talk-data.com talk-data.com

Topic

Analytics

data_analysis insights metrics

395

tagged

Activity Trend

398 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Engineering Books ×
Data Engineering with AWS - Second Edition

Learn data engineering and modern data pipeline design with AWS in this comprehensive guide! You will explore key AWS services like S3, Glue, Redshift, and QuickSight to ingest, transform, and analyze data, and you'll gain hands-on experience creating robust, scalable solutions. What this Book will help me do Understand and implement data ingestion and transformation processes using AWS tools. Optimize data for analytics with advanced AWS-powered workflows. Build end-to-end modern data pipelines leveraging cutting-edge AWS technologies. Design data governance strategies using AWS services for security and compliance. Visualize data and extract insights using Amazon QuickSight and other tools. Author(s) Gareth Eagar is a Senior Data Architect with over 25 years of experience in designing and implementing data solutions across various industries. He combines his deep technical expertise with a passion for teaching, aiming to make complex concepts approachable for learners at all levels. Who is it for? This book is intended for current or aspiring data engineers, data architects, and analysts seeking to leverage AWS for data engineering. It suits beginners with a basic understanding of data concepts who want to gain practical experience as well as intermediate professionals aiming to expand into AWS-based systems.

Designing a Modern Application Data Stack

Today's massive datasets represent an unprecedented opportunity for organizations to build data-intensive applications. With this report, product leads, architects, and others who deal with applications and application development will explore why a cloud data platform is a great fit for data-intensive applications. You'll learn how to carefully consider scalability, data processing, and application distribution when making data app design decisions. Cloud data platforms are the modern infrastructure choice for data applications, as they offer improved scalability, elasticity, and cost efficiency. With a better understanding of data-intensive application architectures on cloud-based data platforms and the best practices outlined in this report, application teams can take full advantage of advances in data processing and app distribution to accelerate development, deployment, and adoption cycles. With this insightful report, you will: Learn why a modern cloud data platform is essential for building data-intensive applications Explore how scalability, data processing, and distribution models are key for today's data apps Implement best practices to improve application scalability and simplify data processing for efficiency gains Modernize application distribution plans to meet the needs of app providers and consumers About the authors: Adam Morton works with Intelligen Group, a Snowflake pure-play data and analytics consultancy. Kevin McGinley is technical director of the Snowflake customer acceleration team. Brad Culberson is a data platform architect specializing in data applications at Snowflake.

Delta Lake: Up and Running

With the surge in big data and AI, organizations can rapidly create data products. However, the effectiveness of their analytics and machine learning models depends on the data's quality. Delta Lake's open source format offers a robust lakehouse framework over platforms like Amazon S3, ADLS, and GCS. This practical book shows data engineers, data scientists, and data analysts how to get Delta Lake and its features up and running. The ultimate goal of building data pipelines and applications is to gain insights from data. You'll understand how your storage solution choice determines the robustness and performance of the data pipeline, from raw data to insights. You'll learn how to: Use modern data management and data engineering techniques Understand how ACID transactions bring reliability to data lakes at scale Run streaming and batch jobs against your data lake concurrently Execute update, delete, and merge commands against your data lake Use time travel to roll back and examine previous data versions Build a streaming data quality pipeline following the medallion architecture

Amazon Redshift: The Definitive Guide

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse. Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift. By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you: Build a cloud data strategy around Amazon Redshift as foundational data warehouse Get started with Amazon Redshift with simple-to-use data models and design best practices Understand how and when to use Redshift Serverless and Redshift provisioned clusters Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing Learn best practices for security, monitoring, resilience, and disaster recovery Leverage Amazon Redshift integration with other AWS services to unlock additional value

The Unrealized Opportunities with Real-Time Data

The amount of data generated from various processes and platforms has increased exponentially in the past decade, and the challenges of filtering useful data out of streams of raw data has become even greater. Meanwhile, the essence of making useful insights from that data has become even more important. In this incisive report, Federico Castanedo examines the challenges companies face when acting on data at rest as well as the benefits you unlock when acting on data as it's generated. Data engineers, enterprise architects, CTOs, and CIOs will explore the tools, processes, and mindset your company needs to process streaming data in real time. Learn how to make quick data-driven decisions to gain an edge on competitors. This report helps you: Explore gaps in today's real-time data architectures, including the limitations of real-time analytics to act on data immediately Examine use cases that can't be served efficiently with real-time analytics Understand how stream processing engines work with real-time data Learn how distributed data processing architectures, stream processing, streaming analytics, and event-based architectures relate to real-time data Understand how to transition from traditional batch processing environments to stream processing Federico Castanedo is an academic director and adjunct professor at IE University in Spain. A data science and AI leader, he has extensive experience in academia, industry, and startups.

Building Real-Time Analytics Systems

Gain deep insight into real-time analytics, including the features of these systems and the problems they solve. With this practical book, data engineers at organizations that use event-processing systems such as Kafka, Google Pub/Sub, and AWS Kinesis will learn how to analyze data streams in real time. The faster you derive insights, the quicker you can spot changes in your business and act accordingly. Author Mark Needham from StarTree provides an overview of the real-time analytics space and an understanding of what goes into building real-time applications. The book's second part offers a series of hands-on tutorials that show you how to combine multiple software products to build real-time analytics applications for an imaginary pizza delivery service. You will: Learn common architectures for real-time analytics Discover how event processing differs from real-time analytics Ingest event data from Apache Kafka into Apache Pinot Combine event streams with OLTP data using Debezium and Kafka Streams Write real-time queries against event data stored in Apache Pinot Build a real-time dashboard and order tracking app Learn how Uber, Stripe, and Just Eat use real-time analytics

MCA Microsoft Certified Associate Azure Data Engineer Study Guide

Prepare for the Azure Data Engineering certification—and an exciting new career in analytics—with this must-have study aide In the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203, accomplished data engineer and tech educator Benjamin Perkins delivers a hands-on, practical guide to preparing for the challenging Azure Data Engineer certification and for a new career in an exciting and growing field of tech. In the book, you’ll explore all the objectives covered on the DP-203 exam while learning the job roles and responsibilities of a newly minted Azure data engineer. From integrating, transforming, and consolidating data from various structured and unstructured data systems into a structure that is suitable for building analytics solutions, you’ll get up to speed quickly and efficiently with Sybex’s easy-to-use study aids and tools. This Study Guide also offers: Career-ready advice for anyone hoping to ace their first data engineering job interview and excel in their first day in the field Indispensable tips and tricks to familiarize yourself with the DP-203 exam structure and help reduce test anxiety Complimentary access to Sybex’s expansive online study tools, accessible across multiple devices, and offering access to hundreds of bonus practice questions, electronic flashcards, and a searchable, digital glossary of key terms A one-of-a-kind study aid designed to help you get straight to the crucial material you need to succeed on the exam and on the job, the MCA Microsoft Certified Associate Azure Data Engineer Study Guide: Exam DP-203 belongs on the bookshelves of anyone hoping to increase their data analytics skills, advance their data engineering career with an in-demand certification, or hoping to make a career change into a popular new area of tech.

Serverless Machine Learning with Amazon Redshift ML

Serverless Machine Learning with Amazon Redshift ML provides a hands-on guide to using Amazon Redshift Serverless and Redshift ML for building and deploying machine learning models. Through SQL-focused examples and practical walkthroughs, you will learn efficient techniques for cloud data analytics and serverless machine learning. What this Book will help me do Grasp the workflow of building machine learning models with Redshift ML using SQL. Learn to handle supervised learning tasks like classification and regression. Apply unsupervised learning techniques, such as K-means clustering, in Redshift ML. Develop time-series forecasting models within Amazon Redshift. Understand how to operationalize machine learning in serverless cloud architecture. Author(s) Debu Panda, Phil Bates, Bhanu Pittampally, and Sumeet Joshi are seasoned professionals in cloud computing and machine learning technologies. They combine deep technical knowledge with teaching expertise to guide learners through mastering Amazon Redshift ML. Their collaborative approach ensures that the content is accessible, engaging, and practically applicable. Who is it for? This book is perfect for data scientists, machine learning engineers, and database administrators using or intending to use Amazon Redshift. It's tailored for professionals with basic knowledge of machine learning and SQL who aim to enhance their efficiency and specialize in serverless machine learning within cloud architectures.

High-Performance Data Architectures

By choosing the right database, you can maximize your business potential, improve performance, increase efficiency, and gain a competitive edge. This insightful report examines the benefits of using a simplified data architecture containing cloud-based HTAP (hybrid transactional and analytical processing) database capabilities. You'll learn how this data architecture can help data engineers and data decision makers focus on what matters most: growing your business. Authors Joe McKendrick and Ed Huang explain how cloud native infrastructure supports enterprise businesses and operations with a much more agile foundation. Just one layer up from the infrastructure, cloud-based databases are a crucial part of data management and analytics. Learn how distributed SQL databases containing HTAP capabilities provide more efficient and streamlined data processing to improve cost efficiency and expedite business operations and decision making. This report helps you: Explore industry trends in database development Learn the benefits of a simplified data architecture Comb through the complex and crowded database choices on the market Examine the process of selecting the right database for your business Learn the latest innovations database for improving your company's efficiency and performance

Graph-Powered Analytics and Machine Learning with TigerGraph

With the rapid rise of graph databases, organizations are now implementing advanced analytics and machine learning solutions to help drive business outcomes. This practical guide shows data scientists, data engineers, architects, and business analysts how to get started with a graph database using TigerGraph, one of the leading graph database models available. You'll explore a three-stage approach to deriving value from connected data: connect, analyze, and learn. Victor Lee, Phuc Kien Nguyen, and Alexander Thomas present real use cases covering several contemporary business needs. By diving into hands-on exercises using TigerGraph Cloud, you'll quickly become proficient at designing and managing advanced analytics and machine learning solutions for your organization. Use graph thinking to connect, analyze, and learn from data for advanced analytics and machine learning Learn how graph analytics and machine learning can deliver key business insights and outcomes Use five core categories of graph algorithms to drive advanced analytics and machine learning Deliver a real-time 360-degree view of core business entities, including customer, product, service, supplier, and citizen Discover insights from connected data through machine learning and advanced analytics

Data Engineering with dbt

Data Engineering with dbt provides a comprehensive guide to building modern, reliable data platforms using dbt and SQL. You'll gain hands-on experience building automated ELT pipelines, using dbt Cloud with Snowflake, and embracing patterns for scalable and maintainable data solutions. What this Book will help me do Set up and manage a dbt Cloud environment and create reliable ELT pipelines. Integrate Snowflake with dbt to implement robust data engineering workflows. Transform raw data into analytics-ready data using dbt's features and SQL. Apply advanced dbt functionality such as macros and Jinja for efficient coding. Ensure data accuracy and platform reliability with built-in testing and monitoring. Author(s) None Zagni is a seasoned data engineering professional with a wealth of experience in designing scalable data platforms. Through practical insights and real-world applications, Zagni demystifies complex data engineering practices. Their approachable teaching style makes technical concepts accessible and actionable. Who is it for? This book is perfect for data engineers, analysts, and analytics engineers looking to leverage dbt for data platform development. If you're a manager or decision maker interested in fostering efficient data workflows or a professional with basic SQL knowledge aiming to deepen your expertise, this resource will be invaluable.

Geospatial Data Analytics on AWS

In "Geospatial Data Analytics on AWS," you will learn how to store, manage, and analyze geospatial data effectively using various AWS services. This book provides insight into building geospatial data lakes, leveraging AWS databases, and applying best practices to derive insights from spatial data in the cloud. What this Book will help me do Design and manage geospatial data lakes on AWS leveraging S3 and other storage solutions. Analyze geospatial data using AWS services such as Athena and Redshift. Utilize machine learning models for geospatial data processing and analytics using SageMaker. Visualize geospatial data through services like Amazon QuickSight and OpenStreetMap integration. Avoid common pitfalls when managing geospatial data in the cloud. Author(s) Scott Bateman, Janahan Gnanachandran, and Jeff DeMuth bring their extensive experience in cloud computing and geospatial analytics to this book. With backgrounds in cloud architecture, data science, and geospatial applications, they aim to make complex topics accessible. Their collaborative approach ensures readers can practically apply concepts to real-world challenges. Who is it for? This book is ideal for GIS and data professionals, including developers, analysts, and scientists. It suits readers with a basic understanding of geographical concepts but no prior AWS experience. If you're aiming to enhance your cloud-based geospatial data management and analytics skills, this is the guide for you.

Data for All

Do you know what happens to your personal data when you are browsing, buying, or using apps? Discover how your data is harvested and exploited, and what you can do to access, delete, and monetize it. Data for All empowers everyone—from tech experts to the general public—to control how third parties use personal data. Read this eye-opening book to learn: The types of data you generate with every action, every day Where your data is stored, who controls it, and how much money they make from it How you can manage access and monetization of your own data Restricting data access to only companies and organizations you want to support The history of how we think about data, and why that is changing The new data ecosystem being built right now for your benefit The data you generate every day is the lifeblood of many large companies—and they make billions of dollars using it. In Data for All, bestselling author John K. Thompson outlines how this one-sided data economy is about to undergo a dramatic change. Thompson pulls back the curtain to reveal the true nature of data ownership, and how you can turn your data from a revenue stream for companies into a financial asset for your benefit. About the Technology Do you know what happens to your personal data when you’re browsing and buying? New global laws are turning the tide on companies who make billions from your clicks, searches, and likes. This eye-opening book provides an inspiring vision of how you can take back control of the data you generate every day. About the Book Data for All gives you a step-by-step plan to transform your relationship with data and start earning a “data dividend”—hundreds or thousands of dollars paid out simply for your online activities. You’ll learn how to oversee who accesses your data, how much different types of data are worth, and how to keep private details private. What's Inside The types of data you generate with every action, every day How you can manage access and monetization of your own data The history of how we think about data, and why that is changing The new data ecosystem being built right now for your benefit About the Reader For anyone who is curious or concerned about how their data is used. No technical knowledge required. About the Author John K. Thompson is an international technology executive with over 37 years of experience in the fields of data, advanced analytics, and artificial intelligence. Quotes An honest, direct, pull-no-punches source on one of the most important personal issues of our time....I changed some of my own behaviors after reading the book, and I suggest you do so as well. You have more to lose than you may think. - From the Foreword by Thomas H. Davenport, author of Competing on Analytics and The AI Advantage A must-read for anyone interested in the future of data. It helped me understand the reasons behind the current data ecosystem and the laws that are shaping its future. A great resource for both professionals and individuals. I highly recommend it. - Ravit Jain, Founder & Host of The Ravit Show, Data Science Evangelist

IBM Power System AC922 Technical Overview and Introduction

This IBM® Redpaper™ publication is a comprehensive guide that covers the IBM Power System AC922 server (8335-GTH and 8335-GTX models). The Power AC922 server is the next generation of the IBM POWER® processor-based systems, which are designed for deep learning (DL) and artificial intelligence (AI), high-performance analytics, and high-performance computing (HPC). This paper introduces the major innovative Power AC922 server features and their relevant functions: Powerful IBM POWER9™ processors that offer up to 22 cores at up to 2.80 GHz (3.10 GHz turbo) performance with up to 2 TB of memory. IBM Coherent Accelerator Processor Interface (CAPI) 2.0, IBM OpenCAPI™, and second-generation NVIDIA NVLink 2.0 technology for exceptional processor to accelerator intercommunication. Up to six dedicated NVIDIA Tesla V100 graphics processing units (GPUs). This publication is for professionals who want to acquire a better understanding of IBM Power Systems™ products and is intended for the following audiences: Clients Sales and marketing professionals Technical support professionals IBM Business Partners Independent software vendors (ISVs) This paper expands the set of IBM Power Systems documentation by providing a desktop reference that offers a detailed technical description of the Power AC922 server. This paper does not replace the current marketing materials and configuration tools. It is intended as an extra source of information that, together with existing sources, can be used to enhance your knowledge of IBM server solutions.

Automating Data Transformations

The modern data stack has evolved rapidly in the past decade. Yet, as enterprises migrate vast amounts of data from on-premises platforms to the cloud, data teams continue to face limitations executing data transformation at scale. Data transformation is an integral part of the analytics workflow--but it's also the most time-consuming, expensive, and error-prone part of the process. In this report, Satish Jayanthi and Armon Petrossian examine key concepts that will enable you to automate data transformation at scale. IT decision makers, CTOs, and data team leaders will explore ways to democratize data transformation by shifting from activity-oriented to outcome-oriented teams--from manufacturing-line assembly to an approach that lets even junior analysts implement data with only a brief code review. With this insightful report, you will: Learn how successful data systems rely on simplicity, flexibility, user-friendliness, and a metadata-first approach Adopt a product-first mindset (data as a product, or DaaP) for developing data resources that focus on discoverability, understanding, trust, and exploration Build a transformation platform that delivers the most value, using a column-first approach Use data architecture as a service (DAaaS) to help teams build and maintain their own data infrastructure as they work collaboratively About the authors: Armon Petrossian is CEO and cofounder of Coalesce. Previously, he was part of the founding team at WhereScape in North America, where he served as national sales manager for almost a decade. Satish Jayanthi is CTO and cofounder of Coalesce. Prior to that, he was senior solutions architect at WhereScape, where he met his cofounder Armon.

IBM FlashSystem 7300 Product Guide

This IBM® Redpaper Product Guide describes the IBM FlashSystem® 7300 solution, which is a next-generation IBM FlashSystem control enclosure. It combines the performance of flash and a Non-Volatile Memory Express (NVMe)-optimized architecture with the reliability and innovation of IBM FlashCore® technology and the rich feature set and high availability (HA) of IBM Spectrum® Virtualize. To take advantage of artificial intelligence (AI)-enhanced applications, real-time big data analytics, and cloud architectures that require higher levels of system performance and storage capacity, enterprises around the globe are rapidly moving to modernize established IT infrastructures. However, for many organizations, staff resources, and expertise are limited, and cost-efficiency is a top priority. These organizations have important investments in existing infrastructure that they want to maximize. They need enterprise-grade solutions that optimize cost-efficiency while simplifying the pathway to modernization. IBM FlashSystem 7300 is designed specifically for these requirements and use cases. It also delivers a cyber resilience without compromising application performance. IBM FlashSystem 7300 provides a rich set of software-defined storage (SDS) features that are delivered by IBM Spectrum Virtualize, including the following examples: Data reduction and deduplication Dynamic tiering Thin-provisioning Snapshots Cloning Replication and data copy services Cyber resilience Transparent Cloud Tiering (TCT) IBM HyperSwap® including 3-site replication for high availability Scale-out and scale-up configurations further enhance capacity and throughput for better availability With the release of IBM Spectrum Virtualize V8.5, extra functions and features are available, including support for new third-generation IBM FlashCore Modules Non-Volatile Memory Express (NVMe) type drives within the control enclosure, and 100 Gbps Ethernet adapters that provide NVMe Remote Direct Memory Access (RDMA) options. New software features include GUI enhancements, security enhancements including multifactor authentication and single sign-on, and Fibre Channel (FC) portsets.

Snowflake SnowPro™ Advanced Architect Certification Companion: Hands-on Preparation and Practice

Master the intricacies of Snowflake and prepare for the SnowPro Advanced Architect Certification exam with this comprehensive study companion. This book provides robust and effective study tools to help you prepare for the exam and is also designed for those who are interested in learning the advanced features of Snowflake. The practical examples and in-depth background on theory in this book help you unleash the power of Snowflake in building a high-performance system. The best practices demonstrated in the book help you use Snowflake more powerfully and effectively as a data warehousing and analytics platform. Reading this book and reviewing the concepts will help you gain the knowledge you need to take the exam. The book guides you through a study of the different domains covered on the exam: Accounts and Security, Snowflake Architecture, Data Engineering, and Performance Optimization. You’ll also be well positioned to apply your newly acquired practical skills to real-world Snowflake solutions. You will have a deep understanding of Snowflake to help you take full advantage of Snowflake’s architecture to deliver value analytics insight to your business. What You Will Learn Gain the knowledge you need to prepare for the exam Review in-depth theory on Snowflake to help you build high-performance systems Broaden your skills as a data warehouse designer to cover the Snowflake ecosystem Optimize performance and costs associated with your use of the Snowflake data platform Share data securely both inside your organization and with external partners Apply your practical skills to real-world Snowflake solutions Who This Book Is For Anyone who is planning to take the SnowPro Advanced Architect Certification exam, those who want to move beyond traditional database technologies and build their skills to design and architect solutions using Snowflake services, and veteran database professionals seeking an on-the-job reference to understand one of the newest and fastest-growing technologies in data

What Every Engineer Should Know About Data-Driven Analytics

What Every Engineer Should Know About Data-Driven Analytics provides a comprehensive introduction to the machine learning theoretical concepts and approaches that are used in predictive data analytics through practical applications and case studies.

Principles of Data Fabric

In "Principles of Data Fabric," you will gain a comprehensive understanding of Data Fabric solutions and architectures. This book provides a clear picture of how to design, implement, and optimize Data Fabric solutions to tackle complex data challenges. By the end, you'll be equipped with the knowledge to unify and leverage your organizational data efficiently. What this Book will help me do Design and architect Data Fabric solutions tailored to specific organizational needs. Learn to integrate Data Fabric with DataOps and Data Mesh for holistic data management. Master the principles of Data Governance and Self-Service analytics within the Data Fabric. Implement best practices for distributed data management and regulatory compliance. Apply industry insights and frameworks to optimize Data Fabric deployment. Author(s) Sonia Mezzetta, the author of "Principles of Data Fabric," is an experienced data professional with a deep understanding of data management frameworks and architectures like Data Fabric, Data Mesh, and DataOps. With years of industry expertise, Sonia has helped organizations implement effective data strategies. Her writing combines technical know-how with an approachable style to enlighten and guide readers on their data journey. Who is it for? This book is ideal for data engineers, data architects, and business analysts who seek to understand and implement Data Fabric solutions. It will also appeal to senior data professionals like Chief Data Officers aiming to integrate Data Fabric into their enterprises. Novice to intermediate knowledge of data management would be beneficial for readers. The content provides clear pathways to achieve actionable results in data strategies.

Building Real-Time Analytics Applications

Every organization needs insight to succeed and excel, and the primary foundation for insights today is data—whether it's internal data from operational systems or external data from partners, vendors, and public sources. But how can you use this data to create and maintain analytics applications capable of gaining real insights in real time? In this report, Darin Briskman explains that leading organizations like Netflix, Walmart, and Confluent have found that while traditional analytics still have value, it's not enough. These companies and many others are now building real-time analytics that deliver insights continually, on demand, and at scale—complete with interactive drill-down data conversations, subsecond performance at scale, and always-on reliability. Ideal for data engineers, data scientists, data architects, and software developers, this report helps you: Learn the elements of real-time analytics, including subsecond performance, high concurrency, and the combination of real-time and historical data Examine case studies that show how Netflix, Walmart, and Confluent have adopted real-time analytics Explore Apache Druid, the real-time database that powers real-time analytics applications Learn how to create real-time analytics applications through data design and interfaces Understand the importance of security, resilience, and managed services Darin Briskman is director of technology at Imply Data, Inc., a software company committed to advancing open source technology and making it simple for developers to realize the power of Apache Druid.