talk-data.com talk-data.com

Topic

Data Engineering

etl data_pipelines big_data

16

tagged

Activity Trend

127 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Science Books ×
Microsoft Power BI Data Analyst Associate Study Guide

Passing the PL-300 exam with 2025 revisions isn't just about memorization—you need to thoroughly know the basic features of Power BI. However, data professionals must also apply best practices that make Power BI solutions scalable and future-proof. The first half of this go-to companion by Paul Turley provides complete coverage of the PL-300 exam objectives for desktop and self-service users, while the second half equips you with necessary best practices and practical skills for real-world success after the exam. Immerse yourself in exam prep, practice questions, and hands-on references for applying time-tested design patterns in Power BI. You'll learn how to transform raw data into actionable insights using Power Query, DAX, and dimensional modeling. Perfect for data analysts and business intelligence developers, this guide shows how Power BI fits into modern data platforms like Azure and Microsoft Fabric, preparing you for the exam and for the evolving world of data engineering. Understand PL-300 exam topics and key prep strategies Discover scalable, enterprise-grade Power BI solutions using best practices Learn how to correctly apply Power Query, DAX, and visualizations in real-world scenarios, with real business data Uncover how to build for scale See how Power BI fits into modern architectures like Azure and Microsoft Fabric

Microsoft Fabric Analytics Engineer Associate Certification Companion: Preparation for DP-600 Microsoft Certification

As organizations increasingly leverage Microsoft Fabric to unify their data engineering, analytics, and governance strategies, the role of the Fabric Analytics Engineer has become more crucial than ever. This book equips readers with the knowledge and hands-on skills required to excel in this domain and pass the DP-600 certification exam confidently. This book covers the entire certification syllabus with clarity and depth, beginning with an overview of Microsoft Fabric. You will gain an understanding of the platform’s architecture and how it integrates with data and AI workloads to provide a unified analytics solution. You will then delve into implementing a data warehouse in Microsoft Fabric, exploring techniques to ingest, transform, and store data efficiently. Next, you will learn how to work with semantic models in Microsoft Fabric, enabling them to create intuitive, meaningful data representations for visualization and reporting. Then, you will focus on administration and governance in Microsoft Fabric, emphasizing best practices for security, compliance, and efficient management of analytics solutions. Lastly, you will find detailed practice tests and exam strategies along with supplementary materials to reinforce key concepts. After reading the book, you will have the background and capability to learn the skills and concepts necessary both to pass the DP-600 exam and become a confident Fabric Analytics Engineer. What You Will Learn A complete understanding of all DP-600 certification exam objectives and requirements Key concepts and terminology related to Microsoft Fabric Analytics Step-by-step preparation for successfully passing the DP-600 certification exam Insights into exam structure, question patterns, and strategies for tackling challenging sections Confidence in demonstrating skills validated by the Microsoft Certified: Fabric Analytics Engineer Associate credential Who This Book Is For ​​​​​​​Data engineers, analysts, and professionals with some experience in data engineering or analytics, seeking to expand their knowledge of Microsoft Fabric

Time Series Analysis with Spark

Time Series Analysis with Spark provides a practical introduction to leveraging Apache Spark and Databricks for time series analysis. You'll learn to prepare, model, and deploy robust and scalable time series solutions for real-world applications. From data preparation to advanced generative AI techniques, this guide prepares you to excel in big data analytics. What this Book will help me do Understand the core concepts and architectures of Apache Spark for time series analysis. Learn to clean, organize, and prepare time series data for big data environments. Gain expertise in choosing, building, and training various time series models tailored to specific projects. Master techniques to scale your models in production using Spark and Databricks. Explore the integration of advanced technologies such as generative AI to enhance predictions and derive insights. Author(s) Yoni Ramaswami, a Senior Solutions Architect at Databricks, has extensive experience in data engineering and AI solutions. With a focus on creating innovative big data and AI strategies across industries, Yoni authored this book to empower professionals to efficiently handle time series data. Yoni's approachable style ensures that both foundational concepts and advanced techniques are accessible to readers. Who is it for? This book is ideal for data engineers, machine learning engineers, data scientists, and analysts interested in enhancing their expertise in time series analysis using Apache Spark and Databricks. Whether you're new to time series or looking to refine your skills, you'll find both foundational insights and advanced practices explained clearly. A basic understanding of Spark is helpful but not required.

Fundamentals of Analytics Engineering

Master the art and science of analytics engineering with 'Fundamentals of Analytics Engineering.' This book takes you on a comprehensive journey from understanding foundational concepts to implementing end-to-end analytics solutions. You'll gain not just theoretical knowledge but practical expertise in building scalable, robust data platforms to meet organizational needs. What this Book will help me do Design and implement effective data pipelines leveraging modern tools like Airbyte, BigQuery, and dbt. Adopt best practices for data modeling and schema design to enhance system performance and develop clearer data structures. Learn advanced techniques for ensuring data quality, governance, and observability in your data solutions. Master collaborative coding practices, including version control with Git and strategies for maintaining well-documented codebases. Automate and manage data workflows efficiently using CI/CD pipelines and workflow orchestrators. Author(s) Dumky De Wilde, alongside six co-authors-experienced professionals from various facets of the analytics field-delivers a cohesive exploration of analytics engineering. The authors blend their expertise in software development, data analysis, and engineering to offer actionable advice and insights. Their approachable ethos makes complex concepts understandable, promoting educational learning. Who is it for? This book is a perfect fit for data analysts and engineers curious about transitioning into analytics engineering. Aspiring professionals as well as seasoned analytics engineers looking to deepen their understanding of modern practices will find guidance. It's tailored for individuals aiming to boost their career trajectory in data engineering roles, addressing fundamental to advanced topics.

Mastering Microsoft Fabric: SAASification of Analytics

Learn and explore the capabilities of Microsoft Fabric, the latest evolution in cloud analytics suites. This book will help you understand how users can leverage Microsoft Office equivalent experience for performing data management and advanced analytics activity. The book starts with an overview of the analytics evolution from on premises to cloud infrastructure as a service (IaaS), platform as a service (PaaS), and now software as a service (SaaS version) and provides an introduction to Microsoft Fabric. You will learn how to provision Microsoft Fabric in your tenant along with the key capabilities of SaaS analytics products and the advantage of using Fabric in the enterprise analytics platform. OneLake and Lakehouse for data engineering is discussed as well as OneLake for data science. Author Ghosh teaches you about data warehouse offerings inside Microsoft Fabric and the new data integration experience which brings Azure Data Factory and Power Query Editor of Power BI together in a single platform. Also demonstrated is Real-Time Analytics in Fabric, including capabilities such as Kusto query and database. You will understand how the new event stream feature integrates with OneLake and other computations. You also will know how to configure the real-time alert capability in a zero code manner and go through the Power BI experience in the Fabric workspace. Fabric pricing and its licensing is also covered. After reading this book, you will understand the capabilities of Microsoft Fabric and its Integration with current and upcoming Azure OpenAI capabilities. What You Will Learn Build OneLake for all data like OneDrive for Microsoft Office Leverage shortcuts for cross-cloud data virtualization in Azure and AWS Understand upcoming OpenAI integration Discover new event streaming and Kusto query inside Fabric real-time analytics Utilize seamless tooling for machine learning and data science Who This Book Is For Citizen users and experts in the data engineering and data science fields, along with chief AI officers

Data Science: The Hard Parts

This practical guide provides a collection of techniques and best practices that are generally overlooked in most data engineering and data science pedagogy. A common misconception is that great data scientists are experts in the "big themes" of the discipline—machine learning and programming. But most of the time, these tools can only take us so far. In practice, the smaller tools and skills really separate a great data scientist from a not-so-great one. Taken as a whole, the lessons in this book make the difference between an average data scientist candidate and a qualified data scientist working in the field. Author Daniel Vaughan has collected, extended, and used these skills to create value and train data scientists from different companies and industries. With this book, you will: Understand how data science creates value Deliver compelling narratives to sell your data science project Build a business case using unit economics principles Create new features for a ML model using storytelling Learn how to decompose KPIs Perform growth decompositions to find root causes for changes in a metric Daniel Vaughan is head of data at Clip, the leading paytech company in Mexico. He's the author of Analytical Skills for AI and Data Science (O'Reilly).

Serverless Analytics with Amazon Athena

Delve into the serverless world of Amazon Athena with the comprehensive book 'Serverless Analytics with Amazon Athena'. This guide introduces you to the power of Athena, showing you how to efficiently query data in Amazon S3 using SQL without the hassle of managing infrastructure. With clear instructions and practical examples, you'll master querying structured, unstructured, and semi-structured data seamlessly. What this Book will help me do Effectively query and analyze both structured and unstructured data stored in S3 using Amazon Athena. Integrate Athena with other AWS services to create powerful, secure, and cost-efficient data workflows. Develop ETL pipelines and machine learning workflows leveraging Athena's compatibility with AWS Glue. Monitor and troubleshoot Athena queries for consistent performance and build scalable serverless data solutions. Implement security best practices and optimize costs when managing your Athena-driven data solutions. Author(s) None Virtuoso, along with co-authors Mert Turkay Hocanin None and None Wishnick, brings a wealth of experience in cloud solutions, serverless technologies, and data engineering. They excel in demystifying complex technical topics and have a passion for empowering readers with practical skills and knowledge. Who is it for? This book is tailored for business intelligence analysts, application developers, and system administrators who want to harness Amazon Athena for seamless, cost-efficient data analytics. It suits individuals with basic SQL knowledge looking to expand their capabilities in querying and processing data. Whether you're managing growing datasets or building data-driven applications, this book provides the know-how to get it right.

Power Query Cookbook

The "Power Query Cookbook" is your comprehensive guide to mastering data preparation and transformation using Power Query. With this book, you'll learn to connect to data sources, reshape data to fit business requirements, and use both no-code transformations and custom M code solutions to unlock the full potential of your data. Step-by-step examples will guide you through optimizing dataflows in Power BI. What this Book will help me do Master connecting to various data sources and performing intuitive transformations using Power Query. Learn to reshape and enrich data to meet complex business requirements efficiently. Explore advanced capabilities of Power Query, including M code and online dataflows. Develop custom data transformations with a blend of GUI-based and M code techniques. Optimize the performance of Power BI Dataflows using best practices and diagnostics tools. Author(s) None Janicijevic is a seasoned expert in data analytics, specializing in Microsoft Power BI and Power Query. With years of experience in data engineering and a passion for teaching, None brings a clear, actionable, and results-driven approach to demystifying complex technical concepts. Their work empowers professionals with the tools they need to excel in data-driven decision-making. Who is it for? This book is designed for data analysts, business intelligence developers, and data engineers aiming to enhance their skills in data preparation using Power Query. If you have a basic understanding of Power BI and want to delve into integrating and optimizing data from multiple sources, this book is for you. It's ideal for professionals seeking practical insights and techniques to improve data transformations. Novices with some exposure to BI tools will also find the material accessible and rewarding.

Data Science on AWS

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

Practical Time Series Analysis

Time series data analysis is increasingly important due to the massive production of such data through the internet of things, the digitalization of healthcare, and the rise of smart cities. As continuous monitoring and data collection become more common, the need for competent time series analysis with both statistical and machine learning techniques will increase. Covering innovations in time series data analysis and use cases from the real world, this practical guide will help you solve the most common data engineering and analysis challengesin time series, using both traditional statistical and modern machine learning techniques. Author Aileen Nielsen offers an accessible, well-rounded introduction to time series in both R and Python that will have data scientists, software engineers, and researchers up and running quickly. You’ll get the guidance you need to confidently: Find and wrangle time series data Undertake exploratory time series data analysis Store temporal data Simulate time series data Generate and select features for a time series Measure error Forecast and classify time series with machine or deep learning Evaluate accuracy and performance

Practical Data Science with Python 3: Synthesizing Actionable Insights from Data

Gain insight into essential data science skills in a holistic manner using data engineering and associated scalable computational methods. This book covers the most popular Python 3 frameworks for both local and distributed (in premise and cloud based) processing. Along the way, you will be introduced to many popular open-source frameworks, like, SciPy, scikitlearn, Numba, Apache Spark, etc. The book is structured around examples, so you will grasp core concepts via case studies and Python 3 code. As data science projects gets continuously larger and more complex, software engineering knowledge and experience is crucial to produce evolvable solutions. You'll see how to create maintainable software for data science and how to document data engineering practices. This book is a good starting point for people who want to gain practical skills to perform data science. All the code willbe available in the form of IPython notebooks and Python 3 programs, which allow you to reproduce all analyses from the book and customize them for your own purpose. You'll also benefit from advanced topics like Machine Learning, Recommender Systems, and Security in Data Science. Practical Data Science with Python will empower you analyze data, formulate proper questions, and produce actionable insights, three core stages in most data science endeavors. What You'll Learn Play the role of a data scientist when completing increasingly challenging exercises using Python 3 Work work with proven data science techniques/technologies Review scalable software engineering practices to ramp up data analysis abilities in the realm of Big Data Apply theory of probability, statistical inference, and algebra to understand the data sciencepractices Who This Book Is For Anyone who would like to embark into the realm of data science using Python 3.

Practical Data Science: A Guide to Building the Technology Stack for Turning Data Lakes into Business Assets

Learn how to build a data science technology stack and perform good data science with repeatable methods. You will learn how to turn data lakes into business assets. The data science technology stack demonstrated in Practical Data Science is built from components in general use in the industry. Data scientist Andreas Vermeulen demonstrates in detail how to build and provision a technology stack to yield repeatable results. He shows you how to apply practical methods to extract actionable business knowledge from data lakes consisting of data from a polyglot of data types and dimensions. What You'll Learn Become fluent in the essential concepts and terminology of data science and data engineering Build and use a technology stack that meets industry criteria Master the methods for retrieving actionable business knowledge Coordinate the handling ofpolyglot data types in a data lake for repeatable results Who This Book Is For Data scientists and data engineers who are required to convert data from a data lake into actionable knowledge for their business, and students who aspire to be data scientists and data engineers

Python Web Scraping Cookbook

Python Web Scraping Cookbook is your comprehensive guide to building efficient and functional web scraping tools using Python. With practical recipes, you'll learn to overcome the challenges of dynamic content, captcha, and irregular web structures while deploying scalable solutions. What this Book will help me do Master the use of Python libraries like BeautifulSoup and Scrapy for scraping data. Perfect techniques for handling JavaScript-heavy sites using Selenium. Learn to overcome web scraping challenges, such as captchas and rate-limiting. Design scalable scraping pipelines with cloud deployment in AWS. Understand web data extraction techniques with XPath, CSS selectors, and more. Author(s) Michael Heydt is a seasoned software engineer and technical author with a focus on data engineering and cloud solutions. Having worked with Python extensively, he brings real-world insights into web scraping. His practical approach simplifies complex concepts. Who is it for? This book is perfect for Python developers and data enthusiasts keen to master web scraping techniques. If you're a programmer with insights into Python scripting and wish to scrape, analyze, and utilize web data efficiently, this book is for you.

Scala: Guide for Data Science Professionals

Scala will be a valuable tool to have on hand during your data science journey for everything from data cleaning to cutting-edge machine learning About This Book Build data science and data engineering solutions with ease An in-depth look at each stage of the data analysis process — from reading and collecting data to distributed analytics Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulations, and source code Who This Book Is For This learning path is perfect for those who are comfortable with Scala programming and now want to enter the field of data science. Some knowledge of statistics is expected. What You Will Learn Transfer and filter tabular data to extract features for machine learning Read, clean, transform, and write data to both SQL and NoSQL databases Create Scala web applications that couple with JavaScript libraries such as D3 to create compelling interactive visualizations Load data from HDFS and HIVE with ease Run streaming and graph analytics in Spark for exploratory analysis Bundle and scale up Spark jobs by deploying them into a variety of cluster managers Build dynamic workflows for scientific computing Leverage open source libraries to extract patterns from time series Master probabilistic models for sequential data In Detail Scala is especially good for analyzing large sets of data as the scale of the task doesn’t have any significant impact on performance. Scala’s powerful functional libraries can interact with databases and build scalable frameworks — resulting in the creation of robust data pipelines. The first module introduces you to Scala libraries to ingest, store, manipulate, process, and visualize data. Using real world examples, you will learn how to design scalable architecture to process and model data — starting from simple concurrency constructs and progressing to actor systems and Apache Spark. After this, you will also learn how to build interactive visualizations with web frameworks. Once you have become familiar with all the tasks involved in data science, you will explore data analytics with Scala in the second module. You’ll see how Scala can be used to make sense of data through easy to follow recipes. You will learn about Bokeh bindings for exploratory data analysis and quintessential machine learning with algorithms with Spark ML library. You’ll get a sufficient understanding of Spark streaming, machine learning for streaming data, and Spark graphX. Armed with a firm understanding of data analysis, you will be ready to explore the most cutting-edge aspect of data science — machine learning. The final module teaches you the A to Z of machine learning with Scala. You’ll explore Scala for dependency injections and implicits, which are used to write machine learning algorithms. You’ll also explore machine learning topics such as clustering, dimentionality reduction, Naïve Bayes, Regression models, SVMs, neural networks, and more. This learning path combines some of the best that Packt has to offer into one complete, curated package. It includes content from the following Packt products: Scala for Data Science, Pascal Bugnion Scala Data Analysis Cookbook, Arun Manivannan Scala for Machine Learning, Patrick R. Nicolas Style and approach A complete package with all the information necessary to start building useful data engineering and data science solutions straight away. It contains a diverse set of recipes that cover the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala. Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Big Data Analytics with R

Unlock the potential of big data analytics by mastering R programming with this comprehensive guide. This book takes you step-by-step through real-world scenarios where R's capabilities shine, providing you with practical skills to handle, process, and analyze large and complex datasets effectively. What this Book will help me do Understand the latest big data processing methods and how R can enhance their application. Set up and use big data platforms such as Hadoop and Spark in conjunction with R. Utilize R for practical big data problems, such as analyzing consumption and behavioral datasets. Integrate R with SQL and NoSQL databases to maximize its versatility in data management. Discover advanced machine learning implementations using R and Spark MLlib for predictive analytics. Author(s) None Walkowiak is an experienced data analyst and R programming expert with a passion for data engineering and machine learning. With a deep knowledge of big data platforms and extensive teaching experience, they bring a clear and approachable writing style to help learners excel. Who is it for? Ideal for data analysts, scientists, and engineers with fundamental data analysis knowledge looking to enhance their big data capabilities using R. If you aim to adapt R for large-scale data management and analysis workflows, this book is your ideal companion to bridge the gap.

Doing Data Science

Now that people are aware that data can make the difference in an election or a business model, data science as an occupation is gaining ground. But how can you get started working in a wide-ranging, interdisciplinary field that’s so clouded in hype? This insightful book, based on Columbia University’s Introduction to Data Science class, tells you what you need to know. In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science. Topics include: Statistical inference, exploratory data analysis, and the data science process Algorithms Spam filters, Naive Bayes, and data wrangling Logistic regression Financial modeling Recommendation engines and causality Data visualization Social networks and data journalism Data engineering, MapReduce, Pregel, and Hadoop Doing Data Science is collaboration between course instructor Rachel Schutt, Senior VP of Data Science at News Corp, and data science consultant Cathy O’Neil, a senior data scientist at Johnson Research Labs, who attended and blogged about the course.