talk-data.com talk-data.com

Topic

Python

programming_language data_science web_development

220

tagged

Activity Trend

185 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Science Books ×
Causal Inference with Bayesian Networks

Leverage the power of graphical models for probabilistic and causal inference to build knowledge-based system applications and to address causal effect queries with observational data for decision aiding and policy making. Key Features Gain a firm understanding of Bayesian networks and structured algorithms for probabilistic inference Acquire a comprehensive understanding of graphical models and their applications in causal inference Gain insights into real-world applications of causal models in multiple domains Enhance your coding skills in R and Python through hands-on examples of causal inference Book Description This is a practical guide that explores the theory and application of Bayesian networks (BN) for probabilistic and causal inference. The book provides step-by-step explanations of graphical models of BN and their structural properties; the causal interpretations of BN and the notion of conditioning by intervention; and the mathematical model of structural equations and the representation in structured causal models (SCM). For probabilistic inference in Bayesian networks, you will learn methods of variable elimination and tree clustering. For causal inference you will learn the computational framework of Pearl's do-calculus for the identification and estimation of causal effects with causal models. In the context of causal inference with observational data, you will be introduced to the potential outcomes framework and explore various classes of meta-learning algorithms that are used to estimate the conditional average treatment effect in causal inference. The book includes practical exercises using R and Python for you to engage in and solidify your understanding of different approaches to probabilistic and causal inference. By the end of this book, you will be able to build and deploy your own causal inference application. You will learn from causal inference sample use cases for diagnosis, epidemiology, social sciences, economics, and finance. What you will learn Representation of knowledge with Bayesian networks Interpretation of conditional independence assumptions Interpretation of causality assumptions in graphical models Probabilistic inference with Bayesian networks Causal effect identification and estimation Machine learning methods for causal inference Coding in R and Python for probabilistic and causal inference Who this book is for This book will serve as a valuable resource for a wide range of professionals including data scientists, software engineers, policy analysts, decision-makers, information technology professionals involved in developing expert systems or knowledge-based applications that deal with uncertainty, as well as researchers across diverse disciplines seeking insights into causal analysis and estimating treatment effects in randomized studies. The book will enable readers to leverage libraries in R and Python and build software prototypes for their own applications.

Time Series Analysis with Python Cookbook - Second Edition

Perform time series analysis and forecasting confidently with this Python code bank and reference manual Purchase of the print or Kindle book includes a free PDF eBook Key Features Explore up-to-date forecasting and anomaly detection techniques using statistical, machine learning, and deep learning algorithms Learn different techniques for evaluating, diagnosing, and optimizing your models Work with a variety of complex data with trends, multiple seasonal patterns, and irregularities Book Description To use time series data to your advantage, you need to be well-versed in data preparation, analysis, and forecasting. This fully updated second edition includes chapters on probabilistic models and signal processing techniques, as well as new content on transformers. Additionally, you will leverage popular libraries and their latest releases covering Pandas, Polars, Sktime, stats models, stats forecast, Darts, and Prophet for time series with new and relevant examples. You'll start by ingesting time series data from various sources and formats, and learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods. Further, you'll explore forecasting using classical statistical models (Holt-Winters, SARIMA, and VAR). Learn practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Then we will move into more advanced topics such as building ML and DL models using TensorFlow and PyTorch, and explore probabilistic modeling techniques. In this part, you’ll also learn how to evaluate, compare, and optimize models, making sure that you finish this book well-versed in wrangling data with Python. What you will learn Understand what makes time series data different from other data Apply imputation and interpolation strategies to handle missing data Implement an array of models for univariate and multivariate time series Plot interactive time series visualizations using hvPlot Explore state-space models and the unobserved components model (UCM) Detect anomalies using statistical and machine learning methods Forecast complex time series with multiple seasonal patterns Use conformal prediction for constructing prediction intervals for time series Who this book is for This book is for data analysts, business analysts, data scientists, data engineers, and Python developers who want practical Python recipes for time series analysis and forecasting techniques. Fundamental knowledge of Python programming is a prerequisite. Prior experience working with time series data to solve business problems will also help you to better utilize and apply the different recipes in this book.

Learn Data Science Using SAS Studio : From Clicks to Code

Do you want to create data analysis reports without writing a line of code? This book introduces SAS Studio, a free, web-based data science product for educational and non-commercial purposes. The power of SAS Studio lies in its visual, point-and-click user interface, which generates SAS code. It is easier to learn SAS Studio than to learn R and Python to accomplish data cleaning, statistics, and visualization tasks. The book includes a case study analyzing the data required to predict the results of presidential elections in the state of Maine for 2016 and 2020. In addition to the presidential elections, the book provides real-life examples, including analyses of stock, oil, and gold prices, crime, marketing, and healthcare. You will see data science in action and how easily it can be performed using complicated tasks and visualizations in SAS Studio. You will learn, step by step, how to perform visualizations, including creating maps. In most cases, you will not need a line of code as you work with the SAS Studio graphical user interface. The book includes explanations of the code that SAS Studio generates automatically. You will learn how to edit this code to perform more complicated advanced tasks. What You Will Learn Become familiar with the SAS Studio IDE. How to create essential visualizations. Know the fundamental statistical analysis required in most data science and analytics reports. Clean the most common dataset problems Learn linear and logistic regression for data prediction and analysis. Write programs in SAS. How to analyze data and get insights from it for decision-making. Learn character, numeric, date, time, and datetime functions and typecasting. Who This Book Is For A general audience of people who are new to data science, students, and data analysts and scientists who are new to SAS. No prior programming or statistical knowledge is required.

The Data Flow Map: A Practical Guide to Clear and Creative Analytics in Any Data Environment

Unlock the secrets of practical data analysis with the Data Flow Map framework—a game-changing approach that transcends tools and platforms. This book isn’t just another programming manual; it’s a guide to thinking and communicating about data at a higher level. Whether you're working with spreadsheets, databases, or AI-driven models, you'll learn how to express your analytics in clear, common language that anyone can understand. In today’s data-rich world, clarity is the real challenge. Technical details often obscure insights that could drive real impact. The Data Flow Map framework simplifies complexity into three core motions: source, focus, and build. The first half of the book explores these concepts through illustrations and stories. The second half applies them to real-world datasets using tools like Excel, SQL, and Python, showing how the framework works across platforms and use cases. A vital resource for analysts at any level, this book offers a practical, tool-agnostic approach to data analysis. With hands-on examples and a universal mental model, you’ll gain the confidence to tackle any dataset, align your team, and deliver insights that matter. Whether you're a beginner or a seasoned pro, the Data Flow Map framework will transform how you approach data analytics. What You Will Learn Grasp essential elements applicable to every data analysis workflow Adapt quickly to any dataset, tool, or platform Master analytic thinking at a higher level Use analytics patterns to better understand the world Break complex analysis into manageable, repeatable steps Iterate faster to uncover deeper insights and better solutions Communicate findings clearly for better decision-making Who This Book Is For Aspiring data professionals and experienced analysts, from beginners to seasoned data engineers, focused on data collection, analysis, and decision making

Bioinformatics with Python Cookbook - Fourth Edition

Bioinformatics with Python Cookbook provides a practical, hands-on approach to solving computational biology challenges with Python, enabling readers to analyze sequencing data, leverage AI for bioinformatics applications, and design robust computational pipelines. What this Book will help me do Perform comprehensive sequence analysis using Python libraries for refined data interpretation. Configure and run bioinformatics workflows on cloud environments for scalable solutions. Apply advanced data science practices to analyze and visualize bioinformatics data. Explore the integration of AI tools in processing multimodal biological datasets. Understand and utilize bioinformatics databases for research and development. Author(s) Shane Brubaker is an experienced computational biologist and software developer with a strong background in bioinformatics and Python programming. With years of experience in data analysis and software engineering, Shane has authored numerous solutions for real-world bioinformatics issues. He brings a practical, example-driven teaching approach, aimed at empowering readers to apply techniques effectively in their work. Who is it for? This book is suitable for bioinformatics professionals, data scientists, and software engineers with moderate experience seeking to expand their computational biology knowledge. Readers should have basic understanding of biology, programming, and cloud tools. By engaging with this book, learners can advance their skills in Python and bioinformatics to address complex biological data challenges effectively.

The Definitive Guide to Microsoft Fabric

Master Microsoft Fabric from basics to advanced architectures with expert guidance to unify, secure, and scale analytics on real-world data platforms Key Features Build a complete data analytics platform with Microsoft Fabric Apply proven architectures, governance, and security strategies Gain real-world insights from five seasoned data experts Purchase of the print or Kindle book includes a free PDF eBook Book Description Microsoft Fabric is reshaping how organizations manage, analyze, and act on data by unifying ingestion, storage, transformation, analytics, AI, and visualization in a single platform. The Definitive Guide to Microsoft Fabric takes you from your very first workspace to building a secure, scalable, and future-proof analytics environment. You’ll learn how to unify data in OneLake, design data meshes, transform and model data, implement real-time analytics, and integrate AI capabilities. The book also covers advanced topics, such as governance, security, cost optimization, and team collaboration using DevOps and DataOps principles. Drawing on the real-world expertise of five seasoned professionals who have built and advised on platforms for startups, SMEs, and Europe’s largest enterprises, this book blends strategic insight with practical guidance. By the end of this book, you’ll have gained the knowledge and skills to design, deploy, and operate a Microsoft Fabric platform that delivers sustainable business value. What you will learn Understand Microsoft Fabric architecture and concepts Unify data storage and data governance with OneLake Ingest and transform data using multiple Fabric tools Implement real-time analytics and event processing Design effective semantic models and reports Integrate AI and machine learning into data workflows Apply governance, security, and compliance controls Optimize performance and costs at scale Who this book is for This book is for data engineers, analytics engineers, architects, and data analysts moving into platform design roles. It’s also valuable for technical leaders seeking to unify analytics in their organizations. You’ll need only a basic grasp of databases, SQL, and Python.

Time Series Forecasting Using Foundation Models

Make accurate time series predictions with powerful pretrained foundation models! You don’t need to spend weeks—or even months—coding and training your own models for time series forecasting. Time Series Forecasting Using Foundation Models shows you how to make accurate predictions using flexible pretrained models. In Time Series Forecasting Using Foundation Models you will discover: The inner workings of large time models Zero-shot forecasting on custom datasets Fine-tuning foundation forecasting models Evaluating large time models Time Series Forecasting Using Foundation Models teaches you how to do efficient forecasting using powerful time series models that have already been pretrained on billions of data points. You’ll appreciate the hands-on examples that show you what you can accomplish with these amazing models. Along the way, you’ll learn how time series foundation models work, how to fine-tune them, and how to use them with your own data. About the Technology Time-series forecasting is the art of analyzing historical, time-stamped data to predict future outcomes. Foundational time series models like TimeGPT and Chronos, pre-trained on billions of data points, can now effectively augment or replace painstakingly-built custom time-series models. About the Book Time Series Forecasting Using Foundation Models explores the architecture of large time models and shows you how to use them to generate fast, accurate predictions. You’ll learn to fine-tune time models on your own data, execute zero-shot probabilistic forecasting, point forecasting, and more. You’ll even find out how to reprogram an LLM into a time series forecaster—all following examples that will run on an ordinary laptop. What's Inside How large time models work Zero-shot forecasting on custom datasets Fine-tuning and evaluating foundation models About the Reader For data scientists and machine learning engineers familiar with the basics of time series forecasting theory. Examples in Python. About the Author Marco Peixeiro builds cutting-edge open-source forecasting Python libraries at Nixtla. He is the author of Time Series Forecasting in Python. Quotes Clear and hands-on, featuring both theory and easy-to-follow examples. - Eryk Lewinson, Author of Python for Finance Cookbook Bridges the gap between classical forecasting methods and the new developments in the foundational models. A fantastic resource. - Juan Orduz, PyMC Labs A foundational guide to forecasting’s next chapter. - Tyler Blume, daybreak An immensely practical introduction to forecasting using foundation models. - Stephan Kolassa, SAP Switzerland

Getting Started with Taipy

Share your machine learning models, create chatbots, as well as build and deploy insightful dashboards speedily using Taipy with this hands-on book featuring real-world application examples from multiple industries Free with your book: DRM-free PDF version + access to Packt's next-gen Reader Key Features Create visually compelling, interactive data applications with Taipy Bring predictive models to end users and create data pipelines to compare scenarios with what-if analyses Go beyond prototypes to build and deploy production-ready applications using the cloud provider of your choice Purchase of the print or Kindle book includes a free PDF eBook in full color Book Description While data analysts, data scientists, and BI experts have the tools to analyze data, build models, and create compelling visuals, they often struggle to translate these insights into practical, user-friendly applications that help end users answer real-world questions, such as identifying revenue trends, predicting inventory needs, or detecting fraud, without wading through complex code. This book is a comprehensive guide to overcoming this challenge. This book teaches you how to use Taipy, a powerful open-source Python library, to build intuitive, production-ready data apps quickly and efficiently. Instead of creating prototypes that nobody uses, you'll learn how to build faster applications that process large amounts of data for multiple users and deliver measurable business impact. Taipy does the heavy lifting to enable your users to visualize their KPIs, interact with charts and maps, and compare scenarios for better decision-making. You’ll learn to use Taipy to build apps that make your data accessible and actionable in production environments like the cloud or Docker. By the end of this book, you won’t just understand Taipy, you'll be able to transform your data skills into impactful solutions that address real-world needs and deliver valuable insights. Email sign-up and proof of purchase required What you will learn Explore Taipy, its use cases, and how it's different from other projects Discover how to create visually appealing interactive apps, display KPIs, charts, and maps Understand how to compare scenarios to make better decisions Connect Taipy applications to several data sources and services Develop apps for diverse use cases, including chatbots, dashboards, ML apps, and maps Deploy Taipy applications on different types of servers and services Master advanced concepts for simplifying and accelerating your development workflow Who this book is for If you’re a data analyst, data scientist, or BI analyst looking to build production-ready data apps entirely in Python, this book is for you. If your scripts and models sit idle because non-technical stakeholders can’t use them, this book shows you how to turn them into full applications fast with Taipy, so your work delivers real business value. It’s also valuable for developers and engineers who want to streamline their data workflows and build UIs in pure Python.

Investing for Programmers

Maximize your portfolio, analyze markets, and make data-driven investment decisions using Python and generative AI. Investing for Programmers shows you how you can turn your existing skills as a programmer into a knack for making sharper investment choices. You’ll learn how to use the Python ecosystem, modern analytic methods, and cutting-edge AI tools to make better decisions and improve the odds of long-term financial success. In Investing for Programmers you’ll learn how to: Build stock analysis tools and predictive models Identify market-beating investment opportunities Design and evaluate algorithmic trading strategies Use AI to automate investment research Analyze market sentiments with media data mining In Investing for Programmers you'll learn the basics of financial investment as you conduct real market analysis, connect with trading APIs to automate buy-sell, and develop a systematic approach to risk management. Don’t worry—there’s no dodgy financial advice or flimsy get-rich-quick schemes. Real-life examples help you build your own intuition about financial markets, and make better decisions for retirement, financial independence, and getting more from your hard-earned money. About the Technology A programmer has a unique edge when it comes to investing. Using open-source Python libraries and AI tools, you can perform sophisticated analysis normally reserved for expensive financial professionals. This book guides you step-by-step through building your own stock analysis tools, forecasting models, and more so you can make smart, data-driven investment decisions. About the Book Investing for Programmers shows you how to analyze investment opportunities using Python and machine learning. In this easy-to-read handbook, experienced algorithmic investor Stefan Papp shows you how to use Pandas, NumPy, and Matplotlib to dissect stock market data, uncover patterns, and build your own trading models. You’ll also discover how to use AI agents and LLMs to enhance your financial research and decision-making process. What's Inside Build stock analysis tools and predictive models Design algorithmic trading strategies Use AI to automate investment research Analyze market sentiment with media data mining About the Reader For professional and hobbyist Python programmers with basic personal finance experience. About the Author Stefan Papp combines 20 years of investment experience in stocks, cryptocurrency, and bonds with decades of work as a data engineer, architect, and software consultant. Quotes Especially valuable for anyone looking to improve their investing. - Armen Kherlopian, Covenant Venture Capital A great breadth of topics—from basic finance concepts to cutting-edge technology. - Ilya Kipnis, Quantstrat Trader A top tip for people who want to leverage development skills to improve their investment possibilities. - Michael Zambiasi, Raiffeisen Digital Bank Brilliantly bridges the worlds of coding and finance. - Thomas Wiecki, PyMC Labs

Statistics Every Programmer Needs

Put statistics into practice with Python! Data-driven decisions rely on statistics. Statistics Every Programmer Needs introduces the statistical and quantitative methods that will help you go beyond “gut feeling” for tasks like predicting stock prices or assessing quality control, with examples using the rich tools of the Python ecosystem. Statistics Every Programmer Needs will teach you how to: Apply foundational and advanced statistical techniques Build predictive models and simulations Optimize decisions under constraints Interpret and validate results with statistical rigor Implement quantitative methods using Python In this hands-on guide, stats expert Gary Sutton blends the theory behind these statistical techniques with practical Python-based applications, offering structured, reproducible, and defensible methods for tackling complex decisions. Well-annotated and reusable Python code listings illustrate each method, with examples you can follow to practice your new skills. About the Technology Whether you’re analyzing application performance metrics, creating relevant dashboards and reports, or immersing yourself in a numbers-heavy coding project, every programmer needs to know how to turn raw data into actionable insight. Statistics and quantitative analysis are the essential tools every programmer needs to clarify uncertainty, optimize outcomes, and make informed choices. About the Book Statistics Every Programmer Needs teaches you how to apply statistics to the everyday problems you’ll face as a software developer. Each chapter is a new tutorial. You’ll predict ultramarathon times using linear regression, forecast stock prices with time series models, analyze system reliability using Markov chains, and much more. The book emphasizes a balance between theory and hands-on Python implementation, with annotated code and real-world examples to ensure practical understanding and adaptability across industries. What's Inside Probability basics and distributions Random variables Regression Decision trees and random forests Time series analysis Linear programming Monte Carlo and Markov methods and much more About the Reader Examples are in Python. About the Author Gary Sutton is a business intelligence and analytics leader and the author of Statistics Slam Dunk: Statistical analysis with R on real NBA data. Quotes A well-organized tour of the statistical, machine learning and optimization tools every data science programmer needs. - Peter Bruce, Author of Statistics for Data Science and Analytics Turns statistics from a stumbling block into a superpower. Clear, relevant, and written with a coder’s mindset! - Mahima Bansod, LogicMonitor Essential! Stats and modeling with an emphasis on real-world system design. - Anupam Samanta, Google A great blend of theory and practice. - Ariel Andres, Scotia Global Asset Management

96 Common Challenges in Power Query: Practical Solutions for Mastering Data Transformation in Excel and Power BI

This comprehensive guide is designed to address the most frequent and challenging issues faced by users of Power Query, a powerful data transformation tool integrated into Excel, Power BI, and Microsoft Azure. By tackling 96 real-world problems with practical, step-by-step solutions, this book is an essential resource for data analysts, Excel enthusiasts, and Power BI professionals. It aims to enhance your data transformation skills and improve efficiency in handling complex data sets. Structured into 12 chapters, the book covers specific areas of Power Query such as data extraction, referencing, column splitting and merging, sorting and filtering, and pivoting and unpivoting tables. You will learn to combine data from Excel files with varying column names, handle multi-row headers, perform advanced filtering, and manage missing values using techniques such as linear interpolation and K-nearest neighbors (K-NN) imputation. The book also dives into advanced Power Query functions such as Table.Group, List.Accumulate, and List.Generate, explored through practical examples such as calculating running totals and implementing complex grouping and iterative processes. Additionally, it covers crucial topics such as error-handling strategies, custom function creation, and the integration of Python and R with Power Query. In addition to providing explanations on the use of functions and the M language for solving real-world challenges, this book discusses optimization techniques for data cleaning processes and improving computational speed. It also compares the execution time of functions across different patterns and proposes the optimal approach based on these comparisons. In today’s data-driven world, mastering Power Query is crucial for accurate and efficient data processing. But as data complexity grows, so do the challenges and pitfalls that users face. This book serves as your guide through the noise and your key to unlocking the full potential of Power Query. You’ll quickly learn to navigate and resolve common issues, enabling you to transform raw data into actionable insights with confidence and precision. What You Will Learn Master data extraction and transformation techniques for various Excel file structures Apply advanced filtering, sorting, and grouping methods to organize and analyze data Leverage powerful functions such as Table.Group, List.Accumulate, and List.Generate for complex transformations Optimize queries to execute faster Create and utilize custom functions to handle iterative processes and advanced list transformation Implement effective error-handling strategies, including removing erroneous rows and extracting error reasons Customize Power Query solutions to meet specific business needs and share custom functions across files Who This Book Is For Aspiring and developing data professionals using Power Query in Excel or Power BI who seek practical solutions to enhance their skills and streamline complex data transformation workflows

Data Without Labels

Discover all-practical implementations of the key algorithms and models for handling unlabeled data. Full of case studies demonstrating how to apply each technique to real-world problems. In Data Without Labels you’ll learn: Fundamental building blocks and concepts of machine learning and unsupervised learning Data cleaning for structured and unstructured data like text and images Clustering algorithms like K-means, hierarchical clustering, DBSCAN, Gaussian Mixture Models, and Spectral clustering Dimensionality reduction methods like Principal Component Analysis (PCA), SVD, Multidimensional scaling, and t-SNE Association rule algorithms like aPriori, ECLAT, SPADE Unsupervised time series clustering, Gaussian Mixture models, and statistical methods Building neural networks such as GANs and autoencoders Dimensionality reduction methods like Principal Component Analysis and multidimensional scaling Association rule algorithms like aPriori, ECLAT, and SPADE Working with Python tools and libraries like sci-kit learn, numpy, Pandas, matplotlib, Seaborn, Keras, TensorFlow, and Flask How to interpret the results of unsupervised learning Choosing the right algorithm for your problem Deploying unsupervised learning to production Maintenance and refresh of an ML solution Data Without Labels introduces mathematical techniques, key algorithms, and Python implementations that will help you build machine learning models for unannotated data. You’ll discover hands-off and unsupervised machine learning approaches that can still untangle raw, real-world datasets and support sound strategic decisions for your business. Don’t get bogged down in theory—the book bridges the gap between complex math and practical Python implementations, covering end-to-end model development all the way through to production deployment. You’ll discover the business use cases for machine learning and unsupervised learning, and access insightful research papers to complete your knowledge. About the Technology Generative AI, predictive algorithms, fraud detection, and many other analysis tasks rely on cheap and plentiful unlabeled data. Machine learning on data without labels—or unsupervised learning—turns raw text, images, and numbers into insights about your customers, accurate computer vision, and high-quality datasets for training AI models. This book will show you how. About the Book Data Without Labels is a comprehensive guide to unsupervised learning, offering a deep dive into its mathematical foundations, algorithms, and practical applications. It presents practical examples from retail, aviation, and banking using fully annotated Python code. You’ll explore core techniques like clustering and dimensionality reduction along with advanced topics like autoencoders and GANs. As you go, you’ll learn where to apply unsupervised learning in business applications and discover how to develop your own machine learning models end-to-end. What's Inside Master unsupervised learning algorithms Real-world business applications Curate AI training datasets Explore autoencoders and GANs applications About the Reader Intended for data science professionals. Assumes knowledge of Python and basic machine learning. About the Author Vaibhav Verdhan is a seasoned data science professional with extensive experience working on data science projects in a large pharmaceutical company. Quotes An invaluable resource for anyone navigating the complexities of unsupervised learning. A must-have. - Ganna Pogrebna, The Alan Turing Institute Empowers the reader to unlock the hidden potential within their data. - Sonny Shergill, Astra Zeneca A must-have for teams working with unstructured data. Cuts through the fog of theory ili Explains the theory and delivers practical solutions. - Leonardo Gomes da Silva, onGRID Sports Technology The Bible for unsupervised learning! Full of real-world applications, clear explanations, and excellent Python implementations. - Gary Bake, Falconhurst Technologies

Applied Machine Learning for Data Science Practitioners

A single-volume reference on data science techniques for evaluating and solving business problems using Applied Machine Learning (ML). Applied Machine Learning for Data Science Practitioners offers a practical, step-by-step guide to building end-to-end ML solutions for real-world business challenges, empowering data science practitioners to make informed decisions and select the right techniques for any use case. Unlike many data science books that focus on popular algorithms and coding, this book takes a holistic approach. It equips you with the knowledge to evaluate a range of techniques and algorithms. The book balances theoretical concepts with practical examples to illustrate key concepts, derive insights, and demonstrate applications. In addition to code snippets and reviewing output, the book provides guidance on interpreting results. This book is an essential resource if you are looking to elevate your understanding of ML and your technical capabilities, combining theoretical and practical coding examples. A basic understanding of using data to solve business problems, high school-level math and statistics, and basic Python coding skills are assumed. Written by a recognized data science expert, Applied Machine Learning for Data Science Practitioners covers essential topics, including: Data Science Fundamentals that provide you with an overview of core concepts, laying the foundation for understanding ML. Data Preparation covers the process of framing ML problems and preparing data and features for modeling. ML Problem Solving introduces you to a range of ML algorithms, including Regression, Classification, Ranking, Clustering, Patterns, Time Series, and Anomaly Detection. Model Optimization explores frameworks, decision trees, and ensemble methods to enhance performance and guide the selection of the most effective model. ML Ethics addresses ethical considerations, including fairness, accountability, transparency, and ethics. Model Deployment and Monitoring focuses on production deployment, performance monitoring, and adapting to model drift.

Think Stats, 3rd Edition

If you know how to program, you have the skills to turn data into knowledge. This thoroughly revised edition presents statistical concepts computationally, rather than mathematically, using programs written in Python. Through practical examples and exercises based on real-world datasets, you'll learn the entire process of exploratory data analysis—from wrangling data and generating statistics to identifying patterns and testing hypotheses. Whether you're a data scientist, software engineer, or data enthusiast, you'll get up to speed on commonly used tools including NumPy, SciPy, and Pandas. You'll explore distributions, relationships between variables, visualization, and many other concepts. And all chapters are available as Jupyter notebooks, so you can read the text, run the code, and work on exercises all in one place. Analyze data distributions and visualize patterns using Python libraries Improve predictions and insights with regression models Dive into specialized topics like time series analysis and survival analysis Integrate statistical techniques and tools for validation, inference, and more Communicate findings with effective data visualization Troubleshoot common data analysis challenges Boost reproducibility and collaboration in data analysis projects with interactive notebooks

3D Data Science with Python

Our physical world is grounded in three dimensions. To create technology that can reason about and interact with it, our data must be 3D too. This practical guide offers data scientists, engineers, and researchers a hands-on approach to working with 3D data using Python. From 3D reconstruction to 3D deep learning techniques, you'll learn how to extract valuable insights from massive datasets, including point clouds, voxels, 3D CAD models, meshes, images, and more. Dr. Florent Poux helps you leverage the potential of cutting-edge algorithms and spatial AI models to develop production-ready systems with a focus on automation. You'll get the 3D data science knowledge and code to: Understand core concepts and representations of 3D data Load, manipulate, analyze, and visualize 3D data using powerful Python libraries Apply advanced AI algorithms for 3D pattern recognition (supervised and unsupervised) Use 3D reconstruction techniques to generate 3D datasets Implement automated 3D modeling and generative AI workflows Explore practical applications in areas like computer vision/graphics, geospatial intelligence, scientific computing, robotics, and autonomous driving Build accurate digital environments that spatial AI solutions can leverage Florent Poux is an esteemed authority in the field of 3D data science who teaches and conducts research for top European universities. He's also head professor at the 3D Geodata Academy and innovation director for French Tech 120 companies.

Data Insight Foundations: Step-by-Step Data Analysis with R

This book is an essential guide designed to equip you with the vital tools and knowledge needed to excel in data science. Master the end-to-end process of data collection, processing, validation, and imputation using R, and understand fundamental theories to achieve transparency with literate programming, renv, and Git--and much more. Each chapter is concise and focused, rendering complex topics accessible and easy to understand. Data Insight Foundations caters to a diverse audience, including web developers, mathematicians, data analysts, and economists, and its flexible structure allows enables you to explore chapters in sequence or navigate directly to the topics most relevant to you. While examples are primarily in R, a basic understanding of the language is advantageous but not essential. Many chapters, especially those focusing on theory, require no programming knowledge at all. Dive in and discover how to manipulate data, ensure reproducibility, conduct thorough literature reviews, collect data effectively, and present your findings with clarity. What You Will Learn Data Management: Master the end-to-end process of data collection, processing, validation, and imputation using R. Reproducible Research: Understand fundamental theories and achieve transparency with literate programming, renv, and Git. Academic Writing: Conduct scientific literature reviews and write structured papers and reports with Quarto. Survey Design: Design well-structured surveys and manage data collection effectively. Data Visualization: Understand data visualization theory and create well-designed and captivating graphics using ggplot2. Who this Book is For Career professionals such as research and data analysts transitioning from academia to a professional setting where production quality significantly impacts career progression. Some familiarity with data analytics processes and an interest in learning R or Python are ideal.

Effective Data Analysis

Learn the technical and soft skills you need to succeed in your career as a data analyst. You’ve learned how to use Python, R, SQL, and the statistical skills needed to get started as a data analyst—so, what’s next? Effective Data Analysis bridges the gap between foundational skills and real-world application. This book provides clear, actionable guidance on transforming business questions into impactful data projects, ensuring you’re tracking the right metrics, and equipping you with a modern data analyst’s essential toolbox. In Effective Data Analysis, you’ll gain the skills needed to excel as a data analyst, including: Maximizing the impact of your analytics projects and deliverables Identifying and leveraging data sources to enhance organizational insights Mastering statistical tests, understanding their strengths, limitations, and when to use them Overcoming the challenges and caveats at every stage of an analytics project Applying your expertise across a variety of domains with confidence Effective Data Analysis is full of sage advice on how to be an effective data analyst in a real production environment. Inside, you’ll find methods that enhance the value of your work—from choosing the right analysis approach, to developing a data-informed organizational culture. About the Technology Data analysts need top-notch knowledge of statistics and programming. They also need to manage clueless stakeholders, navigate messy problems, and advocate for resources. This unique book covers the essential technical topics and soft skills you need to be effective in the real world. About the Book Effective Data Analysis helps you lock down those skills along with unfiltered insight into what the job really looks like. You’ll build out your technical toolbox with tips for defining metrics, testing code, automation, sourcing data, and more. Along the way, you’ll learn to handle the human side of data analysis, including how to turn vague requirements into efficient data pipelines. And you’re sure to love author Mona Khalil’s illustrations, industry examples, and a friendly writing style. What's Inside Identify and incorporate external data Communicate with non-technical stakeholders Apply and interpret statistical tests Techniques to approach any business problem About the Reader Written for early-career data analysts, but useful for all. About the Author Mona Khalil is the Senior Manager of Analytics Engineering at Justworks. Quotes Your roadmap to becoming a standout data analyst! An intriguing blend of technical expertise and practical wisdom. - Chester Ismay, MATE Seminars A thoughtful guide to delivering real-world data analysis. It will be an eye-opening read for all data professionals! - David Lee, Justworks Inc. Compelling insights into the relationship between organizations and data. The real-life examples will help you excel in your data career. - Jeremy Moulton, Greenhouse Mona’s wide range of experience shines in her thoughtful, relevant examples. - Jessica Cherny, Fivetran

Hands-On APIs for AI and Data Science

Are you ready to grow your skills in AI and data science? A great place to start is learning to build and use APIs in real-world data and AI projects. API skills have become essential for AI and data science success, because they are used in a variety of ways in these fields. With this practical book, data scientists and software developers will gain hands-on experience developing and using APIs with the Python programming language and popular frameworks like FastAPI and StreamLit. As you complete the chapters in the book, you'll be creating portfolio projects that teach you how to: Design APIs that data scientists and AIs love Develop APIs using Python and FastAPI Deploy APIs using multiple cloud providers Create data science projects such as visualizations and models using APIs as a data source Access APIs using generative AI and LLMs

The Well-Grounded Data Analyst

Complete eight data science projects that lock in important real-world skills—along with a practical process you can use to learn any new technique quickly and efficiently. Data analysts need to be problem solvers—and The Well-Grounded Data Analyst will teach you how to solve the most common problems you'll face in industry. You'll explore eight scenarios that your class or bootcamp won’t have covered, so you can accomplish what your boss is asking for. In The Well-Grounded Data Analyst you'll learn: High-value skills to tackle specific analytical problems Deconstructing problems for faster, practical solutions Data modeling, PDF data extraction, and categorical data manipulation Handling vague metrics, deciphering inherited projects, and defining customer records The Well-Grounded Data Analyst is for junior and early-career data analysts looking to supplement their foundational data skills with real-world problem solving. As you explore each project, you'll also master a proven process for quickly learning new skills developed by author and Half Stack Data Science podcast host David Asboth. You'll learn how to determine a minimum viable answer for your stakeholders, identify and obtain the data you need to deliver, and reliably present and iterate on your findings. The book can be read cover-to-cover or opened to the chapter most relevant to your current challenges. About the Technology Real world data analysis is messy. Success requires tackling challenges like unreliable data sources, ambiguous requests, and incompatible formats—often with limited guidance. This book goes beyond the clean, structured examples you see in classrooms and bootcamps, offering a step-by-step framework you can use to confidently solve any data analysis problem like a pro. About the Book The Well-Grounded Data Analyst introduces you to eight scenarios that every data analyst is bound to face. You’ll practice author David Asboth’s results-oriented approach as you model data by identifying customer records, navigate poorly-defined metrics, extract data from PDFs, and much more! It also teaches you how to take over incomplete projects and create rapid prototypes with real data. Along the way, you’ll build an impressive portfolio of projects you can showcase at your next interview. What's Inside Deconstructing problems Handling vague metrics Data modeling Categorical data manipulation About the Reader For early-career data scientists. About the Author David Asboth is a data generalist educator, and software architect. He co-hosts the Half Stack Data Science podcast. Quotes Well reasoned and well written, with approaches to solve many sorts of data analysis problems. - Naomi Ceder, Fellow of the Python Software Foundation An excellent resource for any aspiring data scientist! - Andrew R. Freed, IBM David’s clear and repeatable framework will give you confidence to tackle open-ended stakeholder requests and reach an answer much faster! - Shaun McGirr, DevOn Software Services A book version of shadowing a senior data analyst while they explain handling frequent data problems at work, including all the ugly gotchas. - Randy Au, Google

Causal Inference for Data Science

When you know the cause of an event, you can affect its outcome. This accessible introduction to causal inference shows you how to determine causality and estimate effects using statistics and machine learning. A/B tests or randomized controlled trials are expensive and often unfeasible in a business environment. Causal Inference for Data Science reveals the techniques and methodologies you can use to identify causes from data, even when no experiment or test has been performed. In Causal Inference for Data Science you will learn how to: Model reality using causal graphs Estimate causal effects using statistical and machine learning techniques Determine when to use A/B tests, causal inference, and machine learning Explain and assess objectives, assumptions, risks, and limitations Determine if you have enough variables for your analysis It’s possible to predict events without knowing what causes them. Understanding causality allows you both to make data-driven predictions and also intervene to affect the outcomes. Causal Inference for Data Science shows you how to build data science tools that can identify the root cause of trends and events. You’ll learn how to interpret historical data, understand customer behaviors, and empower management to apply optimal decisions. About the Technology Why did you get a particular result? What would have lead to a different outcome? These are the essential questions of causal inference. This powerful methodology improves your decisions by connecting cause and effect—even when you can’t run experiments, A/B tests, or expensive controlled trials. About the Book Causal Inference for Data Science introduces techniques to apply causal reasoning to ordinary business scenarios. And with this clearly-written, practical guide, you won’t need advanced statistics or high-level math to put causal inference into practice! By applying a simple approach based on Directed Acyclic Graphs (DAGs), you’ll learn to assess advertising performance, pick productive health treatments, deliver effective product pricing, and more. What's Inside When to use A/B tests, causal inference, and ML Assess objectives, assumptions, risks, and limitations Apply causal inference to real business data About the Reader For data scientists, ML engineers, and statisticians. About the Author Aleix Ruiz de Villa Robert is a data scientist with a PhD in mathematical analysis from the Universitat Autònoma de Barcelona. Quotes With intuitive explanations, application-focused insights, and real-world examples, this book offers immense practical value. - Philipp Bach, Maintainer of the DoubleML libraries for Python and R An essential guide for navigating the complexities of real-world data analysis. - Adi Shavit, SWAPP A must-read! Demystifies causal inference with a blend of theory and practice. - Karan Gupta, SunPower Corporation Causal relationships can mask and distort results. This book provides a set of tools to extract insights correctly. - Peter V. Henstock, Harvard Extension