O'Reilly Data Science Books

Exam Ref DP-100 Designing and Implementing a Data Science Solution on Azure

2024-12-06 O'Reilly Amazon

book

Dayne Sorvisto

it-operations cloud-computing cloud-platforms microsoft-azure microsoft-azure-certifications microsoft-azure-certifications-associate-tier

Prepare for Microsoft Exam DP-100 and demonstrate your real-world knowledge of managing data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning, and MLflow. Designed for professionals with data science experience, this Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified: Azure Data Scientist Associate level. Focus on the expertise measured by these objectives: Design and prepare a machine learning solution Explore data and train models Prepare a model for deployment Deploy and retrain a model This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have experience in designing and creating a suitable working environment for data science workloads, training machine learning models, and managing, deploying, and monitoring scalable machine learning solutions About the Exam Exam DP-100 focuses on knowledge needed to design and prepare a machine learning solution, manage an Azure Machine Learning workspace, explore data and train models, create models by using the Azure Machine Learning designer, prepare a model for deployment, manage models in Azure Machine Learning, deploy and retrain a model, and apply machine learning operations (MLOps) practices. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Azure Data Scientist Associate credential, demonstrating your expertise in applying data science and machine learning to implement and run machine learning workloads on Azure, including knowledge and experience using Azure Machine Learning and MLflow.

Just Enough Data Science and Machine Learning: Essential Tools and Techniques

2024-12-05 O'Reilly Amazon

book

Mark Levene , Martyn Harris

data data-science AI/ML Data Science DataViz Python

An accessible introduction to applied data science and machine learning, with minimal math and code required to master the foundational and technical aspects of data science. In Just Enough Data Science and Machine Learning, authors Mark Levene and Martyn Harris present a comprehensive and accessible introduction to data science. It allows the readers to develop an intuition behind the methods adopted in both data science and machine learning, which is the algorithmic component of data science involving the discovery of patterns from input data. This book looks at data science from an applied perspective, where emphasis is placed on the algorithmic aspects of data science and on the fundamental statistical concepts necessary to understand the subject. The book begins by exploring the nature of data science and its origins in basic statistics. The authors then guide readers through the essential steps of data science, starting with exploratory data analysis using visualisation tools. They explain the process of forming hypotheses, building statistical models, and utilising algorithmic methods to discover patterns in the data. Finally, the authors discuss general issues and preliminary concepts that are needed to understand machine learning, which is central to the discipline of data science. The book is packed with practical examples and real-world data sets throughout to reinforce the concepts. All examples are supported by Python code external to the reading material to keep the book timeless. Notable features of this book: Clear explanations of fundamental statistical notions and concepts Coverage of various types of data and techniques for analysis In-depth exploration of popular machine learning tools and methods Insight into specific data science topics, such as social networks and sentiment analysis Practical examples and case studies for real-world application Recommended further reading for deeper exploration of specific topics. ....

Pandas Cookbook - Third Edition

2024-10-31 O'Reilly Amazon

book

William Ayd , Matthew Harrison

data data-science data-science-tools Pandas Data Science NumPy

Discover the power of pandas for your data analysis tasks. Pandas Cookbook provides practical, hands-on recipes for mastering pandas 2.x, guiding you through real-world scenarios quickly and effectively. What this Book will help me do Efficiently manipulate and clean data using pandas. Perform advanced grouping and aggregation operations. Handle time series data with pandas robust functions. Optimize pandas code for better performance. Integrate pandas with tools like NumPy and databases. Author(s) William Ayd and Matthew Harrison co-authored this insightful cookbook. With years of practical experience in data science and Python development, both authors aim to make data analysis accessible and efficient using pandas. Who is it for? This book is perfect for Python developers and data analysts looking to enhance their data manipulation skills. Whether you're a beginner aiming to understand pandas or a professional seeking advanced insights, this book is tailored for anyone handling structured data.

Hands-On Prescriptive Analytics

2024-10-22 O'Reilly Amazon

book

Walter R. Paczkowski

data data-science business-intelligence prescriptive-analytics Analytics Cloud Computing

Business decisions in any context—operational, tactical, or strategic—can have considerable consequences. Whether the outcome is positive and rewarding or negative and damaging to the business, its employees, and stakeholders is unknown when action is approved. These decisions are usually made under the proverbial cloud of uncertainty. With this practical guide, data analysts, data scientists, and business analysts will learn why and how maximizing positive consequences and minimizing negative ones requires three forms of rich information: Descriptive analytics explores the results from an action—what has already happened. Predictive analytics focuses on what could happen. The third, prescriptive analytics, informs us what should happen in the future. While all three are important for decision-makers, the primary focus of this book is on the third: prescriptive analytics. Author Walter R. Paczkowski, Ph.D. shows you: The distinction among descriptive, predictive, and prescriptive analytics How predictive analytics produces a menu of action options How prescriptive analytics narrows the menu of action options The forms of prescriptive analytics: eight prescriptive methods Two broad classes of these methods: non-stochastic and stochastic How to develop prescriptive analyses for action recommendations Ways to use an appropriate tool-set in Python

Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib

2024-09-27 O'Reilly Amazon

book

Robert Johansson

data data-science data-science-tools NumPy AI/ML Analytics

Learn how to leverage the scientific computing and data analysis capabilities of Python, its standard library, and popular open-source numerical Python packages like NumPy, SymPy, SciPy, matplotlib, and more. This book demonstrates how to work with mathematical modeling and solve problems with numerical, symbolic, and visualization techniques. It explores applications in science, engineering, data analytics, and more. Numerical Python, Third Edition, presents many case study examples of applications in fundamental scientific computing disciplines, as well as in data science and statistics. This fully revised edition, updated for each library's latest version, demonstrates Python's power for rapid development and exploratory computing due to its simple and high-level syntax and many powerful libraries and tools for computation and data analysis. After reading this book, readers will be familiar with many computing techniques, including array-based and symbolic computing, visualization and numerical file I/O, equation solving, optimization, interpolation and integration, and domain-specific computational problems, such as differential equation solving, data analysis, statistical modeling, and machine learning. What You'll Learn Work with vectors and matrices using NumPy Review Symbolic computing with SymPy Plot and visualize data with Matplotlib Perform data analysis tasks with Pandas and SciPy Understand statistical modeling and machine learning with statsmodels and scikit-learn Optimize Python code using Numba and Cython Who This Book Is For Developers who want to understand how to use Python and its ecosystem of libraries for scientific computing and data analysis.

Data Storytelling with Altair and AI

2024-09-10 O'Reilly Amazon

book

Angelica Lo Duca

data data-science data-science-tasks data-visualization python-viz-tools AI/ML

Great data presentations tell a story. Learn how to organize, visualize, and present data using Python, generative AI, and the cutting-edge Altair data visualization toolkit. Take the fast track to amazing data presentations! Data Storytelling with Altair and AI introduces a stack of useful tools and tried-and-tested methodologies that will rapidly increase your productivity, streamline the visualization process, and leave your audience inspired. In Data Storytelling with Altair and AI you’ll discover: Using Python Altair for data visualization Using Generative AI tools for data storytelling The main concepts of data storytelling Building data stories with the DIKW pyramid approach Transforming raw data into a data story Data Storytelling with Altair and AI teaches you how to turn raw data into effective, insightful data stories. You’ll learn exactly what goes into an effective data story, then combine your Python data skills with the Altair library and AI tools to rapidly create amazing visualizations. Your bosses and decision-makers will love your new presentations—and you’ll love how quick Generative AI makes the whole process! About the Technology Every dataset tells a story. After you’ve cleaned, crunched, and organized the raw data, it’s your job to share its story in a way that connects with your audience. Python’s Altair data visualization library, combined with generative AI tools like Copilot and ChatGPT, provide an amazing toolbox for transforming numbers, code, text, and graphics into intuitive data presentations. About the Book Data Storytelling with Altair and AI teaches you how to build enhanced data visualizations using these tools. The book uses hands-on examples to build powerful narratives that can inform, inspire, and motivate. It covers the Altair data visualization library, along with AI techniques like generating text with ChatGPT, creating images with DALL-E, and Python coding with Copilot. You’ll learn by practicing with each interesting data story, from tourist arrivals in Portugal to population growth in the USA to fake news, salmon aquaculture, and more. What's Inside The Data-Information-Knowledge-Wisdom (DIKW) pyramid Publish data stories using Streamlit, Tableau, and Comet Vega and Vega-Lite visualization grammar About the Reader For data analysts and data scientists experienced with Python. No previous knowledge of Altair or Generative AI required. About the Author Angelica Lo Duca is a researcher at the Institute of Informatics and Telematics of the National Research Council, Italy. The technical editor on this book was Ninoslav Cerkez. Quotes This book’s step-by-step approach, illustrated through real-world examples, makes complex data accessible and actionable. - Alexey Grigorev, DataTalks.Club A clear and concise guide to data storytelling. Highly recommended. - Andrew Madson, Insights x Design Data storytelling in a way that anyone can do! This book feels ahead of its time. - Avery Smith, Data Career Jumpstart Excellent hands-on exercises that combine two of my favorite tools: AI and the Altair library. - Jose Berengueres, Author of DataViz and Storytelling

Statistics for Data Science and Analytics

2024-09-04 O'Reilly Amazon

book

Janet Dobbins , Peter C. Bruce , Peter Gedeck

data data-science data-science-tasks statistics AI/ML Analytics

Introductory statistics textbook with a focus on data science topics such as prediction, correlation, and data exploration Statistics for Data Science and Analytics is a comprehensive guide to statistical analysis using Python, presenting important topics useful for data science such as prediction, correlation, and data exploration. The authors provide an introduction to statistical science and big data, as well as an overview of Python data structures and operations. A range of statistical techniques are presented with their implementation in Python, including hypothesis testing, probability, exploratory data analysis, categorical variables, surveys and sampling, A/B testing, and correlation. The text introduces binary classification, a foundational element of machine learning, validation of statistical models by applying them to holdout data, and probability and inference via the easy-to-understand method of resampling and the bootstrap instead of using a myriad of “kitchen sink” formulas. Regression is taught both as a tool for explanation and for prediction. This book is informed by the authors’ experience designing and teaching both introductory statistics and machine learning at Statistics.com. Each chapter includes practical examples, explanations of the underlying concepts, and Python code snippets to help readers apply the techniques themselves. Statistics for Data Science and Analytics includes information on sample topics such as: Int, float, and string data types, numerical operations, manipulating strings, converting data types, and advanced data structures like lists, dictionaries, and sets Experiment design via randomizing, blinding, and before-after pairing, as well as proportions and percents when handling binary data Specialized Python packages like numpy, scipy, pandas, scikit-learn and statsmodels—the workhorses of data science—and how to get the most value from them Statistical versus practical significance, random number generators, functions for code reuse, and binomial and normal probability distributions Written by and for data science instructors, Statistics for Data Science and Analytics is an excellent learning resource for data science instructors prescribing a required intro stats course for their programs, as well as other students and professionals seeking to transition to the data science field.

Polars Cookbook

2024-08-23 O'Reilly Amazon

book

Yuki Kakegawa

data data-science data-science-tools Pandas Analytics Big Data

Dive into the world of data analysis with the Polars Cookbook. This book, ideal for data professionals, covers practical recipes to manipulate, transform, and analyze data using the Python Polars library. You'll learn both the fundamentals and advanced techniques to build efficient and scalable data workflows. What this Book will help me do Master the basics of Python Polars including installation and setup. Perform complex data manipulation like pivoting, grouping, and joining. Handle large-scale time series data for accurate analysis. Understand data integration with libraries like pandas and numpy. Optimize workflows for both on-premise and cloud environments. Author(s) Yuki Kakegawa is an experienced data analytics consultant who has collaborated with companies such as Microsoft and Stanford Health Care. His passion for data led him to create this detailed guide on Polars. His expertise ensures you gain real-world, actionable insights from every chapter. Who is it for? This book is perfect for data analysts, engineers, and scientists eager to enhance their efficiency with Python Polars. If you are familiar with Python and tools like pandas but are new to Polars, this book will upskill you. Whether handling big data or optimizing code for performance, the Polars Cookbook has the guidance you need to succeed.

DuckDB in Action

2024-08-21 O'Reilly Amazon

book

Michael Simons , Mark Needham , Michael Hunger

data data-science data-science-tools Pandas Analytics API

Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: Read and process data from CSV, JSON and Parquet sources both locally and remote Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the Technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the Book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's Inside Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Fast-paced SQL recap: From simple queries to advanced analytics About the Reader For data pros comfortable with Python and CLI tools. About the Authors Mark Needham is a blogger and video creator at @‌LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j. Quotes I use DuckDB every day, and I still learned a lot about how DuckDB makes things that are hard in most databases easy! - Jordan Tigani, Founder, MotherDuck An excellent resource! Unlocks possibilities for storing, processing, analyzing, and summarizing data at the edge using DuckDB. - Pramod Sadalage, Director, Thoughtworks Clear and accessible. A comprehensive resource for harnessing the power of DuckDB for both novices and experienced professionals. - Qiusheng Wu, Associate Professor, University of Tennessee Excellent! The book all we ducklings have been waiting for! - Gunnar Morling, Decodable

Getting Started with DuckDB

2024-06-24 O'Reilly Amazon

book

Simon Aubury , Ned Letcher

data data-science data-science-tools Pandas Analytics Data Analytics

Unlock the full potential of DuckDB with 'Getting Started with DuckDB,' your guide to mastering data analysis efficiently. By reading this book, you'll discover how to load, transform, and query data using DuckDB, leveraging its unique capabilities for processing large datasets. Gain hands-on experience with SQL, Python, and R to enhance your data science and engineering workflows. What this Book will help me do Effectively load and manage various types of data in DuckDB for seamless processing. Gain hands-on experience writing and optimizing SQL queries tailored for analytical tasks. Integrate DuckDB capabilities into Python and R workflows for streamlined data analysis. Understand DuckDB's optimizations and extensions for specialized data applications. Explore the broader ecosystem of data tools that complement DuckDB's capabilities. Author(s) Simon Aubury and Ned Letcher are seasoned experts in the field of data analytics and engineering. With extensive experience in using both SQL and programming languages like Python and R, they bring practical insights into the innovative uses of DuckDB. They have designed this book to provide a hands-on and approachable way to learn DuckDB, making complex concepts accessible. Who is it for? This book is well-suited for data analysts aiming to accelerate their data analysis workflows, data engineers looking for effective tools for data processing, and data scientists searching for a versatile library for scalable data manipulation. Prior exposure to SQL and programming in Python or R will be beneficial for readers to maximize their learning.

Modern Graph Theory Algorithms with Python

2024-06-07 O'Reilly Amazon

book

Colleen M. Farrelly , Franck Kalala Mutombo

data data-science data-science-tasks graph-analytics AI/ML Analytics

Dive into the fascinating world of graph theory and its applications with 'Modern Graph Theory Algorithms with Python.' Through Python programming and real-world case studies, this book equips you with the tools to transform data into graph structures, apply algorithms, and uncover insights, enabling effective solutions in diverse domains such as finance, epidemiology, and social networks. What this Book will help me do Understand how to wrangle a variety of data types into network formats suitable for analysis. Learn to use graph theory algorithms and toolkits such as NetworkX and igraph in Python. Apply network theory to predict and analyze trends, from epidemics to stock market dynamics. Explore the intersection of machine learning and graph theory through advanced neural network techniques. Gain expertise in database solutions with graph database querying and applications. Author(s) Colleen M. Farrelly, an experienced data scientist, and Franck Kalala Mutombo, a seasoned software engineer, bring years of expertise in network science and Python programming to every page of this book. Their professional experience includes working on cutting-edge problems in data analytics, graph theory, and scalable solutions for real-world issues. Combining their practical know-how, they deliver a resource aimed at both learning and applying techniques effectively. Who is it for? This book is tailored for data scientists, researchers, and analysts with an interest in using graph-based approaches for solving complex data problems. Ideal for those with a basic Python knowledge and familiarity with libraries like pandas and NumPy, the content bridges the gap between theory and application. It also provides insights into broad fields where network science can be impactful, contributing value to both students and professionals.

Cognitive Science, Computational Intelligence, and Data Analytics

2024-06-06 O'Reilly Amazon

book

Monica Bhatia , Sanjeet Kumar Dwivedi , Vikas Khare

data data-science AI/ML Analytics Data Analytics Python

Cognitive Science, Computational Intelligence, and Data Analytics: Methods and Applications with Python introduces readers to the foundational concepts of data analysis, cognitive science, and computational intelligence, including AI and Machine Learning. The book's focus is on fundamental ideas, procedures, and computational intelligence tools that can be applied to a wide range of data analysis approaches, with applications that include mathematical programming, evolutionary simulation, machine learning, and logic-based models. It offers readers the fundamental and practical aspects of cognitive science and data analysis, exploring data analytics in terms of description, evolution, and applicability in real-life problems. The authors cover the history and evolution of cognitive analytics, methodological concerns in philosophy, syntax and semantics, understanding of generative linguistics, theory of memory and processing theory, structured and unstructured data, qualitative and quantitative data, measurement of variables, nominal, ordinals, intervals, and ratio scale data. The content in this book is tailored to the reader's needs in terms of both type and fundamentals, including coverage of multivariate analysis, CRISP methodology and SEMMA methodology. Each chapter provides practical, hands-on learning with real-world applications, including case studies and Python programs related to the key concepts being presented. Demystifies the theory of data analytics using a step-by-step approach Covers the intersection of cognitive science, computational intelligence, and data analytics by providing examples and case studies with applied algorithms, mathematics, and Python programming code Introduces foundational data analytics techniques such as CRISP-DM, SEMMA, and Object Detection Models in the context of computational intelligence methods and tools Covers key concepts of multivariate and cognitive data analytics such as factor analytics, principal component analytics, linear regression analysis, logistic regression analysis, and value chain applications

Pandas Workout

2024-06-05 O'Reilly Amazon

book

Reuven M. Lerner

data data-science data-science-tools Pandas CSV Data Science

Practice makes perfect pandas! Work out your pandas skills against dozens of real-world challenges, each carefully designed to build an intuitive knowledge of essential pandas tasks. In Pandas Workout you’ll learn how to: Clean your data for accurate analysis Work with rows and columns for retrieving and assigning data Handle indexes, including hierarchical indexes Read and write data with a number of common formats, such as CSV and JSON Process and manipulate textual data from within pandas Work with dates and times in pandas Perform aggregate calculations on selected subsets of data Produce attractive and useful visualizations that make your data come alive Pandas Workout hones your pandas skills to a professional-level through two hundred exercises, each designed to strengthen your pandas skills. You’ll test your abilities against common pandas challenges such as importing and exporting, data cleaning, visualization, and performance optimization. Each exercise utilizes a real-world scenario based on real-world data, from tracking the parking tickets in New York City, to working out which country makes the best wines. You’ll soon find your pandas skills becoming second nature—no more trips to StackOverflow for what is now a natural part of your skillset. About the Technology Python’s pandas library can massively reduce the time you spend analyzing, cleaning, exploring, and manipulating data. And the only path to pandas mastery is practice, practice, and, you guessed it, more practice. In this book, Python guru Reuven Lerner is your personal trainer and guide through over 200 exercises guaranteed to boost your pandas skills. About the Book Pandas Workout is a thoughtful collection of practice problems, challenges, and mini-projects designed to build your data analysis skills using Python and pandas. The workouts use realistic data from many sources: the New York taxi fleet, Olympic athletes, SAT scores, oil prices, and more. Each can be completed in ten minutes or less. You’ll explore pandas’ rich functionality for string and date/time handling, complex indexing, and visualization, along with practical tips for every stage of a data analysis project. What's Inside Clean data with less manual labor Retrieving and assigning data Process and manipulate text Calculations on selected data subsets About the Reader For Python programmers and data analysts. About the Author Reuven M. Lerner teaches Python and data science around the world and publishes the “Bamboo Weekly” newsletter. He is the author of Manning’s Python Workout (2020). Quotes A carefully crafted tour through the pandas library, jam-packed with wisdom that will help you become a better pandas user and a better data scientist. - Kevin Markham, Founder of Data School, Creator of pandas in 30 days Will help you apply pandas to real problems and push you to the next level. - Michael Driscoll, RFA Engineering, creator of Teach Me Python The explanations, paired with Reuven’s storytelling and personal tone, make the concepts simple. I’ll never get them wrong again! - Rodrigo Girão Serrão, Python developer and educator The definitive source! - Kiran Anantha, Amazon

Visualize This, 2nd Edition

2024-05-29 O'Reilly Amazon

book

Nathan Yau

data data-science data-science-tasks data-visualization DataViz JavaScript

One of the most influential data visualization books—updated with new techniques, technologies, and examples Visualize This demonstrates how to explain data visually, so that you can present and communicate information in a way that is appealing and easy to understand. Today, there is a continuous flow of data available to answer almost any question. Thoughtful charts, maps, and analysis can help us make sense of this data. But the data does not speak for itself. As leading data expert Nathan Yau explains in this book, graphics provide little value unless they are built upon a firm understanding of the data behind them. Visualize This teaches you a data-first approach from a practical point of view. You'll start by exploring what your data has to say, and then you'll design visualizations that are both remarkable and meaningful. With this book, you'll discover what tools are available to you without becoming overwhelmed with options. You'll be exposed to a variety of software and code and jump right into real-world datasets so that you can learn visualization by doing. You'll learn to ask and answer questions with data, so that you can make charts that are both beautiful and useful. Visualize This also provides you with opportunities to apply what you learn to your own data. This completely updated, full-color second edition: Presents a unique approach to visualizing and telling stories with data, from data visualization expert Nathan Yau Offers step-by-step tutorials and practical design tips for creating statistical graphics, geographical maps, and information design Details tools that can be used to visualize data graphics for reports, presentations, and stories, for the web or for print, with major updates for the latest R packages, Python libraries, JavaScript libraries, illustration software, and point-and-click applications Contains numerous examples and descriptions of patterns and outliers and explains how to show them Information designers, analysts, journalists, statisticians, data scientists—as well as anyone studying for careers in these fields—will gain a valuable background in the concepts and techniques of data visualization, thanks to this legendary book.

Predictive Analytics for the Modern Enterprise

2024-05-20 O'Reilly Amazon

book

Nooruddin Abbas Ali

data data-science data-science-tasks statistics time-series forecasting

The surging predictive analytics market is expected to grow from $10.5 billion today to $28 billion by 2026. With the rise in automation across industries, the increase in data-driven decision-making, and the proliferation of IoT devices, predictive analytics has become an operational necessity in today's forward-thinking companies. If you're a data professional, you need to be aligned with your company's business activities more than ever before. This practical book provides the background, tools, and best practices necessary to help you design, implement, and operationalize predictive analytics on-premises or in the cloud. Explore ways that predictive analytics can provide direct input back to your business Understand mathematical tools commonly used in predictive analytics Learn the development frameworks used in predictive analytics applications Appreciate the role of predictive analytics in the machine learning process Examine industry implementations of predictive analytics Build, train, and retrain predictive models using Python and TensorFlow

Statistical Tableau

2024-05-02 O'Reilly Amazon

book

Ethan Lang

data data-science data-science-tasks statistics AI/ML Analytics

In today's data-driven world, understanding statistical models is crucial for effective analysis and decision making. Whether you're a beginner or an experienced user, this book equips you with the foundational knowledge to grasp and implement statistical models within Tableau. Gain the confidence to speak fluently about the models you employ, driving adoption of your insights and analysis across your organization. As AI continues to revolutionize industries, possessing the skills to leverage statistical models is no longer optional—it's a necessity. Stay ahead of the curve and harness the full potential of your data by mastering the ability to interpret and utilize the insights generated by these models. Whether you're a data enthusiast, analyst, or business professional, this book empowers you to navigate the ever-evolving landscape of data analytics with confidence and proficiency. Start your journey toward data mastery today. In this book, you will learn: The basics of foundational statistical modeling with Tableau How to prove your analysis is statistically significant How to calculate and interpret confidence intervals Best practices for incorporating statistics into data visualizations How to connect external analytics resources from Tableau using R and Python

Mastering Marketing Data Science

2024-04-29 O'Reilly Amazon

book

Iain Brown

data data-science AI/ML Analytics Data Collection Data Science

Unlock the Power of Data: Transform Your Marketing Strategies with Data Science In the digital age, understanding the symbiosis between marketing and data science is not just an advantage; it's a necessity. In Mastering Marketing Data Science: A Comprehensive Guide for Today's Marketers, Dr. Iain Brown, a leading expert in data science and marketing analytics, offers a comprehensive journey through the cutting-edge methodologies and applications that are defining the future of marketing. This book bridges the gap between theoretical data science concepts and their practical applications in marketing, providing readers with the tools and insights needed to elevate their strategies in a data-driven world. Whether you're a master's student, a marketing professional, or a data scientist keen on applying your skills in a marketing context, this guide will empower you with a deep understanding of marketing data science principles and the competence to apply these principles effectively. Comprehensive Coverage: From data collection to predictive analytics, NLP, and beyond, explore every facet of marketing data science. Practical Applications: Engage with real-world examples, hands-on exercises in both Python & SAS, and actionable insights to apply in your marketing campaigns. Expert Guidance: Benefit from Dr. Iain Brown's decade of experience as he shares cutting-edge techniques and ethical considerations in marketing data science. Future-Ready Skills: Learn about the latest advancements, including generative AI, to stay ahead in the rapidly evolving marketing landscape. Accessible Learning: Tailored for both beginners and seasoned professionals, this book ensures a smooth learning curve with a clear, engaging narrative. Mastering Marketing Data Science is designed as a comprehensive how-to guide, weaving together theory and practice to offer a dynamic, workbook-style learning experience. Dr. Brown's voice and expertise guide you through the complexities of marketing data science, making sophisticated concepts accessible and actionable.

Data Science Fundamentals with R, Python, and Open Data

2024-04-16 O'Reilly Amazon

book

Marco Cremonini

software-development programming-languages Python Computer Science CSV Data Science

Data Science Fundamentals with R, Python, and Open Data Introduction to essential concepts and techniques of the fundamentals of R and Python needed to start data science projects Organized with a strong focus on open data, Data Science Fundamentals with R, Python, and Open Data discusses concepts, techniques, tools, and first steps to carry out data science projects, with a focus on Python and RStudio, reflecting a clear industry trend emerging towards the integration of the two. The text examines intricacies and inconsistencies often found in real data, explaining how to recognize them and guiding readers through possible solutions, and enables readers to handle real data confidently and apply transformations to reorganize, indexing, aggregate, and elaborate. This book is full of reader interactivity, with a companion website hosting supplementary material including datasets used in the examples and complete running code (R scripts and Jupyter notebooks) of all examples. Exam-style questions are implemented and multiple choice questions to support the readers’ active learning. Each chapter presents one or more case studies. Written by a highly qualified academic, Data Science Fundamentals with R, Python, and Open Data discuss sample topics such as: Data organization and operations on data frames, covering reading CSV dataset and common errors, and slicing, creating, and deleting columns in R Logical conditions and row selection, covering selection of rows with logical condition and operations on dates, strings, and missing values Pivoting operations and wide form-long form transformations, indexing by groups with multiple variables, and indexing by group and aggregations Conditional statements and iterations, multicolumn functions and operations, data frame joins, and handling data in list/dictionary format Data Science Fundamentals with R, Python, and Open Data is a highly accessible learning resource for students from heterogeneous disciplines where Data Science and quantitative, computational methods are gaining popularity, along with hard sciences not closely related to computer science, and medical fields using stochastic and quantitative models.

Extending Power BI with Python and R - Second Edition

2024-03-29 O'Reilly Amazon

book

Luca Zavarella

data data-science business-intelligence microsoft-power-platform power-bi AI/ML

In "Extending Power BI with Python and R," you'll learn how to enhance your Power BI reports and analyses by leveraging the advanced analytical capabilities of Python and R. From working with large datasets to creating sophisticated visuals, this book provides practical instructions on powerful techniques that unlock new possibilities in Power BI. What this Book will help me do Configure and optimize Python and R integration in Power BI for enhanced performance. Implement advanced data transformation techniques to overcome Power BI limitations. Develop advanced visualizations using the Grammar of Graphics in Python and R. Analyze data leveraging powerful Python and R algorithms, including machine learning models. Secure your Power BI data with anonymization and pseudonymization techniques. Author(s) None Zavarella is a data analytics expert with years of practical experience in business intelligence and data analytics. With a passion for enhancing data tools with programming languages like Python and R, they bring practical knowledge and technical acumen to this comprehensive resource. They aim to make complex concepts approachable to their readers. Who is it for? This book is aimed at professionals such as business analysts, business intelligence specialists, and data scientists who leverage Power BI for their data solutions. Readers should have a working knowledge of Power BI basics and a desire to extend its capabilities. A familiarity with Python and R programming basics is also beneficial for following the advanced techniques presented.

Cracking the Data Science Interview

2024-02-29 O'Reilly Amazon

book

Aaren Stubberfield , Leondra R. Gonzalez

data data-science AI/ML Bash Data Science Git

"Cracking the Data Science Interview" is your ultimate resource for preparing for roles in the competitive field of data science. With this book, you'll explore essential topics such as Python, SQL, statistics, and machine learning, as well as learn practical skills for building portfolios and acing interviews. Follow its guidance and you'll be equipped to stand out in any data science interview. What this Book will help me do Confidently explain complex statistical and machine learning concepts. Develop models and deploy them while ensuring version control and efficiency. Learn and apply scripting skills in shell and Bash for productivity. Master Git workflows to handle collaborative coding in projects. Perfectly tailor portfolios and resumes to land data science opportunities. Author(s) Leondra R. Gonzalez, with years of data science and mentorship experience, co-authors this book with None Stubberfield, a seasoned expert in technology and machine learning. Together, they integrate their expertise to provide practical advice for navigating the data science job market. Who is it for? If you're preparing for data science interviews, this book is for you. It's ideal for candidates with a foundational knowledge of Python, SQL, and statistics looking to refine and expand their technical and professional skills. Professionals transitioning into data science will also find it invaluable for building confidence and succeeding in this rewarding field.

Web Scraping with Python, 3rd Edition

2024-02-15 O'Reilly Amazon

book

Ryan Mitchell

data data-science data-science-tasks web-scraping API HTML

If programming is magic, then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. This thoroughly updated third edition not only introduces you to web scraping but also serves as a comprehensive guide to scraping almost every type of data from the modern web. Part I focuses on web scraping mechanics: using Python to request information from a web server, performing basic handling of the server's response, and interacting with sites in an automated fashion. Part II explores a variety of more specific tools and applications to fit any web scraping scenario you're likely to encounter. Parse complicated HTML pages Develop crawlers with the Scrapy framework Learn methods to store the data you scrape Read and extract data from documents Clean and normalize badly formatted data Read and write natural languages Crawl through forms and logins Scrape JavaScript and crawl through APIs Use and write image-to-text software Avoid scraping traps and bot blockers Use scrapers to test your website

Learn Python the Hard Way: A Deceptively Simple Introduction to the Terrifyingly Beautiful World of Computers and Data Science, 5th Edition

2024-02-07 O'Reilly Amazon

book

Zed A. Shaw

software-development programming-languages Python Data Science SQL

You Will Learn Python! Zed Shaw has created the world's most reliable system for learning Python. Follow it and you will succeed--just like the millions of beginners Zed has taught to date! You bring the discipline, persistence, and attention; the author supplies the masterful knowledge you need to succeed. In Learn Python the Hard Way, Fifth Edition, you'll learn Python by working through 60 lovingly crafted exercises. Read them. Type in the code. Run it. Fix your mistakes. Repeat. As you do, you'll learn how a computer works, how to solve problems, and how to enjoy programming . . . even when it's driving you crazy. Install a complete Python environment Organize and write code Fix and break code Basic mathematics Strings and text Interact with users Work with files Looping and logic Object-oriented programming Data structures using lists and dictionaries Modules, classes, and objects Python packaging Automated testing Basic SQL for Data Science Web scraping Fixing bad data (munging) The "Data" part of "Data Science" It'll be frustrating at first. But if you keep trying, you'll get it--and it'll feel amazing! This course will reward you for every minute you put into it. Soon, you'll know one of the world's most powerful, popular programming languages. You'll be a Python programmer. This Book Is Perfect For Total beginners with zero programming experience Junior developers who know one or two languages Returning professionals who haven't written code in years Aspiring Data Scientists or academics who need to learn to code Seasoned professionals looking for a fast, simple crash course in Python for Data Science Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Hands-On Entity Resolution

2024-02-02 O'Reilly Amazon

book

Michael Shearer

data data-science data-science-tasks entity-resolution-record-linkage entity resolution / record linkage AI/ML

Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs. Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value. With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers: Challenges in deduplicating and joining datasets Extracting, cleansing, and preparing datasets for matching Text matching algorithms to identify equivalent entities Techniques for deduplicating and joining datasets at scale Matching datasets containing persons and organizations Evaluating data matches Optimizing and tuning data matching algorithms Entity resolution using cloud APIs Matching using privacy-enhancing technologies

Principles of Data Science - Third Edition

2024-01-31 O'Reilly Amazon

book

Sinan Ozdemir

data data-science AI/ML Computer Science Data Science NLP

Principles of Data Science offers an end-to-end introduction to data science fundamentals, blending key mathematical concepts with practical programming. You'll learn how to clean and prepare data, construct predictive models, and leverage modern tools like pre-trained models for NLP and computer vision. By integrating theory and practice, this book sets the foundation for impactful data-driven decision-making. What this Book will help me do Develop a solid understanding of foundational statistics and machine learning. Learn how to clean, transform, and visualize data for impactful analysis. Explore transfer learning and pre-trained models for advanced AI tasks. Understand ethical implications, biases, and governance in AI and ML. Gain the knowledge to implement complete data pipelines effectively. Author(s) Sinan Ozdemir is an experienced data scientist, educator, and author with a deep passion for making complex topics accessible. With a background in computer science and applied statistics, Sinan has taught data science at leading institutions and authored multiple books on the topic. His practical approach to teaching combines real-world examples with insightful explanations, ensuring learners gain both competence and confidence. Who is it for? This book is ideal for beginners in data science who want to gain a comprehensive understanding of the field. If you have a background in programming or mathematics and are eager to combine these skills to analyze and extract insights from data, this book will guide you. Individuals working with machine learning or AI who need to solidify their foundational knowledge will find it invaluable. Some familiarity with Python is recommended to follow along seamlessly.

Data Science for Web3

2023-12-29 O'Reilly Amazon

book

Gabriela Castillo Areco

data data-science AI/ML Analytics Blockchain Data Science

Discover how to navigate the world of Web3 data with 'Data Science for Web3,' an expertly crafted guide by Gabriela Castillo Areco. Through practical examples, industry insights, and real-world use cases, you will learn the skills needed to analyze blockchain data and extract actionable business insights. What this Book will help me do Understand blockchain transactions and data structures to build robust datasets. Leverage on-chain and off-chain data for valuable Web3 business insights. Create DeFi- and NFT-specific datasets for targeted analysis. Develop machine learning models tailored for blockchain use cases. Apply data science techniques to innovate in the Web3 ecosystem. Author(s) Gabriela Castillo Areco is a seasoned data scientist and an expert in blockchain analytics. With years of experience in the technology and finance sectors, Gabriela brings a practical perspective to understanding intricate data within the emerging Web3 paradigm. Her engaging approach makes technical concepts accessible and actionable. Who is it for? This book is ideal for data professionals such as analysts, scientists, or engineers, aiming to harness the potential of blockchain analytics. It's also suitable for business professionals exploring data-driven opportunities within Web3. Whether you're a beginner or an experienced learner with some Python background, this book will meet you at your level.

talk-data.com

O'Reilly Data Science Books

Top Topics

Top Speakers

Exam Ref DP-100 Designing and Implementing a Data Science Solution on Azure

Just Enough Data Science and Machine Learning: Essential Tools and Techniques

Pandas Cookbook - Third Edition

Hands-On Prescriptive Analytics

Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib

Data Storytelling with Altair and AI

Statistics for Data Science and Analytics

Polars Cookbook

DuckDB in Action

Getting Started with DuckDB

Modern Graph Theory Algorithms with Python

Cognitive Science, Computational Intelligence, and Data Analytics

Pandas Workout

Visualize This, 2nd Edition

Predictive Analytics for the Modern Enterprise

Statistical Tableau

Mastering Marketing Data Science

Data Science Fundamentals with R, Python, and Open Data

Extending Power BI with Python and R - Second Edition

Cracking the Data Science Interview

Web Scraping with Python, 3rd Edition

Learn Python the Hard Way: A Deceptively Simple Introduction to the Terrifyingly Beautiful World of Computers and Data Science, 5th Edition

Hands-On Entity Resolution

Principles of Data Science - Third Edition

Data Science for Web3