O'Reilly Data Science Books

Learn Microsoft Fabric

2024-02-29 O'Reilly Amazon

book

Arshad Ali , Bradley Schacht

data data-science analytics-platforms microsoft-fabric AI/ML Analytics

Dive into the wonders of Microsoft Fabric, the ultimate solution for mastering data analytics in the AI era. Through engaging real-world examples and hands-on scenarios, this book will equip you with all the tools to design, build, and maintain analytics systems for various use cases like lakehouses, data warehouses, real-time analytics, and data science. What this Book will help me do Understand and utilize the key components of Microsoft Fabric for modern analytics. Build scalable and efficient data analytics solutions with medallion architecture. Implement real-time analytics and machine learning models to derive actionable insights. Monitor and administer your analytics platform for high performance and security. Leverage AI-powered assistant Copilot to boost analytics productivity. Author(s) Arshad Ali and None Schacht bring years of expertise in data analytics and system architecture to this book. Arshad is a seasoned professional specialized in AI-integrated analytics platforms, while None Schacht has a proven track record in deploying enterprise data solutions. Together, they provide deep insights and practical knowledge with a structured and approachable teaching method. Who is it for? Ideal for data professionals such as data analysts, engineers, scientists, and AI/ML experts aiming to enhance their data analytics skills and master Microsoft Fabric. It's also suited for students and new entrants to the field looking to establish a firm foundation in analytics systems. Requires a basic understanding of SQL and Spark.

Graph Algorithms for Data Science

2024-02-26 O'Reilly Amazon

book

Tomaz Bratanic

data data-science AI/ML CSV Data Science NLP

Practical methods for analyzing your data with graphs, revealing hidden connections and new insights. Graphs are the natural way to represent and understand connected data. This book explores the most important algorithms and techniques for graphs in data science, with concrete advice on implementation and deployment. You don’t need any graph experience to start benefiting from this insightful guide. These powerful graph algorithms are explained in clear, jargon-free text and illustrations that makes them easy to apply to your own projects. In Graph Algorithms for Data Science you will learn: Labeled-property graph modeling Constructing a graph from structured data such as CSV or SQL NLP techniques to construct a graph from unstructured data Cypher query language syntax to manipulate data and extract insights Social network analysis algorithms like PageRank and community detection How to translate graph structure to a ML model input with node embedding models Using graph features in node classification and link prediction workflows Graph Algorithms for Data Science is a hands-on guide to working with graph-based data in applications like machine learning, fraud detection, and business data analysis. It’s filled with fascinating and fun projects, demonstrating the ins-and-outs of graphs. You’ll gain practical skills by analyzing Twitter, building graphs with NLP techniques, and much more. About the Technology A graph, put simply, is a network of connected data. Graphs are an efficient way to identify and explore the significant relationships naturally occurring within a dataset. This book presents the most important algorithms for graph data science with examples from machine learning, business applications, natural language processing, and more. About the Book Graph Algorithms for Data Science shows you how to construct and analyze graphs from structured and unstructured data. In it, you’ll learn to apply graph algorithms like PageRank, community detection/clustering, and knowledge graph models by putting each new algorithm to work in a hands-on data project. This cutting-edge book also demonstrates how you can create graphs that optimize input for AI models using node embedding. What's Inside Creating knowledge graphs Node classification and link prediction workflows NLP techniques for graph construction About the Reader For data scientists who know machine learning basics. Examples use the Cypher query language, which is explained in the book. About the Author Tomaž Bratanič works at the intersection of graphs and machine learning. Arturo Geigel was the technical editor for this book. Quotes Undoubtedly the quickest route to grasping the practical applications of graph algorithms. Enjoyable and informative, with real-world business context and practical problem-solving. - Roger Yu, Feedzai Brilliantly eases you into graph-based applications. - Sumit Pal, Independent Consultant I highly recommend this book to anyone involved in analyzing large network databases. - Ivan Herreros, talentsconnect Insightful and comprehensive. The author’s expertise is evident. Be prepared for a rewarding journey. - Michal Štefaňák, Volke

Mastering Microsoft Fabric: SAASification of Analytics

2024-02-21 O'Reilly Amazon

book

Debananda Ghosh

data data-science analytics-platforms microsoft-fabric AI/ML Analytics

Learn and explore the capabilities of Microsoft Fabric, the latest evolution in cloud analytics suites. This book will help you understand how users can leverage Microsoft Office equivalent experience for performing data management and advanced analytics activity. The book starts with an overview of the analytics evolution from on premises to cloud infrastructure as a service (IaaS), platform as a service (PaaS), and now software as a service (SaaS version) and provides an introduction to Microsoft Fabric. You will learn how to provision Microsoft Fabric in your tenant along with the key capabilities of SaaS analytics products and the advantage of using Fabric in the enterprise analytics platform. OneLake and Lakehouse for data engineering is discussed as well as OneLake for data science. Author Ghosh teaches you about data warehouse offerings inside Microsoft Fabric and the new data integration experience which brings Azure Data Factory and Power Query Editor of Power BI together in a single platform. Also demonstrated is Real-Time Analytics in Fabric, including capabilities such as Kusto query and database. You will understand how the new event stream feature integrates with OneLake and other computations. You also will know how to configure the real-time alert capability in a zero code manner and go through the Power BI experience in the Fabric workspace. Fabric pricing and its licensing is also covered. After reading this book, you will understand the capabilities of Microsoft Fabric and its Integration with current and upcoming Azure OpenAI capabilities. What You Will Learn Build OneLake for all data like OneDrive for Microsoft Office Leverage shortcuts for cross-cloud data virtualization in Azure and AWS Understand upcoming OpenAI integration Discover new event streaming and Kusto query inside Fabric real-time analytics Utilize seamless tooling for machine learning and data science Who This Book Is For Citizen users and experts in the data engineering and data science fields, along with chief AI officers

Hands-On Entity Resolution

2024-02-02 O'Reilly Amazon

book

Michael Shearer

data data-science data-science-tasks entity-resolution-record-linkage entity resolution / record linkage AI/ML

Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs. Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value. With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers: Challenges in deduplicating and joining datasets Extracting, cleansing, and preparing datasets for matching Text matching algorithms to identify equivalent entities Techniques for deduplicating and joining datasets at scale Matching datasets containing persons and organizations Evaluating data matches Optimizing and tuning data matching algorithms Entity resolution using cloud APIs Matching using privacy-enhancing technologies

Principles of Data Science - Third Edition

2024-01-31 O'Reilly Amazon

book

Sinan Ozdemir

data data-science AI/ML Computer Science Data Science NLP

Principles of Data Science offers an end-to-end introduction to data science fundamentals, blending key mathematical concepts with practical programming. You'll learn how to clean and prepare data, construct predictive models, and leverage modern tools like pre-trained models for NLP and computer vision. By integrating theory and practice, this book sets the foundation for impactful data-driven decision-making. What this Book will help me do Develop a solid understanding of foundational statistics and machine learning. Learn how to clean, transform, and visualize data for impactful analysis. Explore transfer learning and pre-trained models for advanced AI tasks. Understand ethical implications, biases, and governance in AI and ML. Gain the knowledge to implement complete data pipelines effectively. Author(s) Sinan Ozdemir is an experienced data scientist, educator, and author with a deep passion for making complex topics accessible. With a background in computer science and applied statistics, Sinan has taught data science at leading institutions and authored multiple books on the topic. His practical approach to teaching combines real-world examples with insightful explanations, ensuring learners gain both competence and confidence. Who is it for? This book is ideal for beginners in data science who want to gain a comprehensive understanding of the field. If you have a background in programming or mathematics and are eager to combine these skills to analyze and extract insights from data, this book will guide you. Individuals working with machine learning or AI who need to solidify their foundational knowledge will find it invaluable. Some familiarity with Python is recommended to follow along seamlessly.

MATLAB for Machine Learning - Second Edition

2024-01-30 O'Reilly Amazon

book

Giuseppe Ciaburro

data data-science data-science-tools MATLAB AI/ML Data Science

"MATLAB for Machine Learning" is your comprehensive guide to leveraging MATLAB's powerful tools and toolbox for machine learning and deep learning tasks. Through this book, you will explore practical applications and processes that streamline the development of machine learning models while tackling real-world problems effectively. What this Book will help me do Gain proficiency in utilizing MATLAB's Machine Learning Toolbox for developing machine learning algorithms. Learn how to handle data preprocessing, from data cleansing to visualization, within MATLAB. Explore and implement foundational to advanced machine learning techniques, such as classification and regression models. Comprehend and apply the principles of neural networks for pattern recognition and cluster analysis. Dive into advanced concepts of deep learning, including convolutional networks, natural language processing, and time series analysis, using MATLAB's inbuilt functionality. Author(s) Giuseppe Ciaburro is an expert in the field of machine learning and MATLAB programming. With a robust academic background in data science and years of experience in applying these principles across domains, Giuseppe provides a clear and approachable pathway for learners in his writing. Who is it for? This book is ideal for machine learning professionals, data scientists, and engineers specializing in fields such as deep learning, computer vision, and natural language processing. It is suitable for those with a fundamental understanding of programming concepts who seek to apply MATLAB in solving complex learning problems. A prior familiarity with MATLAB basics will be advantageous.

Statistics Slam Dunk

2024-01-30 O'Reilly Amazon

book

Gary Sutton

data data-science data-science-tasks statistics AI/ML Analytics

Learn statistics by analyzing professional basketball data! In this action-packed book, you’ll build your skills in exploratory data analysis by digging into the fascinating world of NBA games and player stats using the R language. Statistics Slam Dunk is an engaging how-to guide for statistical analysis with R. Each chapter contains an end-to-end data science or statistics project delving into NBA data and revealing real-world sporting insights. Written by a former basketball player turned business intelligence and analytics leader, you’ll get practical experience tidying, wrangling, exploring, testing, modeling, and otherwise analyzing data with the best and latest R packages and functions. In Statistics Slam Dunk you’ll develop a toolbox of R programming skills including: Reading and writing data Installing and loading packages Transforming, tidying, and wrangling data Applying best-in-class exploratory data analysis techniques Creating compelling visualizations Developing supervised and unsupervised machine learning algorithms Executing hypothesis tests, including t-tests and chi-square tests for independence Computing expected values, Gini coefficients, z-scores, and other measures If you’re looking to switch to R from another language, or trade base R for tidyverse functions, this book is the perfect training coach. Much more than a beginner’s guide, it teaches statistics and data science methods that have tons of use cases. And just like in the real world, you’ll get no clean pre-packaged data sets in Statistics Slam Dunk. You’ll take on the challenge of wrangling messy data to drill on the skills that will make you the star player on any data team. About the Technology Statistics Slam Dunk is a data science manual with a difference. Each chapter is a complete, self-contained statistics or data science project for you to work through—from importing data, to wrangling it, testing it, visualizing it, and modeling it. Throughout the book, you’ll work exclusively with NBA data sets and the R language, applying best-in-class statistics techniques to reveal fun and fascinating truths about the NBA. About the Book Is losing basketball games on purpose a rational strategy? Which hustle statistics have an impact on wins and losses? Does spending more on player salaries translate into a winning record? You’ll answer all these questions and more. Plus, R’s visualization capabilities shine through in the book’s 300 plots and charts, including Pareto charts, Sankey diagrams, Cleveland dot plots, and dendrograms. What's Inside Transforming, tidying, and wrangling data Applying best-in-class exploratory data analysis techniques Developing supervised and unsupervised machine learning algorithms Executing hypothesis tests and effect size tests About the Reader For readers who know basic statistics. No advanced knowledge of R—or basketball—required. About the Author Gary Sutton is a former basketball player who has built and led high-performing business intelligence and analytics organizations across multiple verticals. Quotes In this journey of exploration, every computer scientist will find a valuable ally in understanding the language of data. - Kim Lokøy, areo Transcends other R titles by revealing the hidden narratives that lie within the numbers. - Christian Sutton, Shell International Exploration and Production Seamlessly blending theory and practical insights, this book serves as an indispensable guide for those venturing into the field of data analytics. - Juan Delgado, Sodexo BRS

Data Science for Web3

2023-12-29 O'Reilly Amazon

book

Gabriela Castillo Areco

data data-science AI/ML Analytics Blockchain Data Science

Discover how to navigate the world of Web3 data with 'Data Science for Web3,' an expertly crafted guide by Gabriela Castillo Areco. Through practical examples, industry insights, and real-world use cases, you will learn the skills needed to analyze blockchain data and extract actionable business insights. What this Book will help me do Understand blockchain transactions and data structures to build robust datasets. Leverage on-chain and off-chain data for valuable Web3 business insights. Create DeFi- and NFT-specific datasets for targeted analysis. Develop machine learning models tailored for blockchain use cases. Apply data science techniques to innovate in the Web3 ecosystem. Author(s) Gabriela Castillo Areco is a seasoned data scientist and an expert in blockchain analytics. With years of experience in the technology and finance sectors, Gabriela brings a practical perspective to understanding intricate data within the emerging Web3 paradigm. Her engaging approach makes technical concepts accessible and actionable. Who is it for? This book is ideal for data professionals such as analysts, scientists, or engineers, aiming to harness the potential of blockchain analytics. It's also suitable for business professionals exploring data-driven opportunities within Web3. Whether you're a beginner or an experienced learner with some Python background, this book will meet you at your level.

Bayesian Optimization in Action

2023-12-17 O'Reilly Amazon

book

Quan Nguyen

data data-science data-science-tasks statistics bayesian-statistics AI/ML

Bayesian optimization helps pinpoint the best configuration for your machine learning models with speed and accuracy. Put its advanced techniques into practice with this hands-on guide. In Bayesian Optimization in Action you will learn how to: Train Gaussian processes on both sparse and large data sets Combine Gaussian processes with deep neural networks to make them flexible and expressive Find the most successful strategies for hyperparameter tuning Navigate a search space and identify high-performing regions Apply Bayesian optimization to cost-constrained, multi-objective, and preference optimization Implement Bayesian optimization with PyTorch, GPyTorch, and BoTorch Bayesian Optimization in Action shows you how to optimize hyperparameter tuning, A/B testing, and other aspects of the machine learning process by applying cutting-edge Bayesian techniques. Using clear language, illustrations, and concrete examples, this book proves that Bayesian optimization doesn’t have to be difficult! You’ll get in-depth insights into how Bayesian optimization works and learn how to implement it with cutting-edge Python libraries. The book’s easy-to-reuse code samples let you hit the ground running by plugging them straight into your own projects. About the Technology In machine learning, optimization is about achieving the best predictions—shortest delivery routes, perfect price points, most accurate recommendations—in the fewest number of steps. Bayesian optimization uses the mathematics of probability to fine-tune ML functions, algorithms, and hyperparameters efficiently when traditional methods are too slow or expensive. About the Book Bayesian Optimization in Action teaches you how to create efficient machine learning processes using a Bayesian approach. In it, you’ll explore practical techniques for training large datasets, hyperparameter tuning, and navigating complex search spaces. This interesting book includes engaging illustrations and fun examples like perfecting coffee sweetness, predicting weather, and even debunking psychic claims. You’ll learn how to navigate multi-objective scenarios, account for decision costs, and tackle pairwise comparisons. What's Inside Gaussian processes for sparse and large datasets Strategies for hyperparameter tuning Identify high-performing regions Examples in PyTorch, GPyTorch, and BoTorch About the Reader For machine learning practitioners who are confident in math and statistics. About the Author Quan Nguyen is a research assistant at Washington University in St. Louis. He writes for the Python Software Foundation and has authored several books on Python programming. Quotes Using a hands-on approach, clear diagrams, and real-world examples, Quan lifts the veil off the complexities of Bayesian optimization. - From the Foreword by Luis Serrano, Author of Grokking Machine Learning This book teaches Bayesian optimization, starting from its most basic components. You’ll find enough depth to make you comfortable with the tools and methods and enough code to do real work very quickly. - From the Foreword by David Sweet, Author of Experimentation for Engineers Combines modern computational frameworks with visualizations and infographics you won’t find anywhere else. It gives readers the confidence to apply Bayesian optimization to real world problems! - Ravin Kumar, Google

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn

2023-11-23 O'Reilly Amazon

book

Abdelaziz Testas

data data-science data-science-tools Pandas AI/ML Big Data

Migrate from pandas and scikit-learn to PySpark to handle vast amounts of data and achieve faster data processing time. This book will show you how to make this transition by adapting your skills and leveraging the similarities in syntax, functionality, and interoperability between these tools. Distributed Machine Learning with PySpark offers a roadmap to data scientists considering transitioning from small data libraries (pandas/scikit-learn) to big data processing and machine learning with PySpark. You will learn to translate Python code from pandas/scikit-learn to PySpark to preprocess large volumes of data and build, train, test, and evaluate popular machine learning algorithms such as linear and logistic regression, decision trees, random forests, support vector machines, Naïve Bayes, and neural networks. After completing this book, you will understand the foundational concepts of data preparation and machine learning and will have the skills necessary toapply these methods using PySpark, the industry standard for building scalable ML data pipelines. What You Will Learn Master the fundamentals of supervised learning, unsupervised learning, NLP, and recommender systems Understand the differences between PySpark, scikit-learn, and pandas Perform linear regression, logistic regression, and decision tree regression with pandas, scikit-learn, and PySpark Distinguish between the pipelines of PySpark and scikit-learn Who This Book Is For Data scientists, data engineers, and machine learning practitioners who have some familiarity with Python, but who are new to distributed machine learning and the PySpark framework.

Alteryx Designer: The Definitive Guide

2023-11-22 O'Reilly Amazon

book

Joshua Burkhow

data data-science analytics-platforms Alteryx AI/ML Analytics

Analytics projects are frequently long, drawn-out affairs, requiring multiple teams and skills to clean, join, and eventually turn data into analysis for timely decision-making. Alteryx Designer changes all of that. With this low-code, self-service, drag-and-drop workflow platform, new and experienced data and business analysts can deliver results in hours instead of weeks. This practical book shows you how to master all areas of Alteryx Designer quickly. Author and Alteryx ACE Joshua Burkhow starts with the basics of building a workflow, then introduces more than 200 tools for working with intermediate and advanced analytics functionality. With Alteryx Designer's all-in-one toolkit, you'll migrate from legacy analytics software or Excel with ease. Ready to work with data quickly and efficiently? This guide gets you started. Learn the fundamentals of cleaning, prepping, and analyzing data with Alteryx Designer Install, navigate, and quickly become competent with the Alteryx Designer layout and functionality Construct accurate, performant, reliable, and well-documented workflows that automate business processes Learn intermediate techniques using spatial analytics, reporting, and in-database tools Dive into advanced Alteryx capabilities, including predictive and machine learning tools Get introduced to the entire Alteryx Analytic Process Automation (APA) Platform

Fundamentals of Data Science

2023-11-17 O'Reilly Amazon

book

Swarup Roy , Dhruba K. Bhattacharyya , Jugal K. Kalita

data data-science AI/ML Analytics Big Data Data Analytics

Fundamentals of Data Science: Theory and Practice presents basic and advanced concepts in data science along with real-life applications. The book provides students, researchers and professionals at different levels a good understanding of the concepts of data science, machine learning, data mining and analytics. Users will find the authors’ research experiences and achievements in data science applications, along with in-depth discussions on topics that are essential for data science projects, including pre-processing, that is carried out before applying predictive and descriptive data analysis tasks and proximity measures for numeric, categorical and mixed-type data. The book's authors include a systematic presentation of many predictive and descriptive learning algorithms, including recent developments that have successfully handled large datasets with high accuracy. In addition, a number of descriptive learning tasks are included. Presents the foundational concepts of data science along with advanced concepts and real-life applications for applied learning Includes coverage of a number of key topics such as data quality and pre-processing, proximity and validation, predictive data science, descriptive data science, ensemble learning, association rule mining, Big Data analytics, as well as incremental and distributed learning Provides updates on key applications of data science techniques in areas such as Computational Biology, Network Intrusion Detection, Natural Language Processing, Software Clone Detection, Financial Data Analysis, and Scientific Time Series Data Analysis Covers computer program code for implementing descriptive and predictive algorithms

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

2023-11-17 O'Reilly Amazon

book

Sandika S. Sukhdeve , Dr. Shitalkumar R. Sukhdeve

it-operations cloud-computing cloud-platforms google-cloud AI/ML Analytics

This book is your practical and comprehensive guide to learning Google Cloud Platform (GCP) for data science, using only the free tier services offered by the platform. Data science and machine learning are increasingly becoming critical to businesses of all sizes, and the cloud provides a powerful platform for these applications. GCP offers a range of data science services that can be used to store, process, and analyze large datasets, and train and deploy machine learning models. The book is organized into seven chapters covering various topics such as GCP account setup, Google Colaboratory, Big Data and Machine Learning, Data Visualization and Business Intelligence, Data Processing and Transformation, Data Analytics and Storage, and Advanced Topics. Each chapter provides step-by-step instructions and examples illustrating how to use GCP services for data science and big data projects. Readers will learn how to set up a Google Colaboratory account and run Jupyternotebooks, access GCP services and data from Colaboratory, use BigQuery for data analytics, and deploy machine learning models using Vertex AI. The book also covers how to visualize data using Looker Data Studio, run data processing pipelines using Google Cloud Dataflow and Dataprep, and store data using Google Cloud Storage and SQL. What You Will Learn Set up a GCP account and project Explore BigQuery and its use cases, including machine learning Understand Google Cloud AI Platform and its capabilities Use Vertex AI for training and deploying machine learning models Explore Google Cloud Dataproc and its use cases for big data processing Create and share data visualizations and reports with Looker Data Studio Explore Google Cloud Dataflow and its use cases for batch and stream data processing Run data processing pipelines on Cloud Dataflow Explore Google Cloud Storageand its use cases for data storage Get an introduction to Google Cloud SQL and its use cases for relational databases Get an introduction to Google Cloud Pub/Sub and its use cases for real-time data streaming Who This Book Is For Data scientists, machine learning engineers, and analysts who want to learn how to use Google Cloud Platform (GCP) for their data science and big data projects

Consumer Behaviour and Analytics, 2nd Edition

2023-11-08 O'Reilly Amazon

book

Andrew Smith

data data-science web-analytics AI/ML Analytics Big Data

The 2nd edition of Consumer Behaviour and Analytics provides a consumer behaviour textbook for the new marketing reality. In a world of Big Data, machine learning and artificial intelligence, this key text reviews the issues, research and concepts essential for navigating this new terrain.

Data Smart, 2nd Edition

2023-11-07 O'Reilly Amazon

book

Jordan Goldmeier

data data-science AI/ML Data Science Microsoft

Want to jump into data science but don't know where to start? Let's be real, data science is presented as something mystical and unattainable without the most powerful software, hardware, and data expertise. Real data science isn't about technology. It's about how you approach the problem. In this updated edition of Data Smart: Using Data Science to Transform Information into Insight, award-winning data scientist and bestselling author Jordan Goldmeier shows you how to implement data science problems using Excel while exposing how things work behind the scenes. Data Smart is your field guide to building statistics, machine learning, and powerful artificial intelligence concepts right inside your spreadsheet. Inside you'll find: Four-color data visualizations that highlight and illustrate the concepts discussed in the book Tutorials explaining complicated data science using just Microsoft Excel How to take what you’ve learned and apply it to everyday problems at work and life Advice for using formulas, Power Query, and some of Excel's latest features to solve tough data problems Smart data science solutions for common business challenges Explanations of what algorithms do, how they work, and what you can tweak to take your Excel skills to the next level Data Smart is a must-read for students, analysts, and managers ready to become data science savvy and share their findings with the world.

Data Science: The Hard Parts

2023-11-01 O'Reilly Amazon

book

Daniel Vaughan

data data-science AI/ML Data Engineering Data Science KPI

This practical guide provides a collection of techniques and best practices that are generally overlooked in most data engineering and data science pedagogy. A common misconception is that great data scientists are experts in the "big themes" of the discipline—machine learning and programming. But most of the time, these tools can only take us so far. In practice, the smaller tools and skills really separate a great data scientist from a not-so-great one. Taken as a whole, the lessons in this book make the difference between an average data scientist candidate and a qualified data scientist working in the field. Author Daniel Vaughan has collected, extended, and used these skills to create value and train data scientists from different companies and industries. With this book, you will: Understand how data science creates value Deliver compelling narratives to sell your data science project Build a business case using unit economics principles Create new features for a ML model using storytelling Learn how to decompose KPIs Perform growth decompositions to find root causes for changes in a metric Daniel Vaughan is head of data at Clip, the leading paytech company in Mexico. He's the author of Analytical Skills for AI and Data Science (O'Reilly).

R Bioinformatics Cookbook - Second Edition

2023-10-31 O'Reilly Amazon

book

Dan MacLean

data data-science data-science-domains bioinformatics AI/ML Data Science

R Bioinformatics Cookbook is your guide to leveraging the power of R for advanced bioinformatics tasks. This updated second edition uses a recipe-based method to teach data analysis, visualization, and machine learning tailored for biological datasets. You'll gain hands-on experience with popular tools like Bioconductor, ggplot2, and tidyverse to solve real-world genomics problems. What this Book will help me do Set up a reproducible bioinformatics analysis environment using R. Clean, analyze, and visualize biological data with R's powerful packages. Apply RNA-seq and ChIP-seq workflows to study genetic information effectively. Incorporate machine learning techniques into bioinformatics pipelines using R. Automate tasks and create professional-grade reports using functional programming and reporting tools. Author(s) The author, None MacLean, brings years of expertise in bioinformatics and computational biology. Known for clear explanations and practical approaches, they ensure the material is accessible yet challenging. With a strong focus on real-world applications, this book reflects their commitment to bridging bioinformatics and modern data science. Who is it for? This book is perfect for bioinformaticians, researchers, and data scientists with prior R experience. It's tailored for those looking to delve deeper into genomics, data visualization, and bioinformatics techniques. Intermediate knowledge of bioinformatics concepts and familiarity with R programming are assumed for readers to fully benefit from the content.

Machine Learning with Qlik Sense

2023-10-27 O'Reilly Amazon

book

Hannu Ranta

data data-science analytics-platforms qlik-sense AI/ML Analytics

Machine Learning with Qlik Sense introduces practical applications of machine learning within the Qlik platform. Through this book, you will gain a thorough understanding of fundamental ML concepts, learn to apply these within Qlik Sense, and see how to use predictive analytics to solve real-world problems. The hands-on examples ensure you can translate learnings into actionable insights. What this Book will help me do Understand the key principles of machine learning and how to apply them using the Qlik platform. Develop skills in data preprocessing and analysis to prepare datasets for machine learning models. Learn to validate and interpret machine learning models and evaluate their performance. Master advanced visualization techniques for presenting insights derived from data. Apply newfound knowledge to practical business problems through real-world use-case examples. Author(s) Hannu Ranta is an expert in data analytics and has extensive experience utilizing the Qlik platform to derive actionable insights from data. With years of practical exposure and a focus on teaching, Hannu brings a clear and structured approach to using machine learning for analytics. His writing seeks to empower readers to achieve practical solutions using Qlik's powerful tools. Who is it for? This book is perfect for data analysts, data scientists, or anyone working in data analytics who wants to incorporate machine learning into their skill set. It is especially suited to those with a basic familiarity with Qlik tools or data analysis concepts. Beginners in machine learning can also benefit because the book starts from foundational concepts and builds step-by-step.

The Statistics and Machine Learning with R Workshop

2023-10-25 O'Reilly Amazon

book

Liu Peng

data data-science data-science-tools r AI/ML Data Science

This book guides readers through the essentials of applied statistics and machine learning using the R programming language. By delving into robust data processing techniques, visualization, and statistical modeling with R, you will develop skills to effectively analyze data and design predictive models. Each chapter includes hands-on exercises to reinforce the concepts in a practical, intuitive way. What this Book will help me do Understand and apply key statistical concepts such as probability distributions and hypothesis testing to analyze data. Master foundational mathematical principles like linear algebra and calculus relevant to data science and machine learning. Develop proficiency in data manipulation and visualization using robust R libraries such as dplyr and ggplot2. Build predictive models through practical exercises and learn advanced concepts like Bayesian statistics and linear regression. Gain the practical knowledge needed to apply statistical and machine learning methodologies in real-world scenarios. Author(s) Liu Peng is an accomplished author with a strong academic and practical background in statistics and data science. Armed with extensive experience in applying R to real-world problems, he brings a blend of technical mastery and teaching expertise. His commitment is to transform complex concepts into accessible, enriching learning experiences for readers. Who is it for? This book is ideal for data scientists and analysts ranging from beginners to those at an intermediate level. It caters especially to those interested in practicing statistical modeling and learning R in depth. If you have basic familiarity with statistics and are looking to expand your data science capabilities using R, this book is well-suited for you.

Learn Microsoft Power Apps - Second Edition

2023-09-29 O'Reilly Amazon

book

Elisa Bárcena Martín , Matthew Weston

data data-science business-intelligence microsoft-power-platform AI/ML Microsoft

Learn Microsoft Power Apps is your complete guide to building personalized business applications using Microsoft's low-code platform. You'll discover how to create interactive, secure apps tailored to your needs, with the help of detailed examples, best practices, and progressive tutorials. Unleash the power of tools like AI Builder and Dataverse to add cutting-edge functionality to your applications. What this Book will help me do Understand the Power Apps ecosystem and its licensing to make informed decisions. Create canvas applications to address specific business challenges effectively. Incorporate integration with SharePoint, Power Automate, and other Microsoft tools for enhanced app capabilities. Use Dataverse for data storage and employ model-driven approaches for robust applications. Leverage artificial intelligence features like AI Builder and Copilot to accelerate and improve development. Author(s) Matthew Weston and Elisa Bárcena Martín are seasoned professionals in the Microsoft and business solutions field. Their combined experience includes decades of expertise in developing applications, consulting, and teaching others how to harness Power Platform technologies. They excel in breaking down complex topics into understandable, actionable content, and their supportive tone makes learning enjoyable and productive. Who is it for? This book is ideal for business analysts, IT professionals, and solution developers seeking to streamline business processes through custom applications. Whether you're a seasoned developer looking to expand into low-code platforms or a beginner eager to tackle real-world problems, this book guides you step by step. A basic understanding of Microsoft 365 is all that's needed to get started, giving non-developers and tech enthusiasts alike the confidence to create impactful applications.

Streamlit for Data Science - Second Edition

2023-09-29 O'Reilly Amazon

book

Tyler Richards

data data-science AI/ML Cloud Computing Data Science DataViz

Streamlit for Data Science is your complete guide to mastering the creation of powerful, interactive data-driven applications using Python and Streamlit. With this comprehensive resource, you'll learn everything from foundational Streamlit skills to advanced techniques like integrating machine learning models and deploying apps to cloud platforms, enabling you to significantly enhance your data science toolkit. What this Book will help me do Master building interactive applications using Streamlit, including techniques for user interfaces and integrations. Develop visually appealing and functional data visualizations using Python libraries in Streamlit. Learn to integrate Streamlit applications with machine learning frameworks and tools like Hugging Face and OpenAI. Understand and apply best practices to deploy Streamlit apps to cloud platforms such as Streamlit Community Cloud and Heroku. Improve practical Python skills through implementing end-to-end data applications and prototyping data workflows. Author(s) Tyler Richards, the author of Streamlit for Data Science, is a senior data scientist with in-depth practical experience in building data-driven applications. With a passion for Python and data visualization, Tyler leverages his knowledge to help data professionals craft effective and compelling tools. His teaching approach combines clarity, hands-on exercises, and practical relevance. Who is it for? This book is written for data scientists, engineers, and enthusiasts who use Python and want to create dynamic data-driven applications. With a focus on those who have some familiarity with Python and libraries like Pandas or NumPy, it assists readers in building on their knowledge by offering tailored guidance. Perfect for those looking to prototype data projects or enhance their programming toolkit.

Python Data Analytics: With Pandas, NumPy, and Matplotlib

2023-09-01 O'Reilly Amazon

book

Fabio Nelli

data data-science data-science-tools Pandas AI/ML Analytics

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This third edition is fully updated for the latest version of Python and its related libraries, and includes coverage of social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Third Edition is an invaluable reference with its examples of storing, accessing, and analyzing data. What You'll Learn Understand the core concepts of data analysis and the Python ecosystem Go in depth with pandas for reading, writing, and processing data Use tools and techniques for data visualization and image analysis Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch Who This Book Is For Experienced Python developers who need to learn about Pythonic tools for data analysis

Mastering Tableau 2023 - Fourth Edition

2023-08-29 O'Reilly Amazon

book

Marleen Meier

data data-science data-science-tasks data-visualization Tableau AI/ML

This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.

Fundamentals of Data Observability

2023-08-14 O'Reilly Amazon

book

Andy Petrella

data data-science data-science-tasks exploratory-data-analysis AI/ML Analytics

Quickly detect, troubleshoot, and prevent a wide range of data issues through data observability, a set of best practices that enables data teams to gain greater visibility of data and its usage. If you're a data engineer, data architect, or machine learning engineer who depends on the quality of your data, this book shows you how to focus on the practical aspects of introducing data observability in your everyday work. Author Andy Petrella helps you build the right habits to identify and solve data issues, such as data drifts and poor quality, so you can stop their propagation in data applications, pipelines, and analytics. You'll learn ways to introduce data observability, including setting up a framework for generating and collecting all the information you need. Learn the core principles and benefits of data observability Use data observability to detect, troubleshoot, and prevent data issues Follow the book's recipes to implement observability in your data projects Use data observability to create a trustworthy communication framework with data consumers Learn how to educate your peers about the benefits of data observability

Building Data Science Applications with FastAPI - Second Edition

2023-07-31 O'Reilly Amazon

book

François Voron

web-mobile web-development python-web-frameworks fastapi AI/ML API

Building Data Science Applications with FastAPI is your comprehensive guide to mastering the FastAPI framework to build efficient, reliable data science applications and APIs. You'll explore examples and projects that integrate machine learning models, manage databases, and leverage advanced FastAPI features like asynchronous I/O and WebSockets. What this Book will help me do Develop an understanding of the fundamentals and advanced features of the FastAPI framework, like dependency injection and type hinting. Learn how to integrate machine learning models into a FastAPI-based web backend effectively. Master concepts of authentication, database connections, and asynchronous programming in Python. Build and deploy two practical AI applications: a real-time object detection tool and a text-to-image generator. Acquire skills to monitor, log, and maintain software systems for optimal performance and reliability. Author(s) François Voron is an experienced Python developer and data scientist with extensive knowledge of western frameworks including FastAPI. With years of experience designing and deploying machine learning and data science applications, François focuses on empowering developers with practical techniques and real-world applications. His guidance helps readers tackle contemporary challenges in software development. Who is it for? This book is ideal for data scientists and software engineers looking to broaden their skillset by creating robust web APIs for data science applications. Readers are expected to have a working knowledge of Python and basic data science concepts, offering them a chance to expand into backend development. If you're keen to deploy machine learning models and integrate them seamlessly with web technologies, this book is for you. It provides both fundamental insights and advanced techniques to serve a broad range of learners.

talk-data.com

O'Reilly Data Science Books

Top Topics

Top Speakers

Learn Microsoft Fabric

Graph Algorithms for Data Science

Mastering Microsoft Fabric: SAASification of Analytics

Hands-On Entity Resolution

Principles of Data Science - Third Edition

MATLAB for Machine Learning - Second Edition

Statistics Slam Dunk

Data Science for Web3

Bayesian Optimization in Action

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn

Alteryx Designer: The Definitive Guide

Fundamentals of Data Science

Google Cloud Platform for Data Science: A Crash Course on Big Data, Machine Learning, and Data Analytics Services

Consumer Behaviour and Analytics, 2nd Edition

Data Smart, 2nd Edition

Data Science: The Hard Parts

R Bioinformatics Cookbook - Second Edition

Machine Learning with Qlik Sense

The Statistics and Machine Learning with R Workshop

Learn Microsoft Power Apps - Second Edition

Streamlit for Data Science - Second Edition

Python Data Analytics: With Pandas, NumPy, and Matplotlib

Mastering Tableau 2023 - Fourth Edition

Fundamentals of Data Observability

Building Data Science Applications with FastAPI - Second Edition