O'Reilly Data Science Books

Bayesian Optimization in Action

2023-12-17 O'Reilly Amazon

book

Quan Nguyen

data data-science data-science-tasks statistics bayesian-statistics AI/ML

Bayesian optimization helps pinpoint the best configuration for your machine learning models with speed and accuracy. Put its advanced techniques into practice with this hands-on guide. In Bayesian Optimization in Action you will learn how to: Train Gaussian processes on both sparse and large data sets Combine Gaussian processes with deep neural networks to make them flexible and expressive Find the most successful strategies for hyperparameter tuning Navigate a search space and identify high-performing regions Apply Bayesian optimization to cost-constrained, multi-objective, and preference optimization Implement Bayesian optimization with PyTorch, GPyTorch, and BoTorch Bayesian Optimization in Action shows you how to optimize hyperparameter tuning, A/B testing, and other aspects of the machine learning process by applying cutting-edge Bayesian techniques. Using clear language, illustrations, and concrete examples, this book proves that Bayesian optimization doesn’t have to be difficult! You’ll get in-depth insights into how Bayesian optimization works and learn how to implement it with cutting-edge Python libraries. The book’s easy-to-reuse code samples let you hit the ground running by plugging them straight into your own projects. About the Technology In machine learning, optimization is about achieving the best predictions—shortest delivery routes, perfect price points, most accurate recommendations—in the fewest number of steps. Bayesian optimization uses the mathematics of probability to fine-tune ML functions, algorithms, and hyperparameters efficiently when traditional methods are too slow or expensive. About the Book Bayesian Optimization in Action teaches you how to create efficient machine learning processes using a Bayesian approach. In it, you’ll explore practical techniques for training large datasets, hyperparameter tuning, and navigating complex search spaces. This interesting book includes engaging illustrations and fun examples like perfecting coffee sweetness, predicting weather, and even debunking psychic claims. You’ll learn how to navigate multi-objective scenarios, account for decision costs, and tackle pairwise comparisons. What's Inside Gaussian processes for sparse and large datasets Strategies for hyperparameter tuning Identify high-performing regions Examples in PyTorch, GPyTorch, and BoTorch About the Reader For machine learning practitioners who are confident in math and statistics. About the Author Quan Nguyen is a research assistant at Washington University in St. Louis. He writes for the Python Software Foundation and has authored several books on Python programming. Quotes Using a hands-on approach, clear diagrams, and real-world examples, Quan lifts the veil off the complexities of Bayesian optimization. - From the Foreword by Luis Serrano, Author of Grokking Machine Learning This book teaches Bayesian optimization, starting from its most basic components. You’ll find enough depth to make you comfortable with the tools and methods and enough code to do real work very quickly. - From the Foreword by David Sweet, Author of Experimentation for Engineers Combines modern computational frameworks with visualizations and infographics you won’t find anywhere else. It gives readers the confidence to apply Bayesian optimization to real world problems! - Ravin Kumar, Google

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn

2023-11-23 O'Reilly Amazon

book

Abdelaziz Testas

data data-science data-science-tools Pandas AI/ML Big Data

Migrate from pandas and scikit-learn to PySpark to handle vast amounts of data and achieve faster data processing time. This book will show you how to make this transition by adapting your skills and leveraging the similarities in syntax, functionality, and interoperability between these tools. Distributed Machine Learning with PySpark offers a roadmap to data scientists considering transitioning from small data libraries (pandas/scikit-learn) to big data processing and machine learning with PySpark. You will learn to translate Python code from pandas/scikit-learn to PySpark to preprocess large volumes of data and build, train, test, and evaluate popular machine learning algorithms such as linear and logistic regression, decision trees, random forests, support vector machines, Naïve Bayes, and neural networks. After completing this book, you will understand the foundational concepts of data preparation and machine learning and will have the skills necessary toapply these methods using PySpark, the industry standard for building scalable ML data pipelines. What You Will Learn Master the fundamentals of supervised learning, unsupervised learning, NLP, and recommender systems Understand the differences between PySpark, scikit-learn, and pandas Perform linear regression, logistic regression, and decision tree regression with pandas, scikit-learn, and PySpark Distinguish between the pipelines of PySpark and scikit-learn Who This Book Is For Data scientists, data engineers, and machine learning practitioners who have some familiarity with Python, but who are new to distributed machine learning and the PySpark framework.

Python for Data Science For Dummies, 3rd Edition

2023-11-07 O'Reilly Amazon

book

John Paul Mueller , Luca Massaron

software-development programming-languages Python Cloud Computing Data Science RDBMS

Let Python do the heavy lifting for you as you analyze large datasets Python for Data Science For Dummies lets you get your hands dirty with data using one of the top programming languages. This beginner’s guide takes you step by step through getting started, performing data analysis, understanding datasets and example code, working with Google Colab, sampling data, and beyond. Coding your data analysis tasks will make your life easier, make you more in-demand as an employee, and open the door to valuable knowledge and insights. This new edition is updated for the latest version of Python and includes current, relevant data examples. Get a firm background in the basics of Python coding for data analysis Learn about data science careers you can pursue with Python coding skills Integrate data analysis with multimedia and graphics Manage and organize data with cloud-based relational databases Python careers are on the rise. Grab this user-friendly Dummies guide and gain the programming skills you need to become a data pro.

Hands-On Web Scraping with Python - Second Edition

2023-10-06 O'Reilly Amazon

book

Anish Chapagain

data data-science data-science-tasks web-scraping API Data Science

In "Hands-On Web Scraping with Python," you'll learn how to harness the power of Python libraries to extract, process, and analyze data from the web. This book provides a practical, step-by-step guide for beginners and data enthusiasts alike. What this Book will help me do Master the use of Python libraries like requests, lxml, Scrapy, and Beautiful Soup for web scraping. Develop advanced techniques for secure browsing and data extraction using APIs and Selenium. Understand the principles behind regex and PDF data parsing for comprehensive scraping. Analyze and visualize data using data science tools such as Pandas and Plotly. Build a portfolio of real-world scraping projects to demonstrate your capabilities. Author(s) Anish Chapagain, the author of "Hands-On Web Scraping with Python," is an experienced programmer and instructor who specializes in Python and data-related technologies. With his vast experience in teaching individuals from diverse backgrounds, Anish approaches complex concepts with clarity and a hands-on methodology. Who is it for? This book is perfect for aspiring data scientists, Python beginners, and anyone who wants to delve into web scraping. Readers should have a basic understanding of how websites work but no prior coding experience is required. If you aim to develop scraping skills and understand data analysis, this book is the ideal starting point.

Streamlit for Data Science - Second Edition

2023-09-29 O'Reilly Amazon

book

Tyler Richards

data data-science AI/ML Cloud Computing Data Science DataViz

Streamlit for Data Science is your complete guide to mastering the creation of powerful, interactive data-driven applications using Python and Streamlit. With this comprehensive resource, you'll learn everything from foundational Streamlit skills to advanced techniques like integrating machine learning models and deploying apps to cloud platforms, enabling you to significantly enhance your data science toolkit. What this Book will help me do Master building interactive applications using Streamlit, including techniques for user interfaces and integrations. Develop visually appealing and functional data visualizations using Python libraries in Streamlit. Learn to integrate Streamlit applications with machine learning frameworks and tools like Hugging Face and OpenAI. Understand and apply best practices to deploy Streamlit apps to cloud platforms such as Streamlit Community Cloud and Heroku. Improve practical Python skills through implementing end-to-end data applications and prototyping data workflows. Author(s) Tyler Richards, the author of Streamlit for Data Science, is a senior data scientist with in-depth practical experience in building data-driven applications. With a passion for Python and data visualization, Tyler leverages his knowledge to help data professionals craft effective and compelling tools. His teaching approach combines clarity, hands-on exercises, and practical relevance. Who is it for? This book is written for data scientists, engineers, and enthusiasts who use Python and want to create dynamic data-driven applications. With a focus on those who have some familiarity with Python and libraries like Pandas or NumPy, it assists readers in building on their knowledge by offering tailored guidance. Perfect for those looking to prototype data projects or enhance their programming toolkit.

Learning Data Science

2023-09-15 O'Reilly Amazon

book

Sam Lau , Joseph Gonzalez , Deborah Nolan

data data-science Data Collection Data Science Pandas Python

As an aspiring data scientist, you appreciate why organizations rely on data for important decisions—whether it's for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data. Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It's aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the "technical/nontechnical" divide. If you have a basic knowledge of Python programming, you'll learn how to work with data using industry-standard tools like pandas. Refine a question of interest to one that can be studied with data Pursue data collection that may involve text processing, web scraping, etc. Glean valuable insights about data through data cleaning, exploration, and visualization Learn how to use modeling to describe the data Generalize findings beyond the data

Python Data Analytics: With Pandas, NumPy, and Matplotlib

2023-09-01 O'Reilly Amazon

book

Fabio Nelli

data data-science data-science-tools Pandas AI/ML Analytics

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This third edition is fully updated for the latest version of Python and its related libraries, and includes coverage of social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Third Edition is an invaluable reference with its examples of storing, accessing, and analyzing data. What You'll Learn Understand the core concepts of data analysis and the Python ecosystem Go in depth with pandas for reading, writing, and processing data Use tools and techniques for data visualization and image analysis Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch Who This Book Is For Experienced Python developers who need to learn about Pythonic tools for data analysis

Building Statistical Models in Python

2023-08-31 O'Reilly Amazon

book

Huy Hoang Nguyen , Stuart J Miller , Paul N Adams

data data-science data-science-tasks statistics Data Science Python

Building Statistical Models in Python is your go-to guide for mastering statistical modeling techniques using Python. By reading this book, you will explore how to use Python libraries like stats models and others to tackle tasks such as regression, classification, and time series analysis. What this Book will help me do Develop a deep practical knowledge of statistical concepts and their implementation in Python. Create regression and classification models to solve real-world problems. Gain expertise analyzing time series data and generating valuable forecasts. Learn to perform hypothesis verification to interpret data correctly. Understand survival analysis and apply it in various industry scenarios. Author(s) Huy Hoang Nguyen, Paul N Adams, and Stuart J Miller bring their extensive expertise in data science and Python programming to the table. With years of professional experience in both industry and academia, they aim to make statistical modeling approachable and applicable. Combining technical depth with hands-on coding, their goal is to ensure readers not only understand the theory but also gain confidence in its application. Who is it for? This book is tailored for beginners and intermediate programmers seeking to learn statistical modeling without a prerequisite in mathematics. It's ideal for data analysts, data scientists, and Python enthusiasts who want to leverage statistical models to gain insights from data. With this book, you will journey from the basics to advanced applications, making it perfect for those who aim to master statistical analysis.

Mastering Tableau 2023 - Fourth Edition

2023-08-29 O'Reilly Amazon

book

Marleen Meier

data data-science data-science-tasks data-visualization Tableau AI/ML

This comprehensive book on Tableau 2023 is your practical guide to mastering data visualization and business intelligence techniques. You will explore the latest features of Tableau, learn how to create insightful dashboards, and gain proficiency in integrating analytics and machine learning workflows. By the end, you'll have the skills to address a variety of analytics challenges using Tableau. What this Book will help me do Master the latest Tableau 2023 features and use cases to tackle analytics challenges. Develop and implement ETL workflows using Tableau Prep Builder for optimized data preparation. Integrate Tableau with programming languages such as Python and R to enhance analytics. Create engaging, visually impactful dashboards for effective data storytelling. Understand and apply data governance to ensure data quality and compliance. Author(s) Marleen Meier is an experienced data visualization expert and Tableau consultant with over a decade of experience helping organizations transform data into actionable insights. Her approach integrates her technical expertise and a keen eye for design to make analytics accessible rather than overwhelming. Her passion for teaching others to use visualization tools effectively shines through in her writing. Who is it for? This book is ideal for business analysts, BI professionals, or data analysts looking to enhance their Tableau expertise. It caters to both newcomers seeking to understand the foundations of Tableau and experienced users aiming to refine their skills in advanced analytics and data visualization. If your goal is to leverage Tableau as a strategic tool in your organization's BI projects, this book is for you.

Building Data Science Applications with FastAPI - Second Edition

2023-07-31 O'Reilly Amazon

book

François Voron

web-mobile web-development python-web-frameworks fastapi AI/ML API

Building Data Science Applications with FastAPI is your comprehensive guide to mastering the FastAPI framework to build efficient, reliable data science applications and APIs. You'll explore examples and projects that integrate machine learning models, manage databases, and leverage advanced FastAPI features like asynchronous I/O and WebSockets. What this Book will help me do Develop an understanding of the fundamentals and advanced features of the FastAPI framework, like dependency injection and type hinting. Learn how to integrate machine learning models into a FastAPI-based web backend effectively. Master concepts of authentication, database connections, and asynchronous programming in Python. Build and deploy two practical AI applications: a real-time object detection tool and a text-to-image generator. Acquire skills to monitor, log, and maintain software systems for optimal performance and reliability. Author(s) François Voron is an experienced Python developer and data scientist with extensive knowledge of western frameworks including FastAPI. With years of experience designing and deploying machine learning and data science applications, François focuses on empowering developers with practical techniques and real-world applications. His guidance helps readers tackle contemporary challenges in software development. Who is it for? This book is ideal for data scientists and software engineers looking to broaden their skillset by creating robust web APIs for data science applications. Readers are expected to have a working knowledge of Python and basic data science concepts, offering them a chance to expand into backend development. If you're keen to deploy machine learning models and integrate them seamlessly with web technologies, this book is for you. It provides both fundamental insights and advanced techniques to serve a broad range of learners.

Scaling Python with Dask

2023-07-26 O'Reilly Amazon

book

Mika Kimmins , Holden Karau

data data-science data-science-tools dask API Cloud Computing

Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library for parallel computing provides APIs that make it easy to parallelize PyData libraries including NumPy, pandas, and scikit-learn. Authors Holden Karau and Mika Kimmins show you how to use Dask computations in local systems and then scale to the cloud for heavier workloads. This practical book explains why Dask is popular among industry experts and academics and is used by organizations that include Walmart, Capital One, Harvard Medical School, and NASA. With this book, you'll learn: What Dask is, where you can use it, and how it compares with other tools How to use Dask for batch data parallel processing Key distributed system concepts for working with Dask Methods for using Dask with higher-level APIs and building blocks How to work with integrated libraries such as scikit-learn, pandas, and PyTorch How to use Dask with GPUs

Learn Enough Python to Be Dangerous: Software Development, Flask Web Apps, and Beginning Data Science with Python

2023-07-07 O'Reilly Amazon

book

Michael Hartl

software-development programming-languages Python AI/ML Data Science DataViz

All You Need to Know, and Nothing You Don't, to Solve Real Problems with Python Python is one of the most popular programming languages in the world, used for everything from shell scripts to web development to data science. As a result, Python is a great language to learn, but you don't need to learn "everything" to get started, just how to use it efficiently to solve real problems. In Learn Enough Python to Be Dangerous, renowned instructor Michael Hartl teaches the specific concepts, skills, and approaches you need to be professionally productive. Even if you've never programmed before, Hartl helps you quickly build technical sophistication and master the lore you need to succeed. Hartl introduces Python both as a general-purpose language and as a specialist tool for web development and data science, presenting focused examples and exercises that help you internalize what matters, without wasting time on details pros don't care about. Soon, it'll be like you were born knowing this stuff--and you'll be suddenly, seriously dangerous. Learn enough about . . . Applying core Python concepts with the interactive interpreter and command line Writing object-oriented code with Python's native objects Developing and publishing self-contained Python packages Using elegant, powerful functional programming techniques, including Python comprehensions Building new objects, and extending them via Test-Driven Development (TDD) Leveraging Python's exceptional shell scripting capabilities Creating and deploying a full web app, using routes, layouts, templates, and forms Getting started with data-science tools for numerical computations, data visualization, data analysis, and machine learning Mastering concrete and informal skills every developer needs Michael Hartl's Learn Enough Series includes books and video courses that focus on the most important parts of each subject, so you don't have to learn everything to get started--you just have to learn enough to be dangerous and solve technical problems yourself. Like this book? Don't miss Michael Hartl's companion video tutorial, Learn Enough Python to Be Dangerous LiveLessons. Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Dive Into Data Science

2023-07-04 O'Reilly Amazon

book

Bradford Tuckfield

data data-science AI/ML Data Science Marketing Python

Dive into the exciting world of data science with this practical introduction. Packed with essential skills and useful examples, Dive Into Data Science will show you how to obtain, analyze, and visualize data so you can leverage its power to solve common business challenges. With only a basic understanding of Python and high school math, you’ll be able to effortlessly work through the book and start implementing data science in your day-to-day work. From improving a bike sharing company to extracting data from websites and creating recommendation systems, you’ll discover how to find and use data-driven solutions to make business decisions. Topics covered include conducting exploratory data analysis, running A/B tests, performing binary classification using logistic regression models, and using machine learning algorithms. You’ll also learn how to: •Forecast consumer demand •Optimize marketing campaigns •Reduce customer attrition •Predict website traffic •Build recommendation systems With this practical guide at your fingertips, harness the power of programming, mathematical theory, and good old common sense to find data-driven solutions that make a difference. Don’t wait; dive right in!

Time Series Indexing

2023-06-30 O'Reilly Amazon

book

Mihalis Tsoukalos

data data-science data-science-tasks statistics time-series Python

Time series data is at the heart of many applications, from finance and system monitoring to weather forecasting and medical data analysis. "Time Series Indexing" offers a hands-on guide to implementing and leveraging the iSAX indexing technique in Python to efficiently manage, search, and analyze time series data. What this Book will help me do Gain the know-how to implement algorithms like SAX and iSAX with illustrative Python examples. Learn to construct robust time series indexes tailored to real-world data sets. Understand the theoretical underpinnings of time series processing and indexing techniques. Explore and employ visualization techniques to interpret time series structures and insights. Gain the skills to adapt iSAX methodologies to other programming environments and practices. Author(s) Mihalis Tsoukalos is an accomplished developer and author specializing in Python programming and data processing techniques. With years of experience translating complex academic research into practical applications, Mihalis excels at bridging the gap between theory and practice. His writing approach ensures readers grasp both the foundational principles and the hands-on methods needed to succeed. Who is it for? This book best suits researchers, analysts, and developers who work with time series data and seek to elevate their proficiency in indexing and managing such data. It is perfect for professionals with a foundational knowledge of Python and programming concepts. This material also supports learners eager to derive actionable insights from theory-heavy academic research.

Power BI Machine Learning and OpenAI

2023-05-31 O'Reilly Amazon

book

Greg Beaumont

data data-science business-intelligence microsoft-power-platform power-bi AI/ML

Microsoft Power BI Machine Learning and OpenAI offers a comprehensive exploration into advanced data analytics and artificial intelligence using Microsoft Power BI. Through hands-on, workshop-style examples, readers will discover the integration of machine learning models and OpenAI features to enhance business intelligence. This book provides practical examples, real-world scenarios, and step-by-step guidance. What this Book will help me do Learn to apply machine learning capabilities within Power BI to create predictive analytics Understand how to integrate OpenAI services to build enhanced analytics workflows Gain hands-on experience in using R and Python for advanced data visualization in Power BI Master the skills needed to build and deploy SaaS auto ML models within Power BI Leverage Power BI's AI visuals and features to elevate data storytelling Author(s) Greg Beaumont, an expert in data science and business intelligence, brings years of experience in Power BI and analytics to this book. With a focus on practical applications, Greg empowers readers to harness the power of AI and machine learning to elevate their data solutions. As a consultant and trainer, he shares his deep knowledge to help readers unlock the full potential of their tools. Who is it for? This book is ideal for data analysts, BI professionals, and data scientists who aim to integrate machine learning and OpenAI into their workflows. If you're familiar with Power BI's fundamentals and are eager to explore its advanced capabilities, this guide is tailored for you. Perfect for professionals looking to elevate their analytics to a new level, combining data science concepts with Power BI's features.

Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach

2023-04-03 O'Reilly Amazon

book

Umesh R. Hodeghatta , Umesha Nayak

data data-science data-science-tools r AI/ML Analytics

This book illustrates how data can be useful in solving business problems. It explores various analytics techniques for using data to discover hidden patterns and relationships, predict future outcomes, optimize efficiency and improve the performance of organizations. You’ll learn how to analyze data by applying concepts of statistics, probability theory, and linear algebra. In this new edition, both R and Python are used to demonstrate these analyses. Practical Business Analytics Using R and Python also features new chapters covering databases, SQL, Neural networks, Text Analytics, and Natural Language Processing.Part one begins with an introduction to analytics, the foundations required to perform data analytics, and explains different analytics terms and concepts such as databases and SQL, basic statistics, probability theory, and data exploration. Part two introduces predictive models using statistical machine learning and discusses concepts like regression, classification, and neural networks. Part three covers two of the most popular unsupervised learning techniques, clustering and association mining, as well as text mining and natural language processing (NLP). The book concludes with an overview of big data analytics, R and Python essentials for analytics including libraries such as pandas and NumPy. Upon completing this book, you will understand how to improve business outcomes by leveraging R and Python for data analytics. What You Will Learn Master the mathematical foundations required for business analytics Understand various analytics models and data mining techniques such as regression, supervised machine learning algorithms for modeling, unsupervised modeling techniques, and how to choose the correct algorithm for analysis in any given task Use R and Python to develop descriptive models, predictive models, and optimize models Interpret and recommend actions based on analytical model outcomes Who This Book Is For Software professionals and developers, managers, and executives who want to understand and learn the fundamentals of analytics using R and Python.

Applied Geospatial Data Science with Python

2023-02-28 O'Reilly Amazon

book

David S. Jordan

software-development programming-languages Python AI/ML Analytics Data Science

"Applied Geospatial Data Science with Python" introduces readers to the power of integrating geospatial data into data science workflows. This book equips you with practical methods for processing, analyzing, and visualizing spatial data to solve real-world problems. Through hands-on examples and clear, actionable advice, you will master the art of spatial data analysis using Python. What this Book will help me do Learn to process, analyze, and visualize geospatial data using Python libraries. Develop a foundational understanding of GIS and geospatial data science principles. Gain skills in building geospatial AI and machine learning models for specific use cases. Apply geospatial data workflows to practical scenarios like optimization and clustering. Create a portfolio of geospatial data science projects relevant across different industries. Author(s) David S. Jordan is an experienced data scientist with years of expertise in GIS and geospatial analytics. With a passion for making complex topics accessible, David leverages his deep technical knowledge to provide practical, hands-on instruction. His approach emphasizes real-world applications and encourages learners to develop confidence as they work with geospatial data. Who is it for? This book is perfect for data scientists looking to integrate geospatial data analysis into their existing workflows, and GIS professionals seeking to expand into data science. If you already have a basic knowledge of Python for data analysis or data science and want to explore how to work effectively with geospatial data to drive impactful solutions, this is the book for you.

Experimentation for Engineers

2023-02-23 O'Reilly Amazon

book

David Sweet

data data-science data-science-tasks statistics bayesian-statistics AI/ML

Optimize the performance of your systems with practical experiments used by engineers in the world’s most competitive industries. In Experimentation for Engineers: From A/B testing to Bayesian optimization you will learn how to: Design, run, and analyze an A/B test Break the "feedback loops" caused by periodic retraining of ML models Increase experimentation rate with multi-armed bandits Tune multiple parameters experimentally with Bayesian optimization Clearly define business metrics used for decision-making Identify and avoid the common pitfalls of experimentation Experimentation for Engineers: From A/B testing to Bayesian optimization is a toolbox of techniques for evaluating new features and fine-tuning parameters. You’ll start with a deep dive into methods like A/B testing, and then graduate to advanced techniques used to measure performance in industries such as finance and social media. Learn how to evaluate the changes you make to your system and ensure that your testing doesn’t undermine revenue or other business metrics. By the time you’re done, you’ll be able to seamlessly deploy experiments in production while avoiding common pitfalls. About the Technology Does my software really work? Did my changes make things better or worse? Should I trade features for performance? Experimentation is the only way to answer questions like these. This unique book reveals sophisticated experimentation practices developed and proven in the world’s most competitive industries that will help you enhance machine learning systems, software applications, and quantitative trading solutions. About the Book Experimentation for Engineers: From A/B testing to Bayesian optimization delivers a toolbox of processes for optimizing software systems. You’ll start by learning the limits of A/B testing, and then graduate to advanced experimentation strategies that take advantage of machine learning and probabilistic methods. The skills you’ll master in this practical guide will help you minimize the costs of experimentation and quickly reveal which approaches and features deliver the best business results. What's Inside Design, run, and analyze an A/B test Break the “feedback loops” caused by periodic retraining of ML models Increase experimentation rate with multi-armed bandits Tune multiple parameters experimentally with Bayesian optimization About the Reader For ML and software engineers looking to extract the most value from their systems. Examples in Python and NumPy. About the Author David Sweet has worked as a quantitative trader at GETCO and a machine learning engineer at Instagram. He teaches in the AI and Data Science master's programs at Yeshiva University. Quotes Putting an ‘improved’ version of a system into production can be really risky. This book focuses you on what is important! - Simone Sguazza, University of Applied Sciences and Arts of Southern Switzerland A must-have for anyone setting up experiments, from A/B tests to contextual bandits and Bayesian optimization. - Maxim Volgin, KLM Shows a non-mathematical programmer exactly what they need to write powerful mathematically-based testing algorithms. - Patrick Goetz, The University of Texas at Austin Gives you the tools you need to get the most out of your experiments. - Marc-Anthony Taylor, Raiffeisen Bank International

Data Mining and Predictive Analytics for Business Decisions

2023-02-13 O'Reilly Amazon

book

Andres Fortino

data data-science data-science-tasks exploratory-data-analysis AI/ML Analytics

With many recent advances in data science, we have many more tools and techniques available for data analysts to extract information from data sets. This book will assist data analysts to move up from simple tools such as Excel for descriptive analytics to answer more sophisticated questions using machine learning. Most of the exercises use R and Python, but rather than focus on coding algorithms, the book employs interactive interfaces to these tools to perform the analysis. Using the CRISP-DM data mining standard, the early chapters cover conducting the preparatory steps in data mining: translating business information needs into framed analytical questions and data preparation. The Jamovi and the JASP interfaces are used with R and the Orange3 data mining interface with Python. Where appropriate, Voyant and other open-source programs are used for text analytics. The techniques covered in this book range from basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics. Includes companion files with case study files, solution spreadsheets, data sets and charts, etc. from the book. Features: Covers basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics Uses R, Python, Jamovi and JASP interfaces, and the Orange3 data mining interface Includes companion files with the case study files from the book, solution spreadsheets, data sets, etc.

Pandas for Everyone: Python Data Analysis, 2nd Edition

2022-12-22 O'Reilly Amazon

book

Daniel Y. Chen

data data-science data-science-tools Pandas AI/ML Data Science

Manage and Automate Data Analysis with Pandas in Python Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple data sets. Pandas for Everyone, 2nd Edition, brings together practical knowledge and insight for solving real problems with Pandas, even if youre new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world data science problems such as using regularization to prevent data overfitting, or when to use unsupervised machine learning methods to find the underlying structure in a data set. New features to the second edition include: Extended coverage of plotting and the seaborn data visualization library Expanded examples and resources Updated Python 3.9 code and packages coverage, including statsmodels and scikit-learn libraries Online bonus material on geopandas, Dask, and creating interactive graphics with Altair Chen gives you a jumpstart on using Pandas with a realistic data set and covers combining data sets, handling missing data, and structuring data sets for easier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes. Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability and introduces you to the wider Python data analysis ecosystem. Work with DataFrames and Series, and import or export data Create plots with matplotlib, seaborn, and pandas Combine data sets and handle missing data Reshape, tidy, and clean data sets so theyre easier to work with Convert data types and manipulate text strings Apply functions to scale data manipulations Aggregate, transform, and filter large data sets with groupby Leverage Pandas advanced date and time capabilities Fit linear models using statsmodels and scikit-learn libraries Use generalized linear modeling to fit models with different response variables Compare multiple models to select the best one Regularize to overcome overfitting and improve performance Use clustering in unsupervised machine learning ...

Python Data Science Handbook, 2nd Edition

2022-12-08 O'Reilly Amazon

book

Jake VanderPlas

software-development programming-languages Python AI/ML Data Science Matplotlib

Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all—IPython, NumPy, pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you'll learn how: IPython and Jupyter provide computational environments for scientists using Python NumPy includes the ndarray for efficient storage and manipulation of dense data arrays Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data Matplotlib includes capabilities for a flexible range of data visualizations Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms

The Art of Data-Driven Business

2022-12-02 O'Reilly Amazon

book

Alan Bernardo Palacio

data data-science data-science-tools Pandas AI/ML Analytics

Learn how to integrate data-driven methodologies and machine learning into your business decision-making processes with 'The Art of Data-Driven Business.' This comprehensive guide shows you how to apply Python-based machine learning techniques to real-world challenges, transforming your organization into an innovative and well-informed enterprise. What this Book will help me do Create professional-quality data visualizations using Python's seaborn library to derive business insights. Analyze customer behavior, including predicting churn, with machine learning techniques. Apply clustering algorithms to segment customers for targeted marketing campaigns. Utilize pandas effectively for pricing and sales analytics to optimize your pricing strategies. Forecast outcomes of promotional strategies to determine costs and benefits and maximize performance. Author(s) None Palacio is an experienced data scientist and educator who specializes in the application of machine learning to solve business problems. With extensive real-world industry experience, Palacio brings practical insights and methodologies to learners. Their teaching connects technical knowledge to actionable business strategies. Who is it for? This book is ideal for business professionals aiming to incorporate data science into their strategies and technical experts seeking to leverage machine learning for business scenarios. Beginners to Python can find foundational help, while data scientists will appreciate the focused practical applications. It's perfect for individuals seeking a strong data-driven perspective in marketing, sales, and customer management.

Scaling Python with Ray

2022-11-29 O'Reilly Amazon

book

Boris Lublinsky , Holden Karau

data data-science Python

Serverless computing enables developers to concentrate solely on their applications rather than worry about where they've been deployed. With the Ray general-purpose serverless implementation in Python, programmers and data scientists can hide servers, implement stateful applications, support direct communication between tasks, and access hardware accelerators. In this book, experienced software architecture practitioners Holden Karau and Boris Lublinsky show you how to scale existing Python applications and pipelines, allowing you to stay in the Python ecosystem while reducing single points of failure and manual scheduling. Scaling Python with Ray is ideal for software architects and developers eager to explore successful case studies and learn more about decision and measurement effectiveness. If your data processing or server application has grown beyond what a single computer can handle, this book is for you. You'll explore distributed processing (the pure Python implementation of serverless) and learn how to: Implement stateful applications with Ray actors Build workflow management in Ray Use Ray as a unified system for batch and stream processing Apply advanced data processing with Ray Build microservices with Ray Implement reliable Ray applications

The Book of Dash

2022-10-25 O'Reilly Amazon

book

Christian Mayer , Adam Schroeder , Ann Marie Ward

data data-science data-science-tasks data-visualization dashboards AI/ML

A swift and practical introduction to building interactive data visualization apps in Python, known as dashboards. Youâ??ve seen dashboards before; think election result visualizations you can update in real time, or population maps you can filter by demographic. With the Python Dash library youâ??ll create analytic dashboards that present data in effective, usable, elegant ways in just a few lines of code. The book is fast-paced and caters to those entirely new to dashboards. It will talk you through the necessary software, then get straight into building the dashboards themselves. Youâ??ll learn the basic format of a Dash app by building a twitter analysis dashboard that maps the number of likes certain accounts gained over time. Youâ??ll build up skills through three more sophisticated projects. The first is a global analysis app that compares country data in three areas: the percentage of a population using the internet, percentage of parliament seats held by women, and CO2 emissions. Youâ??ll then build an investment portfolio dashboard, and an app that allows you to visualize and explore machine learning algorithms. In this book you will: â?¢Create and run your first Dash apps â?¢Use the pandas library to manipulate and analyze social media data â?¢Use Git to download and build on existing apps written by the pros â?¢Visualize machine learning models in your apps â?¢Create and manipulate statistical and scientific charts and maps using Plotly Dash combines several technologies to get you building dashboards quickly and efficiently. This book will do the same.

Practical Linear Algebra for Data Science

2022-09-06 O'Reilly Amazon

book

Mike X Cohen

math-science-engineering math linear-algebra AI/ML Data Science Python

If you want to work in any computational or technical field, you need to understand linear algebra. As the study of matrices and operations acting upon them, linear algebra is the mathematical basis of nearly all algorithms and analyses implemented in computers. But the way it's presented in decades-old textbooks is much different from how professionals use linear algebra today to solve real-world modern applications. This practical guide from Mike X Cohen teaches the core concepts of linear algebra as implemented in Python, including how they're used in data science, machine learning, deep learning, computational simulations, and biomedical data processing applications. Armed with knowledge from this book, you'll be able to understand, implement, and adapt myriad modern analysis methods and algorithms. Ideal for practitioners and students using computer technology and algorithms, this book introduces you to: The interpretations and applications of vectors and matrices Matrix arithmetic (various multiplications and transformations) Independence, rank, and inverses Important decompositions used in applied linear algebra (including LU and QR) Eigendecomposition and singular value decomposition Applications including least-squares model fitting and principal components analysis

talk-data.com

O'Reilly Data Science Books

Top Topics

Top Speakers

Bayesian Optimization in Action

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn

Python for Data Science For Dummies, 3rd Edition

Hands-On Web Scraping with Python - Second Edition

Streamlit for Data Science - Second Edition

Learning Data Science

Python Data Analytics: With Pandas, NumPy, and Matplotlib

Building Statistical Models in Python

Mastering Tableau 2023 - Fourth Edition

Building Data Science Applications with FastAPI - Second Edition

Scaling Python with Dask

Learn Enough Python to Be Dangerous: Software Development, Flask Web Apps, and Beginning Data Science with Python

Dive Into Data Science

Time Series Indexing

Power BI Machine Learning and OpenAI

Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach

Applied Geospatial Data Science with Python

Experimentation for Engineers

Data Mining and Predictive Analytics for Business Decisions

Pandas for Everyone: Python Data Analysis, 2nd Edition

Python Data Science Handbook, 2nd Edition

The Art of Data-Driven Business

Scaling Python with Ray

The Book of Dash

Practical Linear Algebra for Data Science