talk-data.com talk-data.com

Event

O'Reilly Data Science Books

2013-08-09 – 2026-02-25 Oreilly Visit website ↗

Activities tracked

324

Collection of O'Reilly books on Data Science.

Filtering by: Data Science ×

Sessions & talks

Showing 101–125 of 324 · Newest first

Search within this event →
Extending Power BI with Python and R

Dive into the world of advanced analytics and visualizations in Power BI with "Extending Power BI with Python and R". This comprehensive guide will teach you how to integrate Python and R scripting into your Power BI projects, allowing you to build data models, transform data, and create rich visualizations. Learn practical techniques to make your Power BI dashboards more interactive and insightful. What this Book will help me do Master the integration of Python and R scripts into Power BI to enhance its functionality. Learn to implement advanced data transformations and enrichments using external APIs. Create advanced visualizations and custom visuals with R for improved analytics. Perform advanced data analysis including handling missing data using Python and R. Leverage machine learning techniques within Power BI projects to extract actionable insights. Author(s) None Zavarella is a data science expert and renowned author specializing in data analytics and visualization tools. With years of experience working with Power BI, Python, and R in diverse data-driven projects, Zavarella offers a unique perspective on enhancing Power BI capabilities. Passionate about teaching, they craft clear and impactful tutorials for learners. Who is it for? This book is perfect for business intelligence professionals, data scientists, and business analysts who already use Power BI and want to augment its features with Python and R. If you have a foundational understanding of Power BI and some basic familiarity with Python and R, this book will help you explore their combined potential for advanced analytics.

Data Science Bookcamp

Learn data science with Python by building five real-world projects! Experiment with card game predictions, tracking disease outbreaks, and more, as you build a flexible and intuitive understanding of data science. In Data Science Bookcamp you will learn: Techniques for computing and plotting probabilities Statistical analysis using Scipy How to organize datasets with clustering algorithms How to visualize complex multi-variable datasets How to train a decision tree machine learning algorithm In Data Science Bookcamp you’ll test and build your knowledge of Python with the kind of open-ended problems that professional data scientists work on every day. Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career. About the Technology A data science project has a lot of moving parts, and it takes practice and skill to get all the code, algorithms, datasets, formats, and visualizations working together harmoniously. This unique book guides you through five realistic projects, including tracking disease outbreaks from news headlines, analyzing social networks, and finding relevant patterns in ad click data. About the Book Data Science Bookcamp doesn’t stop with surface-level theory and toy examples. As you work through each project, you’ll learn how to troubleshoot common problems like missing data, messy data, and algorithms that don’t quite fit the model you’re building. You’ll appreciate the detailed setup instructions and the fully explained solutions that highlight common failure points. In the end, you’ll be confident in your skills because you can see the results. What's Inside Web scraping Organize datasets with clustering algorithms Visualize complex multi-variable datasets Train a decision tree machine learning algorithm About the Reader For readers who know the basics of Python. No prior data science or machine learning skills required. About the Author Leonard Apeltsin is the Head of Data Science at Anomaly, where his team applies advanced analytics to uncover healthcare fraud, waste, and abuse. Quotes Valuable and accessible… a solid foundation for anyone aspiring to be a data scientist. - Amaresh Rajasekharan, IBM Corporation Really good introduction of statistical data science concepts. A must-have for every beginner! - Simone Sguazza, University of Applied Sciences and Arts of Southern Switzerland A full-fledged tutorial in data science including common Python libraries and language tricks! - Jean-François Morin, Laval University This book is a complete package for understanding how the data science process works end to end. - Ayon Roy, Internshala

Econometrics and Data Science: Apply Data Science Techniques to Model Complex Problems and Implement Solutions for Economic Problems

Get up to speed on the application of machine learning approaches in macroeconomic research. This book brings together economics and data science. Author Tshepo Chris Nokeri begins by introducing you to covariance analysis, correlation analysis, cross-validation, hyperparameter optimization, regression analysis, and residual analysis. In addition, he presents an approach to contend with multi-collinearity. He then debunks a time series model recognized as the additive model. He reveals a technique for binarizing an economic feature to perform classification analysis using logistic regression. He brings in the Hidden Markov Model, used to discover hidden patterns and growth in the world economy. The author demonstrates unsupervised machine learning techniques such as principal component analysis and cluster analysis. Key deep learning concepts and ways of structuring artificial neural networks are explored along with training them and assessing their performance. The Monte Carlo simulation technique is applied to stimulate the purchasing power of money in an economy. Lastly, the Structural Equation Model (SEM) is considered to integrate correlation analysis, factor analysis, multivariate analysis, causal analysis, and path analysis. After reading this book, you should be able to recognize the connection between econometrics and data science. You will know how to apply a machine learning approach to modeling complex economic problems and others beyond this book. You will know how to circumvent and enhance model performance, together with the practical implications of a machine learning approach in econometrics, and you will be able to deal with pressing economic problems. What You Will Learn Examine complex, multivariate, linear-causal structures through the path and structural analysis technique, including non-linearity and hidden states Be familiar with practical applications of machine learning and deep learning in econometrics Understand theoretical framework and hypothesis development, and techniques for selecting appropriate models Develop, test, validate, and improve key supervised (i.e., regression and classification) and unsupervised (i.e., dimension reduction and cluster analysis) machine learning models, alongside neural networks, Markov, and SEM models Represent and interpret data and models Who This Book Is For Beginning and intermediate data scientists, economists, machine learning engineers, statisticians, and business executives

Modern Analytics Platforms

From a global pandemic to extreme weather, the events of 2020 and 2021 have caused organizations to make quick and constant adjustments to their strategy and operations. This transformation is likely to continue and have a major impact on analytics. Not only do responders to Experian's annual Global Data Management survey confirm more demand for data insights, but most of them also believe the lack of agility hurt their organization's responses to fast-changing business needs. With this O'Reilly report, you'll learn how organizations have begun to take new approaches to analytics for business reinvention and digital transformation. Chief analytics and data officers and data analytics, data science, data visualization leaders will explore converged analytics and find out how it differs from legacy and current analytics approaches. You'll see where your organization stands in its journey to convergence--and what you need to do next. This report helps you: Examine how three organizations in different industries and with different objectives have benefited from modern analytics Learn how analytics has evolved to support greater business agility at scale Examine the alignment of people, processes, tools, and data in converged analytics Learn the five stages of analytical competition and six dimensions for benchmarking maturity Explore practices that you can adopt to improve your analytics capabilities and your agility

Building Data Science Applications with FastAPI

This comprehensive guide to FastAPI walks readers through developing modern web backends optimized for data science applications. By mastering key concepts like dependency injection and asynchronous programming, you will create high-performing REST APIs and machine learning powered systems. What this Book will help me do Master asynchronous programming and type hinting in Python for efficient coding. Design comprehensive RESTful APIs for machine learning with FastAPI. Build, test, and maintain scalable data science applications. Integrate Python libraries like NumPy and scikit-learn into web backends. Deploy modular and efficient FastAPI-backed systems to production. Author(s) None Voron is a seasoned software developer specialized in web frameworks and data science applications. With a strong background in building scalable systems, they bring invaluable insights on utilizing FastAPI. Voron emphasizes clarity and hands-on learning, sharing their expertise to help developers master the technology efficiently. Who is it for? This book is ideal for data scientists and Python developers interested in creating efficient data science backends. If you have groundwork knowledge of machine learning concepts and Python programming, this book will enhance your ability to deploy and manage APIs for data-driven applications.

Practical Data Science with Python

Practical Data Science with Python guides you through the entire process of leveraging Python tools to analyze and gain insights from data. You'll start with foundational concepts and coding essentials, progressing through statistical analysis, machine learning techniques, and ethical considerations. What this Book will help me do Clean, prepare, and explore data using pandas and NumPy. Understand and implement machine learning models such as random forests and support vector machines. Perform statistical tests and analyze distributions to enhance data insights. Utilize SQL with Python for efficient data interaction. Generate automated reports and dashboards for data storytelling. Author(s) Nathan George has extensive professional experience as a data scientist and Python developer. He specializes in the application of machine learning and statistical methods to solve real-world problems. His writing combines technical depth with an approachable style, aiming to provide readers with actionable knowledge and skills. Who is it for? This book is perfect for data science beginners who have a basic understanding of Python and want to build practical data analysis skills. Students in analytics programs or professionals looking to transition into a data science role will find value in its approachable yet comprehensive coverage. Aspiring data analysts and career changers will gain firsthand exposure to Python-based data science best practices. If you're eager to develop practical, hands-on experience in the data science field, this is the guide for you.

Pandas in Action

Take the next steps in your data science career! This friendly and hands-on guide shows you how to start mastering Pandas with skills you already know from spreadsheet software. In Pandas in Action you will learn how to: Import datasets, identify issues with their data structures, and optimize them for efficiency Sort, filter, pivot, and draw conclusions from a dataset and its subsets Identify trends from text-based and time-based data Organize, group, merge, and join separate datasets Use a GroupBy object to store multiple DataFrames Pandas has rapidly become one of Python's most popular data analysis libraries. In Pandas in Action, a friendly and example-rich introduction, author Boris Paskhaver shows you how to master this versatile tool and take the next steps in your data science career. You’ll learn how easy Pandas makes it to efficiently sort, analyze, filter and munge almost any type of data. About the Technology Data analysis with Python doesn’t have to be hard. If you can use a spreadsheet, you can learn pandas! While its grid-style layouts may remind you of Excel, pandas is far more flexible and powerful. This Python library quickly performs operations on millions of rows, and it interfaces easily with other tools in the Python data ecosystem. It’s a perfect way to up your data game. About the Book Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you’ll find in the real world. What's Inside Organize, group, merge, split, and join datasets Find trends in text-based and time-based data Sort, filter, pivot, optimize, and draw conclusions Apply aggregate operations About the Reader For readers experienced with spreadsheets and basic Python programming. About the Author Boris Paskhaver is a software engineer, Agile consultant, and online educator. His programming courses have been taken by 300,000 students across 190 countries. Quotes Of all the introductory pandas books I’ve read—and I did read a few—this is the best, by a mile. - Erico Lendzian, idibu.com This approachable guide will get you up and running quickly with all the basics you need to analyze your data. - Jonathan Sharley, SiriusXM Media Understanding and putting in practice the concepts of this book will help you increase productivity and make you look like a pro. - Jose Apablaza, Steadfast Networks Teaches both novice and expert Python users the essential concepts required for data analysis and data science. - Ben McNamara, DataGeek

Data Science For Dummies, 3rd Edition

Monetize your company’s data and data science expertise without spending a fortune on hiring independent strategy consultants to help What if there was one simple, clear process for ensuring that all your company’s data science projects achieve a high a return on investment? What if you could validate your ideas for future data science projects, and select the one idea that’s most prime for achieving profitability while also moving your company closer to its business vision? There is. Industry-acclaimed data science consultant, Lillian Pierson, shares her proprietary STAR Framework – A simple, proven process for leading profit-forming data science projects. Not sure what data science is yet? Don’t worry! Parts 1 and 2 of Data Science For Dummies will get all the bases covered for you. And if you’re already a data science expert? Then you really won’t want to miss the data science strategy and data monetization gems that are shared in Part 3 onward throughout this book. Data Science For Dummies demonstrates: The only process you’ll ever need to lead profitable data science projects Secret, reverse-engineered data monetization tactics that no one’s talking about The shocking truth about how simple natural language processing can be How to beat the crowd of data professionals by cultivating your own unique blend of data science expertise Whether you’re new to the data science field or already a decade in, you’re sure to learn something new and incredibly valuable from Data Science For Dummies. Discover how to generate massive business wins from your company’s data by picking up your copy today.

Data Science for Marketing Analytics - Second Edition

In 'Data Science for Marketing Analytics', you'll embark on a journey that integrates the power of data analytics with strategic marketing. With a focus on practical application, this guide walks you through using Python to analyze datasets, implement machine learning models, and derive data-driven insights. What this Book will help me do Gain expertise in cleaning, exploring, and visualizing marketing data using Python. Build machine learning models to predict customer behavior and sales outcomes. Leverage unsupervised learning techniques for effective customer segmentation. Compare and optimize predictive models using advanced evaluation methods. Master Python libraries like pandas and Matplotlib for data manipulation and visualization. Author(s) Mirza Rahim Baig, Gururajan Govindan, and Vishwesh Ravi Shrimali combine their extensive expertise in data analytics and marketing to bring you this comprehensive guide. Drawing from years of applying analytics in real-world marketing scenarios, they provide a hands-on approach to learning data science tools and techniques. Who is it for? This book is perfect for marketing professionals and analysts eager to harness the capabilities of Python to enhance their data-driven strategies. It is also ideal for data scientists looking to apply their skills in marketing across various roles. While a basic understanding of data analysis and Python will help, all key concepts are introduced comprehensively for beginners.

Pandas Brain Teasers

This book contains 25 short programs that will challenge your understanding of Pandas. Like any big project, the Pandas developers had to make some design decisions that at times seem surprising. This book uses those quirks as a teaching opportunity. By understanding the gaps in your knowledge, you'll become better at what you do. Some of the teasers are from the author's experience shipping bugs to production, and some from others doing the same. Teasers and puzzles are fun, and learning how to solve them can teach you to avoid programming mistakes and maybe even impress your colleagues and future employers. Working with data is central to nearly everything we do, from disease contact tracing and analyzing health records to smart meters that track utility consumption behavior. With the power of Python's pandas library, you can process and analyze this data in a highly efficient and simple-to-understand way. And with 25 brain teasers designed to turn this technology's quirks into a teaching opportunity, you'll be honing your data science skills while having fun at the same time. Following a simple format, you'll challenge yourself and your understanding of pandas. Read a short Python program that uses pandas, try to guess the output, run the code yourself, and then go to the next page for an explanation of the solution. From common pitfalls and hidden gotchas to unexpected twists and turns, you'll deepen your understanding of pandas, learn to write more efficient code, and reduce the number of bugs in the software you develop. You may even impress your colleagues and your employers, both present and future. Learn the tricks of the trade with Python's pandas, in one of the most fun and creative ways around. What You Need: To run the code you'll need Python version 3.8 or upper and Pandas version 1.0 or upper installed. We use Python version 3.8.3 and Pandas version 1.0.5; the output might change in future versions.

Getting Started with Streamlit for Data Science

Getting Started with Streamlit for Data Science is your essential guide to quickly and efficiently building dynamic data science web applications in Python using Streamlit. Whether you're embedding machine learning models, visualizing data, or deploying projects, this book helps you excel in creating and sharing interactive apps with ease. What this Book will help me do Set up a development environment to create your first Streamlit application. Implement and visualize dynamic data workflows by integrating various Python libraries into Streamlit. Develop and showcase machine learning models within Streamlit for clear and interactive presentations. Deploy your projects effortlessly using platforms like Streamlit Sharing, Heroku, and AWS. Utilize tools like Streamlit Components and themes to enhance the aesthetics and usability of your apps. Author(s) Tyler Richards is a data science expert with extensive experience in leveraging technology to present complex data models in an understandable way. He brings practical solutions to readers, aiming to empower them with the tools they need to succeed in the field of data science. Tyler adopts a hands-on teaching method with illustrative examples to ensure clarity and easy learning. Who is it for? This book is designed for anyone involved in data science, from beginners just starting in the field to experienced professionals who want to learn to create interactive web applications using Streamlit. Ideal for those with a working knowledge of Python, this resource will help you streamline your workflows and enhance your project presentations.

Data Science at the Command Line, 2nd Edition

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools--useful whether you work with Windows, macOS, or Linux. You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on text, CSV, HTML, XML, and JSON files Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow Create your own tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines Model data with dimensionality reduction, regression, and classification algorithms Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark

Introduction to Statistical and Machine Learning Methods for Data Science

Boost your understanding of data science techniques to solve real-world problems Data science is an exciting, interdisciplinary field that extracts insights from data to solve business problems. This book introduces common data science techniques and methods and shows you how to apply them in real-world case studies. From data preparation and exploration to model assessment and deployment, this book describes every stage of the analytics life cycle, including a comprehensive overview of unsupervised and supervised machine learning techniques. The book guides you through the necessary steps to pick the best techniques and models and then implement those models to successfully address the original business need. No software is shown in the book, and mathematical details are kept to a minimum. This allows you to develop an understanding of the fundamentals of data science, no matter what background or experience level you have.

Knowledge Graphs

Applying knowledge in the right context is the most powerful lever businesses can use to become agile, creative, and resilient. Knowledge graphs add context, meaning, and utility to business data. They drive intelligence into data for unparalleled automation and visibility into processes, products, and customers. Businesses use knowledge graphs to anticipate downstream effects, make decisions based on all relevant information, and quickly respond to dynamic markets. In this report for chief information and data officers, Jesus Barassa, Amy E. Hodler, and Jim Webber from Neo4j show how to use knowledge graphs to gain insights, reveal a flexible and intuitive representation of complex data relationships, and make better predictions based on holistic information. Explore knowledge graph mechanics and common organizing principles Build and exploit a connected representation of your enterprise data environment Use decisioning knowledge graphs to explore the advantages of adding relationships to data analytics and data science Conduct virtual testing using software versions of real-world processes Deploy knowledge graphs for more trusted data, higher accuracies, and better reasoning for contextual AI

Data Science Projects with Python - Second Edition

Data Science Projects with Python offers a hands-on, project-based approach to learning data science using real-world data sets and tools. You will explore data using Python libraries like pandas and Matplotlib, build machine learning models with scikit-learn, and apply advanced techniques like XGBoost and SHAP values. This book equips you to confidently extract insights, evaluate models, and deliver results with clarity. What this Book will help me do Learn to load, clean, and preprocess data using Python and pandas. Build and evaluate predictive models, including logistic regression and random forests. Visualize data effectively using Python libraries like Matplotlib. Master advanced techniques like XGBoost and algorithmic fairness. Communicate data-driven insights to aid decision making in practical scenarios. Author(s) Stephen Klosterman is an experienced data scientist with a strong focus on practical applications of machine learning in business. Combining a rich academic background with hands-on industry experience, he excels at explaining complex concepts in an approachable way. As the author of 'Data Science Projects with Python,' his goal is to provide learners with the skills needed for real-world data science challenges. Who is it for? This book is ideal for beginners in data science and machine learning who have some basic programming knowledge in Python. Aspiring data scientists will benefit from its practical, end-to-end examples. Professionals seeking to expand their skillset in predictive modeling and delivering business insights will find this book invaluable. Some foundation in statistics and programming is recommended.

Essentials of Data Science and Analytics

Data science and analytics have emerged as the most desired fields in driving business decisions. Using the techniques and methods of data science, decision makers can uncover hidden patterns in their data, develop algorithms and models that help improve processes and make key business decisions. Data science is a data driven decision making approach that uses several different areas and disciplines with a purpose of extracting insights and knowledge from structured and unstructured data. The algorithms and models of data science along with machine learning and predictive modeling are widely used in solving business problems and predicting future outcomes. This book combines the key concepts of data science and analytics to help you gain a practical understanding of these fields. The four different sections of the book are divided into chapters that explain the core of data science. Given the booming interest in data science, this book is timely and informative.

Behavioral Data Analysis with R and Python

Harness the full power of the behavioral data in your company by learning tools specifically designed for behavioral data analysis. Common data science algorithms and predictive analytics tools treat customer behavioral data, such as clicks on a website or purchases in a supermarket, the same as any other data. Instead, this practical guide introduces powerful methods specifically tailored for behavioral data analysis. Advanced experimental design helps you get the most out of your A/B tests, while causal diagrams allow you to tease out the causes of behaviors even when you can't run experiments. Written in an accessible style for data scientists, business analysts, and behavioral scientists, thispractical book provides complete examples and exercises in R and Python to help you gain more insight from your data--immediately. Understand the specifics of behavioral data Explore the differences between measurement and prediction Learn how to clean and prepare behavioral data Design and analyze experiments to drive optimal business decisions Use behavioral data to understand and measure cause and effect Segment customers in a transparent and insightful way

Becoming a Data Head

"Turn yourself into a Data Head. You'll become a more valuable employee and make your organization more successful."Thomas H. Davenport, Research Fellow, Author of Competing on Analytics, Big Data @ Work, and The AI Advantage You've heard the hype around data—now get the facts. In Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning, award-winning data scientists Alex Gutman and Jordan Goldmeier pull back the curtain on data science and give you the language and tools necessary to talk and think critically about it. You'll learn how to: Think statistically and understand the role variation plays in your life and decision making Speak intelligently and ask the right questions about the statistics and results you encounter in the workplace Understand what's really going on with machine learning, text analytics, deep learning, and artificial intelligence Avoid common pitfalls when working with and interpreting data Becoming a Data Head is a complete guide for data science in the workplace: covering everything from the personalities you’ll work with to the math behind the algorithms. The authors have spent years in data trenches and sought to create a fun, approachable, and eminently readable book. Anyone can become a Data Head—an active participant in data science, statistics, and machine learning. Whether you're a business professional, engineer, executive, or aspiring data scientist, this book is for you.

Responsible Data Science

Explore the most serious prevalent ethical issues in data science with this insightful new resource The increasing popularity of data science has resulted in numerous well-publicized cases of bias, injustice, and discrimination. The widespread deployment of “Black box” algorithms that are difficult or impossible to understand and explain, even for their developers, is a primary source of these unanticipated harms, making modern techniques and methods for manipulating large data sets seem sinister, even dangerous. When put in the hands of authoritarian governments, these algorithms have enabled suppression of political dissent and persecution of minorities. To prevent these harms, data scientists everywhere must come to understand how the algorithms that they build and deploy may harm certain groups or be unfair. Responsible Data Science delivers a comprehensive, practical treatment of how to implement data science solutions in an even-handed and ethical manner that minimizes the risk of undue harm to vulnerable members of society. Both data science practitioners and managers of analytics teams will learn how to: Improve model transparency, even for black box models Diagnose bias and unfairness within models using multiple metrics Audit projects to ensure fairness and minimize the possibility of unintended harm Perfect for data science practitioners, Responsible Data Science will also earn a spot on the bookshelves of technically inclined managers, software developers, and statisticians.

Hands-On Data Analysis with Pandas - Second Edition

'Hands-On Data Analysis with Pandas' guides you to gain expertise in the Python pandas library for data analysis and manipulation. With practical, real-world examples, you'll learn to analyze datasets, visualize data trends, and implement machine learning models for actionable insights. What this Book will help me do Understand and implement data analysis techniques with Python. Develop expertise in data manipulation using pandas and NumPy. Visualize data effectively with pandas visualization tools and seaborn. Apply machine learning techniques with Python libraries. Combine datasets and handle complex data workflows efficiently. Author(s) Stefanie Molin is a software engineer and data scientist with extensive experience in analytics and Python. She has worked with large data-driven systems and has a strong focus on teaching data analysis effectively. Stefanie's books are known for their practical, hands-on approach to solving real data problems. Who is it for? This book is perfect for aspiring data scientists, data analysts, and Python developers. Readers with beginner to intermediate skill levels in Python will find it accessible and informative. It is designed for those seeking to build practical data analysis skills. If you're looking to add data science and pandas to your toolkit, this book is ideal.

Data Science on AWS

With this practical book, AI and machine learning practitioners will learn how to successfully build and deploy data science projects on Amazon Web Services. The Amazon AI and machine learning stack unifies data science, data engineering, and application development to help level up your skills. This guide shows you how to build and run pipelines in the cloud, then integrate the results into applications in minutes instead of days. Throughout the book, authors Chris Fregly and Antje Barth demonstrate how to reduce cost and improve performance. Apply the Amazon AI and ML stack to real-world use cases for natural language processing, computer vision, fraud detection, conversational devices, and more Use automated machine learning to implement a specific subset of use cases with SageMaker Autopilot Dive deep into the complete model development lifecycle for a BERT-based NLP use case including data ingestion, analysis, model training, and deployment Tie everything together into a repeatable machine learning operations pipeline Explore real-time ML, anomaly detection, and streaming analytics on data streams with Amazon Kinesis and Managed Streaming for Apache Kafka Learn security best practices for data science projects and workflows including identity and access management, authentication, authorization, and more

Cleaning Data for Effective Data Science

Dive into the intricacies of data cleaning, a crucial aspect of any data science and machine learning pipeline, with 'Cleaning Data for Effective Data Science.' This comprehensive guide walks you through tools and methodologies like Python, R, and command-line utilities to prepare raw data for analysis. Learn practical strategies to manage, clean, and refine data encountered in the real world. What this Book will help me do Understand and utilize various data formats such as JSON, SQL, and PDF for data ingestion and processing. Master key tools like pandas, SciPy, and Tidyverse to manipulate and analyze datasets efficiently. Develop heuristics and methodologies for assessing data quality, detecting bias, and identifying irregularities. Apply advanced techniques like feature engineering and statistical adjustments to enhance data usability. Gain confidence in handling time series data by employing methods for de-trending and interpolating missing values. Author(s) David Mertz has years of experience as a Python programmer and data scientist. Known for his engaging and accessible teaching style, David has authored numerous technical articles and books. He emphasizes not only the technicalities of data science tools but also the critical thinking that approaches solutions creatively and effectively. Who is it for? 'Cleaning Data for Effective Data Science' is designed for data scientists, software developers, and educators dealing with data preparation. Whether you're an aspiring data enthusiast or an experienced professional looking to refine your skills, this book provides essential tools and frameworks. Prior programming knowledge, particularly in Python or R, coupled with an understanding of statistical fundamentals, will help you make the most of this resource.

Data Science for Supply Chain Forecasting

Using data science in order to solve a problem requires a scientific mindset more than coding skills. Data Science for Supply Chain Forecasting, Second Edition contends that a true scientific method which includes experimentation, observation, and constant questioning must be applied to supply chains to achieve excellence in demand forecasting. This second edition adds more than 45 percent extra content with four new chapters including an introduction to neural networks and the forecast value added framework. Part I focuses on statistical "traditional" models, Part II, on machine learning, and the all-new Part III discusses demand forecasting process management. The various chapters focus on both forecast models and new concepts such as metrics, underfitting, overfitting, outliers, feature optimization, and external demand drivers. The book is replete with do-it-yourself sections with implementations provided in Python (and Excel for the statistical models) to show the readers how to apply these models themselves. This hands-on book, covering the entire range of forecasting—from the basics all the way to leading-edge models—will benefit supply chain practitioners, forecasters, and analysts looking to go the extra mile with demand forecasting. Events around the book Link to a De Gruyter Online Event in which the author Nicolas Vandeput together with Stefan de Kok, supply chain innovator and CEO of Wahupa; Spyros Makridakis, professor at the University of Nicosia and director of the Institute For the Future (IFF); and Edouard Thieuleux, founder of AbcSupplyChain, discuss the general issues and challenges of demand forecasting and provide insights into best practices (process, models) and discussing how data science and machine learning impact those forecasts. The event will be moderated by Michael Gilliland, marketing manager for SAS forecasting software: https://youtu.be/1rXjXcabW2s

Forecasting Time Series Data with Facebook Prophet

Delve into the art of time series forecasting with the comprehensive power of Facebook Prophet. This tool enables users to develop precise forecasting models with simplicity and effectiveness. Through this book, you'll explore Prophet's core functionality and advanced configurations, equipping yourself with the knowledge to proficiently model and predict data trends. What this Book will help me do Build intuitive and effective forecasting models using Facebook Prophet. Understand the role and implementation of seasonality and holiday effects in time series data. Identify and address outliers and special data events effectively. Optimize forecasts using advanced techniques like hyperparameter tuning and additional regressors. Evaluate and deploy forecasting models in production settings for practical applications. Author(s) Greg Rafferty is a seasoned data science professional with extensive experience in time series forecasting. Having worked on diverse forecasting projects, Greg brings a unique perspective that integrates practicality and depth. His approachable writing style makes complex topics accessible and actionable. Who is it for? This book is tailored for data scientists, analysts, and developers seeking to enhance their forecasting capabilities using Python. If you have a grounding in Python and a basic understanding of forecasting principles, you will find this book a valuable resource to sharpen your expertise and achieve new forecasting precision.

Beginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks

Enhance your data science programming and analysis with the Wolfram programming language and Mathematica, an applied mathematical tools suite. The book will introduce you to the Wolfram programming language and its syntax, as well as the structure of Mathematica and its advantages and disadvantages. You’ll see how to use the Wolfram language for data science from a theoretical and practical perspective. Learning this language makes your data science code better because it is very intuitive and comes with pre-existing functions that can provide a welcoming experience for those who use other programming languages. You’ll cover how to use Mathematica where data management and mathematical computations are needed. Along the way you’ll appreciate how Mathematica provides a complete integrated platform: it has a mixed syntax as a result of its symbolic and numerical calculations allowing it to carry out various processes without superfluous lines of code. You’ll learn to use its notebooks as a standard format, which also serves to create detailed reports of the processes carried out. What You Will Learn Use Mathematica to explore data and describe the concepts using Wolfram language commands Create datasets, work with data frames, and create tables Import, export, analyze, and visualize data Work with the Wolfram data repository Build reports on the analysis Use Mathematica for machine learning, with different algorithms, including linear, multiple, and logistic regression; decision trees; and data clustering Who This Book Is For Data scientists new to using Wolfram and Mathematica as a language/tool to program in. Programmers should have some prior programming experience, but can be new to the Wolfram language.