O'Reilly Data Science Books

Pro Data Mashup for Power BI: Powering Up with Power Query and the M Language to Find, Load, and Transform Data

2022-08-25 O'Reilly Amazon

book

Adam Aspin

data data-science business-intelligence microsoft-power-platform power-bi Analytics

This book provides all you need to find data from external sources and load and transform that data into Power BI where you can mine it for business insights and a competitive edge. This ranges from connecting to corporate databases such as Azure SQL and SQL Server to file-based data sources, and cloud- and web-based data sources. The book also explains the use of Direct Query and Live Connect to establish instant connections to databases and data warehouses and avoid loading data. The book provides detailed guidance on techniques for transforming inbound data into normalized data sets that are easy to query and analyze. This covers data cleansing, data modification, and standardization as well as merging source data into robust data structures that can feed into your data model. You will learn how to pivot and transpose data and extrapolate missing values as well as harness external programs such as R and Python into a Power Query data flow. You also will see how to handle errors in source data and extend basic data ingestion to create robust and parameterized data load and transformation processes. Everything in this book is aimed at helping you deliver compelling and interactive insight with remarkable ease using Power BI’s built-in data load and transformation tools. What You Will Learn Connect Power BI to a range of external data sources Prepare data from external sources for easy analysis in Power BI Cleanse data from duplicates, outliers, and other bad values Make live connections from which to refresh data quickly and easily Apply advanced techniques to interpolate missing data Who This Book Is For All Power BI users from beginners to super users. Any user of the world’s leading dashboarding toolcan leverage the techniques explained in this book to turbo-charge their data preparation skills and learn how a wide range of external data sources can be harnessed and loaded into Power BI to drive their analytics. No previous knowledge of working with data, databases, or external data sources is required—merely the need to find, transform, and load data into Power BI..

Effective Data Science Infrastructure

2022-08-09 O'Reilly Amazon

book

Ville Tuulos

data data-science AI/ML Analytics AWS Cloud Computing

Simplify data science infrastructure to give data scientists an efficient path from prototype to production. In Effective Data Science Infrastructure you will learn how to: Design data science infrastructure that boosts productivity Handle compute and orchestration in the cloud Deploy machine learning to production Monitor and manage performance and results Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, Conda, and Docker Architect complex applications for multiple teams and large datasets Customize and grow data science infrastructure Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you’ll master scalable techniques for data storage, computation, experiment tracking, and orchestration that are relevant to companies of all shapes and sizes. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python. The author is donating proceeds from this book to charities that support women and underrepresented groups in data science. About the Technology Growing data science projects from prototype to production requires reliable infrastructure. Using the powerful new techniques and tooling in this book, you can stand up an infrastructure stack that will scale with any organization, from startups to the largest enterprises. About the Book Effective Data Science Infrastructure teaches you to build data pipelines and project workflows that will supercharge data scientists and their projects. Based on state-of-the-art tools and concepts that power data operations of Netflix, this book introduces a customizable cloud-based approach to model development and MLOps that you can easily adapt to your company’s specific needs. As you roll out these practical processes, your teams will produce better and faster results when applying data science and machine learning to a wide array of business problems. What's Inside Handle compute and orchestration in the cloud Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, AWS, and the Python data ecosystem Architect complex applications that require large datasets and models, and a team of data scientists About the Reader For infrastructure engineers and engineering-minded data scientists who are familiar with Python. About the Author At Netflix, Ville Tuulos designed and built Metaflow, a full-stack framework for data science. Currently, he is the CEO of a startup focusing on data science infrastructure. Quotes By reading and referring to this book, I’m confident you will learn how to make your machine learning operations much more efficient and productive. - From the Foreword by Travis Oliphant, Author of NumPy, Founder of Anaconda, PyData, and NumFOCUS Effective Data Science Infrastructure is a brilliant book. It’s a must-have for every data science team. - Ninoslav Cerkez, Logit More data science. Less headaches. - Dr. Abel Alejandro Coronado Iruegas, National Institute of Statistics and Geography of Mexico Indispensable. A copy should be on every data engineer’s bookshelf. - Matthew Copple, Grand River Analytics

Python for Data Science

2022-08-02 O'Reilly Amazon

book

Yuli Vasiliev

software-development programming-languages Python AI/ML Data Science Marketing

Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. Python for Data Science introduces you to the Pythonic world of data analysis with a learn-by-doing approach rooted in practical examples and hands-on activities. Youâ??ll learn how to write Python code to obtain, transform, and analyze data, practicing state-of-the-art data processing techniques for use cases in business management, marketing, and decision support. You will discover Pythonâ??s rich set of built-in data structures for basic operations, as well as its robust ecosystem of open-source libraries for data science, including NumPy, pandas, scikit-learn, matplotlib, and more. Examples show how to load data in various formats, how to streamline, group, and aggregate data sets, and how to create charts, maps, and other visualizations. Later chapters go in-depth with demonstrations of real-world data applications, including using location data to power a taxi service, market basket analysis to identify items commonly purchased together, and machine learning to predict stock prices.

Data Democratization with Domo

2022-06-17 O'Reilly Amazon

book

Jeff Burtenshaw

data data-science business-intelligence AI/ML BI Cloud Computing

Discover how to leverage the full potential of Domo, a robust cloud-based business intelligence platform, in your organization. This comprehensive guide walks you through data integration, transformation, visualization, and governance techniques, enabling you to deliver impactful, data-driven results quickly and effectively. What this Book will help me do Understand and utilize Domo's cloud data architecture for comprehensive data analysis. Seamlessly acquire and manage data using Domo connectors and tools. Create and customize dashboards that communicate data insights effectively. Build and deploy Python applications and machine learning models on Domo. Securely govern your organization's data with robust Domo features. Author(s) The author, None Burtenshaw, is an expert in business intelligence and data platforms. With years of experience working with data integration tools, their writing combines technical thoroughness with practical insights. They aim to empower professionals with the skills to excel in data-driven decision making, reflecting their passion for making technology accessible and actionable. Who is it for? This book is ideal for business intelligence professionals, including developers and analysts, looking to elevate their understanding of Domo. It is suited for those with a fundamental knowledge of data platforms seeking advanced skills in data management and visualization. BI managers will gain insights into governance and security, while analysts will find inspiration for data storytelling. If you're aiming to master the possibilities of Domo, this book is for you.

The Pandas Workshop

2022-06-17 O'Reilly Amazon

book

William So , Thomas Joseph , Blaine Bateman , Saikat Basak

data data-science data-science-tools Pandas Data Science Matplotlib

The Pandas Workshop offers a detailed journey into the world of data analysis using Python and the pandas library. Throughout the book, you'll build skills in accessing, transforming, visualizing, and modeling data, all while focusing on real-world data science challenges. You will gain the knowledge and confidence needed to dissect and derive insights from complex datasets. What this Book will help me do Understand how to access and load data from various formats including databases and web-based sources. Manipulate and transform data for analysis using efficient pandas techniques. Create insightful visualizations using Matplotlib integrated with pandas for clearer data presentation. Build predictive and descriptive data models and glean data-driven insights. Handle and analyze time-series data to uncover trends and seasonal effects in data patterns. Author(s) Blaine Bateman, Saikat Basak, Thomas Joseph, and William So collectively bring diverse expertise in data analysis, programming, and teaching. Their goal is to make cutting-edge data science techniques accessible through clear explanations and practical exercises, helping learners from varied backgrounds master the pandas library. Who is it for? This book is best suited for novice to intermediate programmers and data enthusiasts who are already familiar with Python but are new to the pandas library. Ideal readers are those interested in honing their skills in data analysis and visualization, as well as leveraging data for informed decision-making. Whether you're an analyst, aspiring data scientist, or business professional seeking to strengthen your analytical toolkit, this book provides beneficial insights and techniques.

Building Data Science Solutions with Anaconda

2022-05-27 O'Reilly Amazon

book

Dan Meador

data data-science AI/ML Data Science NumPy Pandas

Explore the comprehensive world of data science with "Building Data Science Solutions with Anaconda." This book covers essential topics like managing environments with Anaconda, detecting and overcoming bias, and ensuring model interpretability. Delve into practical tools and solutions, all explained in an approachable way to help you become proficient in data science workflows. What this Book will help me do Master environment management for data science projects using Anaconda and conda. Detect and mitigate dataset biases to ensure fair and ethical machine learning models. Learn advanced data science techniques with tools like NumPy, pandas, and Jupyter Notebooks. Understand and explain your machine learning models using LIME and SHAP. Grow your expertise in selecting and fine-tuning AI/ML algorithms for diverse applications. Author(s) None Meador combines extensive expertise in data science with a thorough understanding of Anaconda tools and open source software. With a background in engineering and AI model management, None provides an insightful perspective on the field. Their practical and analogy-driven approach makes technical concepts accessible to learners of any level. Who is it for? This book is ideal for data analysts, aspiring machine learning engineers, and data science professionals who wish to deepen their knowledge and make the most of Anaconda's capabilities. A prior understanding of Python and basic data science principles is assumed. If you're looking to optimize your data science workflows and gain hands-on practice, this book is for you.

Reproducible Data Science with Pachyderm

2022-03-18 O'Reilly Amazon

book

Svetlana Karslioglu

data data-science AI/ML AWS Azure Cloud Computing

Dive into the world of reproducible data science with Pachyderm, a specialized platform designed for version-controlled data pipelines. By following this book, 'Reproducible Data Science with Pachyderm,' you'll gain the skills to implement robust, scalable machine learning workflows with Pachyderm 2.0, covering setup, integration, and advanced use cases. What this Book will help me do Build scalable, version-controlled data pipelines with Pachyderm's unique features. Understand the principles behind reproducible data science and implement them effectively. Deploy Pachyderm on AWS, Google Cloud, and Azure while integrating with popular tools. Create and manage end-to-end machine learning workflows, including hyperparameter tuning. Leverage advanced integrations, such as Pachyderm Notebooks and language clients like Python and Go. Author(s) Svetlana Karslioglu is a seasoned data scientist with extensive experience in constructing scalable machine learning and data processing systems. With years in both practical implementation and educational endeavors, she has a talent for breaking down complex concepts into accessible learning paths. Her approach is hands-on and results-oriented, aimed at empowering professionals to excel in the field of data science. Who is it for? This book is intended for data scientists, machine learning engineers, and data engineers who are keen to ensure reproducibility in their workflows. Ideal readers may have familiarity with data science basics and some exposure to Kubernetes and programming languages like Python. By studying the book, learners will establish confidence in implementing Pachyderm for scalable and reliable data pipelines.

Hands-on Matplotlib: Learn Plotting and Visualizations with Python 3

2021-11-27 O'Reilly Amazon

book

Ashwin Pajankar

data data-science data-science-tasks data-visualization python-viz-tools Matplotlib

Learn the core aspects of NumPy, Matplotlib, and Pandas, and use them to write programs with Python 3. This book focuses heavily on various data visualization techniques and will help you acquire expert-level knowledge of working with Matplotlib, a MATLAB-style plotting library for Python programming language that provides an object-oriented API for embedding plots into applications. You'll begin with an introduction to Python 3 and the scientific Python ecosystem. Next, you'll explore NumPy and ndarray data structures, creation routines, and data visualization. You'll examine useful concepts related to style sheets, legends, and layouts, followed by line, bar, and scatter plots. Chapters then cover recipes of histograms, contours, streamplots, and heatmaps, and how to visualize images and audio with pie and polar charts. Moving forward, you'll learn how to visualize with pcolor, pcolormesh, and colorbar, and how to visualize in 3D in Matplotlib, create simple animations, and embed Matplotlib with different frameworks. The concluding chapters cover how to visualize data with Pandas and Matplotlib, Seaborn, and how to work with the real-life data and visualize it. After reading Hands-on Matplotlib you'll be proficient with Matplotlib and able to comfortably work with ndarrays in NumPy and data frames in Pandas. What You'll Learn Understand Data Visualization and Python using Matplotlib Review the fundamental data structures in NumPy and Pandas Work with 3D plotting, visualizations, and animations Visualize images and audio data Who This Book Is For Data scientists, machine learning engineers and software professionals with basic programming skills.

Extending Power BI with Python and R

2021-11-26 O'Reilly Amazon

book

Luca Zavarella

data data-science business-intelligence microsoft-power-platform power-bi AI/ML

Dive into the world of advanced analytics and visualizations in Power BI with "Extending Power BI with Python and R". This comprehensive guide will teach you how to integrate Python and R scripting into your Power BI projects, allowing you to build data models, transform data, and create rich visualizations. Learn practical techniques to make your Power BI dashboards more interactive and insightful. What this Book will help me do Master the integration of Python and R scripts into Power BI to enhance its functionality. Learn to implement advanced data transformations and enrichments using external APIs. Create advanced visualizations and custom visuals with R for improved analytics. Perform advanced data analysis including handling missing data using Python and R. Leverage machine learning techniques within Power BI projects to extract actionable insights. Author(s) None Zavarella is a data science expert and renowned author specializing in data analytics and visualization tools. With years of experience working with Power BI, Python, and R in diverse data-driven projects, Zavarella offers a unique perspective on enhancing Power BI capabilities. Passionate about teaching, they craft clear and impactful tutorials for learners. Who is it for? This book is perfect for business intelligence professionals, data scientists, and business analysts who already use Power BI and want to augment its features with Python and R. If you have a foundational understanding of Power BI and some basic familiarity with Python and R, this book will help you explore their combined potential for advanced analytics.

Data Science Bookcamp

2021-11-17 O'Reilly Amazon

book

Leonard Apeltsin

data data-science AI/ML Analytics Data Science IBM

Learn data science with Python by building five real-world projects! Experiment with card game predictions, tracking disease outbreaks, and more, as you build a flexible and intuitive understanding of data science. In Data Science Bookcamp you will learn: Techniques for computing and plotting probabilities Statistical analysis using Scipy How to organize datasets with clustering algorithms How to visualize complex multi-variable datasets How to train a decision tree machine learning algorithm In Data Science Bookcamp you’ll test and build your knowledge of Python with the kind of open-ended problems that professional data scientists work on every day. Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career. About the Technology A data science project has a lot of moving parts, and it takes practice and skill to get all the code, algorithms, datasets, formats, and visualizations working together harmoniously. This unique book guides you through five realistic projects, including tracking disease outbreaks from news headlines, analyzing social networks, and finding relevant patterns in ad click data. About the Book Data Science Bookcamp doesn’t stop with surface-level theory and toy examples. As you work through each project, you’ll learn how to troubleshoot common problems like missing data, messy data, and algorithms that don’t quite fit the model you’re building. You’ll appreciate the detailed setup instructions and the fully explained solutions that highlight common failure points. In the end, you’ll be confident in your skills because you can see the results. What's Inside Web scraping Organize datasets with clustering algorithms Visualize complex multi-variable datasets Train a decision tree machine learning algorithm About the Reader For readers who know the basics of Python. No prior data science or machine learning skills required. About the Author Leonard Apeltsin is the Head of Data Science at Anomaly, where his team applies advanced analytics to uncover healthcare fraud, waste, and abuse. Quotes Valuable and accessible… a solid foundation for anyone aspiring to be a data scientist. - Amaresh Rajasekharan, IBM Corporation Really good introduction of statistical data science concepts. A must-have for every beginner! - Simone Sguazza, University of Applied Sciences and Arts of Southern Switzerland A full-fledged tutorial in data science including common Python libraries and language tricks! - Jean-François Morin, Laval University This book is a complete package for understanding how the data science process works end to end. - Ayon Roy, Internshala

Building Data Science Applications with FastAPI

2021-10-08 O'Reilly Amazon

book

François Voron

web-mobile web-development python-web-frameworks fastapi AI/ML API

This comprehensive guide to FastAPI walks readers through developing modern web backends optimized for data science applications. By mastering key concepts like dependency injection and asynchronous programming, you will create high-performing REST APIs and machine learning powered systems. What this Book will help me do Master asynchronous programming and type hinting in Python for efficient coding. Design comprehensive RESTful APIs for machine learning with FastAPI. Build, test, and maintain scalable data science applications. Integrate Python libraries like NumPy and scikit-learn into web backends. Deploy modular and efficient FastAPI-backed systems to production. Author(s) None Voron is a seasoned software developer specialized in web frameworks and data science applications. With a strong background in building scalable systems, they bring invaluable insights on utilizing FastAPI. Voron emphasizes clarity and hands-on learning, sharing their expertise to help developers master the technology efficiently. Who is it for? This book is ideal for data scientists and Python developers interested in creating efficient data science backends. If you have groundwork knowledge of machine learning concepts and Python programming, this book will enhance your ability to deploy and manage APIs for data-driven applications.

Practical Data Science with Python

2021-09-30 O'Reilly Amazon

book

Nathan George

software-development programming-languages Python AI/ML Analytics Data Science

Practical Data Science with Python guides you through the entire process of leveraging Python tools to analyze and gain insights from data. You'll start with foundational concepts and coding essentials, progressing through statistical analysis, machine learning techniques, and ethical considerations. What this Book will help me do Clean, prepare, and explore data using pandas and NumPy. Understand and implement machine learning models such as random forests and support vector machines. Perform statistical tests and analyze distributions to enhance data insights. Utilize SQL with Python for efficient data interaction. Generate automated reports and dashboards for data storytelling. Author(s) Nathan George has extensive professional experience as a data scientist and Python developer. He specializes in the application of machine learning and statistical methods to solve real-world problems. His writing combines technical depth with an approachable style, aiming to provide readers with actionable knowledge and skills. Who is it for? This book is perfect for data science beginners who have a basic understanding of Python and want to build practical data analysis skills. Students in analytics programs or professionals looking to transition into a data science role will find value in its approachable yet comprehensive coverage. Aspiring data analysts and career changers will gain firsthand exposure to Python-based data science best practices. If you're eager to develop practical, hands-on experience in the data science field, this is the guide for you.

Pandas in Action

2021-09-22 O'Reilly Amazon

book

Boris Paskhaver

data data-science data-science-tools Pandas Agile/Scrum Data Science

Take the next steps in your data science career! This friendly and hands-on guide shows you how to start mastering Pandas with skills you already know from spreadsheet software. In Pandas in Action you will learn how to: Import datasets, identify issues with their data structures, and optimize them for efficiency Sort, filter, pivot, and draw conclusions from a dataset and its subsets Identify trends from text-based and time-based data Organize, group, merge, and join separate datasets Use a GroupBy object to store multiple DataFrames Pandas has rapidly become one of Python's most popular data analysis libraries. In Pandas in Action, a friendly and example-rich introduction, author Boris Paskhaver shows you how to master this versatile tool and take the next steps in your data science career. You’ll learn how easy Pandas makes it to efficiently sort, analyze, filter and munge almost any type of data. About the Technology Data analysis with Python doesn’t have to be hard. If you can use a spreadsheet, you can learn pandas! While its grid-style layouts may remind you of Excel, pandas is far more flexible and powerful. This Python library quickly performs operations on millions of rows, and it interfaces easily with other tools in the Python data ecosystem. It’s a perfect way to up your data game. About the Book Pandas in Action introduces Python-based data analysis using the amazing pandas library. You’ll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you’ll find in the real world. What's Inside Organize, group, merge, split, and join datasets Find trends in text-based and time-based data Sort, filter, pivot, optimize, and draw conclusions Apply aggregate operations About the Reader For readers experienced with spreadsheets and basic Python programming. About the Author Boris Paskhaver is a software engineer, Agile consultant, and online educator. His programming courses have been taken by 300,000 students across 190 countries. Quotes Of all the introductory pandas books I’ve read—and I did read a few—this is the best, by a mile. - Erico Lendzian, idibu.com This approachable guide will get you up and running quickly with all the basics you need to analyze your data. - Jonathan Sharley, SiriusXM Media Understanding and putting in practice the concepts of this book will help you increase productivity and make you look like a pro. - Jose Apablaza, Steadfast Networks Teaches both novice and expert Python users the essential concepts required for data analysis and data science. - Ben McNamara, DataGeek

Data Science for Marketing Analytics - Second Edition

2021-09-07 O'Reilly Amazon

book

Vishwesh Ravi Shrimali , Mirza Rahim Baig , Gururajan Govindan

data data-science AI/ML Analytics Data Analytics Data Science

In 'Data Science for Marketing Analytics', you'll embark on a journey that integrates the power of data analytics with strategic marketing. With a focus on practical application, this guide walks you through using Python to analyze datasets, implement machine learning models, and derive data-driven insights. What this Book will help me do Gain expertise in cleaning, exploring, and visualizing marketing data using Python. Build machine learning models to predict customer behavior and sales outcomes. Leverage unsupervised learning techniques for effective customer segmentation. Compare and optimize predictive models using advanced evaluation methods. Master Python libraries like pandas and Matplotlib for data manipulation and visualization. Author(s) Mirza Rahim Baig, Gururajan Govindan, and Vishwesh Ravi Shrimali combine their extensive expertise in data analytics and marketing to bring you this comprehensive guide. Drawing from years of applying analytics in real-world marketing scenarios, they provide a hands-on approach to learning data science tools and techniques. Who is it for? This book is perfect for marketing professionals and analysts eager to harness the capabilities of Python to enhance their data-driven strategies. It is also ideal for data scientists looking to apply their skills in marketing across various roles. While a basic understanding of data analysis and Python will help, all key concepts are introduced comprehensively for beginners.

Pandas Brain Teasers

2021-08-30 O'Reilly Amazon

book

Miki Tebeka

data data-science data-science-tools Pandas Data Science Python

This book contains 25 short programs that will challenge your understanding of Pandas. Like any big project, the Pandas developers had to make some design decisions that at times seem surprising. This book uses those quirks as a teaching opportunity. By understanding the gaps in your knowledge, you'll become better at what you do. Some of the teasers are from the author's experience shipping bugs to production, and some from others doing the same. Teasers and puzzles are fun, and learning how to solve them can teach you to avoid programming mistakes and maybe even impress your colleagues and future employers. Working with data is central to nearly everything we do, from disease contact tracing and analyzing health records to smart meters that track utility consumption behavior. With the power of Python's pandas library, you can process and analyze this data in a highly efficient and simple-to-understand way. And with 25 brain teasers designed to turn this technology's quirks into a teaching opportunity, you'll be honing your data science skills while having fun at the same time. Following a simple format, you'll challenge yourself and your understanding of pandas. Read a short Python program that uses pandas, try to guess the output, run the code yourself, and then go to the next page for an explanation of the solution. From common pitfalls and hidden gotchas to unexpected twists and turns, you'll deepen your understanding of pandas, learn to write more efficient code, and reduce the number of bugs in the software you develop. You may even impress your colleagues and your employers, both present and future. Learn the tricks of the trade with Python's pandas, in one of the most fun and creative ways around. What You Need: To run the code you'll need Python version 3.8 or upper and Pandas version 1.0 or upper installed. We use Python version 3.8.3 and Pandas version 1.0.5; the output might change in future versions.

Getting Started with Streamlit for Data Science

2021-08-20 O'Reilly Amazon

book

Tyler Richards

data data-science AI/ML AWS Data Science Python

Getting Started with Streamlit for Data Science is your essential guide to quickly and efficiently building dynamic data science web applications in Python using Streamlit. Whether you're embedding machine learning models, visualizing data, or deploying projects, this book helps you excel in creating and sharing interactive apps with ease. What this Book will help me do Set up a development environment to create your first Streamlit application. Implement and visualize dynamic data workflows by integrating various Python libraries into Streamlit. Develop and showcase machine learning models within Streamlit for clear and interactive presentations. Deploy your projects effortlessly using platforms like Streamlit Sharing, Heroku, and AWS. Utilize tools like Streamlit Components and themes to enhance the aesthetics and usability of your apps. Author(s) Tyler Richards is a data science expert with extensive experience in leveraging technology to present complex data models in an understandable way. He brings practical solutions to readers, aiming to empower them with the tools they need to succeed in the field of data science. Tyler adopts a hands-on teaching method with illustrative examples to ensure clarity and easy learning. Who is it for? This book is designed for anyone involved in data science, from beginners just starting in the field to experienced professionals who want to learn to create interactive web applications using Streamlit. Ideal for those with a working knowledge of Python, this resource will help you streamline your workflows and enhance your project presentations.

Data Science at the Command Line, 2nd Edition

2021-08-17 O'Reilly Amazon

book

Jeroen Janssens

data data-science Agile/Scrum API CSV Data Science

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 100 Unix power tools--useful whether you work with Windows, macOS, or Linux. You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, engineers, system administrators, and researchers. Obtain data from websites, APIs, databases, and spreadsheets Perform scrub operations on text, CSV, HTML, XML, and JSON files Explore data, compute descriptive statistics, and create visualizations Manage your data science workflow Create your own tools from one-liners and existing Python or R code Parallelize and distribute data-intensive pipelines Model data with dimensionality reduction, regression, and classification algorithms Leverage the command line from Python, Jupyter, R, RStudio, and Apache Spark

Data Science Projects with Python - Second Edition

2021-07-29 O'Reilly Amazon

book

Stephen Klosterman

software-development programming-languages Python AI/ML Data Science Matplotlib

Data Science Projects with Python offers a hands-on, project-based approach to learning data science using real-world data sets and tools. You will explore data using Python libraries like pandas and Matplotlib, build machine learning models with scikit-learn, and apply advanced techniques like XGBoost and SHAP values. This book equips you to confidently extract insights, evaluate models, and deliver results with clarity. What this Book will help me do Learn to load, clean, and preprocess data using Python and pandas. Build and evaluate predictive models, including logistic regression and random forests. Visualize data effectively using Python libraries like Matplotlib. Master advanced techniques like XGBoost and algorithmic fairness. Communicate data-driven insights to aid decision making in practical scenarios. Author(s) Stephen Klosterman is an experienced data scientist with a strong focus on practical applications of machine learning in business. Combining a rich academic background with hands-on industry experience, he excels at explaining complex concepts in an approachable way. As the author of 'Data Science Projects with Python,' his goal is to provide learners with the skills needed for real-world data science challenges. Who is it for? This book is ideal for beginners in data science and machine learning who have some basic programming knowledge in Python. Aspiring data scientists will benefit from its practical, end-to-end examples. Professionals seeking to expand their skillset in predictive modeling and delivering business insights will find this book invaluable. Some foundation in statistics and programming is recommended.

Advanced Forecasting with Python: With State-of-the-Art-Models Including LSTMs, Facebook’s Prophet, and Amazon’s DeepAR

2021-07-02 O'Reilly Amazon

book

Joos Korstanje

data data-science data-science-tasks statistics time-series forecasting

Cover all the machine learning techniques relevant for forecasting problems, ranging from univariate and multivariate time series to supervised learning, to state-of-the-art deep forecasting models such as LSTMs, recurrent neural networks, Facebook’s open-source Prophet model, and Amazon’s DeepAR model. Rather than focus on a specific set of models, this book presents an exhaustive overview of all the techniques relevant to practitioners of forecasting. It begins by explaining the different categories of models that are relevant for forecasting in a high-level language. Next, it covers univariate and multivariate time series models followed by advanced machine learning and deep learning models. It concludes with reflections on model selection such as benchmark scores vs. understandability of models vs. compute time, and automated retraining and updating of models. Each of the models presented in this book is covered in depth, with an intuitive simple explanation ofthe model, a mathematical transcription of the idea, and Python code that applies the model to an example data set. Reading this book will add a competitive edge to your current forecasting skillset. The book is also adapted to those who have recently started working on forecasting tasks and are looking for an exhaustive book that allows them to start with traditional models and gradually move into more and more advanced models. What You Will Learn Carry out forecasting with Python Mathematically and intuitively understand traditional forecasting models and state-of-the-art machine learning techniques Gain the basics of forecasting and machine learning, including evaluation of models, cross-validation, and back testing Select the right model for the right use case Who This Book Is For The advanced nature of the later chapters makes the book relevant for appliedexperts working in the domain of forecasting, as the models covered have been published only recently. Experts working in the domain will want to update their skills as traditional models are regularly being outperformed by newer models.

Behavioral Data Analysis with R and Python

2021-06-16 O'Reilly Amazon

book

Florent Buisson

data data-science Analytics Data Science Python

Harness the full power of the behavioral data in your company by learning tools specifically designed for behavioral data analysis. Common data science algorithms and predictive analytics tools treat customer behavioral data, such as clicks on a website or purchases in a supermarket, the same as any other data. Instead, this practical guide introduces powerful methods specifically tailored for behavioral data analysis. Advanced experimental design helps you get the most out of your A/B tests, while causal diagrams allow you to tease out the causes of behaviors even when you can't run experiments. Written in an accessible style for data scientists, business analysts, and behavioral scientists, thispractical book provides complete examples and exercises in R and Python to help you gain more insight from your data--immediately. Understand the specifics of behavioral data Explore the differences between measurement and prediction Learn how to clean and prepare behavioral data Design and analyze experiments to drive optimal business decisions Use behavioral data to understand and measure cause and effect Segment customers in a transparent and insightful way

Mastering Tableau 2021 - Third Edition

2021-05-31 O'Reilly Amazon

book

David Baldwin , Marleen Meier

data data-science data-science-tasks data-visualization Tableau Analytics

Tableau 2021 brings a wide range of tools and techniques for mastering data visualization and business intelligence. In this book, you will delve into the advanced methodologies to fully utilize Tableau's capabilities. Whether you're dealing with geo-spatial, time-series analytics, or complex dashboards, this resource provides expertise through real-world data challenges. What this Book will help me do Draw connections between multiple databases and create insightful Tableau dashboards. Master advanced data visualization techniques that lead to impactful storytelling. Understand Tableau's integration with programming languages such as Python and R. Analyze datasets with time-series and geo-spatial methods to gain predictive insights. Leverage Tableau Prep Builder for efficient data cleaning and transformation processes. Author(s) Marleen Meier and David Baldwin are seasoned professionals in business intelligence and data analytics. They bring years of practical experience and have helped numerous organizations worldwide transform their data visualization strategies using Tableau. Their collaborative approach ensures a comprehensive, beginner to advanced learning experience. Who is it for? This book is perfect for business intelligence analysts, data analysts, and industry professionals who are already familiar with Tableau's basics and wish to expand their knowledge. It provides advanced techniques and implementations of Tableau for improving data storytelling and dashboard performance. Readers seeking to connect Tableau with external programming tools will also greatly benefit from this guide.

Interactive Dashboards and Data Apps with Plotly and Dash

2021-05-21 O'Reilly Amazon

book

Elias Dabbas

data data-science data-science-tasks data-visualization dashboards Dashboard

This book, "Interactive Dashboards and Data Apps with Plotly and Dash", is a practical guide to building dynamic dashboards and applications using the Dash Python framework. It covers creating visualizations, integrating interactive controls, and deploying the apps, all without requiring JavaScript expertise. What this Book will help me do Master creating interactive data dashboards using Dash and Plotly. Understand how to integrate controls such as sliders and dropdowns into apps. Learn to use Plotly Express for visually representing data with ease. Develop capabilities to deploy a fully functional web app for data interaction. Understand how to use multi-page configurations and URLs for advanced apps. Author(s) None Dabbas is a seasoned Python developer with extensive expertise in data visualization and full-stack development. Drawing from real-world experience, None brings a practical approach to teaching, ensuring that learners understand not only how to build applications but why the approach works. Who is it for? This book is ideal for data analysts, engineers, and developers looking to enhance their visualization capabilities. If you are familiar with Python and have basic HTML skills, you will find this book accessible and rewarding. Beginners looking to explore advanced dashboard creation without JavaScript will also appreciate the clear approach.

Think Bayes, 2nd Edition

2021-05-18 O'Reilly Amazon

book

Allen B. Downey

data data-science data-science-tasks statistics bayesian-statistics Python

If you know how to program, you're ready to tackle Bayesian statistics. With this book, you'll learn how to solve statistical problems with Python code instead of mathematical formulas, using discrete probability distributions rather than continuous mathematics. Once you get the math out of the way, the Bayesian fundamentals will become clearer and you'll begin to apply these techniques to real-world problems. Bayesian statistical methods are becoming more common and more important, but there aren't many resources available to help beginners. Based on undergraduate classes taught by author Allen B. Downey, this book's computational approach helps you get a solid start. Use your programming skills to learn and understand Bayesian statistics Work with problems involving estimation, prediction, decision analysis, evidence, and Bayesian hypothesis testing Get started with simple examples, using coins, dice, and a bowl of cookies Learn computational methods for solving real-world problems

Hands-On Data Analysis with Pandas - Second Edition

2021-04-29 O'Reilly Amazon

book

Stefanie Molin

data data-science data-science-tools Pandas AI/ML Analytics

'Hands-On Data Analysis with Pandas' guides you to gain expertise in the Python pandas library for data analysis and manipulation. With practical, real-world examples, you'll learn to analyze datasets, visualize data trends, and implement machine learning models for actionable insights. What this Book will help me do Understand and implement data analysis techniques with Python. Develop expertise in data manipulation using pandas and NumPy. Visualize data effectively with pandas visualization tools and seaborn. Apply machine learning techniques with Python libraries. Combine datasets and handle complex data workflows efficiently. Author(s) Stefanie Molin is a software engineer and data scientist with extensive experience in analytics and Python. She has worked with large data-driven systems and has a strong focus on teaching data analysis effectively. Stefanie's books are known for their practical, hands-on approach to solving real data problems. Who is it for? This book is perfect for aspiring data scientists, data analysts, and Python developers. Readers with beginner to intermediate skill levels in Python will find it accessible and informative. It is designed for those seeking to build practical data analysis skills. If you're looking to add data science and pandas to your toolkit, this book is ideal.

Bootstrapping

2021-04-19 O'Reilly Amazon

book

Felix Bittmann

data data-science data-science-tasks statistics Python

Bootstrapping is a conceptually simple statistical technique to increase the quality of estimates, conduct robustness checks and compute standard errors for virtually any statistic. This book provides an intelligible and compact introduction for students, scientists and practitioners. It not only gives a clear explanation of the underlying concepts but also demonstrates the application of bootstrapping using Python and Stata.

talk-data.com

O'Reilly Data Science Books

Top Topics

Top Speakers

Pro Data Mashup for Power BI: Powering Up with Power Query and the M Language to Find, Load, and Transform Data

Effective Data Science Infrastructure

Python for Data Science

Data Democratization with Domo

The Pandas Workshop

Building Data Science Solutions with Anaconda

Reproducible Data Science with Pachyderm

Hands-on Matplotlib: Learn Plotting and Visualizations with Python 3

Extending Power BI with Python and R

Data Science Bookcamp

Building Data Science Applications with FastAPI

Practical Data Science with Python

Pandas in Action

Data Science for Marketing Analytics - Second Edition

Pandas Brain Teasers

Getting Started with Streamlit for Data Science

Data Science at the Command Line, 2nd Edition

Data Science Projects with Python - Second Edition

Advanced Forecasting with Python: With State-of-the-Art-Models Including LSTMs, Facebook’s Prophet, and Amazon’s DeepAR

Behavioral Data Analysis with R and Python

Mastering Tableau 2021 - Third Edition

Interactive Dashboards and Data Apps with Plotly and Dash

Think Bayes, 2nd Edition

Hands-On Data Analysis with Pandas - Second Edition

Bootstrapping