talk-data.com talk-data.com

Event

O'Reilly Data Science Books

2013-08-09 – 2026-02-25 Oreilly Visit website ↗

Activities tracked

2118

Collection of O'Reilly books on Data Science.

Sessions & talks

Showing 276–300 of 2118 · Newest first

Search within this event →
Applied Geospatial Data Science with Python

"Applied Geospatial Data Science with Python" introduces readers to the power of integrating geospatial data into data science workflows. This book equips you with practical methods for processing, analyzing, and visualizing spatial data to solve real-world problems. Through hands-on examples and clear, actionable advice, you will master the art of spatial data analysis using Python. What this Book will help me do Learn to process, analyze, and visualize geospatial data using Python libraries. Develop a foundational understanding of GIS and geospatial data science principles. Gain skills in building geospatial AI and machine learning models for specific use cases. Apply geospatial data workflows to practical scenarios like optimization and clustering. Create a portfolio of geospatial data science projects relevant across different industries. Author(s) David S. Jordan is an experienced data scientist with years of expertise in GIS and geospatial analytics. With a passion for making complex topics accessible, David leverages his deep technical knowledge to provide practical, hands-on instruction. His approach emphasizes real-world applications and encourages learners to develop confidence as they work with geospatial data. Who is it for? This book is perfect for data scientists looking to integrate geospatial data analysis into their existing workflows, and GIS professionals seeking to expand into data science. If you already have a basic knowledge of Python for data analysis or data science and want to explore how to work effectively with geospatial data to drive impactful solutions, this is the book for you.

Leading Biotech Data Teams

With hundreds of startups founded each year, the relatively new field of data-focused biotech—or TechBio—is growing rapidly. But without enough experienced practitioners to go around, most organizations hire data scientists with minimal biotech experience and lab scientists who've taken a crash course in data science. This arrangement is problematic. The way lab scientists and data scientists think and work is fundamentally different. But there is a solution. This report introduces biocode principles to help these scientists reframe the way they think about their role, their team's role, and the tools they use to fulfill those roles. Lab and data scientists alike will learn how to address the underlying issues so they can focus on solving these technology problems together. Each of the following chapters presents a vital biocode principle: "Defining Objectives" explores how to broaden the way teams view their work, shifting from purely technical objectives to organizational-level scientific objectives "Building Collaborations" encourages teams to focus their energy on collaboration with partner teams rather than guard their time for technical work "Deploying Tooling" covers ways to coordinate each team's work with the cadence of experiments and lab work

The Kaggle Workbook

"The Kaggle Workbook" is an engaging and practical guide for anyone looking to excel in Kaggle competitions by learning from real past case studies and hands-on exercises. Inside, you'll dive deep into key data science concepts, explore how Kaggle Grandmasters tackle challenges, and apply new skills to your own projects. What this Book will help me do Master the methodology used in past Kaggle competitions for real-world applications. Discover and implement advanced data science techniques such as gradient boosting and NLP. Build a portfolio that demonstrates hands-on experience solving complex data problems. Learn time-series forecasting and computer vision by exploring detailed case studies. Develop a practical mindset for competitive data science problem solving. Author(s) Konrad Banachewicz and Luca Massaron bring their expertise as Kaggle Grandmasters to the pages of this book. With extensive experience in data science and collaborative problem-solving, they guide readers through practical exercises with a clear, approachable style. Their passion for sharing knowledge shines through in every chapter. Who is it for? "The Kaggle Workbook" is ideal for aspiring and experienced data scientists who want to sharpen their competitive data science skills. It caters to those with a foundational knowledge of data science and an interest in enhancing it through practical exercises. The book is a perfect fit for anyone aiming to succeed in Kaggle competitions, whether starting out or advancing further.

Data Wrangling with R

Data Wrangling with R guides you through mastering data preparation in the R programming language using tidyverse libraries. You will learn techniques to load, explore, transform, and visualize data effectively, gaining the skills needed for data modeling and insights extraction. What this Book will help me do Understand how to use R and tidyverse libraries to handle data wrangling tasks. Learn methods to work with diverse data types like numbers, strings, and dates. Gain proficiency in building visual representations of data using ggplot2. Build and validate your first predictive model for useful insights. Create an interactive web application with Shiny in R. Author(s) Gustavo Santos is an experienced data scientist specializing in R programming and data visualization. With a background in statistics and several years of professional experience in industry and academia, Gustavo excels at translating complex data analytics concepts into practical skills. His approach to teaching is hands-on and example-driven, aiming to empower readers to excel in real-world applications. Who is it for? If you are a data scientist, data analyst, or even a beginner programmer who wants to enhance their data manipulation and visualization skills, this book is perfect for you. Familiarity with R or a general understanding of programming concepts is suggested but not mandatory. It caters to professionals looking to refine their data wrangling workflow and to students aspiring to break into data-centered fields. By the end, you'll be ready to apply data wrangling and visualization tools in your projects.

Experimentation for Engineers

Optimize the performance of your systems with practical experiments used by engineers in the world’s most competitive industries. In Experimentation for Engineers: From A/B testing to Bayesian optimization you will learn how to: Design, run, and analyze an A/B test Break the "feedback loops" caused by periodic retraining of ML models Increase experimentation rate with multi-armed bandits Tune multiple parameters experimentally with Bayesian optimization Clearly define business metrics used for decision-making Identify and avoid the common pitfalls of experimentation Experimentation for Engineers: From A/B testing to Bayesian optimization is a toolbox of techniques for evaluating new features and fine-tuning parameters. You’ll start with a deep dive into methods like A/B testing, and then graduate to advanced techniques used to measure performance in industries such as finance and social media. Learn how to evaluate the changes you make to your system and ensure that your testing doesn’t undermine revenue or other business metrics. By the time you’re done, you’ll be able to seamlessly deploy experiments in production while avoiding common pitfalls. About the Technology Does my software really work? Did my changes make things better or worse? Should I trade features for performance? Experimentation is the only way to answer questions like these. This unique book reveals sophisticated experimentation practices developed and proven in the world’s most competitive industries that will help you enhance machine learning systems, software applications, and quantitative trading solutions. About the Book Experimentation for Engineers: From A/B testing to Bayesian optimization delivers a toolbox of processes for optimizing software systems. You’ll start by learning the limits of A/B testing, and then graduate to advanced experimentation strategies that take advantage of machine learning and probabilistic methods. The skills you’ll master in this practical guide will help you minimize the costs of experimentation and quickly reveal which approaches and features deliver the best business results. What's Inside Design, run, and analyze an A/B test Break the “feedback loops” caused by periodic retraining of ML models Increase experimentation rate with multi-armed bandits Tune multiple parameters experimentally with Bayesian optimization About the Reader For ML and software engineers looking to extract the most value from their systems. Examples in Python and NumPy. About the Author David Sweet has worked as a quantitative trader at GETCO and a machine learning engineer at Instagram. He teaches in the AI and Data Science master's programs at Yeshiva University. Quotes Putting an ‘improved’ version of a system into production can be really risky. This book focuses you on what is important! - Simone Sguazza, University of Applied Sciences and Arts of Southern Switzerland A must-have for anyone setting up experiments, from A/B tests to contextual bandits and Bayesian optimization. - Maxim Volgin, KLM Shows a non-mathematical programmer exactly what they need to write powerful mathematically-based testing algorithms. - Patrick Goetz, The University of Texas at Austin Gives you the tools you need to get the most out of your experiments. - Marc-Anthony Taylor, Raiffeisen Bank International

Microsoft Power BI Data Analyst Certification Companion: Preparation for Exam PL-300

Use this book to study for the PL-300 Microsoft Power BI Data Analyst exam. The book follows the “Skills Measured” outline provided by Microsoft to help focus your study. Each topic area from the outline corresponds to an area covered by the exam, and the book helps you build a good base of knowledge in each area. Each topic is presented with a blend of practical explanations, theory, and best practices. Power BI is more than just the Power BI Desktop or the Power BI Service. It is two distinct applications and an online service that, together, enable business users to gather, shape, and analyze data to generate and present insights. This book clearly delineates the purpose of each component and explains the key concepts necessary to use each component effectively. Each chapter provides best practices and tips to help an inexperienced Power BI practitioner develop good habits that will support larger or more complex analyses. Manybusiness analysts come to Power BI with a wealth of experience in Excel and particularly with pivot tables. Some of this experience translates readily into Power BI concepts. This book leverages that overlap in skill sets to help seasoned Excel users overcome the initial learning curve in Power BI, but no prior knowledge of any kind is assumed, terminology is defined in non-technical language, and key concepts are explained using analogies and ideas from experiences common to any reader. After reading this book, you will have the background and capability to learn the skills and concepts necessary both to pass the PL-300 exam and become a confident Power BI practitioner. What You Will Learn Create user-friendly, responsive reports with drill-throughs, bookmarks, and tool tips Construct a star schema with relationships, ensuring that your analysis will be both accurate and responsive Publish reports and datasets to the Power BI Service, enabling the report (and the dataset) to be viewed and used by your colleagues Extract data from a variety of sources, enabling you to leverage the data that your organization has collected and stored in a variety of sources Schedule data refreshes for published datasets so your reports and dashboards stay up to date Develop dashboards with visuals from different reports and streaming content Who This Book Is For Power BI users who are planning to take the PL-300 exam, Power BI users who want help studying the topic areas listed in Microsoft’s outline for the PL-300 exam, and those who are not planning to take the exam but want to close any knowledge gaps they might have

API Analytics for Product Managers

In API Analytics for Product Managers, you will learn how to approach APIs as products to drive revenue and business growth. The book provides actionable insights on researching, strategizing, marketing, and evaluating the performance of APIs in SaaS contexts. What this Book will help me do Learn to develop long-term strategies for managing APIs as a product. Master the concepts of the API lifecycle and API maturity for better management. Understand and apply key metrics to measure activation, retention, and engagement of APIs. Design support models for APIs that ensure scalability and efficiency. Gain techniques for deriving actionable business insights from metrics analysis. Author(s) Deepa Goyal is an experienced product manager who specializes in API lifecycle management and analytics strategies. With years of industry experience, she has developed deep expertise in scaling and optimizing APIs to deliver business value. Her practical and results-oriented writing style makes complex topics accessible for professionals looking to enhance their API strategies. Who is it for? Ideal for product managers, engineers, and executives in SaaS companies looking to maximize the potential of APIs. This book is especially suited for individuals with foundational knowledge of APIs aiming to refine their analytical and strategic skills. Readers will gain actionable insights to track API performance effectively and implement metrics-driven decisions. It's a must-read for those focused on leveraging APIs for business growth.

Democratizing Application Development with Betty Blocks

"Democratizing Application Development with Betty Blocks" is a hands-on guide for learning the Betty Blocks no-code platform to develop impactful, dynamic business applications. This book introduces both basic and advanced concepts, empowering readers to create valuable IT solutions, from prototypes to complete applications. What this Book will help me do Understand the capabilities and low-code functionalities of Betty Blocks through engaging examples. Learn to create business applications using data models, workflows, and dynamic web components. Master rapid application development techniques to build prototypes and applications quickly. Discover how to use Betty Blocks' drag-and-drop interfaces for effective front-end design. Gain insight into collaborating as a citizen developer to deliver functional custom applications. Author(s) Reinier van Altena is an experienced professional in no-code application development, specializing in empowering users of varying technical skills to create business solutions. With practical insights derived from extensive use of platforms like Betty Blocks, Reinier shares approachable and actionable advice. His expertise bridges the gap between technology and innovation. Who is it for? This book is tailored for individuals interested in building business applications without prior coding knowledge. Ideal readers include citizen developers, business professionals, and anyone seeking to fulfill specific organizational IT needs through creativity and innovation. The book emphasizes learning fundamentals and advanced application-building strategies in an accessible manner.

Data Mining and Predictive Analytics for Business Decisions

With many recent advances in data science, we have many more tools and techniques available for data analysts to extract information from data sets. This book will assist data analysts to move up from simple tools such as Excel for descriptive analytics to answer more sophisticated questions using machine learning. Most of the exercises use R and Python, but rather than focus on coding algorithms, the book employs interactive interfaces to these tools to perform the analysis. Using the CRISP-DM data mining standard, the early chapters cover conducting the preparatory steps in data mining: translating business information needs into framed analytical questions and data preparation. The Jamovi and the JASP interfaces are used with R and the Orange3 data mining interface with Python. Where appropriate, Voyant and other open-source programs are used for text analytics. The techniques covered in this book range from basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics. Includes companion files with case study files, solution spreadsheets, data sets and charts, etc. from the book. Features: Covers basic descriptive statistics, such as summarization and tabulation, to more sophisticated predictive techniques, such as linear and logistic regression, clustering, classification, and text analytics Uses R, Python, Jamovi and JASP interfaces, and the Orange3 data mining interface Includes companion files with the case study files from the book, solution spreadsheets, data sets, etc.

Advances in Business Statistics, Methods and Data Collection

ADVANCES IN BUSINESS STATISTICS, METHODS AND DATA COLLECTION Advances in Business Statistics, Methods and Data Collection delivers insights into the latest state of play in producing establishment statistics, obtained from businesses, farms and institutions. Presenting materials and reflecting discussions from the 6 th International Conference on Establishment Statistics (ICES-VI), this edited volume provides a broad overview of methodology underlying current establishment statistics from every aspect of the production life cycle while spotlighting innovative and impactful advancements in the development, conduct, and evaluation of modern establishment statistics programs. Highlights include: Practical discussions on agile, timely, and accurate measurement of rapidly evolving economic phenomena such as globalization, new computer technologies, and the informal sector. Comprehensive explorations of administrative and new data sources and technologies, covering big (organic) data sources and methods for data integration, linking, machine learning and visualization. Detailed compilations of statistical programs’ responses to wide-ranging data collection and production challenges, among others caused by the Covid-19 pandemic. In-depth examinations of business survey questionnaire design, computerization, pretesting methods, experimentation, and paradata. Methodical presentations of conventional and emerging procedures in survey statistics techniques for establishment statistics, encompassing probability sampling designs and sample coordination, non-probability sampling, missing data treatments, small area estimation and Bayesian methods. Providing a broad overview of most up-to-date science, this book challenges the status quo and prepares researchers for current and future challenges in establishment statistics and methods. Perfect for survey researchers, government statisticians, National Bank employees, economists, and undergraduate and graduate students in survey research and economics, Advances in Business Statistics, Methods and Data Collection will also earn a place in the toolkit of researchers working –with data– in industries across a variety of fields.

Business Analysis For Dummies, 2nd Edition

Build a successful career in business analysis When it comes to doing good business, change is a very good thing. And effective business analysts are at the heart of identifying opportunities for growth and implementing the solutions that can transform an organization’s foundation—and ultimately increase its profitability. Whether you’re an aspiring business analysis professional or a seasoned analyst looking for the latest techniques and approaches, Business Analysis For Dummies helps you discover the newest tips and tricks for turning knowledge into the changes that have a real and meaningful impact on business and drive your organization towards value delivery. Identify areas for growth and create solutions Learn how to bring people together to collaborate effectively Discover ways to better understand and serve your customers See how business analysis works in the real world Learn the technology to make the job easier Find business solutions to improve your organization’s performance Understand how to dig deeply into your organization’s data, processes, and business rules Dummies makes the path to business success clear. Start here to turn your love of business analysis into the catalyst that makes a difference.

R All-in-One For Dummies

A deep dive into the programming language of choice for statistics and data With R All-in-One For Dummies, you get five mini-books in one, offering a complete and thorough resource on the R programming language and a road map for making sense of the sea of data we're all swimming in. Maybe you're pursuing a career in data science, maybe you're looking to infuse a little statistics know-how into your existing career, or maybe you're just R-curious. This book has your back. Along with providing an overview of coding in R and how to work with the language, this book delves into the types of projects and applications R programmers tend to tackle the most. You'll find coverage of statistical analysis, machine learning, and data management with R. Grasp the basics of the R programming language and write your first lines of code Understand how R programmers use code to analyze data and perform statistical analysis Use R to create data visualizations and machine learning programs Work through sample projects to hone your R coding skill This is an excellent all-in-one resource for beginning coders who'd like to move into the data space by knowing more about R.

Microsoft Power Platform Enterprise Architecture - Second Edition

Microsoft Power Platform Enterprise Architecture is your essential guide to designing powerful, enterprise-grade solutions using Microsoft technology. This thoroughly structured book equips you with architectural insights, methodologies, and best practices necessary to optimize solutions using the Microsoft Power Platform and integrate them seamlessly with M365 and Azure. What this Book will help me do Design robust enterprise solutions leveraging Microsoft Power Platform and Dynamics 365. Integrate Power Platform tools with Microsoft 365 and Azure effectively for comprehensive solutions. Implement advanced security, extensibility, and lifecycle management methodologies. Migrate and manage data efficiently within the Power Platform ecosystem. Overcome architectural challenges in multi-system integration using proven techniques. Author(s) Robert Rybaric, the author of this book, is an experienced enterprise architect specializing in Microsoft technologies. With years of expertise in designing enterprise systems, Robert brings practical insights into crafting effective solutions using Microsoft Power Platform. His approach emphasizes clarity and practicality, ensuring concepts are both illustrative and applicable for readers. Who is it for? This book is ideal for enterprise architects and technical decision-makers aiming to design and deploy complex solutions using the Microsoft Power Platform. It is tailored for professionals with familiarity with Microsoft Power Platform and Azure services who wish to refine their skills in enterprise architecture to meet growing business demands efficiently.

Tableau Desktop Specialist Certification

This book is your comprehensive guide to achieving the Tableau Desktop Specialist certification. By working through its structured content, you'll gain the skills necessary to understand and utilize Tableau for data analysis and visualization, and you'll be confidently prepared to pass the certification exam. What this Book will help me do Master how to load and prepare data in Tableau efficiently for analysis. Design and create visually impactful charts and dashboards tailored to your audience. Learn to utilize calculations, parameters, and advanced functions in Tableau. Develop an understanding of managing dimensions, measures, and their application. Gain confidence in sharing and presenting insights effectively with Tableau. Author(s) Adam Mico, a renowned Tableau ambassador and visionary, has extensive experience in the data visualization field. Having successfully obtained the Tableau Desktop Specialist certification himself, Adam combines practical expertise with an intuitive teaching approach, ensuring readers gain both the knowledge and confidence required to excel. Who is it for? If you're starting your journey in data visualization or looking to formalize your Tableau skills, this book is intended for you. Beginners without prior Tableau experience will find structured guidance, while those with some knowledge will appreciate the detailed preparation tips. Ideal for professionals aiming to pass the Tableau Desktop Specialist certification or improve their data analysis capabilities.

CompTIA Data+ DA0-001 Exam Cram

CompTIA® Data+ DA0-001 Exam Cram is an all-inclusive study guide designed to help you pass the CompTIA Data+ DA0-001 exam. Prepare for test day success with complete coverage of exam objectives and topics, plus hundreds of realistic practice questions. Extensive prep tools include quizzes, Exam Alerts, and our essential last-minute review CramSheet. The powerful Pearson Test Prep practice software provides real-time assessment and feedback with two complete exams. Covers the critical information needed to score higher on your Data+ DA0-001 exam! Understand data concepts, environments, mining, analysis, visualization, governance, quality, and controls Work with databases, data warehouses, database schemas, dimensions, data types, structures, and file formats Acquire data and understand how it can be monetized Clean and profile data so it;s more accurate, consistent, and useful Review essential techniques for manipulating and querying data Explore essential tools and techniques of modern data analytics Understand both descriptive and inferential statistical methods Get started with data visualization, reporting, and dashboards Leverage charts, graphs, and reports for data-driven decision-making Learn important data governance concepts ...

Introduction to System Science with MATLAB, 2nd Edition

Introduction to SYSTEM SCIENCE with MATLAB Explores the mathematical basis for developing and evaluating continuous and discrete systems In this revised Second Edition of Introduction to System Science with MATLAB®, the authors Gary Sandquist and Zakary Wilde provide a comprehensive exploration of essential concepts, mathematical framework, analytical resources, and productive skills required to address any rational system confidently and adequately for quantitative evaluation. This Second Edition is supplemented with new updates to the mathematical and technical materials from the first edition. A new chapter to assist readers to generalize and execute algorithms for systems development and analysis, as well as an expansion of the chapter covering specific system science applications, is included. The book provides the mathematical basis for developing and evaluating single and multiple input/output systems that are continuous or discrete. It offers the mathematical basis for the recognition, definition, quantitative modeling, analysis, and evaluation in system science. The book also provides: A comprehensive introduction to system science and the principles of causality and cause and effect operations, including their historical and scientific background A complete exploration of fundamental systems concepts and basic system equations, including definitions and classifications Practical applications and discussions of single-input systems, multiple-input systems, and system modeling and evaluation An in-depth examination of generalized system analysis methods and specific system science applications Perfect for upper-level undergraduate and graduate students in engineering, mathematics, and physical sciences, Introduction to System Science with MATLAB® will also earn a prominent place in libraries of researchers in the life and social sciences.

Building Table Views with Phoenix LiveView

Data is at the core of every business, but it is useless if nobody can access and analyze it. Learn how to generate business value by making your data accessible with advanced table UIs. This definitive guide teaches you how to bring your data to the fingertips of nontechnical users with advanced features like pagination, sorting, filtering, and infinity scrolling. Build reactive and reuseable table components by leveraging Phoenix LiveView, schemaless changesets, and Ecto query composition. Table UIs are the bread and butter for every web developer, so it is time to learn how to build them right. As a web developer, you have to build tables. Lots and lots of tables. With table UIs making up such a significant part of your daily work, you need to know how to build the right table for the task, with all the needed features. Building a simple table is easy, but tables only become really useful with advanced features like pagination, sorting, and filtering. That;s where building a table can quickly become complicated. This book shows you how to implement advanced table features in a clean and reusable way. You'll build fast and interactive table UIs by leveraging Phoenix LiveView. Make vast amounts of data manageable with common but complex features like pagination, sorting, filtering, and inifinity scrolling. Use SOLID coding principles to make your queries reusable with query composition. Compartmentalize your UI with LiveComponents and learn how to handle user input securely with schemaless changesets. Share your view onto the data painlessly by storing your search parameters in the URL. Data is one of the most valuable assets of your business, but you cannot unlock its potential if you don't know how to make it accessible. This book shows you how to deliver that data to your users' fingertips quickly. What You Need: You'll need Elixir 1.12 or later, Erlang/OTP 24 or later, Phoenix 1.6 or later, and PostgreSQL installed on your machine.

Building Solutions with the Microsoft Power Platform

With the accelerating speed of business and the increasing dependence on technology, companies today are significantly changing the way they build in-house business solutions. Many now use low-code and no code technologies to help them deal with specific issues, but that's just the beginning. With this practical guide, power users and developers will discover ways to resolve everyday challenges by building end-to-end solutions with the Microsoft Power Platform. Author Jason Rivera, who specializes in SharePoint and the Microsoft 365 solution architecture, provides a comprehensive overview of how to use the Power Platform to build end-to-end solutions that address tactical business needs. By learning key components of the platform, including Power Apps, Power Automate, and Power BI, you'll be able to build low-code and no code applications, automate repeatable business processes, and create interactive reports from available data. Learn how the Power Platform apps work together Incorporate AI into the Power Platform without extensive ML or AI knowledge Create end-to-end solutions to solve tactical business needs, including data collection, process automation, and reporting Build AI-based solutions using Power Virtual Agents and AI Builder

CompTIA Data+: DAO-001 Certification Guide

The "CompTIA Data+: DAO-001 Certification Guide" is your complete resource to approaching and passing the CompTIA Data+ certification exam. This book offers clear explanations, step-by-step exercises, and practical examples designed to help you master the domain concepts essential for the DAO-001 exam. Prepare confidently and expand your career opportunities in data analytics. What this Book will help me do Understand and apply the five domains covered in the DAO-001 certification exam. Learn data preparation techniques such as collection, cleaning, and wrangling. Master descriptive statistical methods and hypothesis testing to analyze data. Create insightful visualizations and professional reports for stakeholders. Grasp the fundamentals of data governance, including data quality standards. Author(s) Cameron Dodd is an experienced data analyst and educator passionate about breaking down complex concepts. With years of teaching and hands-on analytics expertise, he has developed a student-centric approach to helping professionals achieve certification and career advancement. His structured yet relatable writing style makes learning intuitive. Who is it for? The ideal readers of this book are data professionals aiming to achieve CompTIA Data+ certification (DAO-001 exam), individuals entering the growing field of data analytics, and professionals looking to validate or expand their skills. Whether you're starting from scratch or solidifying your knowledge, this book is designed for all levels.

Pandas for Everyone: Python Data Analysis, 2nd Edition

Manage and Automate Data Analysis with Pandas in Python Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple data sets. Pandas for Everyone, 2nd Edition, brings together practical knowledge and insight for solving real problems with Pandas, even if youre new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world data science problems such as using regularization to prevent data overfitting, or when to use unsupervised machine learning methods to find the underlying structure in a data set. New features to the second edition include: Extended coverage of plotting and the seaborn data visualization library Expanded examples and resources Updated Python 3.9 code and packages coverage, including statsmodels and scikit-learn libraries Online bonus material on geopandas, Dask, and creating interactive graphics with Altair Chen gives you a jumpstart on using Pandas with a realistic data set and covers combining data sets, handling missing data, and structuring data sets for easier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes. Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability and introduces you to the wider Python data analysis ecosystem. Work with DataFrames and Series, and import or export data Create plots with matplotlib, seaborn, and pandas Combine data sets and handle missing data Reshape, tidy, and clean data sets so theyre easier to work with Convert data types and manipulate text strings Apply functions to scale data manipulations Aggregate, transform, and filter large data sets with groupby Leverage Pandas advanced date and time capabilities Fit linear models using statsmodels and scikit-learn libraries Use generalized linear modeling to fit models with different response variables Compare multiple models to select the best one Regularize to overcome overfitting and improve performance Use clustering in unsupervised machine learning ...

Python Data Science Handbook, 2nd Edition

Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all—IPython, NumPy, pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you'll learn how: IPython and Jupyter provide computational environments for scientists using Python NumPy includes the ndarray for efficient storage and manipulation of dense data arrays Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data Matplotlib includes capabilities for a flexible range of data visualizations Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms