talk-data.com talk-data.com

Topic

Pandas

data_manipulation data_analysis python

72

tagged

Activity Trend

17 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: O'Reilly Data Science Books ×
Time Series Analysis with Python Cookbook - Second Edition

Perform time series analysis and forecasting confidently with this Python code bank and reference manual Purchase of the print or Kindle book includes a free PDF eBook Key Features Explore up-to-date forecasting and anomaly detection techniques using statistical, machine learning, and deep learning algorithms Learn different techniques for evaluating, diagnosing, and optimizing your models Work with a variety of complex data with trends, multiple seasonal patterns, and irregularities Book Description To use time series data to your advantage, you need to be well-versed in data preparation, analysis, and forecasting. This fully updated second edition includes chapters on probabilistic models and signal processing techniques, as well as new content on transformers. Additionally, you will leverage popular libraries and their latest releases covering Pandas, Polars, Sktime, stats models, stats forecast, Darts, and Prophet for time series with new and relevant examples. You'll start by ingesting time series data from various sources and formats, and learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods. Further, you'll explore forecasting using classical statistical models (Holt-Winters, SARIMA, and VAR). Learn practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Then we will move into more advanced topics such as building ML and DL models using TensorFlow and PyTorch, and explore probabilistic modeling techniques. In this part, you’ll also learn how to evaluate, compare, and optimize models, making sure that you finish this book well-versed in wrangling data with Python. What you will learn Understand what makes time series data different from other data Apply imputation and interpolation strategies to handle missing data Implement an array of models for univariate and multivariate time series Plot interactive time series visualizations using hvPlot Explore state-space models and the unobserved components model (UCM) Detect anomalies using statistical and machine learning methods Forecast complex time series with multiple seasonal patterns Use conformal prediction for constructing prediction intervals for time series Who this book is for This book is for data analysts, business analysts, data scientists, data engineers, and Python developers who want practical Python recipes for time series analysis and forecasting techniques. Fundamental knowledge of Python programming is a prerequisite. Prior experience working with time series data to solve business problems will also help you to better utilize and apply the different recipes in this book.

Getting Started with Taipy

Share your machine learning models, create chatbots, as well as build and deploy insightful dashboards speedily using Taipy with this hands-on book featuring real-world application examples from multiple industries Free with your book: DRM-free PDF version + access to Packt's next-gen Reader Key Features Create visually compelling, interactive data applications with Taipy Bring predictive models to end users and create data pipelines to compare scenarios with what-if analyses Go beyond prototypes to build and deploy production-ready applications using the cloud provider of your choice Purchase of the print or Kindle book includes a free PDF eBook in full color Book Description While data analysts, data scientists, and BI experts have the tools to analyze data, build models, and create compelling visuals, they often struggle to translate these insights into practical, user-friendly applications that help end users answer real-world questions, such as identifying revenue trends, predicting inventory needs, or detecting fraud, without wading through complex code. This book is a comprehensive guide to overcoming this challenge. This book teaches you how to use Taipy, a powerful open-source Python library, to build intuitive, production-ready data apps quickly and efficiently. Instead of creating prototypes that nobody uses, you'll learn how to build faster applications that process large amounts of data for multiple users and deliver measurable business impact. Taipy does the heavy lifting to enable your users to visualize their KPIs, interact with charts and maps, and compare scenarios for better decision-making. You’ll learn to use Taipy to build apps that make your data accessible and actionable in production environments like the cloud or Docker. By the end of this book, you won’t just understand Taipy, you'll be able to transform your data skills into impactful solutions that address real-world needs and deliver valuable insights. Email sign-up and proof of purchase required What you will learn Explore Taipy, its use cases, and how it's different from other projects Discover how to create visually appealing interactive apps, display KPIs, charts, and maps Understand how to compare scenarios to make better decisions Connect Taipy applications to several data sources and services Develop apps for diverse use cases, including chatbots, dashboards, ML apps, and maps Deploy Taipy applications on different types of servers and services Master advanced concepts for simplifying and accelerating your development workflow Who this book is for If you’re a data analyst, data scientist, or BI analyst looking to build production-ready data apps entirely in Python, this book is for you. If your scripts and models sit idle because non-technical stakeholders can’t use them, this book shows you how to turn them into full applications fast with Taipy, so your work delivers real business value. It’s also valuable for developers and engineers who want to streamline their data workflows and build UIs in pure Python.

Investing for Programmers

Maximize your portfolio, analyze markets, and make data-driven investment decisions using Python and generative AI. Investing for Programmers shows you how you can turn your existing skills as a programmer into a knack for making sharper investment choices. You’ll learn how to use the Python ecosystem, modern analytic methods, and cutting-edge AI tools to make better decisions and improve the odds of long-term financial success. In Investing for Programmers you’ll learn how to: Build stock analysis tools and predictive models Identify market-beating investment opportunities Design and evaluate algorithmic trading strategies Use AI to automate investment research Analyze market sentiments with media data mining In Investing for Programmers you'll learn the basics of financial investment as you conduct real market analysis, connect with trading APIs to automate buy-sell, and develop a systematic approach to risk management. Don’t worry—there’s no dodgy financial advice or flimsy get-rich-quick schemes. Real-life examples help you build your own intuition about financial markets, and make better decisions for retirement, financial independence, and getting more from your hard-earned money. About the Technology A programmer has a unique edge when it comes to investing. Using open-source Python libraries and AI tools, you can perform sophisticated analysis normally reserved for expensive financial professionals. This book guides you step-by-step through building your own stock analysis tools, forecasting models, and more so you can make smart, data-driven investment decisions. About the Book Investing for Programmers shows you how to analyze investment opportunities using Python and machine learning. In this easy-to-read handbook, experienced algorithmic investor Stefan Papp shows you how to use Pandas, NumPy, and Matplotlib to dissect stock market data, uncover patterns, and build your own trading models. You’ll also discover how to use AI agents and LLMs to enhance your financial research and decision-making process. What's Inside Build stock analysis tools and predictive models Design algorithmic trading strategies Use AI to automate investment research Analyze market sentiment with media data mining About the Reader For professional and hobbyist Python programmers with basic personal finance experience. About the Author Stefan Papp combines 20 years of investment experience in stocks, cryptocurrency, and bonds with decades of work as a data engineer, architect, and software consultant. Quotes Especially valuable for anyone looking to improve their investing. - Armen Kherlopian, Covenant Venture Capital A great breadth of topics—from basic finance concepts to cutting-edge technology. - Ilya Kipnis, Quantstrat Trader A top tip for people who want to leverage development skills to improve their investment possibilities. - Michael Zambiasi, Raiffeisen Digital Bank Brilliantly bridges the worlds of coding and finance. - Thomas Wiecki, PyMC Labs

Data Without Labels

Discover all-practical implementations of the key algorithms and models for handling unlabeled data. Full of case studies demonstrating how to apply each technique to real-world problems. In Data Without Labels you’ll learn: Fundamental building blocks and concepts of machine learning and unsupervised learning Data cleaning for structured and unstructured data like text and images Clustering algorithms like K-means, hierarchical clustering, DBSCAN, Gaussian Mixture Models, and Spectral clustering Dimensionality reduction methods like Principal Component Analysis (PCA), SVD, Multidimensional scaling, and t-SNE Association rule algorithms like aPriori, ECLAT, SPADE Unsupervised time series clustering, Gaussian Mixture models, and statistical methods Building neural networks such as GANs and autoencoders Dimensionality reduction methods like Principal Component Analysis and multidimensional scaling Association rule algorithms like aPriori, ECLAT, and SPADE Working with Python tools and libraries like sci-kit learn, numpy, Pandas, matplotlib, Seaborn, Keras, TensorFlow, and Flask How to interpret the results of unsupervised learning Choosing the right algorithm for your problem Deploying unsupervised learning to production Maintenance and refresh of an ML solution Data Without Labels introduces mathematical techniques, key algorithms, and Python implementations that will help you build machine learning models for unannotated data. You’ll discover hands-off and unsupervised machine learning approaches that can still untangle raw, real-world datasets and support sound strategic decisions for your business. Don’t get bogged down in theory—the book bridges the gap between complex math and practical Python implementations, covering end-to-end model development all the way through to production deployment. You’ll discover the business use cases for machine learning and unsupervised learning, and access insightful research papers to complete your knowledge. About the Technology Generative AI, predictive algorithms, fraud detection, and many other analysis tasks rely on cheap and plentiful unlabeled data. Machine learning on data without labels—or unsupervised learning—turns raw text, images, and numbers into insights about your customers, accurate computer vision, and high-quality datasets for training AI models. This book will show you how. About the Book Data Without Labels is a comprehensive guide to unsupervised learning, offering a deep dive into its mathematical foundations, algorithms, and practical applications. It presents practical examples from retail, aviation, and banking using fully annotated Python code. You’ll explore core techniques like clustering and dimensionality reduction along with advanced topics like autoencoders and GANs. As you go, you’ll learn where to apply unsupervised learning in business applications and discover how to develop your own machine learning models end-to-end. What's Inside Master unsupervised learning algorithms Real-world business applications Curate AI training datasets Explore autoencoders and GANs applications About the Reader Intended for data science professionals. Assumes knowledge of Python and basic machine learning. About the Author Vaibhav Verdhan is a seasoned data science professional with extensive experience working on data science projects in a large pharmaceutical company. Quotes An invaluable resource for anyone navigating the complexities of unsupervised learning. A must-have. - Ganna Pogrebna, The Alan Turing Institute Empowers the reader to unlock the hidden potential within their data. - Sonny Shergill, Astra Zeneca A must-have for teams working with unstructured data. Cuts through the fog of theory ili Explains the theory and delivers practical solutions. - Leonardo Gomes da Silva, onGRID Sports Technology The Bible for unsupervised learning! Full of real-world applications, clear explanations, and excellent Python implementations. - Gary Bake, Falconhurst Technologies

Think Stats, 3rd Edition

If you know how to program, you have the skills to turn data into knowledge. This thoroughly revised edition presents statistical concepts computationally, rather than mathematically, using programs written in Python. Through practical examples and exercises based on real-world datasets, you'll learn the entire process of exploratory data analysis—from wrangling data and generating statistics to identifying patterns and testing hypotheses. Whether you're a data scientist, software engineer, or data enthusiast, you'll get up to speed on commonly used tools including NumPy, SciPy, and Pandas. You'll explore distributions, relationships between variables, visualization, and many other concepts. And all chapters are available as Jupyter notebooks, so you can read the text, run the code, and work on exercises all in one place. Analyze data distributions and visualize patterns using Python libraries Improve predictions and insights with regression models Dive into specialized topics like time series analysis and survival analysis Integrate statistical techniques and tools for validation, inference, and more Communicate findings with effective data visualization Troubleshoot common data analysis challenges Boost reproducibility and collaboration in data analysis projects with interactive notebooks

DuckDB: Up and Running

DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool. Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions Conduct data analytics tasks using DuckDB Integrate DuckDB with pandas, Polars, and JupySQL Use DuckDB to query your data Perform spatial analytics using DuckDB's spatial extension Work with a diverse range of data including Parquet, CSV, and JSON

Pandas Cookbook - Third Edition

Discover the power of pandas for your data analysis tasks. Pandas Cookbook provides practical, hands-on recipes for mastering pandas 2.x, guiding you through real-world scenarios quickly and effectively. What this Book will help me do Efficiently manipulate and clean data using pandas. Perform advanced grouping and aggregation operations. Handle time series data with pandas robust functions. Optimize pandas code for better performance. Integrate pandas with tools like NumPy and databases. Author(s) William Ayd and Matthew Harrison co-authored this insightful cookbook. With years of practical experience in data science and Python development, both authors aim to make data analysis accessible and efficient using pandas. Who is it for? This book is perfect for Python developers and data analysts looking to enhance their data manipulation skills. Whether you're a beginner aiming to understand pandas or a professional seeking advanced insights, this book is tailored for anyone handling structured data.

Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib

Learn how to leverage the scientific computing and data analysis capabilities of Python, its standard library, and popular open-source numerical Python packages like NumPy, SymPy, SciPy, matplotlib, and more. This book demonstrates how to work with mathematical modeling and solve problems with numerical, symbolic, and visualization techniques. It explores applications in science, engineering, data analytics, and more. Numerical Python, Third Edition, presents many case study examples of applications in fundamental scientific computing disciplines, as well as in data science and statistics. This fully revised edition, updated for each library's latest version, demonstrates Python's power for rapid development and exploratory computing due to its simple and high-level syntax and many powerful libraries and tools for computation and data analysis. After reading this book, readers will be familiar with many computing techniques, including array-based and symbolic computing, visualization and numerical file I/O, equation solving, optimization, interpolation and integration, and domain-specific computational problems, such as differential equation solving, data analysis, statistical modeling, and machine learning. What You'll Learn Work with vectors and matrices using NumPy Review Symbolic computing with SymPy Plot and visualize data with Matplotlib Perform data analysis tasks with Pandas and SciPy Understand statistical modeling and machine learning with statsmodels and scikit-learn Optimize Python code using Numba and Cython Who This Book Is For Developers who want to understand how to use Python and its ecosystem of libraries for scientific computing and data analysis.

Statistics for Data Science and Analytics

Introductory statistics textbook with a focus on data science topics such as prediction, correlation, and data exploration Statistics for Data Science and Analytics is a comprehensive guide to statistical analysis using Python, presenting important topics useful for data science such as prediction, correlation, and data exploration. The authors provide an introduction to statistical science and big data, as well as an overview of Python data structures and operations. A range of statistical techniques are presented with their implementation in Python, including hypothesis testing, probability, exploratory data analysis, categorical variables, surveys and sampling, A/B testing, and correlation. The text introduces binary classification, a foundational element of machine learning, validation of statistical models by applying them to holdout data, and probability and inference via the easy-to-understand method of resampling and the bootstrap instead of using a myriad of “kitchen sink” formulas. Regression is taught both as a tool for explanation and for prediction. This book is informed by the authors’ experience designing and teaching both introductory statistics and machine learning at Statistics.com. Each chapter includes practical examples, explanations of the underlying concepts, and Python code snippets to help readers apply the techniques themselves. Statistics for Data Science and Analytics includes information on sample topics such as: Int, float, and string data types, numerical operations, manipulating strings, converting data types, and advanced data structures like lists, dictionaries, and sets Experiment design via randomizing, blinding, and before-after pairing, as well as proportions and percents when handling binary data Specialized Python packages like numpy, scipy, pandas, scikit-learn and statsmodels—the workhorses of data science—and how to get the most value from them Statistical versus practical significance, random number generators, functions for code reuse, and binomial and normal probability distributions Written by and for data science instructors, Statistics for Data Science and Analytics is an excellent learning resource for data science instructors prescribing a required intro stats course for their programs, as well as other students and professionals seeking to transition to the data science field.

Polars Cookbook

Dive into the world of data analysis with the Polars Cookbook. This book, ideal for data professionals, covers practical recipes to manipulate, transform, and analyze data using the Python Polars library. You'll learn both the fundamentals and advanced techniques to build efficient and scalable data workflows. What this Book will help me do Master the basics of Python Polars including installation and setup. Perform complex data manipulation like pivoting, grouping, and joining. Handle large-scale time series data for accurate analysis. Understand data integration with libraries like pandas and numpy. Optimize workflows for both on-premise and cloud environments. Author(s) Yuki Kakegawa is an experienced data analytics consultant who has collaborated with companies such as Microsoft and Stanford Health Care. His passion for data led him to create this detailed guide on Polars. His expertise ensures you gain real-world, actionable insights from every chapter. Who is it for? This book is perfect for data analysts, engineers, and scientists eager to enhance their efficiency with Python Polars. If you are familiar with Python and tools like pandas but are new to Polars, this book will upskill you. Whether handling big data or optimizing code for performance, the Polars Cookbook has the guidance you need to succeed.

DuckDB in Action

Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: Read and process data from CSV, JSON and Parquet sources both locally and remote Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the Technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the Book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's Inside Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Fast-paced SQL recap: From simple queries to advanced analytics About the Reader For data pros comfortable with Python and CLI tools. About the Authors Mark Needham is a blogger and video creator at @‌LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j. Quotes I use DuckDB every day, and I still learned a lot about how DuckDB makes things that are hard in most databases easy! - Jordan Tigani, Founder, MotherDuck An excellent resource! Unlocks possibilities for storing, processing, analyzing, and summarizing data at the edge using DuckDB. - Pramod Sadalage, Director, Thoughtworks Clear and accessible. A comprehensive resource for harnessing the power of DuckDB for both novices and experienced professionals. - Qiusheng Wu, Associate Professor, University of Tennessee Excellent! The book all we ducklings have been waiting for! - Gunnar Morling, Decodable

Getting Started with DuckDB

Unlock the full potential of DuckDB with 'Getting Started with DuckDB,' your guide to mastering data analysis efficiently. By reading this book, you'll discover how to load, transform, and query data using DuckDB, leveraging its unique capabilities for processing large datasets. Gain hands-on experience with SQL, Python, and R to enhance your data science and engineering workflows. What this Book will help me do Effectively load and manage various types of data in DuckDB for seamless processing. Gain hands-on experience writing and optimizing SQL queries tailored for analytical tasks. Integrate DuckDB capabilities into Python and R workflows for streamlined data analysis. Understand DuckDB's optimizations and extensions for specialized data applications. Explore the broader ecosystem of data tools that complement DuckDB's capabilities. Author(s) Simon Aubury and Ned Letcher are seasoned experts in the field of data analytics and engineering. With extensive experience in using both SQL and programming languages like Python and R, they bring practical insights into the innovative uses of DuckDB. They have designed this book to provide a hands-on and approachable way to learn DuckDB, making complex concepts accessible. Who is it for? This book is well-suited for data analysts aiming to accelerate their data analysis workflows, data engineers looking for effective tools for data processing, and data scientists searching for a versatile library for scalable data manipulation. Prior exposure to SQL and programming in Python or R will be beneficial for readers to maximize their learning.

Modern Graph Theory Algorithms with Python

Dive into the fascinating world of graph theory and its applications with 'Modern Graph Theory Algorithms with Python.' Through Python programming and real-world case studies, this book equips you with the tools to transform data into graph structures, apply algorithms, and uncover insights, enabling effective solutions in diverse domains such as finance, epidemiology, and social networks. What this Book will help me do Understand how to wrangle a variety of data types into network formats suitable for analysis. Learn to use graph theory algorithms and toolkits such as NetworkX and igraph in Python. Apply network theory to predict and analyze trends, from epidemics to stock market dynamics. Explore the intersection of machine learning and graph theory through advanced neural network techniques. Gain expertise in database solutions with graph database querying and applications. Author(s) Colleen M. Farrelly, an experienced data scientist, and Franck Kalala Mutombo, a seasoned software engineer, bring years of expertise in network science and Python programming to every page of this book. Their professional experience includes working on cutting-edge problems in data analytics, graph theory, and scalable solutions for real-world issues. Combining their practical know-how, they deliver a resource aimed at both learning and applying techniques effectively. Who is it for? This book is tailored for data scientists, researchers, and analysts with an interest in using graph-based approaches for solving complex data problems. Ideal for those with a basic Python knowledge and familiarity with libraries like pandas and NumPy, the content bridges the gap between theory and application. It also provides insights into broad fields where network science can be impactful, contributing value to both students and professionals.

Pandas Workout

Practice makes perfect pandas! Work out your pandas skills against dozens of real-world challenges, each carefully designed to build an intuitive knowledge of essential pandas tasks. In Pandas Workout you’ll learn how to: Clean your data for accurate analysis Work with rows and columns for retrieving and assigning data Handle indexes, including hierarchical indexes Read and write data with a number of common formats, such as CSV and JSON Process and manipulate textual data from within pandas Work with dates and times in pandas Perform aggregate calculations on selected subsets of data Produce attractive and useful visualizations that make your data come alive Pandas Workout hones your pandas skills to a professional-level through two hundred exercises, each designed to strengthen your pandas skills. You’ll test your abilities against common pandas challenges such as importing and exporting, data cleaning, visualization, and performance optimization. Each exercise utilizes a real-world scenario based on real-world data, from tracking the parking tickets in New York City, to working out which country makes the best wines. You’ll soon find your pandas skills becoming second nature—no more trips to StackOverflow for what is now a natural part of your skillset. About the Technology Python’s pandas library can massively reduce the time you spend analyzing, cleaning, exploring, and manipulating data. And the only path to pandas mastery is practice, practice, and, you guessed it, more practice. In this book, Python guru Reuven Lerner is your personal trainer and guide through over 200 exercises guaranteed to boost your pandas skills. About the Book Pandas Workout is a thoughtful collection of practice problems, challenges, and mini-projects designed to build your data analysis skills using Python and pandas. The workouts use realistic data from many sources: the New York taxi fleet, Olympic athletes, SAT scores, oil prices, and more. Each can be completed in ten minutes or less. You’ll explore pandas’ rich functionality for string and date/time handling, complex indexing, and visualization, along with practical tips for every stage of a data analysis project. What's Inside Clean data with less manual labor Retrieving and assigning data Process and manipulate text Calculations on selected data subsets About the Reader For Python programmers and data analysts. About the Author Reuven M. Lerner teaches Python and data science around the world and publishes the “Bamboo Weekly” newsletter. He is the author of Manning’s Python Workout (2020). Quotes A carefully crafted tour through the pandas library, jam-packed with wisdom that will help you become a better pandas user and a better data scientist. - Kevin Markham, Founder of Data School, Creator of pandas in 30 days Will help you apply pandas to real problems and push you to the next level. - Michael Driscoll, RFA Engineering, creator of Teach Me Python The explanations, paired with Reuven’s storytelling and personal tone, make the concepts simple. I’ll never get them wrong again! - Rodrigo Girão Serrão, Python developer and educator The definitive source! - Kiran Anantha, Amazon

Distributed Machine Learning with PySpark: Migrating Effortlessly from Pandas and Scikit-Learn

Migrate from pandas and scikit-learn to PySpark to handle vast amounts of data and achieve faster data processing time. This book will show you how to make this transition by adapting your skills and leveraging the similarities in syntax, functionality, and interoperability between these tools. Distributed Machine Learning with PySpark offers a roadmap to data scientists considering transitioning from small data libraries (pandas/scikit-learn) to big data processing and machine learning with PySpark. You will learn to translate Python code from pandas/scikit-learn to PySpark to preprocess large volumes of data and build, train, test, and evaluate popular machine learning algorithms such as linear and logistic regression, decision trees, random forests, support vector machines, Naïve Bayes, and neural networks. After completing this book, you will understand the foundational concepts of data preparation and machine learning and will have the skills necessary toapply these methods using PySpark, the industry standard for building scalable ML data pipelines. What You Will Learn Master the fundamentals of supervised learning, unsupervised learning, NLP, and recommender systems Understand the differences between PySpark, scikit-learn, and pandas Perform linear regression, logistic regression, and decision tree regression with pandas, scikit-learn, and PySpark Distinguish between the pipelines of PySpark and scikit-learn Who This Book Is For Data scientists, data engineers, and machine learning practitioners who have some familiarity with Python, but who are new to distributed machine learning and the PySpark framework.

Hands-On Web Scraping with Python - Second Edition

In "Hands-On Web Scraping with Python," you'll learn how to harness the power of Python libraries to extract, process, and analyze data from the web. This book provides a practical, step-by-step guide for beginners and data enthusiasts alike. What this Book will help me do Master the use of Python libraries like requests, lxml, Scrapy, and Beautiful Soup for web scraping. Develop advanced techniques for secure browsing and data extraction using APIs and Selenium. Understand the principles behind regex and PDF data parsing for comprehensive scraping. Analyze and visualize data using data science tools such as Pandas and Plotly. Build a portfolio of real-world scraping projects to demonstrate your capabilities. Author(s) Anish Chapagain, the author of "Hands-On Web Scraping with Python," is an experienced programmer and instructor who specializes in Python and data-related technologies. With his vast experience in teaching individuals from diverse backgrounds, Anish approaches complex concepts with clarity and a hands-on methodology. Who is it for? This book is perfect for aspiring data scientists, Python beginners, and anyone who wants to delve into web scraping. Readers should have a basic understanding of how websites work but no prior coding experience is required. If you aim to develop scraping skills and understand data analysis, this book is the ideal starting point.

Streamlit for Data Science - Second Edition

Streamlit for Data Science is your complete guide to mastering the creation of powerful, interactive data-driven applications using Python and Streamlit. With this comprehensive resource, you'll learn everything from foundational Streamlit skills to advanced techniques like integrating machine learning models and deploying apps to cloud platforms, enabling you to significantly enhance your data science toolkit. What this Book will help me do Master building interactive applications using Streamlit, including techniques for user interfaces and integrations. Develop visually appealing and functional data visualizations using Python libraries in Streamlit. Learn to integrate Streamlit applications with machine learning frameworks and tools like Hugging Face and OpenAI. Understand and apply best practices to deploy Streamlit apps to cloud platforms such as Streamlit Community Cloud and Heroku. Improve practical Python skills through implementing end-to-end data applications and prototyping data workflows. Author(s) Tyler Richards, the author of Streamlit for Data Science, is a senior data scientist with in-depth practical experience in building data-driven applications. With a passion for Python and data visualization, Tyler leverages his knowledge to help data professionals craft effective and compelling tools. His teaching approach combines clarity, hands-on exercises, and practical relevance. Who is it for? This book is written for data scientists, engineers, and enthusiasts who use Python and want to create dynamic data-driven applications. With a focus on those who have some familiarity with Python and libraries like Pandas or NumPy, it assists readers in building on their knowledge by offering tailored guidance. Perfect for those looking to prototype data projects or enhance their programming toolkit.

Learning Data Science

As an aspiring data scientist, you appreciate why organizations rely on data for important decisions—whether it's for companies designing websites, cities deciding how to improve services, or scientists discovering how to stop the spread of disease. And you want the skills required to distill a messy pile of data into actionable insights. We call this the data science lifecycle: the process of collecting, wrangling, analyzing, and drawing conclusions from data. Learning Data Science is the first book to cover foundational skills in both programming and statistics that encompass this entire lifecycle. It's aimed at those who wish to become data scientists or who already work with data scientists, and at data analysts who wish to cross the "technical/nontechnical" divide. If you have a basic knowledge of Python programming, you'll learn how to work with data using industry-standard tools like pandas. Refine a question of interest to one that can be studied with data Pursue data collection that may involve text processing, web scraping, etc. Glean valuable insights about data through data cleaning, exploration, and visualization Learn how to use modeling to describe the data Generalize findings beyond the data

Python Data Analytics: With Pandas, NumPy, and Matplotlib

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This third edition is fully updated for the latest version of Python and its related libraries, and includes coverage of social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Third Edition is an invaluable reference with its examples of storing, accessing, and analyzing data. What You'll Learn Understand the core concepts of data analysis and the Python ecosystem Go in depth with pandas for reading, writing, and processing data Use tools and techniques for data visualization and image analysis Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch Who This Book Is For Experienced Python developers who need to learn about Pythonic tools for data analysis

Scaling Python with Dask

Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library for parallel computing provides APIs that make it easy to parallelize PyData libraries including NumPy, pandas, and scikit-learn. Authors Holden Karau and Mika Kimmins show you how to use Dask computations in local systems and then scale to the cloud for heavier workloads. This practical book explains why Dask is popular among industry experts and academics and is used by organizations that include Walmart, Capital One, Harvard Medical School, and NASA. With this book, you'll learn: What Dask is, where you can use it, and how it compares with other tools How to use Dask for batch data parallel processing Key distributed system concepts for working with Dask Methods for using Dask with higher-level APIs and building blocks How to work with integrated libraries such as scikit-learn, pandas, and PyTorch How to use Dask with GPUs