NumPy

Investing for Programmers

2025-09-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Stefan Papp

AI/ML API GenAI LLM Matplotlib Pandas Python data data-science data-science-tools

Maximize your portfolio, analyze markets, and make data-driven investment decisions using Python and generative AI. Investing for Programmers shows you how you can turn your existing skills as a programmer into a knack for making sharper investment choices. You’ll learn how to use the Python ecosystem, modern analytic methods, and cutting-edge AI tools to make better decisions and improve the odds of long-term financial success. In Investing for Programmers you’ll learn how to: Build stock analysis tools and predictive models Identify market-beating investment opportunities Design and evaluate algorithmic trading strategies Use AI to automate investment research Analyze market sentiments with media data mining In Investing for Programmers you'll learn the basics of financial investment as you conduct real market analysis, connect with trading APIs to automate buy-sell, and develop a systematic approach to risk management. Don’t worry—there’s no dodgy financial advice or flimsy get-rich-quick schemes. Real-life examples help you build your own intuition about financial markets, and make better decisions for retirement, financial independence, and getting more from your hard-earned money. About the Technology A programmer has a unique edge when it comes to investing. Using open-source Python libraries and AI tools, you can perform sophisticated analysis normally reserved for expensive financial professionals. This book guides you step-by-step through building your own stock analysis tools, forecasting models, and more so you can make smart, data-driven investment decisions. About the Book Investing for Programmers shows you how to analyze investment opportunities using Python and machine learning. In this easy-to-read handbook, experienced algorithmic investor Stefan Papp shows you how to use Pandas, NumPy, and Matplotlib to dissect stock market data, uncover patterns, and build your own trading models. You’ll also discover how to use AI agents and LLMs to enhance your financial research and decision-making process. What's Inside Build stock analysis tools and predictive models Design algorithmic trading strategies Use AI to automate investment research Analyze market sentiment with media data mining About the Reader For professional and hobbyist Python programmers with basic personal finance experience. About the Author Stefan Papp combines 20 years of investment experience in stocks, cryptocurrency, and bonds with decades of work as a data engineer, architect, and software consultant. Quotes Especially valuable for anyone looking to improve their investing. - Armen Kherlopian, Covenant Venture Capital A great breadth of topics—from basic finance concepts to cutting-edge technology. - Ilya Kipnis, Quantstrat Trader A top tip for people who want to leverage development skills to improve their investment possibilities. - Michael Zambiasi, Raiffeisen Digital Bank Brilliantly bridges the worlds of coding and finance. - Thomas Wiecki, PyMC Labs

Data Without Labels

2025-05-26 · O'Reilly Data Science Books O'Reilly Amazon

book

by Vaibhav Verdhan

AI/ML Data Science GenAI Keras Matplotlib Pandas Python Seaborn TensorFlow data data-science data-science-tools

Discover all-practical implementations of the key algorithms and models for handling unlabeled data. Full of case studies demonstrating how to apply each technique to real-world problems. In Data Without Labels you’ll learn: Fundamental building blocks and concepts of machine learning and unsupervised learning Data cleaning for structured and unstructured data like text and images Clustering algorithms like K-means, hierarchical clustering, DBSCAN, Gaussian Mixture Models, and Spectral clustering Dimensionality reduction methods like Principal Component Analysis (PCA), SVD, Multidimensional scaling, and t-SNE Association rule algorithms like aPriori, ECLAT, SPADE Unsupervised time series clustering, Gaussian Mixture models, and statistical methods Building neural networks such as GANs and autoencoders Dimensionality reduction methods like Principal Component Analysis and multidimensional scaling Association rule algorithms like aPriori, ECLAT, and SPADE Working with Python tools and libraries like sci-kit learn, numpy, Pandas, matplotlib, Seaborn, Keras, TensorFlow, and Flask How to interpret the results of unsupervised learning Choosing the right algorithm for your problem Deploying unsupervised learning to production Maintenance and refresh of an ML solution Data Without Labels introduces mathematical techniques, key algorithms, and Python implementations that will help you build machine learning models for unannotated data. You’ll discover hands-off and unsupervised machine learning approaches that can still untangle raw, real-world datasets and support sound strategic decisions for your business. Don’t get bogged down in theory—the book bridges the gap between complex math and practical Python implementations, covering end-to-end model development all the way through to production deployment. You’ll discover the business use cases for machine learning and unsupervised learning, and access insightful research papers to complete your knowledge. About the Technology Generative AI, predictive algorithms, fraud detection, and many other analysis tasks rely on cheap and plentiful unlabeled data. Machine learning on data without labels—or unsupervised learning—turns raw text, images, and numbers into insights about your customers, accurate computer vision, and high-quality datasets for training AI models. This book will show you how. About the Book Data Without Labels is a comprehensive guide to unsupervised learning, offering a deep dive into its mathematical foundations, algorithms, and practical applications. It presents practical examples from retail, aviation, and banking using fully annotated Python code. You’ll explore core techniques like clustering and dimensionality reduction along with advanced topics like autoencoders and GANs. As you go, you’ll learn where to apply unsupervised learning in business applications and discover how to develop your own machine learning models end-to-end. What's Inside Master unsupervised learning algorithms Real-world business applications Curate AI training datasets Explore autoencoders and GANs applications About the Reader Intended for data science professionals. Assumes knowledge of Python and basic machine learning. About the Author Vaibhav Verdhan is a seasoned data science professional with extensive experience working on data science projects in a large pharmaceutical company. Quotes An invaluable resource for anyone navigating the complexities of unsupervised learning. A must-have. - Ganna Pogrebna, The Alan Turing Institute Empowers the reader to unlock the hidden potential within their data. - Sonny Shergill, Astra Zeneca A must-have for teams working with unstructured data. Cuts through the fog of theory ili Explains the theory and delivers practical solutions. - Leonardo Gomes da Silva, onGRID Sports Technology The Bible for unsupervised learning! Full of real-world applications, clear explanations, and excellent Python implementations. - Gary Bake, Falconhurst Technologies

Think Stats, 3rd Edition

2025-04-11 · O'Reilly Data Science Books O'Reilly Amazon

book

by Allen B. Downey

DataViz Pandas Python SciPy data data-science data-science-tools

If you know how to program, you have the skills to turn data into knowledge. This thoroughly revised edition presents statistical concepts computationally, rather than mathematically, using programs written in Python. Through practical examples and exercises based on real-world datasets, you'll learn the entire process of exploratory data analysis—from wrangling data and generating statistics to identifying patterns and testing hypotheses. Whether you're a data scientist, software engineer, or data enthusiast, you'll get up to speed on commonly used tools including NumPy, SciPy, and Pandas. You'll explore distributions, relationships between variables, visualization, and many other concepts. And all chapters are available as Jupyter notebooks, so you can read the text, run the code, and work on exercises all in one place. Analyze data distributions and visualize patterns using Python libraries Improve predictions and insights with regression models Dive into specialized topics like time series analysis and survival analysis Integrate statistical techniques and tools for validation, inference, and more Communicate findings with effective data visualization Troubleshoot common data analysis challenges Boost reproducibility and collaboration in data analysis projects with interactive notebooks

Pandas Cookbook - Third Edition

2024-10-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by William Ayd , Matthew Harrison

Data Science Pandas Python data data-science data-science-tools

Discover the power of pandas for your data analysis tasks. Pandas Cookbook provides practical, hands-on recipes for mastering pandas 2.x, guiding you through real-world scenarios quickly and effectively. What this Book will help me do Efficiently manipulate and clean data using pandas. Perform advanced grouping and aggregation operations. Handle time series data with pandas robust functions. Optimize pandas code for better performance. Integrate pandas with tools like NumPy and databases. Author(s) William Ayd and Matthew Harrison co-authored this insightful cookbook. With years of practical experience in data science and Python development, both authors aim to make data analysis accessible and efficient using pandas. Who is it for? This book is perfect for Python developers and data analysts looking to enhance their data manipulation skills. Whether you're a beginner aiming to understand pandas or a professional seeking advanced insights, this book is tailored for anyone handling structured data.

Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib

2024-09-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Robert Johansson

AI/ML Analytics Data Analytics Data Science Matplotlib Pandas Python Scikit-learn SciPy data data-science data-science-tools

Learn how to leverage the scientific computing and data analysis capabilities of Python, its standard library, and popular open-source numerical Python packages like NumPy, SymPy, SciPy, matplotlib, and more. This book demonstrates how to work with mathematical modeling and solve problems with numerical, symbolic, and visualization techniques. It explores applications in science, engineering, data analytics, and more. Numerical Python, Third Edition, presents many case study examples of applications in fundamental scientific computing disciplines, as well as in data science and statistics. This fully revised edition, updated for each library's latest version, demonstrates Python's power for rapid development and exploratory computing due to its simple and high-level syntax and many powerful libraries and tools for computation and data analysis. After reading this book, readers will be familiar with many computing techniques, including array-based and symbolic computing, visualization and numerical file I/O, equation solving, optimization, interpolation and integration, and domain-specific computational problems, such as differential equation solving, data analysis, statistical modeling, and machine learning. What You'll Learn Work with vectors and matrices using NumPy Review Symbolic computing with SymPy Plot and visualize data with Matplotlib Perform data analysis tasks with Pandas and SciPy Understand statistical modeling and machine learning with statsmodels and scikit-learn Optimize Python code using Numba and Cython Who This Book Is For Developers who want to understand how to use Python and its ecosystem of libraries for scientific computing and data analysis.

Statistics for Data Science and Analytics

2024-09-04 · O'Reilly Data Science Books O'Reilly Amazon

book

by Janet Dobbins , Peter C. Bruce , Peter Gedeck

AI/ML Analytics Big Data Data Science Pandas Python Scikit-learn SciPy data data-science data-science-tasks statistics

Introductory statistics textbook with a focus on data science topics such as prediction, correlation, and data exploration Statistics for Data Science and Analytics is a comprehensive guide to statistical analysis using Python, presenting important topics useful for data science such as prediction, correlation, and data exploration. The authors provide an introduction to statistical science and big data, as well as an overview of Python data structures and operations. A range of statistical techniques are presented with their implementation in Python, including hypothesis testing, probability, exploratory data analysis, categorical variables, surveys and sampling, A/B testing, and correlation. The text introduces binary classification, a foundational element of machine learning, validation of statistical models by applying them to holdout data, and probability and inference via the easy-to-understand method of resampling and the bootstrap instead of using a myriad of “kitchen sink” formulas. Regression is taught both as a tool for explanation and for prediction. This book is informed by the authors’ experience designing and teaching both introductory statistics and machine learning at Statistics.com. Each chapter includes practical examples, explanations of the underlying concepts, and Python code snippets to help readers apply the techniques themselves. Statistics for Data Science and Analytics includes information on sample topics such as: Int, float, and string data types, numerical operations, manipulating strings, converting data types, and advanced data structures like lists, dictionaries, and sets Experiment design via randomizing, blinding, and before-after pairing, as well as proportions and percents when handling binary data Specialized Python packages like numpy, scipy, pandas, scikit-learn and statsmodels—the workhorses of data science—and how to get the most value from them Statistical versus practical significance, random number generators, functions for code reuse, and binomial and normal probability distributions Written by and for data science instructors, Statistics for Data Science and Analytics is an excellent learning resource for data science instructors prescribing a required intro stats course for their programs, as well as other students and professionals seeking to transition to the data science field.

Polars Cookbook

2024-08-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Yuki Kakegawa

Analytics Big Data Cloud Computing Data Analytics Microsoft Pandas Polars Python data data-science data-science-tools

Dive into the world of data analysis with the Polars Cookbook. This book, ideal for data professionals, covers practical recipes to manipulate, transform, and analyze data using the Python Polars library. You'll learn both the fundamentals and advanced techniques to build efficient and scalable data workflows. What this Book will help me do Master the basics of Python Polars including installation and setup. Perform complex data manipulation like pivoting, grouping, and joining. Handle large-scale time series data for accurate analysis. Understand data integration with libraries like pandas and numpy. Optimize workflows for both on-premise and cloud environments. Author(s) Yuki Kakegawa is an experienced data analytics consultant who has collaborated with companies such as Microsoft and Stanford Health Care. His passion for data led him to create this detailed guide on Polars. His expertise ensures you gain real-world, actionable insights from every chapter. Who is it for? This book is perfect for data analysts, engineers, and scientists eager to enhance their efficiency with Python Polars. If you are familiar with Python and tools like pandas but are new to Polars, this book will upskill you. Whether handling big data or optimizing code for performance, the Polars Cookbook has the guidance you need to succeed.

Modern Graph Theory Algorithms with Python

2024-06-07 · O'Reilly Data Science Books O'Reilly Amazon

book

by Colleen M. Farrelly , Franck Kalala Mutombo

AI/ML Analytics Data Analytics Pandas Python data data-science data-science-tasks graph-analytics

Dive into the fascinating world of graph theory and its applications with 'Modern Graph Theory Algorithms with Python.' Through Python programming and real-world case studies, this book equips you with the tools to transform data into graph structures, apply algorithms, and uncover insights, enabling effective solutions in diverse domains such as finance, epidemiology, and social networks. What this Book will help me do Understand how to wrangle a variety of data types into network formats suitable for analysis. Learn to use graph theory algorithms and toolkits such as NetworkX and igraph in Python. Apply network theory to predict and analyze trends, from epidemics to stock market dynamics. Explore the intersection of machine learning and graph theory through advanced neural network techniques. Gain expertise in database solutions with graph database querying and applications. Author(s) Colleen M. Farrelly, an experienced data scientist, and Franck Kalala Mutombo, a seasoned software engineer, bring years of expertise in network science and Python programming to every page of this book. Their professional experience includes working on cutting-edge problems in data analytics, graph theory, and scalable solutions for real-world issues. Combining their practical know-how, they deliver a resource aimed at both learning and applying techniques effectively. Who is it for? This book is tailored for data scientists, researchers, and analysts with an interest in using graph-based approaches for solving complex data problems. Ideal for those with a basic Python knowledge and familiarity with libraries like pandas and NumPy, the content bridges the gap between theory and application. It also provides insights into broad fields where network science can be impactful, contributing value to both students and professionals.

Streamlit for Data Science - Second Edition

2023-09-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tyler Richards

AI/ML Cloud Computing Data Science DataViz LLM Pandas Python data data-science

Streamlit for Data Science is your complete guide to mastering the creation of powerful, interactive data-driven applications using Python and Streamlit. With this comprehensive resource, you'll learn everything from foundational Streamlit skills to advanced techniques like integrating machine learning models and deploying apps to cloud platforms, enabling you to significantly enhance your data science toolkit. What this Book will help me do Master building interactive applications using Streamlit, including techniques for user interfaces and integrations. Develop visually appealing and functional data visualizations using Python libraries in Streamlit. Learn to integrate Streamlit applications with machine learning frameworks and tools like Hugging Face and OpenAI. Understand and apply best practices to deploy Streamlit apps to cloud platforms such as Streamlit Community Cloud and Heroku. Improve practical Python skills through implementing end-to-end data applications and prototyping data workflows. Author(s) Tyler Richards, the author of Streamlit for Data Science, is a senior data scientist with in-depth practical experience in building data-driven applications. With a passion for Python and data visualization, Tyler leverages his knowledge to help data professionals craft effective and compelling tools. His teaching approach combines clarity, hands-on exercises, and practical relevance. Who is it for? This book is written for data scientists, engineers, and enthusiasts who use Python and want to create dynamic data-driven applications. With a focus on those who have some familiarity with Python and libraries like Pandas or NumPy, it assists readers in building on their knowledge by offering tailored guidance. Perfect for those looking to prototype data projects or enhance their programming toolkit.

Python Data Analytics: With Pandas, NumPy, and Matplotlib

2023-09-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Fabio Nelli

AI/ML Analytics Data Analytics DataViz JavaScript Keras Matplotlib Pandas Python PyTorch Scikit-learn TensorFlow +3 more

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This third edition is fully updated for the latest version of Python and its related libraries, and includes coverage of social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Third Edition is an invaluable reference with its examples of storing, accessing, and analyzing data. What You'll Learn Understand the core concepts of data analysis and the Python ecosystem Go in depth with pandas for reading, writing, and processing data Use tools and techniques for data visualization and image analysis Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch Who This Book Is For Experienced Python developers who need to learn about Pythonic tools for data analysis

Scaling Python with Dask

2023-07-26 · O'Reilly Data Science Books O'Reilly Amazon

book

by Holden Karau (Fight Health Insurance) , Mika Kimmins

API Cloud Computing Pandas Python PyTorch Scikit-learn dask data data-science data-science-tools

Modern systems contain multi-core CPUs and GPUs that have the potential for parallel computing. But many scientific Python tools were not designed to leverage this parallelism. With this short but thorough resource, data scientists and Python programmers will learn how the Dask open source library for parallel computing provides APIs that make it easy to parallelize PyData libraries including NumPy, pandas, and scikit-learn. Authors Holden Karau and Mika Kimmins show you how to use Dask computations in local systems and then scale to the cloud for heavier workloads. This practical book explains why Dask is popular among industry experts and academics and is used by organizations that include Walmart, Capital One, Harvard Medical School, and NASA. With this book, you'll learn: What Dask is, where you can use it, and how it compares with other tools How to use Dask for batch data parallel processing Key distributed system concepts for working with Dask Methods for using Dask with higher-level APIs and building blocks How to work with integrated libraries such as scikit-learn, pandas, and PyTorch How to use Dask with GPUs

Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach

2023-04-03 · O'Reilly Data Science Books O'Reilly Amazon

book

by Umesh R. Hodeghatta , Umesha Nayak

AI/ML Analytics Big Data Data Analytics NLP Pandas Python SQL data data-science data-science-tools r

This book illustrates how data can be useful in solving business problems. It explores various analytics techniques for using data to discover hidden patterns and relationships, predict future outcomes, optimize efficiency and improve the performance of organizations. You’ll learn how to analyze data by applying concepts of statistics, probability theory, and linear algebra. In this new edition, both R and Python are used to demonstrate these analyses. Practical Business Analytics Using R and Python also features new chapters covering databases, SQL, Neural networks, Text Analytics, and Natural Language Processing.Part one begins with an introduction to analytics, the foundations required to perform data analytics, and explains different analytics terms and concepts such as databases and SQL, basic statistics, probability theory, and data exploration. Part two introduces predictive models using statistical machine learning and discusses concepts like regression, classification, and neural networks. Part three covers two of the most popular unsupervised learning techniques, clustering and association mining, as well as text mining and natural language processing (NLP). The book concludes with an overview of big data analytics, R and Python essentials for analytics including libraries such as pandas and NumPy. Upon completing this book, you will understand how to improve business outcomes by leveraging R and Python for data analytics. What You Will Learn Master the mathematical foundations required for business analytics Understand various analytics models and data mining techniques such as regression, supervised machine learning algorithms for modeling, unsupervised modeling techniques, and how to choose the correct algorithm for analysis in any given task Use R and Python to develop descriptive models, predictive models, and optimize models Interpret and recommend actions based on analytical model outcomes Who This Book Is For Software professionals and developers, managers, and executives who want to understand and learn the fundamentals of analytics using R and Python.

Experimentation for Engineers

2023-02-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by David Sweet

AI/ML Data Science Python bayesian-statistics data data-science data-science-tasks statistics

Optimize the performance of your systems with practical experiments used by engineers in the world’s most competitive industries. In Experimentation for Engineers: From A/B testing to Bayesian optimization you will learn how to: Design, run, and analyze an A/B test Break the "feedback loops" caused by periodic retraining of ML models Increase experimentation rate with multi-armed bandits Tune multiple parameters experimentally with Bayesian optimization Clearly define business metrics used for decision-making Identify and avoid the common pitfalls of experimentation Experimentation for Engineers: From A/B testing to Bayesian optimization is a toolbox of techniques for evaluating new features and fine-tuning parameters. You’ll start with a deep dive into methods like A/B testing, and then graduate to advanced techniques used to measure performance in industries such as finance and social media. Learn how to evaluate the changes you make to your system and ensure that your testing doesn’t undermine revenue or other business metrics. By the time you’re done, you’ll be able to seamlessly deploy experiments in production while avoiding common pitfalls. About the Technology Does my software really work? Did my changes make things better or worse? Should I trade features for performance? Experimentation is the only way to answer questions like these. This unique book reveals sophisticated experimentation practices developed and proven in the world’s most competitive industries that will help you enhance machine learning systems, software applications, and quantitative trading solutions. About the Book Experimentation for Engineers: From A/B testing to Bayesian optimization delivers a toolbox of processes for optimizing software systems. You’ll start by learning the limits of A/B testing, and then graduate to advanced experimentation strategies that take advantage of machine learning and probabilistic methods. The skills you’ll master in this practical guide will help you minimize the costs of experimentation and quickly reveal which approaches and features deliver the best business results. What's Inside Design, run, and analyze an A/B test Break the “feedback loops” caused by periodic retraining of ML models Increase experimentation rate with multi-armed bandits Tune multiple parameters experimentally with Bayesian optimization About the Reader For ML and software engineers looking to extract the most value from their systems. Examples in Python and NumPy. About the Author David Sweet has worked as a quantitative trader at GETCO and a machine learning engineer at Instagram. He teaches in the AI and Data Science master's programs at Yeshiva University. Quotes Putting an ‘improved’ version of a system into production can be really risky. This book focuses you on what is important! - Simone Sguazza, University of Applied Sciences and Arts of Southern Switzerland A must-have for anyone setting up experiments, from A/B tests to contextual bandits and Bayesian optimization. - Maxim Volgin, KLM Shows a non-mathematical programmer exactly what they need to write powerful mathematically-based testing algorithms. - Patrick Goetz, The University of Texas at Austin Gives you the tools you need to get the most out of your experiments. - Marc-Anthony Taylor, Raiffeisen Bank International

Python Data Science Handbook, 2nd Edition

2022-12-08 · O'Reilly Data Science Books O'Reilly Amazon

book

by Jake VanderPlas

AI/ML Data Science Matplotlib Pandas Python Scikit-learn programming-languages software-development

Python is a first-class tool for many researchers, primarily because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the new edition of Python Data Science Handbook do you get them all—IPython, NumPy, pandas, Matplotlib, Scikit-Learn, and other related tools. Working scientists and data crunchers familiar with reading and writing Python code will find the second edition of this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python. With this handbook, you'll learn how: IPython and Jupyter provide computational environments for scientists using Python NumPy includes the ndarray for efficient storage and manipulation of dense data arrays Pandas contains the DataFrame for efficient storage and manipulation of labeled/columnar data Matplotlib includes capabilities for a flexible range of data visualizations Scikit-learn helps you build efficient and clean Python implementations of the most important and established machine learning algorithms

Effective Data Science Infrastructure

2022-08-09 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ville Tuulos

AI/ML Analytics AWS Cloud Computing Data Science Docker MLOps Python data data-science

Simplify data science infrastructure to give data scientists an efficient path from prototype to production. In Effective Data Science Infrastructure you will learn how to: Design data science infrastructure that boosts productivity Handle compute and orchestration in the cloud Deploy machine learning to production Monitor and manage performance and results Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, Conda, and Docker Architect complex applications for multiple teams and large datasets Customize and grow data science infrastructure Effective Data Science Infrastructure: How to make data scientists more productive is a hands-on guide to assembling infrastructure for data science and machine learning applications. It reveals the processes used at Netflix and other data-driven companies to manage their cutting edge data infrastructure. In it, you’ll master scalable techniques for data storage, computation, experiment tracking, and orchestration that are relevant to companies of all shapes and sizes. You’ll learn how you can make data scientists more productive with your existing cloud infrastructure, a stack of open source software, and idiomatic Python. The author is donating proceeds from this book to charities that support women and underrepresented groups in data science. About the Technology Growing data science projects from prototype to production requires reliable infrastructure. Using the powerful new techniques and tooling in this book, you can stand up an infrastructure stack that will scale with any organization, from startups to the largest enterprises. About the Book Effective Data Science Infrastructure teaches you to build data pipelines and project workflows that will supercharge data scientists and their projects. Based on state-of-the-art tools and concepts that power data operations of Netflix, this book introduces a customizable cloud-based approach to model development and MLOps that you can easily adapt to your company’s specific needs. As you roll out these practical processes, your teams will produce better and faster results when applying data science and machine learning to a wide array of business problems. What's Inside Handle compute and orchestration in the cloud Combine cloud-based tools into a cohesive data science environment Develop reproducible data science projects using Metaflow, AWS, and the Python data ecosystem Architect complex applications that require large datasets and models, and a team of data scientists About the Reader For infrastructure engineers and engineering-minded data scientists who are familiar with Python. About the Author At Netflix, Ville Tuulos designed and built Metaflow, a full-stack framework for data science. Currently, he is the CEO of a startup focusing on data science infrastructure. Quotes By reading and referring to this book, I’m confident you will learn how to make your machine learning operations much more efficient and productive. - From the Foreword by Travis Oliphant, Author of NumPy, Founder of Anaconda, PyData, and NumFOCUS Effective Data Science Infrastructure is a brilliant book. It’s a must-have for every data science team. - Ninoslav Cerkez, Logit More data science. Less headaches. - Dr. Abel Alejandro Coronado Iruegas, National Institute of Statistics and Geography of Mexico Indispensable. A copy should be on every data engineer’s bookshelf. - Matthew Copple, Grand River Analytics

Python for Data Science

2022-08-02 · O'Reilly Data Science Books O'Reilly Amazon

book

by Yuli Vasiliev

AI/ML Data Science Marketing Matplotlib Pandas Python Scikit-learn programming-languages software-development

Python is an ideal choice for accessing, manipulating, and gaining insights from data of all kinds. Python for Data Science introduces you to the Pythonic world of data analysis with a learn-by-doing approach rooted in practical examples and hands-on activities. Youâ??ll learn how to write Python code to obtain, transform, and analyze data, practicing state-of-the-art data processing techniques for use cases in business management, marketing, and decision support. You will discover Pythonâ??s rich set of built-in data structures for basic operations, as well as its robust ecosystem of open-source libraries for data science, including NumPy, pandas, scikit-learn, matplotlib, and more. Examples show how to load data in various formats, how to streamline, group, and aggregate data sets, and how to create charts, maps, and other visualizations. Later chapters go in-depth with demonstrations of real-world data applications, including using location data to power a taxi service, market basket analysis to identify items commonly purchased together, and machine learning to predict stock prices.

Building Data Science Solutions with Anaconda

2022-05-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Dan Meador

AI/ML Data Science Pandas Python data data-science

Explore the comprehensive world of data science with "Building Data Science Solutions with Anaconda." This book covers essential topics like managing environments with Anaconda, detecting and overcoming bias, and ensuring model interpretability. Delve into practical tools and solutions, all explained in an approachable way to help you become proficient in data science workflows. What this Book will help me do Master environment management for data science projects using Anaconda and conda. Detect and mitigate dataset biases to ensure fair and ethical machine learning models. Learn advanced data science techniques with tools like NumPy, pandas, and Jupyter Notebooks. Understand and explain your machine learning models using LIME and SHAP. Grow your expertise in selecting and fine-tuning AI/ML algorithms for diverse applications. Author(s) None Meador combines extensive expertise in data science with a thorough understanding of Anaconda tools and open source software. With a background in engineering and AI model management, None provides an insightful perspective on the field. Their practical and analogy-driven approach makes technical concepts accessible to learners of any level. Who is it for? This book is ideal for data analysts, aspiring machine learning engineers, and data science professionals who wish to deepen their knowledge and make the most of Anaconda's capabilities. A prior understanding of Python and basic data science principles is assumed. If you're looking to optimize your data science workflows and gain hands-on practice, this book is for you.

Hands-on Matplotlib: Learn Plotting and Visualizations with Python 3

2021-11-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ashwin Pajankar

AI/ML API DataViz MATLAB Matplotlib Pandas Python Seaborn data data-science data-science-tasks data-visualization +1 more

Learn the core aspects of NumPy, Matplotlib, and Pandas, and use them to write programs with Python 3. This book focuses heavily on various data visualization techniques and will help you acquire expert-level knowledge of working with Matplotlib, a MATLAB-style plotting library for Python programming language that provides an object-oriented API for embedding plots into applications. You'll begin with an introduction to Python 3 and the scientific Python ecosystem. Next, you'll explore NumPy and ndarray data structures, creation routines, and data visualization. You'll examine useful concepts related to style sheets, legends, and layouts, followed by line, bar, and scatter plots. Chapters then cover recipes of histograms, contours, streamplots, and heatmaps, and how to visualize images and audio with pie and polar charts. Moving forward, you'll learn how to visualize with pcolor, pcolormesh, and colorbar, and how to visualize in 3D in Matplotlib, create simple animations, and embed Matplotlib with different frameworks. The concluding chapters cover how to visualize data with Pandas and Matplotlib, Seaborn, and how to work with the real-life data and visualize it. After reading Hands-on Matplotlib you'll be proficient with Matplotlib and able to comfortably work with ndarrays in NumPy and data frames in Pandas. What You'll Learn Understand Data Visualization and Python using Matplotlib Review the fundamental data structures in NumPy and Pandas Work with 3D plotting, visualizations, and animations Visualize images and audio data Who This Book Is For Data scientists, machine learning engineers and software professionals with basic programming skills.

Building Data Science Applications with FastAPI

2021-10-08 · O'Reilly Data Science Books O'Reilly Amazon

book

by François Voron

AI/ML API Data Science Python Scikit-learn fastapi python-web-frameworks web-development web-mobile

This comprehensive guide to FastAPI walks readers through developing modern web backends optimized for data science applications. By mastering key concepts like dependency injection and asynchronous programming, you will create high-performing REST APIs and machine learning powered systems. What this Book will help me do Master asynchronous programming and type hinting in Python for efficient coding. Design comprehensive RESTful APIs for machine learning with FastAPI. Build, test, and maintain scalable data science applications. Integrate Python libraries like NumPy and scikit-learn into web backends. Deploy modular and efficient FastAPI-backed systems to production. Author(s) None Voron is a seasoned software developer specialized in web frameworks and data science applications. With a strong background in building scalable systems, they bring invaluable insights on utilizing FastAPI. Voron emphasizes clarity and hands-on learning, sharing their expertise to help developers master the technology efficiently. Who is it for? This book is ideal for data scientists and Python developers interested in creating efficient data science backends. If you have groundwork knowledge of machine learning concepts and Python programming, this book will enhance your ability to deploy and manage APIs for data-driven applications.

Practical Data Science with Python

2021-09-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Nathan George

AI/ML Analytics Data Science Pandas Python SQL programming-languages software-development

Practical Data Science with Python guides you through the entire process of leveraging Python tools to analyze and gain insights from data. You'll start with foundational concepts and coding essentials, progressing through statistical analysis, machine learning techniques, and ethical considerations. What this Book will help me do Clean, prepare, and explore data using pandas and NumPy. Understand and implement machine learning models such as random forests and support vector machines. Perform statistical tests and analyze distributions to enhance data insights. Utilize SQL with Python for efficient data interaction. Generate automated reports and dashboards for data storytelling. Author(s) Nathan George has extensive professional experience as a data scientist and Python developer. He specializes in the application of machine learning and statistical methods to solve real-world problems. His writing combines technical depth with an approachable style, aiming to provide readers with actionable knowledge and skills. Who is it for? This book is perfect for data science beginners who have a basic understanding of Python and want to build practical data analysis skills. Students in analytics programs or professionals looking to transition into a data science role will find value in its approachable yet comprehensive coverage. Aspiring data analysts and career changers will gain firsthand exposure to Python-based data science best practices. If you're eager to develop practical, hands-on experience in the data science field, this is the guide for you.

talk-data.com

Activity Trend

Top Events

Top Speakers

Investing for Programmers

Data Without Labels

Think Stats, 3rd Edition

Pandas Cookbook - Third Edition

Numerical Python: Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib

Statistics for Data Science and Analytics

Polars Cookbook

Modern Graph Theory Algorithms with Python

Streamlit for Data Science - Second Edition

Python Data Analytics: With Pandas, NumPy, and Matplotlib

Scaling Python with Dask

Practical Business Analytics Using R and Python: Solve Business Problems Using a Data-driven Approach

Experimentation for Engineers

Python Data Science Handbook, 2nd Edition

Effective Data Science Infrastructure

Python for Data Science

Building Data Science Solutions with Anaconda

Hands-on Matplotlib: Learn Plotting and Visualizations with Python 3

Building Data Science Applications with FastAPI

Practical Data Science with Python