Pandas

Pandas Brain Teasers

2021-08-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Miki Tebeka

Data Science Python data data-science data-science-tools

This book contains 25 short programs that will challenge your understanding of Pandas. Like any big project, the Pandas developers had to make some design decisions that at times seem surprising. This book uses those quirks as a teaching opportunity. By understanding the gaps in your knowledge, you'll become better at what you do. Some of the teasers are from the author's experience shipping bugs to production, and some from others doing the same. Teasers and puzzles are fun, and learning how to solve them can teach you to avoid programming mistakes and maybe even impress your colleagues and future employers. Working with data is central to nearly everything we do, from disease contact tracing and analyzing health records to smart meters that track utility consumption behavior. With the power of Python's pandas library, you can process and analyze this data in a highly efficient and simple-to-understand way. And with 25 brain teasers designed to turn this technology's quirks into a teaching opportunity, you'll be honing your data science skills while having fun at the same time. Following a simple format, you'll challenge yourself and your understanding of pandas. Read a short Python program that uses pandas, try to guess the output, run the code yourself, and then go to the next page for an explanation of the solution. From common pitfalls and hidden gotchas to unexpected twists and turns, you'll deepen your understanding of pandas, learn to write more efficient code, and reduce the number of bugs in the software you develop. You may even impress your colleagues and your employers, both present and future. Learn the tricks of the trade with Python's pandas, in one of the most fun and creative ways around. What You Need: To run the code you'll need Python version 3.8 or upper and Pandas version 1.0 or upper installed. We use Python version 3.8.3 and Pandas version 1.0.5; the output might change in future versions.

Data Science Projects with Python - Second Edition

2021-07-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Stephen Klosterman

AI/ML Data Science Matplotlib Python Scikit-learn programming-languages software-development

Data Science Projects with Python offers a hands-on, project-based approach to learning data science using real-world data sets and tools. You will explore data using Python libraries like pandas and Matplotlib, build machine learning models with scikit-learn, and apply advanced techniques like XGBoost and SHAP values. This book equips you to confidently extract insights, evaluate models, and deliver results with clarity. What this Book will help me do Learn to load, clean, and preprocess data using Python and pandas. Build and evaluate predictive models, including logistic regression and random forests. Visualize data effectively using Python libraries like Matplotlib. Master advanced techniques like XGBoost and algorithmic fairness. Communicate data-driven insights to aid decision making in practical scenarios. Author(s) Stephen Klosterman is an experienced data scientist with a strong focus on practical applications of machine learning in business. Combining a rich academic background with hands-on industry experience, he excels at explaining complex concepts in an approachable way. As the author of 'Data Science Projects with Python,' his goal is to provide learners with the skills needed for real-world data science challenges. Who is it for? This book is ideal for beginners in data science and machine learning who have some basic programming knowledge in Python. Aspiring data scientists will benefit from its practical, end-to-end examples. Professionals seeking to expand their skillset in predictive modeling and delivering business insights will find this book invaluable. Some foundation in statistics and programming is recommended.

Apache Airflow and Ray: Orchestrating ML at Scale

2021-07-01 · Airflow Summit 2021

session

by Daniel Imberman

AI/ML Airflow TensorFlow

As the Apache Airflow project grows, we seek both ways to incorporate rising technologies and novel ways to expose them to our users. Ray is one of the fastest-growing distributed computation systems on the market today. In this talk, we will introduce the Ray decorator and Ray backend. These features, built with the help of the Ray maintainers at Anyscale, will allow Data Scientists to natively integrate their distributed pandas, XGBoost, and TensorFlow jobs to their airflow pipelines with a single decorator. By merging the orchestration of Airflow and the distributed computation of Ray, this coordination of technologies opens Airflow users to a whole host of new possibilities when designing their pipelines.

Customizing Xcom to enhance data sharing between tasks

2021-07-01 · Airflow Summit 2021

session

by Vikram Koka (Astronomer) , Ephraim Anierobi

Airflow API Cloud Computing Cloud Storage JSON S3

In Apache Airflow, Xcom is the default mechanism for passing data between tasks in a DAG. In practice, this has been restricted to small data elements, since the Xcom data is persisted in the Airflow metadatabase and is constrained by database and performance limitations. With the new TaskFlow API introduced in Airflow 2.0, it is seamless to pass data between tasks and the use of Xcom is invisible. However, the ability to pass data is restricted to a relatively small set of data types which can be natively converted in JSON. This tutorial describes how to go beyond these limitations by developing and deploying a Custom Xcom backend within Airflow to enable the sharing of large and varied data elements such as Pandas data frames between tasks in a data pipeline, using a cloud storage such as Google Storage or Amazon S3.

Lessons Learned From The Pipeline Data Engineering Academy

2021-06-26 · Data Engineering Podcast Listen

podcast_episode

by Daniel Molnar , Peter Fabian (Pipeline Academy) , Tobias Macey

BI Data Engineering Data Management Docker DWH ETL/ELT GitHub Kafka Kubernetes Looker Modern Data Stack Prefect +3 more

Summary Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that, Daniel Molnar and Peter Fabian started the Pipeline Academy to do exactly that. In this episode they reflect on the lessons that they learned while teaching the first cohort of their bootcamp how to be effective data engineers. By focusing on the fundamentals, and making everyone write code, they were able to build confidence and impart the importance of context for their students.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Your host is Tobias Macey and today I’m interviewing Daniel Molnar and Peter Fabian about the lessons that they learned from their first cohort at the Pipeline data engineering academy

Interview

Introduction How did you get involved in the area of data management? Can you start by sharing the curriculum and learning goals for the students? How did you set a common baseline for all of the students to build from throughout the program?

What was your process for determining the structure of the tasks and the tooling used?

What were some of the topics/tools that the students had the most difficulty with?

What topics/tools were the easiest to grasp?

What are some difficulties that you encountered while trying to teach different concepts? How did you deal with the tension of teaching the fundamentals while tying them to toolchains that hiring managers are looking for? What are the successes that you had with this cohort and what changes are you making to your approach/curriculum to build on them? What are some of the failures that you encountered and what lessons have you taken from them? How did the pandemic impact your overall plan and execution of the initial cohort? What were the skills that you focused on for interview preparation? What level of ongoing support/engagement do you have with students once they complete the curriculum? What are the most interesting, innovative, or unexpected solutions that you saw from your students? What are the most interesting, unexpected, or challenging lessons that you have learned while working with your first cohort? When is a bootcamp the wrong approach for skill development? What do you have planned for the future of the Pipeline Academy?

Contact Info

Daniel

LinkedIn Website @soobrosa on Twitter

Peter

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

Pipeline Academy

Blog

Scikit Pandas Urchin Kafka Three "C"s – Context, Confidence, and Code Prefect

Podcast Episode

Great Expectations

Podcast Episode Podcast.init Episode

Docker Kubernetes Become a Data Engineer On A Shoestring James Mickens

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Hands-On Data Analysis with Pandas - Second Edition

2021-04-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Stefanie Molin

AI/ML Analytics Data Science NumPy Python Seaborn data data-science data-science-tools

'Hands-On Data Analysis with Pandas' guides you to gain expertise in the Python pandas library for data analysis and manipulation. With practical, real-world examples, you'll learn to analyze datasets, visualize data trends, and implement machine learning models for actionable insights. What this Book will help me do Understand and implement data analysis techniques with Python. Develop expertise in data manipulation using pandas and NumPy. Visualize data effectively with pandas visualization tools and seaborn. Apply machine learning techniques with Python libraries. Combine datasets and handle complex data workflows efficiently. Author(s) Stefanie Molin is a software engineer and data scientist with extensive experience in analytics and Python. She has worked with large data-driven systems and has a strong focus on teaching data analysis effectively. Stefanie's books are known for their practical, hands-on approach to solving real data problems. Who is it for? This book is perfect for aspiring data scientists, data analysts, and Python developers. Readers with beginner to intermediate skill levels in Python will find it accessible and informative. It is designed for those seeking to build practical data analysis skills. If you're looking to add data science and pandas to your toolkit, this book is ideal.

Cleaning Data for Effective Data Science

2021-03-31 · O'Reilly Data Science Books O'Reilly Amazon

book

by David Mertz

AI/ML Data Quality Data Science JSON Python SciPy SQL data data-science

Dive into the intricacies of data cleaning, a crucial aspect of any data science and machine learning pipeline, with 'Cleaning Data for Effective Data Science.' This comprehensive guide walks you through tools and methodologies like Python, R, and command-line utilities to prepare raw data for analysis. Learn practical strategies to manage, clean, and refine data encountered in the real world. What this Book will help me do Understand and utilize various data formats such as JSON, SQL, and PDF for data ingestion and processing. Master key tools like pandas, SciPy, and Tidyverse to manipulate and analyze datasets efficiently. Develop heuristics and methodologies for assessing data quality, detecting bias, and identifying irregularities. Apply advanced techniques like feature engineering and statistical adjustments to enhance data usability. Gain confidence in handling time series data by employing methods for de-trending and interpolating missing values. Author(s) David Mertz has years of experience as a Python programmer and data scientist. Known for his engaging and accessible teaching style, David has authored numerous technical articles and books. He emphasizes not only the technicalities of data science tools but also the critical thinking that approaches solutions creatively and effectively. Who is it for? 'Cleaning Data for Effective Data Science' is designed for data scientists, software developers, and educators dealing with data preparation. Whether you're an aspiring data enthusiast or an experienced professional looking to refine your skills, this book provides essential tools and frameworks. Prior programming knowledge, particularly in Python or R, coupled with an understanding of statistical fundamentals, will help you make the most of this resource.

Python for Algorithmic Trading

2020-11-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Yves Hilpisch

AI/ML Analytics NumPy Python Data Streaming data data-science

Algorithmic trading, once the exclusive domain of institutional players, is now open to small organizations and individual traders using online platforms. The tool of choice for many traders today is Python and its ecosystem of powerful packages. In this practical book, author Yves Hilpisch shows students, academics, and practitioners how to use Python in the fascinating field of algorithmic trading. You'll learn several ways to apply Python to different aspects of algorithmic trading, such as backtesting trading strategies and interacting with online trading platforms. Some of the biggest buy- and sell-side institutions make heavy use of Python. By exploring options for systematically building and deploying automated algorithmic trading strategies, this book will help you level the playing field. Set up a proper Python environment for algorithmic trading Learn how to retrieve financial data from public and proprietary data sources Explore vectorization for financial analytics with NumPy and pandas Master vectorized backtesting of different algorithmic trading strategies Generate market predictions by using machine learning and deep learning Tackle real-time processing of streaming data with socket programming tools Implement automated algorithmic trading strategies with the OANDA and FXCM trading platforms

Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python

2020-10-24 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Ashwin Pajankar

Data Science DataViz Matplotlib NumPy Python programming-languages software-development

Quickly start programming with Python 3 for data visualization with this step-by-step, detailed guide. This book’s programming-friendly approach using libraries such as leather, NumPy, Matplotlib, and Pandas will serve as a template for business and scientific visualizations. You’ll begin by installing Python 3, see how to work in Jupyter notebook, and explore Leather, Python’s popular data visualization charting library. You’ll also be introduced to the scientific Python 3 ecosystem and work with the basics of NumPy, an integral part of that ecosystem. Later chapters are focused on various NumPy routines along with getting started with Scientific Data visualization using matplotlib. You’ll review the visualization of 3D data using graphs and networks and finish up by looking at data visualization with Pandas, including the visualization of COVID-19 data sets. The code examples are tested on popular platforms like Ubuntu, Windows, and Raspberry Pi OS. WithPractical Python Data Visualization you’ll master the core concepts of data visualization with Pandas and the Jupyter notebook interface. What You'll Learn Review practical aspects of Python Data Visualization with programming-friendly abstractions Install Python 3 and Jupyter on multiple platforms including Windows, Raspberry Pi, and Ubuntu Visualize COVID-19 data sets with Pandas Who This Book Is For Data Science enthusiasts and professionals, Business analysts and managers, software engineers, data engineers.

The Data Science Workshop - Second Edition

2020-08-28 · O'Reilly Data Science Books O'Reilly Amazon

book

by Anthony So , Thomas Joseph , Andrew Worsley , Dr. Samuel Asare , Robert Thas John

AI/ML Data Science Python Scikit-learn data data-science

The Data Science Workshop provides a comprehensive introduction to building real-world data science projects. Through a hands-on approach, you will learn how to analyze data, build machine learning models, and deploy them effectively in various scenarios. This book is designed to equip you with the skills to confidently tackle data science challenges. What this Book will help me do Understand the differences between supervised and unsupervised learning to select the appropriate technique. Master data manipulation and analysis using popular Python libraries like pandas and scikit-learn. Develop skills in regression, classification, and clustering to solve diverse data science problems. Learn advanced methods to improve model accuracy, including hyperparameter tuning and feature engineering. Implement and deploy machine learning models efficiently in production workflows. Author(s) The authors of The Data Science Workshop are experienced professionals and educators in the field of data science and machine learning. They have extensive expertise in using practical methods to solve data challenges and have a passion for teaching others through engaging and clear instructional material. Who is it for? This book is ideal for aspiring data analysts, data scientists, and business analysts who wish to build foundational skills in data science. It caters to those new to the field and professionals transitioning to a data-centric role, providing practical knowledge without requiring an advanced mathematical background. Familiarity with Python is recommended.

The Data Wrangling Workshop - Second Edition

2020-07-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Shubhadeep Roychowdhury , John Wesley Doyle , Harshil Jain , Samik Sen , Akshay Khare , Dr. Tirthajyoti Sarkar , Nagendra Nagaraj , Dr. Vlad Sebastian Ionescu , Robert Thas John , Brian Lipp

Analytics Data Quality Data Science Matplotlib NumPy Python RDBMS SQL data data-science data-science-tools

The Data Wrangling Workshop is your beginner's guide to the essential techniques and practices of data manipulation using Python. Throughout the book, you will progressively build your skills, learning key concepts such as extracting, cleaning, and transforming data into actionable insights. By the end, you'll be confident in handling various data wrangling tasks efficiently. What this Book will help me do Understand and apply the fundamentals of data wrangling using Python. Combine and aggregate data from diverse sources like web data, SQL databases, and spreadsheets. Use descriptive statistics and plotting to examine dataset properties. Handle missing or incorrect data effectively to maintain data quality. Gain hands-on experience with Python's powerful data science libraries like Pandas, NumPy, and Matplotlib. Author(s) Brian Lipp, None Roychowdhury, and Dr. Tirthajyoti Sarkar are experienced educators and professionals in the fields of data science and engineering. Their collective expertise spans years of teaching and working with data technologies. They aim to make data wrangling accessible and comprehensible, focusing on practical examples to equip learners with real-world skills. Who is it for? The Data Wrangling Workshop is ideal for developers, data analysts, and business analysts aiming to become data scientists or analytics experts. If you're just getting started with Python, you will find this book guiding you step-by-step. A basic understanding of Python programming, as well as relational databases and SQL, is recommended for smooth learning.

The Data Visualization Workshop

2020-07-28 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Tim Großmann , Piotr Malak , Rohan Chikorde , Anshu Kumar , Joshua Görner , Mario Döbler , Ankit Verma

Data Science DataViz Matplotlib NumPy Python Seaborn data data-science data-science-tasks data-visualization

In "The Data Visualization Workshop," you will explore the fascinating world of data visualization and learn how to turn raw data into compelling visualizations that clearly communicate your insights. This book provides practical guidance and hands-on exercises to familiarize you with essential topics such as plotting techniques and interactive visualizations using Python. What this Book will help me do Prepare and clean raw data for visualization using NumPy and pandas. Create effective and visually appealing charts using libraries like Matplotlib and Seaborn. Generate geospatial visualizations utilizing tools like geoplotlib. Develop interactive visualizations for web integration with the Bokeh library. Apply visualization techniques to real-world data analysis scenarios, including stock data and Airbnb datasets. Author(s) Mario Döbler and Tim Großmann are experienced authors and professionals in the field of Python programming and data science. They bring a wealth of knowledge and practical insights to data visualization. Through their collaborative efforts, they aim to empower readers with the skills to create compelling data visualizations and uncover meaningful data narratives. Who is it for? This book is ideal for beginners new to data visualization, as well as developers and data scientists seeking to enhance their practical skills. It is approachable for readers without prior visualization experience but assumes familiarity with Python programming and basic mathematics. If you're eager to bring your data to life in insightful and engaging ways, this book is for you.

The Applied Data Science Workshop - Second Edition

2020-07-22 · O'Reilly Data Science Books O'Reilly Amazon

book

by Shovon Sengupta , Paul Van Branteghem , Alex Galea , Karen Yang , Guillermina Bea j

AI/ML Data Science Matplotlib Python Seaborn data data-science

Embark on an interactive journey into the world of data science with 'The Applied Data Science Workshop'. By following real-world scenarios and hands-on exercises, you will explore the fundamentals of data analysis and machine learning modeling within Jupyter Notebooks, leveraging Python libraries like pandas and sci-kit learn to draw meaningful insights from data. What this Book will help me do Master the process of setting up and using Jupyter Notebooks effectively for data science tasks. Learn to preprocess, analyze, and visualize data using Python libraries such as pandas, Matplotlib, and Seaborn. Discover methods to train and evaluate machine learning models using real-world data scenarios. Apply techniques to assess model performance and optimize them with advanced validation. Gain the skills to communicate insights through well-documented analyses and stakeholder-ready reports. Author(s) None Galea, an accomplished author in the data science domain, focuses on making technical concepts understandable and relatable. With this book, Galea leverages years of experience to introduce readers to practical applications of data science using Python. The author's approach ensures that readers not only learn the concepts but also apply them hands-on. Who is it for? This book caters to aspiring data scientists and developers interested in data analysis and practical applications of data science techniques. Beginners will find the step-by-step methodology approachable, while those with a basic understanding of Python programming or machine learning can quickly extend their skills. It suits anyone eager to apply data science in their professional toolbox.

Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way

2020-06-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Hannah Stepanek

Big Data Python data data-science data-science-tools

Understand and implement big data analysis solutions in pandas with an emphasis on performance. This book strengthens your intuition for working with pandas, the Python data analysis library, by exploring its underlying implementation and data structures. Thinking in Pandas introduces the topic of big data and demonstrates concepts by looking at exciting and impactful projects that pandas helped to solve. From there, you will learn to assess your own projects by size and type to see if pandas is the appropriate library for your needs. Author Hannah Stepanek explains how to load and normalize data in pandas efficiently, and reviews some of the most commonly used loaders and several of their most powerful options. You will then learn how to access and transform data efficiently, what methods to avoid, and when to employ more advanced performance techniques. You will also go over basic data access and munging in pandas and the intuitive dictionary syntax. Choosing the right DataFrame format, working with multi-level DataFrames, and how pandas might be improved upon in the future are also covered. By the end of the book, you will have a solid understanding of how the pandas library works under the hood. Get ready to make confident decisions in your own projects by utilizing pandas—the right way. What You Will Learn Understand the underlying data structure of pandas and why it performs the way it does under certain circumstances Discover how to use pandas to extract, transform, and load data correctly with an emphasis on performance Choose the right DataFrame so that the data analysis is simple and efficient. Improve performance of pandas operations with other Python libraries Who This Book Is For Software engineers with basic programming skills in Python keen on using pandas for a big data analysis project. Python software developers interested in big data.

Interactive Data Visualization with Python - Second Edition

2020-04-14 · O'Reilly Data Visualization Books O'Reilly Amazon

book

by Abha Belorkar , Sharath Chandra Guntuku , Anshu Kumar , Shubhangi Hora

Data Science DataViz Matplotlib Plotly Python Seaborn data data-science data-science-tasks data-visualization

With Interactive Data Visualization with Python, you will learn to turn raw data into compelling, interactive visual stories. This book guides you through the practical uses of Python libraries such as Bokeh and Plotly, teaching you skills to create visualizations that captivate and inform. What this Book will help me do Understand and apply different principles and techniques of interactive data visualization to bring your data to life. Master the use of libraries like Matplotlib, Seaborn, Altair, and Bokeh for creating a variety of data visualizations. Learn how to customize data visualizations effectively to meet the needs of different audiences and use cases. Gain proficiency in using advanced tools like Plotly for creating dynamic and engaging visual presentations. Acquire the ability to identify common pitfalls in visualization and learn strategies to avoid them, ensuring clarity and impact. Author(s) Abha Belorkar, Sharath Chandra Guntuku, Shubhangi Hora, and Anshu Kumar are experts in Python programming and data visualization with years of experience in data science and software development. They have collaborated to blend their knowledge into this book-a clear and practical guide to mastering interactive visualization with Python. Who is it for? This book is perfect for Python developers, data analysts, and data scientists who want to enhance their skills in data presentation. If you are ready to transform complex data into digestible and interactive visuals, this book is for you. A basic familiarity with Python programming and libraries like pandas is recommended. By the end of the book, you'll feel confident in creating professional-grade data visualizations.

Pandas 1.x Cookbook - Second Edition

2020-02-27 · O'Reilly Data Science Books O'Reilly Amazon

book

by Theodore Petrou , Matthew Harrison

AI/ML Data Science Matplotlib Python Seaborn data data-science data-science-tools

The 'Pandas 1.x Cookbook' offers a recipe-based guide for mastering the powerful Python library, pandas. You will gain practical knowledge for handling and manipulating data efficiently, from the fundamentals to advanced techniques. The book is an essential resource for exploring and analyzing datasets with pandas. What this Book will help me do Understand and apply data exploration techniques in pandas. Use pandas to manipulate, aggregate, and clean datasets to extract meaningful insights. Combine pandas with Matplotlib and Seaborn to create effective visualizations. Perform time series analysis and transform datasets for machine learning. Implement workflows for handling large-scale data that exceeds your computer's memory. Author(s) Matthew Harrison and Theodore Petrou are highly experienced educators and practitioners in data science and Python programming. With their extensive expertise in using pandas, they provide insights through practical exercises and approachable narratives. Their aim is to make complex concepts accessible to learners of varying skill levels. Who is it for? This book is ideal for Python programmers, analysts, and data scientists seeking to expand their data handling and analysis capabilities. It caters to both beginners who are new to pandas and those looking to deepen their understanding of its advanced features. If your goal is to explore, clean, and analyze complex datasets efficiently, this book is tailored for you.

The Data Science Workshop

2020-01-29 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tiffany Ford , Anthony So , Pritesh Tiwari , Thomas Joseph , Andrew Worsley , Ivan Liu , Dr. Samuel Asare , Robert Thas John , Barbora stetinova

AI/ML Data Science Python Scikit-learn data data-science

The Data Science Workshop is designed for beginners looking to step into the rigorous yet rewarding world of data science. By leveraging a hands-on approach, this book demystifies key concepts and guides you gently into creating practical machine learning models with Python. What this Book will help me do Understand supervised and unsupervised learning and their applications. Gain hands-on experience with Python libraries like scikit-learn and pandas for data manipulation. Learn practical use cases of machine learning techniques such as regression and clustering. Discover techniques to ensure robustness in machine learning with hyperparameter tuning and ensembling. Develop efficiency in feature engineering with automated tools to accelerate workflows. Author(s) Anthony So None, Thomas Joseph, Robert Thas John, and Andrew Worsley are seasoned experts in data science and Python programming. Along with Dr. Samuel Asare None, they bring decades of experience and practical knowledge to this book, delivering an engaging and approachable learning experience. Who is it for? This book is targeted toward individuals who are beginners in data science and are eager to acquire foundational knowledge and practical skills. It appeals to those who prefer a structured, hands-on approach to learning, possibly having some prior programming experience or interest in Python. Professionals aspiring to pivot into data-oriented roles or students aiming to strengthen their understanding of data science concepts will find this book particularly valuable. If you're looking to gain confidence in implementing data science projects and solving real-world problems, this text is for you.

Mining Social Media

2019-12-10 · O'Reilly Data Science Books O'Reilly Amazon

book

by Lam Thuy Vo

API Google Sheets HTML Python data data-science data-science-tasks web-scraping

Did fake Twitter accounts help sway a presidential election? What can Facebook and Reddit archives tell us about human behavior? In Mining Social Media, senior BuzzFeed reporter Lam Thuy Vo shows you how to use Python and key data analysis tools to find the stories buried in social media. Whether you’re a professional journalist, an academic researcher, or a citizen investigator, you’ll learn how to use technical tools to collect and analyze data from social media sources to build compelling, data-driven stories. Learn how to: •Write Python scripts and use APIs to gather data from the social web •Download data archives and dig through them for insights •Inspect HTML downloaded from websites for useful content •Format, aggregate, sort, and filter your collected data using Google Sheets •Create data visualizations to illustrate your discoveries •Perform advanced data analysis using Python, Jupyter Notebooks, and the pandas library •Apply what you’ve learned to research topics on your own Social media is filled with thousands of hidden stories just waiting to be told. Learn to use the data-sleuthing tools that professionals use to write your own data-driven stories.

Mastering pandas - Second Edition

2019-10-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ashish Kumar (Grainite)

Analytics Data Science Python data data-science data-science-tools

Mastering pandas is the ultimate guide to harnessing the power of the pandas library for data analysis. Covering everything from installation to advanced techniques, this book provides comprehensive instructions and examples to help you perform efficient data manipulation and visualization. Explore key features of pandas, such as multi-indexing and time series analysis, and become proficient in actionable analytics. What this Book will help me do Master importing and managing datasets of various formats using pandas. Expertly handle missing data and clean datasets for robust analysis. Create powerful visualizations and reports using pandas and Jupyter notebooks. Leverage advanced indexing and grouping techniques to derive insights. Utilize pandas for time series analysis to analyze trends and patterns. Author(s) None Kumar is an experienced data scientist specializing in data analysis and visualization using Python. With a deep understanding of the pandas library, None has been helping professionals and enthusiasts alike to make data-driven decisions. Known for an example-driven teaching style, None bridges complex theoretical concepts with practical applications in data science. Who is it for? If you're a data scientist, analyst, or Python developer seeking to enhance your data analysis capabilities, this book is for you. Prior knowledge of Python is beneficial but not mandatory, as foundational concepts are explained. This guide spans beginner to advanced topics, accommodating users looking to deepen their skills and those aiming to start with pandas.

Learn Python by Building Data Science Applications

2019-08-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Philipp Kats , David Katz

AI/ML CI/CD Data Science Matplotlib NumPy Python Scikit-learn data data-science

Learn Python by Building Data Science Applications takes a hands-on approach to teaching Python programming by guiding you through building engaging real-world data science projects. This book introduces Python's rich ecosystem and equips you with the skills to analyze data, train models, and deploy them as efficient applications. What this Book will help me do Get proficient in Python programming by learning core topics like data structures, loops, and functions. Explore data science libraries such as NumPy, Pandas, and scikit-learn to analyze and process data. Learn to create visualizations with Matplotlib and Altair, simplifying data communication. Build and deploy machine learning models using Python and share them as web services. Understand development practices such as testing, packaging, and continuous integration for professional workflows. Author(s) None Kats and None Katz are seasoned Python developers with years of experience in teaching programming and deploying data science applications. Their expertise spans providing learners with practical knowledge and versatile skills. They combine clear explanations with engaging projects to ensure a rewarding learning experience. Who is it for? This book is ideal for individuals new to programming or data science who want to learn Python through practical projects. Researchers, analysts, and ambitious students with minimal coding background but a keen interest in data analysis and application development will find this book beneficial. It's a perfect choice for anyone eager to explore and leverage Python for real-world solutions.

talk-data.com

Activity Trend

Top Events

Top Speakers

Pandas Brain Teasers

Data Science Projects with Python - Second Edition

Apache Airflow and Ray: Orchestrating ML at Scale

Customizing Xcom to enhance data sharing between tasks

Lessons Learned From The Pipeline Data Engineering Academy

Hands-On Data Analysis with Pandas - Second Edition

Cleaning Data for Effective Data Science

Python for Algorithmic Trading

Practical Python Data Visualization: A Fast Track Approach To Learning Data Visualization With Python

The Data Science Workshop - Second Edition

The Data Wrangling Workshop - Second Edition

The Data Visualization Workshop

The Applied Data Science Workshop - Second Edition

Thinking in Pandas: How to Use the Python Data Analysis Library the Right Way

Interactive Data Visualization with Python - Second Edition

Pandas 1.x Cookbook - Second Edition

The Data Science Workshop

Mining Social Media

Mastering pandas - Second Edition

Learn Python by Building Data Science Applications