Python

Effective Data Analysis

2025-03-06 · O'Reilly Data Science Books O'Reilly Amazon

book

by Mona Khalil

Analytics Analytics Engineering Fivetran SQL data data-science

Learn the technical and soft skills you need to succeed in your career as a data analyst. You’ve learned how to use Python, R, SQL, and the statistical skills needed to get started as a data analyst—so, what’s next? Effective Data Analysis bridges the gap between foundational skills and real-world application. This book provides clear, actionable guidance on transforming business questions into impactful data projects, ensuring you’re tracking the right metrics, and equipping you with a modern data analyst’s essential toolbox. In Effective Data Analysis, you’ll gain the skills needed to excel as a data analyst, including: Maximizing the impact of your analytics projects and deliverables Identifying and leveraging data sources to enhance organizational insights Mastering statistical tests, understanding their strengths, limitations, and when to use them Overcoming the challenges and caveats at every stage of an analytics project Applying your expertise across a variety of domains with confidence Effective Data Analysis is full of sage advice on how to be an effective data analyst in a real production environment. Inside, you’ll find methods that enhance the value of your work—from choosing the right analysis approach, to developing a data-informed organizational culture. About the Technology Data analysts need top-notch knowledge of statistics and programming. They also need to manage clueless stakeholders, navigate messy problems, and advocate for resources. This unique book covers the essential technical topics and soft skills you need to be effective in the real world. About the Book Effective Data Analysis helps you lock down those skills along with unfiltered insight into what the job really looks like. You’ll build out your technical toolbox with tips for defining metrics, testing code, automation, sourcing data, and more. Along the way, you’ll learn to handle the human side of data analysis, including how to turn vague requirements into efficient data pipelines. And you’re sure to love author Mona Khalil’s illustrations, industry examples, and a friendly writing style. What's Inside Identify and incorporate external data Communicate with non-technical stakeholders Apply and interpret statistical tests Techniques to approach any business problem About the Reader Written for early-career data analysts, but useful for all. About the Author Mona Khalil is the Senior Manager of Analytics Engineering at Justworks. Quotes Your roadmap to becoming a standout data analyst! An intriguing blend of technical expertise and practical wisdom. - Chester Ismay, MATE Seminars A thoughtful guide to delivering real-world data analysis. It will be an eye-opening read for all data professionals! - David Lee, Justworks Inc. Compelling insights into the relationship between organizations and data. The real-life examples will help you excel in your data career. - Jeremy Moulton, Greenhouse Mona’s wide range of experience shines in her thoughtful, relevant examples. - Jessica Cherny, Fivetran

Hands-On APIs for AI and Data Science

2025-03-05 · O'Reilly Data Science Books O'Reilly Amazon

book

by Ryan Day

AI/ML API Cloud Computing Data Science GenAI LLM data data-science

Are you ready to grow your skills in AI and data science? A great place to start is learning to build and use APIs in real-world data and AI projects. API skills have become essential for AI and data science success, because they are used in a variety of ways in these fields. With this practical book, data scientists and software developers will gain hands-on experience developing and using APIs with the Python programming language and popular frameworks like FastAPI and StreamLit. As you complete the chapters in the book, you'll be creating portfolio projects that teach you how to: Design APIs that data scientists and AIs love Develop APIs using Python and FastAPI Deploy APIs using multiple cloud providers Create data science projects such as visualizations and models using APIs as a data source Access APIs using generative AI and LLMs

150: 9 Huge LIES About Becoming a Data Analyst Nobody Talks About

2025-03-04 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Analytics Computer Science Data Analytics

In this episode, I uncover the nine biggest LIES about landing a data job. Maybe what's stopping you from pursuing a data career is just a big lie. No College Degree As A Data Analyst YT Playlist: https://www.youtube.com/playlist?list=PLo0oTKi2fPNjHi6iXT3Pu68kUmiT-xDWs Don’t Learn Python as a Data Analyst (Learn This Instead): https://www.youtube.com/watch?v=VVhURHXMSlA 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 00:00 Introduction 00:05 You Need a Computer Science or Math Degree 01:20 You Have to Be Good at Math and Statistics 03:00 You Must Know Everything About Data Analytics 04:27 Certifications Matter 05:35 Skills Are Enough 07:20 AI Will Take Your Job 09:24 You'll Spend 80% of Your Time Cleaning Data 10:08 Data Titles 11:44 There Are Lots of Remote Jobs 13:17 The "Self-Taught" Data Analyst 🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

#81 AI Code Assistants: The Good, The Bad & The Overhyped, plus Python’s UV Glow-Up & Postman’s Existential Crisis

2025-02-27 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

AI/ML GitHub LLM

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. This week, we dive into the latest in AI-assisted coding, software quality, and the ongoing debate on whether LLMs will replace developers—or just make their lives easier: My LLM Codegen workflow atm: A deep dive into using LLMs for coding, including structured workflows, tool recommendations, and the fine line between automation and chaos.Cline & Cursor: Exploring VSCode extensions and AI-powered coding tools that aim to supercharge development—but are they game-changers or just fancy autocomplete?To avoid being replaced by LLMs, do what they can’t: A thought-provoking take on the future of programming, the value of human intuition, and how to stay ahead in an AI-driven world.The wired brain: Why we should stop using glowing-brain stock images to talk about AI—and what that says about how we understand machine intelligence.A year of uv: Reflecting on a year of UV, the rising star of Python package managers. Should you switch? Maybe. Probably.Posting: A look at a fun GitHub project that makes sharing online a little more structured.Software Quality: AI may generate code, but does it generate good code? A discussion on testing, maintainability, and avoiding spaghetti.movingWithTheTimes: A bit of programmer humor to lighten the mood—because tech discussions need memes too.

The Future of Data Engineering: AI, LLMs, and Automation

2025-02-26 · Data Engineering Podcast Listen

podcast_episode

by Gleb Mezhanskiy (Datafold) , Tobias Macey

AI/ML Data Engineering Data Management Datafold LLM Modern Data Stack SQL

Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large language models (LLMs) to enhance productivity and reduce manual toil. The conversation covers the potential of AI to transform data engineering tasks, such as text-to-SQL interfaces and creating semantic graphs to improve data accessibility, and explores practical applications of LLMs in automating code reviews, testing, and understanding data lineage.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy about Interview IntroductionHow did you get involved in the area of data management?modern data stack is deadwhere is AI in the data stack?"buy our tool to ship AI"opportunities for LLM in DE workflowContact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DatafoldCopilotCursor IDEAI AgentsDataChatAI Engineering Podcast EpisodeMetrics LayerEmacsLangChainLangGraphCrewAIThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Fine-tuning a Pre-trained Language Model

2025-02-23 · Pré-entrainement et finetuning des LLMs à partir de zéro (from scratch)

talk

PyTorch transformers

Session on 2025-02-23 focusing on fine-tuning techniques for a pre-trained model, in line with the series described in the description.

From scratch: Building a Language Model

2025-02-23 · Pré-entrainement et finetuning des LLMs à partir de zéro (from scratch)

talk

object-oriented programming

Session on 2025-02-23 focusing on the complete creation of a language model from zero, based on 'Building a Large Language Model from Scratch' by Sebastian Raschka. The session is chapter-based and alternates between theoretical and practical components.

Breakout Session 1 - Telling a Good Story with Data

2025-02-21 · Microsoft DevConnect

Breakout Session

Matplotlib Pandas Plotly Seaborn jupyter notebook vs code

Master data storytelling with Python using Pandas, Matplotlib, Seaborn, and Plotly. Gain hands-on insights into data analysis and visualization with Jupyter Notebook in VS Code.

Evolving Responsibilities in AI Data Management

2025-02-16 · Data Engineering Podcast Listen

podcast_episode

by Bartosz Mikulski , Tobias Macey

AI/ML BI Data Engineering Data Management Data Modelling Datafold DWH GenAI MLOps RAG Vector DB

Summary In this episode of the Data Engineering Podcast Bartosz Mikulski talks about preparing data for AI applications. Bartosz shares his journey from data engineering to MLOps and emphasizes the importance of data testing over software development in AI contexts. He discusses the types of data assets required for AI applications, including extensive test datasets, especially in generative AI, and explains the differences in data requirements for various AI application styles. The conversation also explores the skills data engineers need to transition into AI, such as familiarity with vector databases and new data modeling strategies, and highlights the challenges of evolving AI applications, including frequent reprocessing of data when changing chunking strategies or embedding models.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Bartosz Mikulski about how to prepare data for use in AI applicationsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining some of the main categories of data assets that are needed for AI applications?How does the nature of the application change those requirements? (e.g. RAG app vs. agent, etc.)How do the different assets map to the stages of the application lifecycle?What are some of the common roles and divisions of responsibility that you see in the construction and operation of a "typical" AI application?For data engineers who are used to data warehousing/BI, what are the skills that map to AI apps?What are some of the data modeling patterns that are needed to support AI apps?chunking strategies metadata managementWhat are the new categories of data that data engineers need to manage in the context of AI applications?agent memory generation/evolution conversation history managementdata collection for fine tuningWhat are some of the notable evolutions in the space of AI applications and their patterns that have happened in the past ~1-2 years that relate to the responsibilities of data engineers?What are some of the skills gaps that teams should be aware of and identify training opportunities for?What are the most interesting, innovative, or unexpected ways that you have seen data teams address the needs of AI applications?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI applications and their reliance on data?What are some of the emerging trends that you are paying particular attention to?Contact Info WebsiteLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SparkRayChunking StrategiesHypothetical document embeddingsModel Fine TuningPrompt CompressionThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Learning LangChain

2025-02-14 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Nuno Campos , Mayo Oshin

AI/ML API Databricks JavaScript LLM RAG ai-ml artificial-intelligence-ai data langchain large-language-models-llms

If you're looking to build production-ready AI applications that can reason and retrieve external data for context-awareness, you'll need to master--;a popular development framework and platform for building, running, and managing agentic applications. LangChain is used by several leading companies, including Zapier, Replit, Databricks, and many more. This guide is an indispensable resource for developers who understand Python or JavaScript but are beginners eager to harness the power of AI. Authors Mayo Oshin and Nuno Campos demystify the use of LangChain through practical insights and in-depth tutorials. Starting with basic concepts, this book shows you step-by-step how to build a production-ready AI agent that uses your data. Harness the power of retrieval-augmented generation (RAG) to enhance the accuracy of LLMs using external up-to-date data Develop and deploy AI applications that interact intelligently and contextually with users Make use of the powerful agent architecture with LangGraph Integrate and manage third-party APIs and tools to extend the functionality of your AI applications Monitor, test, and evaluate your AI applications to improve performance Understand the foundations of LLM app development and how they can be used with LangChain

#79 The $6 AI Model? France’s $85B Bet, DeepSeek's Censorship & The Python Upgrades You Need

2025-02-13 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

by Charlie Marsh (Astral)

AI/ML LLM

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. DataTopics Unpluggedis your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don’t), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! This week, we break down some of the biggest developments in AI, investments, and automation: France’s AI Boom: $85 billion in investments – A look at how a mix of international and domestic funds is fueling France’s AI ecosystem, and why Mistral AI might be Europe's best shot at competing with OpenAI.Anthropic’s AI Job Index: Who’s using AI at work? – A deep dive into the latest report on how AI is being used in different industries, from software development to education, and the surprising ways automation is creeping into unexpected jobs.The $6 AI Model: How low can costs go? – Researchers have managed to create a reasoning model for just $6. We unpack how they pulled it off and what this means for the AI landscape.AI Censorship & Model Distillation: What’s really going on? – A discussion on recent claims that certain AI models come with baked-in censorship, and whether fine-tuning is playing a bigger role than we think.PromptLayer’s No-Code AI Tools – Are no-code AI development platforms the next big thing?Predicted Outputs: OpenAI’s approach to efficient code editing – A look at how OpenAI’s "Predicted Outputs" feature could make AI-assisted coding more efficient.MacOS System Monitoring & Dev Tooling: The geeky stuff – A breakdown of system monitoring tools for Mac users who love to keep an eye on every process running in the background.Snapshot Testing with Birdie – Exploring the concept of snapshot testing beyond UI testing and into function outputs.BeeWare & the Python Ecosystem – A look at how BeeWare is helping Python developers build cross-platform applications.Astral, Ruff, and UV: Python’s performance evolution – The latest from Charlie Marsh on the tools shaping Python development.

AI Agents in Action

2025-02-12 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Micheal Lanham (Brilliant Harvest)

AI/ML API LLM ai-ml artificial-intelligence-ai artificial intelligence (ai) data

Create LLM-powered autonomous agents and intelligent assistants tailored to your business and personal needs. From script-free customer service chatbots to fully independent agents operating seamlessly in the background, AI-powered assistants represent a breakthrough in machine intelligence. In AI Agents in Action, you'll master a proven framework for developing practical agents that handle real-world business and personal tasks. Author Micheal Lanham combines cutting-edge academic research with hands-on experience to help you: Understand and implement AI agent behavior patterns Design and deploy production-ready intelligent agents Leverage the OpenAI Assistants API and complementary tools Implement robust knowledge management and memory systems Create self-improving agents with feedback loops Orchestrate collaborative multi-agent systems Enhance agents with speech and vision capabilities You won't find toy examples or fragile assistants that require constant supervision. AI Agents in Action teaches you to build trustworthy AI capable of handling high-stakes negotiations. You'll master prompt engineering to create agents with distinct personas and profiles, and develop multi-agent collaborations that thrive in unpredictable environments. Beyond just learning a new technology, you'll discover a transformative approach to problem-solving. About the Technology Most production AI systems require many orchestrated interactions between the user, AI models, and a wide variety of data sources. AI agents capture and organize these interactions into autonomous components that can process information, make decisions, and learn from interactions behind the scenes. This book will show you how to create AI agents and connect them together into powerful multi-agent systems. About the Book In AI Agents in Action, you’ll learn how to build production-ready assistants, multi-agent systems, and behavioral agents. You’ll master the essential parts of an agent, including retrieval-augmented knowledge and memory, while you create multi-agent applications that can use software tools, plan tasks autonomously, and learn from experience. As you explore the many interesting examples, you’ll work with state-of-the-art tools like OpenAI Assistants API, GPT Nexus, LangChain, Prompt Flow, AutoGen, and CrewAI. What's Inside Knowledge management and memory systems Feedback loops for continuous agent learning Collaborative multi-agent systems Speech and computer vision About the Reader For intermediate Python programmers. About the Author Micheal Lanham is a software and technology innovator with over 20 years of industry experience. He has authored books on deep learning, including Manning’s Evolutionary Deep Learning. Quotes This is about to become the hottest area of applied AI. Get a head start with this book! - Richard Davies, author of Prompt Engineering in Practice Couldn’t put this book down! It’s so comprehensive and clear that I felt like I was learning from a master teacher. - Radhika Kanubaddhi, Amazon An enlightening journey! This book transformed my questions into answers. - Jose San Leandro, ACM-SL Expertly guides through creating agent profiles, using tools, memory, planning, and multi-agent systems. Couldn’t be more timely! - Grigory Sapunov author of JAX in Action

Machine Learning for Tabular Data

2025-02-12 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Luca Massaron , Mark Ryan (Google)

AI/ML Cloud Computing GenAI Keras LLM Marketing MLOps PyTorch TensorFlow ai-ml data machine-learning

Business runs on tabular data in databases, spreadsheets, and logs. Crunch that data using deep learning, gradient boosting, and other machine learning techniques. Machine Learning for Tabular Data teaches you to train insightful machine learning models on common tabular business data sources such as spreadsheets, databases, and logs. You’ll discover how to use XGBoost and LightGBM on tabular data, optimize deep learning libraries like TensorFlow and PyTorch for tabular data, and use cloud tools like Vertex AI to create an automated MLOps pipeline. Machine Learning for Tabular Data will teach you how to: Pick the right machine learning approach for your data Apply deep learning to tabular data Deploy tabular machine learning locally and in the cloud Pipelines to automatically train and maintain a model Machine Learning for Tabular Data covers classic machine learning techniques like gradient boosting, and more contemporary deep learning approaches. By the time you’re finished, you’ll be equipped with the skills to apply machine learning to the kinds of data you work with every day. About the Technology Machine learning can accelerate everyday business chores like account reconciliation, demand forecasting, and customer service automation—not to mention more exotic challenges like fraud detection, predictive maintenance, and personalized marketing. This book shows you how to unlock the vital information stored in spreadsheets, ledgers, databases and other tabular data sources using gradient boosting, deep learning, and generative AI. About the Book Machine Learning for Tabular Data delivers practical ML techniques to upgrade every stage of the business data analysis pipeline. In it, you’ll explore examples like using XGBoost and Keras to predict short-term rental prices, deploying a local ML model with Python and Flask, and streamlining workflows using large language models (LLMs). Along the way, you’ll learn to make your models both more powerful and more explainable. What's Inside Master XGBoost Apply deep learning to tabular data Deploy models locally and in the cloud Build pipelines to train and maintain models About the Reader For readers experienced with Python and the basics of machine learning. About the Authors Mark Ryan is the AI Lead of the Developer Knowledge Platform at Google. A three-time Kaggle Grandmaster, Luca Massaron is a Google Developer Expert (GDE) in machine learning and AI. He has published 17 other books. Quotes

An in-depth look at Data Prep Kit

2025-02-06 · AI Alliance: An in-depth look at Data Prep Kit

talk

by Sujee Maniyam (IBM / The AI Alliance)

data prep kit open-source

A detailed look at Data Prep Kit, its features and usage.

The Well-Grounded Data Analyst

2025-02-06 · O'Reilly Data Science Books O'Reilly Amazon

book

by David Asboth

Data Modelling Data Science IBM data data-science

Complete eight data science projects that lock in important real-world skills—along with a practical process you can use to learn any new technique quickly and efficiently. Data analysts need to be problem solvers—and The Well-Grounded Data Analyst will teach you how to solve the most common problems you'll face in industry. You'll explore eight scenarios that your class or bootcamp won’t have covered, so you can accomplish what your boss is asking for. In The Well-Grounded Data Analyst you'll learn: High-value skills to tackle specific analytical problems Deconstructing problems for faster, practical solutions Data modeling, PDF data extraction, and categorical data manipulation Handling vague metrics, deciphering inherited projects, and defining customer records The Well-Grounded Data Analyst is for junior and early-career data analysts looking to supplement their foundational data skills with real-world problem solving. As you explore each project, you'll also master a proven process for quickly learning new skills developed by author and Half Stack Data Science podcast host David Asboth. You'll learn how to determine a minimum viable answer for your stakeholders, identify and obtain the data you need to deliver, and reliably present and iterate on your findings. The book can be read cover-to-cover or opened to the chapter most relevant to your current challenges. About the Technology Real world data analysis is messy. Success requires tackling challenges like unreliable data sources, ambiguous requests, and incompatible formats—often with limited guidance. This book goes beyond the clean, structured examples you see in classrooms and bootcamps, offering a step-by-step framework you can use to confidently solve any data analysis problem like a pro. About the Book The Well-Grounded Data Analyst introduces you to eight scenarios that every data analyst is bound to face. You’ll practice author David Asboth’s results-oriented approach as you model data by identifying customer records, navigate poorly-defined metrics, extract data from PDFs, and much more! It also teaches you how to take over incomplete projects and create rapid prototypes with real data. Along the way, you’ll build an impressive portfolio of projects you can showcase at your next interview. What's Inside Deconstructing problems Handling vague metrics Data modeling Categorical data manipulation About the Reader For early-career data scientists. About the Author David Asboth is a data generalist educator, and software architect. He co-hosts the Half Stack Data Science podcast. Quotes Well reasoned and well written, with approaches to solve many sorts of data analysis problems. - Naomi Ceder, Fellow of the Python Software Foundation An excellent resource for any aspiring data scientist! - Andrew R. Freed, IBM David’s clear and repeatable framework will give you confidence to tackle open-ended stakeholder requests and reach an answer much faster! - Shaun McGirr, DevOn Software Services A book version of shadowing a senior data analyst while they explain handling frequent data problems at work, including all the ugly gotchas. - Randy Au, Google

Scaling Semantic Segmentation with Blender

2025-01-30 · Jan 30 - AI, Machine Learning and Computer Vision Meetup

talk

by Vincent Vandenbussche

blender synthetic data

Generating datasets for semantic segmentation can be time-intensive. Learn how to use Blender's Python API to create diverse and realistic synthetic data with automated labels, saving time and improving model performance. Preview the topics to be discussed in this Medium post.

Getting Started with Docling and Data Prep Kit

2025-01-30 · AI Alliance: Getting Started with Docling and Data Prep Kit

talk

by Sujee Maniyam (IBM / The AI Alliance)

data prep kit docling

In this talk, I will introduce the capabilities of Data Prep Kit and Docling, walk you through their key features, and demonstrate how to get started with these powerful tools to streamline your data preparation workflows.

Causal Inference for Data Science

2025-01-30 · O'Reilly Data Science Books O'Reilly Amazon

book

by Aleix Ruiz de Villa

AI/ML Data Science data data-science

When you know the cause of an event, you can affect its outcome. This accessible introduction to causal inference shows you how to determine causality and estimate effects using statistics and machine learning. A/B tests or randomized controlled trials are expensive and often unfeasible in a business environment. Causal Inference for Data Science reveals the techniques and methodologies you can use to identify causes from data, even when no experiment or test has been performed. In Causal Inference for Data Science you will learn how to: Model reality using causal graphs Estimate causal effects using statistical and machine learning techniques Determine when to use A/B tests, causal inference, and machine learning Explain and assess objectives, assumptions, risks, and limitations Determine if you have enough variables for your analysis It’s possible to predict events without knowing what causes them. Understanding causality allows you both to make data-driven predictions and also intervene to affect the outcomes. Causal Inference for Data Science shows you how to build data science tools that can identify the root cause of trends and events. You’ll learn how to interpret historical data, understand customer behaviors, and empower management to apply optimal decisions. About the Technology Why did you get a particular result? What would have lead to a different outcome? These are the essential questions of causal inference. This powerful methodology improves your decisions by connecting cause and effect—even when you can’t run experiments, A/B tests, or expensive controlled trials. About the Book Causal Inference for Data Science introduces techniques to apply causal reasoning to ordinary business scenarios. And with this clearly-written, practical guide, you won’t need advanced statistics or high-level math to put causal inference into practice! By applying a simple approach based on Directed Acyclic Graphs (DAGs), you’ll learn to assess advertising performance, pick productive health treatments, deliver effective product pricing, and more. What's Inside When to use A/B tests, causal inference, and ML Assess objectives, assumptions, risks, and limitations Apply causal inference to real business data About the Reader For data scientists, ML engineers, and statisticians. About the Author Aleix Ruiz de Villa Robert is a data scientist with a PhD in mathematical analysis from the Universitat Autònoma de Barcelona. Quotes With intuitive explanations, application-focused insights, and real-world examples, this book offers immense practical value. - Philipp Bach, Maintainer of the DoubleML libraries for Python and R An essential guide for navigating the complexities of real-world data analysis. - Adi Shavit, SWAPP A must-read! Demystifies causal inference with a blend of theory and practice. - Karan Gupta, SunPower Corporation Causal relationships can mask and distort results. This book provides a set of tools to extract insights correctly. - Peter V. Henstock, Harvard Extension

Machine Learning Algorithms in Depth

2025-01-27 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Vadim Smolyakov

AI/ML LLM Microsoft Monte Carlo NLP Cyber Security ai-ml data machine-learning

Learn how machine learning algorithms work from the ground up so you can effectively troubleshoot your models and improve their performance. Fully understanding how machine learning algorithms function is essential for any serious ML engineer. In Machine Learning Algorithms in Depth you’ll explore practical implementations of dozens of ML algorithms including: Monte Carlo Stock Price Simulation Image Denoising using Mean-Field Variational Inference EM algorithm for Hidden Markov Models Imbalanced Learning, Active Learning and Ensemble Learning Bayesian Optimization for Hyperparameter Tuning Dirichlet Process K-Means for Clustering Applications Stock Clusters based on Inverse Covariance Estimation Energy Minimization using Simulated Annealing Image Search based on ResNet Convolutional Neural Network Anomaly Detection in Time-Series using Variational Autoencoders Machine Learning Algorithms in Depth dives into the design and underlying principles of some of the most exciting machine learning (ML) algorithms in the world today. With a particular emphasis on probabilistic algorithms, you’ll learn the fundamentals of Bayesian inference and deep learning. You’ll also explore the core data structures and algorithmic paradigms for machine learning. Each algorithm is fully explored with both math and practical implementations so you can see how they work and how they’re put into action. About the Technology Learn how machine learning algorithms work from the ground up so you can effectively troubleshoot your models and improve their performance. This book guides you from the core mathematical foundations of the most important ML algorithms to their Python implementations, with a particular focus on probability-based methods. About the Book Machine Learning Algorithms in Depth dissects and explains dozens of algorithms across a variety of applications, including finance, computer vision, and NLP. Each algorithm is mathematically derived, followed by its hands-on Python implementation along with insightful code annotations and informative graphics. You’ll especially appreciate author Vadim Smolyakov’s clear interpretations of Bayesian algorithms for Monte Carlo and Markov models. What's Inside Monte Carlo stock price simulation EM algorithm for hidden Markov models Imbalanced learning, active learning, and ensemble learning Bayesian optimization for hyperparameter tuning Anomaly detection in time-series About the Reader For machine learning practitioners familiar with linear algebra, probability, and basic calculus. About the Author Vadim Smolyakov is a data scientist in the Enterprise & Security DI R&D team at Microsoft. Quotes I love this book! It shows you how to implement common ML algorithms in plain Python with only the essential libraries, so you can see how the computation and math works in practice. - Junpeng Lao, Senior Data Scientist at Google I highly recommend this book. In the era of ChatGPT real knowledge of algorithms is invaluable. - Vatsal Desai, InfoDesk Explains algorithms so well that even a novice can digest it. - Harsh Raval, Zymr

Statistical Quantitative Methods in Finance: From Theory to Quantitative Portfolio Management

2025-01-22 · O'Reilly Data Science Books O'Reilly Amazon

book

by Samit Ahlawat

AI/ML Data Science data data-science data-science-tasks statistics

Statistical quantitative methods are vital for financial valuation models and benchmarking machine learning models in finance. This book explores the theoretical foundations of statistical models, from ordinary least squares (OLS) to the generalized method of moments (GMM) used in econometrics. It enriches your understanding through practical examples drawn from applied finance, demonstrating the real-world applications of these concepts. Additionally, the book delves into non-linear methods and Bayesian approaches, which are becoming increasingly popular among practitioners thanks to advancements in computational resources. By mastering these topics, you will be equipped to build foundational models crucial for applied data science, a skill highly sought after by software engineering and asset management firms. The book also offers valuable insights into quantitative portfolio management, showcasing how traditional data science tools can be enhanced with machine learning models. These enhancements are illustrated through real-world examples from finance and econometrics, accompanied by Python code. This practical approach ensures that you can apply what you learn, gaining proficiency in the statsmodels library and becoming adept at designing, implementing, and calibrating your models. By understanding and applying these statistical models, you enhance your data science skills and effectively tackle financial challenges. What You Will Learn Understand the fundamentals of linear regression and its applications in financial data analysis and prediction Apply generalized linear models for handling various types of data distributions and enhancing model flexibility Gain insights into regime switching models to capture different market conditions and improve financial forecasting Benchmark machine learning models against traditional statistical methods to ensure robustness and reliability in financial applications Who This Book Is For Data scientists, machine learning engineers, finance professionals, and software engineers

talk-data.com

Activity Trend

Top Events

Top Speakers

Effective Data Analysis

Hands-On APIs for AI and Data Science

150: 9 Huge LIES About Becoming a Data Analyst Nobody Talks About

#81 AI Code Assistants: The Good, The Bad & The Overhyped, plus Python’s UV Glow-Up & Postman’s Existential Crisis

The Future of Data Engineering: AI, LLMs, and Automation

Fine-tuning a Pre-trained Language Model

From scratch: Building a Language Model

Breakout Session 1 - Telling a Good Story with Data

Evolving Responsibilities in AI Data Management

Learning LangChain

#79 The $6 AI Model? France’s $85B Bet, DeepSeek's Censorship & The Python Upgrades You Need

AI Agents in Action

Machine Learning for Tabular Data

An in-depth look at Data Prep Kit

The Well-Grounded Data Analyst

Scaling Semantic Segmentation with Blender

Getting Started with Docling and Data Prep Kit

Causal Inference for Data Science

Machine Learning Algorithms in Depth

Statistical Quantitative Methods in Finance: From Theory to Quantitative Portfolio Management