talk-data.com talk-data.com

Topic

Python

programming_language data_science web_development

1446

tagged

Activity Trend

185 peak/qtr
2020-Q1 2026-Q1

Activities

1446 activities · Newest first

My guest in this episode is Coert du Plessis, an impressive data and analytics executive, entrepreneur and general lover of life. Coert shares his wealth of knowledge and experience gained through a career and life full of interesting twists and turns. In this wide-ranging conversation, we talk about: Coert’s journey from South African farmland to Australian board roomsHow Coert became the CEO of MaxMineWhy our ability to tackle climate change depends on the mining industryHow to build and sell successful data productsCoert’s approach to building a fulfilling and rewarding career in data and analyticsThe importance of taking risks and running life experiments, and much more.Coert on LinkedIn: https://www.linkedin.com/in/coertdup/ My new book, 'Data-Centric Machine Learning with Python': https://www.packtpub.com/product/data-centric-machine-learning-with-python/9781804618127

Hands-on learning on building and evaluating generative AI solutions with LLMs responsibly at scale. Learn to create visual executable flows linking LLMs, vector embeddings, prompts, and Python tools; evaluate performance metrics and responsible AI issues such as groundedness, hallucinations, and relevance. Pre-requisites: Basic understanding of Python.

Atelier en ligne présentant les bases de la programmation en Python et l’analyse de données avec Jupyter Notebook et Pandas, en utilisant les jeux de données Airbnb. L’atelier couvre les bases de Python, les bases de Pandas, la visualisation de données et comment sourcer et analyser les jeux de données Airbnb. Accès à la plateforme d'apprentissage du Wagon à l’issue du cours.

Cracking the Data Science Interview

"Cracking the Data Science Interview" is your ultimate resource for preparing for roles in the competitive field of data science. With this book, you'll explore essential topics such as Python, SQL, statistics, and machine learning, as well as learn practical skills for building portfolios and acing interviews. Follow its guidance and you'll be equipped to stand out in any data science interview. What this Book will help me do Confidently explain complex statistical and machine learning concepts. Develop models and deploy them while ensuring version control and efficiency. Learn and apply scripting skills in shell and Bash for productivity. Master Git workflows to handle collaborative coding in projects. Perfectly tailor portfolios and resumes to land data science opportunities. Author(s) Leondra R. Gonzalez, with years of data science and mentorship experience, co-authors this book with None Stubberfield, a seasoned expert in technology and machine learning. Together, they integrate their expertise to provide practical advice for navigating the data science job market. Who is it for? If you're preparing for data science interviews, this book is for you. It's ideal for candidates with a foundational knowledge of Python, SQL, and statistics looking to refine and expand their technical and professional skills. Professionals transitioning into data science will also find it invaluable for building confidence and succeeding in this rewarding field.

A fireside chat between Hugo and Simon Willison exploring LLMs, GenAI, and democratizing data tools. They discuss what LLMs are capable of, the evolving ecosystem, running LLMs locally, and how Unix philosophy, Python, and LLMs can be combined into a productivity toolkit. Includes a live coding intro to Simon’s LLM CLI utility and Python library.

Kyle Gallatin is currently a Senior Machine Learning (ML) Engineer at Handshake. As a previous Data Scientist and Software Engineer, Kyle has extensive experience engineering data and ML model features, building ML models and ML pipelines, and deploying ML models to production. Kyle is also the author of the O’Reilly report: The Framework for ML Governance, 2nd author of O’Reilly’s Machine Learning in Python Cookbook, an instructor at New York City Data Science Academy, and frequent publisher of other topics in ML across multiple publications.

Esta sesión será en español. Acerca de esta sesión: En esta última charla del ciclo, veremos la introducción a Microsoft Azure Quantum Development Kit (QDK), la configuración del entorno de programación con VS Code, la exploración del entorno, la creación de un entorno de trabajo de Azure Quantum y nuestros primeros programas cuánticos con ejemplos base. Se presentarán Q# y su integración con Python y Qiskit, así como Quantum Katas para seguir la ruta de aprendizaje.

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In episode #37, titled "Text2Video with Sora and Blazing-Fast Python Packaging," we're joined by guests Vitala Sparacello and Lukas Valatka to navigate the latest tech frontiers. Here's what's on the agenda: AMD's Open-Source Endeavor: A peek into AMD's quietly funded drop-in CUDA implementation built on ROCm.Chat with RTX: Nvidia's local AI chatbot promises a revolution in PC interactions.uv - Python's Speedy Installer: Exploring why uv's lightning-fast package installation is a game-changer for the Python community.SORA Unveiled: Dive into OpenAI's groundbreaking text to video model and what it means for content creation.GPT-5's Leap: Insights into how GPT-5 is outperforming its predecessors "across the board."Google's Gemini 1.5: A look at how Google's latest model scales up to 1 million tokens.Apple's AI Ambitions: Rumors of AI updates to Spotlight and Xcode could reshape developer experiences.The AI Copyright Conundrum: Discussing how copyright lawsuits could potentially upend the AI industry.Intro music courtesy of fesliyanstudios.com.

Web Scraping with Python, 3rd Edition

If programming is magic, then web scraping is surely a form of wizardry. By writing a simple automated program, you can query web servers, request data, and parse it to extract the information you need. This thoroughly updated third edition not only introduces you to web scraping but also serves as a comprehensive guide to scraping almost every type of data from the modern web. Part I focuses on web scraping mechanics: using Python to request information from a web server, performing basic handling of the server's response, and interacting with sites in an automated fashion. Part II explores a variety of more specific tools and applications to fit any web scraping scenario you're likely to encounter. Parse complicated HTML pages Develop crawlers with the Scrapy framework Learn methods to store the data you scrape Read and extract data from documents Clean and normalize badly formatted data Read and write natural languages Crawl through forms and logins Scrape JavaScript and crawl through APIs Use and write image-to-text software Avoid scraping traps and bot blockers Use scrapers to test your website

Summary

Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Your host is Tobias Macey and today I'm interviewing Andy Jefferson about how to solve the problem of data sharing

Interview

Introduction How did you get involved in the area of data management? Can you start by giving some context and scope of what we mean by "data sharing" for the purposes of this conversation? What is the current state of the ecosystem for data sharing protocols/practices/platforms?

What are some of the main challenges/shortcomings that teams/organizations experience with these options?

What are the technical capabilities that need to be present for an effective data sharing solution?

How does that change as a function of the type of data? (e.g. tabular, image, etc.)

What are the requirements around governance and auditability of data access that need to be addressed when sharing data? What are the typical boundaries along which data access requires special consideration for how the sharing is managed? Many data platform vendors have their own interfaces for data sharing. What are the shortcomings of those options, and what are the opportunities for abstracting the sharing capability from the underlying platform? What are the most interesting, innovative, or unexpected ways that you have seen data sharing/Bobsled used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data sharing? When is Bobsled the wrong choice? What do you have planned for the future of data sharing?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine

Avery talks with Michael Kennedy about the many ways Python is used.

Michael hosts the Talk Python to Me podcast, is an expert in Python, and explains how experts use Python in various fields.

The episode also discusses beginners who want to learn and use Python, including choosing an IDE and focusing on projects.

Connect with Michael Kennedy

🤝 Connect on Linkedin

🐤 Follow on Twitter (X)

Ⓜ️ Follow on Fosstodon

🐍 Learn About TalkPython Podcast

🤝 Ace your data analyst interview with the interview simulator

📩 Get my weekly email with helpful data career tips

📊 Come to my next free “How to Land Your First Data Job” training

🏫 Check out my 10-week data analytics bootcamp

Timestamps:

(11:18) - Python vs Other Programming Languages (17:15) - The Future of Python and Its Applications (32:06) - How the Rockband Weezer uses Python

Connect with Avery:

📺 Subscribe on YouTube

🎙Listen to My Podcast

👔 Connect with me on LinkedIn

📸 Instagram

🎵 TikTok

Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Learn Python the Hard Way: A Deceptively Simple Introduction to the Terrifyingly Beautiful World of Computers and Data Science, 5th Edition

You Will Learn Python! Zed Shaw has created the world's most reliable system for learning Python. Follow it and you will succeed--just like the millions of beginners Zed has taught to date! You bring the discipline, persistence, and attention; the author supplies the masterful knowledge you need to succeed. In Learn Python the Hard Way, Fifth Edition, you'll learn Python by working through 60 lovingly crafted exercises. Read them. Type in the code. Run it. Fix your mistakes. Repeat. As you do, you'll learn how a computer works, how to solve problems, and how to enjoy programming . . . even when it's driving you crazy. Install a complete Python environment Organize and write code Fix and break code Basic mathematics Strings and text Interact with users Work with files Looping and logic Object-oriented programming Data structures using lists and dictionaries Modules, classes, and objects Python packaging Automated testing Basic SQL for Data Science Web scraping Fixing bad data (munging) The "Data" part of "Data Science" It'll be frustrating at first. But if you keep trying, you'll get it--and it'll feel amazing! This course will reward you for every minute you put into it. Soon, you'll know one of the world's most powerful, popular programming languages. You'll be a Python programmer. This Book Is Perfect For Total beginners with zero programming experience Junior developers who know one or two languages Returning professionals who haven't written code in years Aspiring Data Scientists or academics who need to learn to code Seasoned professionals looking for a fast, simple crash course in Python for Data Science Register your book for convenient access to downloads, updates, and/or corrections as they become available. See inside book for details.

Hands-On Entity Resolution

Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs. Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value. With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers: Challenges in deduplicating and joining datasets Extracting, cleansing, and preparing datasets for matching Text matching algorithms to identify equivalent entities Techniques for deduplicating and joining datasets at scale Matching datasets containing persons and organizations Evaluating data matches Optimizing and tuning data matching algorithms Entity resolution using cloud APIs Matching using privacy-enhancing technologies

A 90-minute hands-on workshop on the FiftyOne computer vision toolset. Part 1 covers FiftyOne Basics (terms, architecture, installation, and general usage), an overview of useful workflows to explore, understand, and curate data, and how FiftyOne represents and semantically slices unstructured computer vision data. Part 2 is a hands-on introduction to FiftyOne where you load datasets from the FiftyOne Dataset Zoo, navigate the FiftyOne App, programmatically inspect attributes of a dataset, add new samples and custom attributes, generate and evaluate model predictions, and save insightful views into the data. Prerequisites: working knowledge of Python and basic computer vision.