Hands-on Python masterclass for ages 12-18 featuring the Emoji Master Challenge. Students progress through levels including Level 3 (display a rose emoji 10 times) and Level 5 (conceal a superhero with emojis), starting with a Python introduction and ending with a superhero reveal. The course covers practical Python skills, interactive games with PyGame, and graphics with Turtle, with an emphasis on problem solving and collaboration.
talk-data.com
Topic
Python
1446
tagged
Activity Trend
Top Events
In this episode, we talk with Orell about his journey from electrical engineering to freelancing in data engineering. Exploring lessons from startup life, working with messy industrial data, the realities of freelancing, and how to stay up to date with new tools.
Topics covered: Why Orel left a PhD and a simulation‑focused start‑up after Covid hitWhat he learned trying (and failing) to commercialise medical‑imaging simulationsThe first freelance project and the long, quiet months that followedHow he now finds clients, keeps projects small and delivers value quicklyTypical work he does for industrial companies: parsing messy machine logs, building simple pipelines, adding structure laterFavorite everyday tools (Python, DuckDB, a bit of C++) and the habit of blocking time for learningAdvice for anyone thinking about freelancing: cash runway, networking, and focusing on problems rather than “perfect” tech choices A practical conversation for listeners who are curious about moving from research or permanent roles into freelance data engineering.
🕒 TIMECODES 0:00 Orel’s career and move to freelancing 9:04 Startup experience and data engineering lessons 16:05 Academia vs. startups and starting freelancing 25:33 Early freelancing challenges and networking 34:22 Freelance data engineering and messy industrial data 43:27 Staying practical, learning tools, and growth 50:33 Freelancing challenges and client acquisition 58:37 Tools, problem-solving, and manual work
🔗 CONNECT WITH ORELL
Twitter - https://bsky.app/profile/orgarten.bsk...
LinkedIn - / ogarten
Github - https://github.com/orgarten
Website - https://orellgarten.com
🔗 CONNECT WITH DataTalksClub
Join the community - https://datatalks.club/slack.html
Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/...
Check other upcoming events - https://lu.ma/dtc-events
GitHub: https://github.com/DataTalksClub
LinkedIn - / datatalks-club
Twitter - / datatalksclub
Website - https://datatalks.club/
🔗 CONNECT WITH ALEXEY
Connect with Alexey
Twitter - / al_grigor
Linkedin - / agrigorev
This will be a walkthrough of the modern data platform, the capabilities and the challenges Python is solving and how other tools like Airflow and DBT play a role in the modern data platform.
A hands-on coding session developing Capital Asset Pricing Model using Python
Put statistics into practice with Python! Data-driven decisions rely on statistics. Statistics Every Programmer Needs introduces the statistical and quantitative methods that will help you go beyond “gut feeling” for tasks like predicting stock prices or assessing quality control, with examples using the rich tools of the Python ecosystem. Statistics Every Programmer Needs will teach you how to: Apply foundational and advanced statistical techniques Build predictive models and simulations Optimize decisions under constraints Interpret and validate results with statistical rigor Implement quantitative methods using Python In this hands-on guide, stats expert Gary Sutton blends the theory behind these statistical techniques with practical Python-based applications, offering structured, reproducible, and defensible methods for tackling complex decisions. Well-annotated and reusable Python code listings illustrate each method, with examples you can follow to practice your new skills. About the Technology Whether you’re analyzing application performance metrics, creating relevant dashboards and reports, or immersing yourself in a numbers-heavy coding project, every programmer needs to know how to turn raw data into actionable insight. Statistics and quantitative analysis are the essential tools every programmer needs to clarify uncertainty, optimize outcomes, and make informed choices. About the Book Statistics Every Programmer Needs teaches you how to apply statistics to the everyday problems you’ll face as a software developer. Each chapter is a new tutorial. You’ll predict ultramarathon times using linear regression, forecast stock prices with time series models, analyze system reliability using Markov chains, and much more. The book emphasizes a balance between theory and hands-on Python implementation, with annotated code and real-world examples to ensure practical understanding and adaptability across industries. What's Inside Probability basics and distributions Random variables Regression Decision trees and random forests Time series analysis Linear programming Monte Carlo and Markov methods and much more About the Reader Examples are in Python. About the Author Gary Sutton is a business intelligence and analytics leader and the author of Statistics Slam Dunk: Statistical analysis with R on real NBA data. Quotes A well-organized tour of the statistical, machine learning and optimization tools every data science programmer needs. - Peter Bruce, Author of Statistics for Data Science and Analytics Turns statistics from a stumbling block into a superpower. Clear, relevant, and written with a coder’s mindset! - Mahima Bansod, LogicMonitor Essential! Stats and modeling with an emphasis on real-world system design. - Anupam Samanta, Google A great blend of theory and practice. - Ariel Andres, Scotia Global Asset Management
Summary In this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. He discusses the challenges of traditional Jupyter notebooks, such as hidden states and lack of interactivity, and how Marimo addresses these issues with features like reactive execution and Python-native file formats. Akshay also explores the broader landscape of programmatic notebooks, comparing Marimo to other tools like Jupyter, Streamlit, and Hex, highlighting its unique approach to creating data apps directly from notebooks and eliminating the need for separate app development. The conversation delves into the technical architecture of Marimo, its community-driven development, and future plans, including a commercial offering and enhanced AI integration, emphasizing Marimo's role in bridging the gap between data exploration and production-ready applications.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementTired of data migrations that drag on for months or even years? What if I told you there's a way to cut that timeline by up to 6x while guaranteeing accuracy? Datafold's Migration Agent is the only AI-powered solution that doesn't just translate your code; it validates every single data point to ensure perfect parity between your old and new systems. Whether you're moving from Oracle to Snowflake, migrating stored procedures to dbt, or handling complex multi-system migrations, they deliver production-ready code with a guaranteed timeline and fixed price. Stop burning budget on endless consulting hours. Visit dataengineeringpodcast.com/datafold to book a demo and see how they're turning months-long migration nightmares into week-long success stories.Your host is Tobias Macey and today I'm interviewing Akshay Agrawal about Marimo, a reusable and reproducible Python notebook environmentInterview IntroductionHow did you get involved in the area of data management?Can you describe what Marimo is and the story behind it?What are the core problems and use cases that you are focused on addressing with Marimo?What are you explicitly not trying to solve for with Marimo?Programmatic notebooks have been around for decades now. Jupyter was largely responsible for making them popular outside of academia. How have the applications of notebooks changed in recent years?What are the limitations that have been most challenging to address in production contexts?Jupyter has long had support for multi-language notebooks/notebook kernels. What is your opinion on the utility of that feature as a core concern of the notebook system?Beyond notebooks, Streamlit and Hex have become quite popular for publishing the results of notebook-style analysis. How would you characterize the feature set of Marimo for those use cases?For a typical data team that is working across data pipelines, business analytics, ML/AI engineering, etc. How do you see Marimo applied within and across those contexts?One of the common difficulties with notebooks is that they are largely a single-player experience. They may connect into a shared compute cluster for scaling up execution (e.g. Ray, Dask, etc.). How does Marimo address the situation where a data platform team wants to offer notebooks as a service to reduce the friction to getting started with analyzing data in a warehouse/lakehouse context?How are you seeing teams integrate Marimo with orchestrators (e.g. Dagster, Airflow, Prefect)?What are some of the most interesting or complex engineering challenges that you have had to address while building and evolving Marimo?\What are the most interesting, innovative, or unexpected ways that you have seen Marimo used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Marimo?When is Marimo the wrong choice?What do you have planned for the future of Marimo?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links MarimoJupyterIPythonStreamlitPodcast.init EpisodeVector EmbeddingsDimensionality ReductionKagglePytestPEP 723 script dependency metadataMatLabVisicalcMathematicaRMarkdownRShinyElixir LivebookDatabricks NotebooksPapermillPluto - Julia NotebookHexDirected Acyclic Graph (DAG)Sumble Kaggle founder Anthony Goldblum's startupRayDaskJupytextnbdevDuckDBPodcast EpisodeIcebergSupersetjupyter-marimo-proxyJupyterHubBinderNixAnyWidgetJupyter WidgetsMatplotlibAltairPlotlyDataFusionPolarsMotherDuckThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Abstract: In this talk we will explore the world of imaging in digital pathology and discover gigapixel images and how to look at them. We will learn how deep learning can help predict cancer recurrence in the case of prostate cancer and how models can help pathologists discover new biomarkers.
BYOP session: bring your own project and work with fellow Python and data professionals and enthusiasts.
Building production-ready ML systems is rarely straightforward—especially when predictions must be triggered by real-world events in near real time. In this talk, I’ll walk through how FastAPI and Pydantic can be used to architect an event-driven ML system, where model workflows are orchestrated using message queues and jobs vary in latency and compute requirements. The goal is to show how Python developers can move fast while maintaining control over validation, orchestration, and deployment in complex ML architectures.
Summary In this episode of the Data Engineering Podcast Dan Sotolongo from Snowflake talks about the complexities of incremental data processing in warehouse environments. Dan discusses the challenges of handling continuously evolving datasets and the importance of incremental data processing for optimized resource use and reduced latency. He explains how delayed view semantics can address these challenges by maintaining up-to-date results with minimal work, leveraging Snowflake's dynamic tables feature. The conversation also explores the broader landscape of data processing, comparing batch and streaming systems, and highlights the trade-offs between them. Dan emphasizes the need for a unified theoretical framework to discuss semantic guarantees in data pipelines and introduces the concept of delayed view semantics, touching on the limitations of current systems and the potential of dynamic tables to simplify complex data workflows.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Dan Sotolongo about the challenges of incremental data processing in warehouse environments and how delayed view semantics help to address the problemInterview IntroductionHow did you get involved in the area of data management?Can you start by defining the scope of the term "incremental data processing"?What are some of the common solutions that data engineers build when creating workflows to implement that pattern?What are some common difficulties that they encounter in the pursuit of incremental data?Can you describe what delayed view semantics are and the story behind it?What are the problems that DVS explicitly doesn't address?How does the approach that you have taken in Dynamic View Semantics compare to systems like Materialize, Feldera, etc.Can you describe the technical architecture of the implementation of Dynamic Tables?What are the elements of the problem that are as-yet unsolved?How has the implementation changed/evolved as you learned more about the solution space?What would be involved in implementing the delayed view semantics pattern in other dbms engines?For someone who wants to use DVS/Dyamic Tables for managing their incremental data loads, what does the workflow look like?What are the options for being able to apply tests/validation logic to a dynamic table while it is operating?What are the most interesting, innovative, or unexpected ways that you have seen Dynamic Tables used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Dynamic Tables/Delayed View Semantics?When are Dynamic Tables/DVS the wrong choice?What do you have planned for the future of Dynamic Tables?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Delayed View Semantics: Presentation SlidesSnowflakeNumPyIPythonJupyterFlinkSpark StreamingKafkaSnowflake Dynamic TablesAirflowDagsterStreaming WatermarksMaterializeFelderaACIDCAP Theorem)LinearizabilitySerializable ConsistencySIGMODMaterialized ViewsdbtData VaultApache IcebergDatabricks DeltaHudiDead Letter Queuepg_ivmProperty Based TestingIceberg V3 Row LineagePrometheusThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
When working out, deciding what workout to do next to achieve a specific goal can be challenging. This talk will guide you through creating a four-week workout plan as a Notion page with to-do list items tailored to your workout history, helping you run a faster 5K
Most Python scripts in data science are synchronous — fetching one record at a time, waiting for APIs, or slowly scraping websites. In this talk, we’ll introduce Python’s asyncio ecosystem and show how it transforms IO-heavy data workflows. You'll see how httpx, aiofiles, and async constructs speed up tasks like web scraping and batch API calls. We’ll compare async vs threading, walk through a real-world case study, and wrap with performance benchmarks that demonstrate async's value.
Hands-on workshop for aspiring programmers aged 12-18. The Emoji Master Challenge guides students through levels—e.g., Level 3 displays a rose emoji 10 times; Level 5 conceals a superhero with emojis. The session begins with a Python introduction and ends with a reveal of the superheroes created by the students. The full course expands to practical Python skills and projects using PyGame, Turtle, and PyQT.
In this episode, Conor and Bryce chat about language learning apps, recent C++/CUDA/Python meetups and more! Link to Episode 243 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBryce Adelstein Lelbach: TwitterShow Notes Date Generated: 2025-07-01 Date Released: 2025-07-18 MondlyduolingoBabbelADSP Episode 213: NumPy & Summed-Area TablesADSP Episode 227: Re: The CUDA C++ Developer’s ToolboxIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8
An overview of agentic AI and how to build agentic apps using the LlamaIndex framework. Covers core design patterns such as event-driven workflows, routing, parallelization, orchestrator–worker setups, and evaluator–optimizer loops, and discusses multi-agent systems. The session will explore MCP servers and tools for providing live context to agents, and uses the open-source LlamaIndex Python framework with models from OpenAI and Anthropic.
An overview of agentic AI concepts, focusing on how to build agentic apps with LlamaIndex. We'll cover core design patterns such as event-driven workflows, routing, parallelization, orchestrator–worker setups, and evaluator–optimizer loops, and show how to bring them to life in the LlamaIndex framework. The session also explores how these pieces fit into multi-agent systems, with a focus on MCP servers and tools that provide live context to agents. By the end, you'll learn to build agents using LlamaIndex, compose multi-agent systems, design reusable tools for agents, and give agents real-time knowledge. The talk uses the LlamaIndex Python framework and models from OpenAI and Anthropic.
An in-depth look at agentic AI — how to build agent-driven applications using the LlamaIndex framework. We’ll cover core design patterns such as event-driven workflows, routing, parallelization, orchestrator–worker setups, and evaluator–optimizer loops, and show how to implement them in LlamaIndex. The talk also explores how these pieces fit together into multi-agent systems, and how MCP servers and tools help agents obtain live context to hit their goals. By the end, you’ll learn to build agents with LlamaIndex, compose multi-agent systems, design reusable tools for agents, and give your agents real-time knowledge. The session uses the open-source LlamaIndex framework in Python and models from providers like OpenAI and Anthropic.
Atelier sur la construction d’un modèle de Machine Learning avec Python.
Summary In this episode of the Data Engineering Podcast Kacper Łukawski from Qdrant about integrating MCP servers with vector databases to process unstructured data. Kacper shares his experience in data engineering, from building big data pipelines in the automotive industry to leveraging large language models (LLMs) for transforming unstructured datasets into valuable assets. He discusses the challenges of building data pipelines for unstructured data and how vector databases facilitate semantic search and retrieval-augmented generation (RAG) applications. Kacper delves into the intricacies of vector storage and search, including metadata and contextual elements, and explores the evolution of vector engines beyond RAG to applications like semantic search and anomaly detection. The conversation covers the role of Model Context Protocol (MCP) servers in simplifying data integration and retrieval processes, highlighting the need for experimentation and evaluation when adopting LLMs, and offering practical advice on optimizing vector search costs and fine-tuning embedding models for improved search quality.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Kacper Łukawski about how MCP servers can be paired with vector databases to streamline processing of unstructured dataInterview IntroductionHow did you get involved in the area of data management?LLMs are enabling the derivation of useful data assets from unstructured sources. What are the challenges that teams face in building the pipelines to support that work?How has the role of vector engines grown or evolved in the past ~2 years as LLMs have gained broader adoption?Beyond its role as a store of context for agents, RAG, etc. what other applications are common for vector databaes?In the ecosystem of vector engines, what are the distinctive elements of Qdrant?How has the MCP specification simplified the work of processing unstructured data?Can you describe the toolchain and workflow involved in building a data pipeline that leverages an MCP for generating embeddings?helping data engineers gain confidence in non-deterministic workflowsbringing application/ML/data teams into collaboration for determining the impact of e.g. chunking strategies, embedding model selection, etc.What are the most interesting, innovative, or unexpected ways that you have seen MCP and Qdrant used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on vector use cases?When is MCP and/or Qdrant the wrong choice?What do you have planned for the future of MCP with Qdrant?Contact Info LinkedInTwitter/XPersonal websiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links QdrantKafkaApache OoziNamed Entity RecognitionGraphRAGpgvectorElasticsearchApache LuceneOpenSearchBM25Semantic SearchMCP == Model Context ProtocolAnthropic Contextualized ChunkingCohereThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Info session about LLM Mini Bootcamp; join to ask questions and receive a discount coupon.