talk-data.com
Activities & events
| Title & Speakers | Event |
|---|---|
|
From Data to Deployment: Building Production AI Systems
2026-01-22 · 18:00
PyData Wolverhampton Launch Event: From Data to Deployment Join us for the inaugural PyData Wolverhampton meetup! We're bringing together data scientists, engineers, AI practitioners, and anyone interested in Python and data science. What to Expect: This event features two practical talks on building AI systems that work in production: Talk 1: "From Demos to Deployed: Building AI Systems That Work, and Work Right" Speaker: Stephen Toriola (Software & AI Engineer at Compare the Market) Explore how AI has evolved from simple demos to production systems, what makes AI work in real-world applications, and how to build responsibly. Talk 2: "Building AI Right: Ethics and Implementation in Practice" Speaker: Nazeh Abel (AI Consultant at Medallion Technologies) Practical insights on implementing AI ethically, common pitfalls to avoid, and making better decisions when building AI systems. Agenda:
What to Bring: Just yourself! No laptop or preparation needed. Bring business cards if you'd like to connect with other attendees. Food & Drinks: Free pizza and soft drinks provided. How to Find Us: University of Wolverhampton Science Park, Wolverhampton WV10 9RU By Public Transport: From Wolverhampton train station, walk 5 minutes to the bus station. Take bus 32 or 33, ride for 7 stops (approximately 12 minutes) and drop off at Stafford Road. Walk 5 minutes to the Science Park. By Taxi: 7-minute drive from Wolverhampton train/bus station. By Car: Free parking available on-site. Accessibility: The venue is on the ground floor and fully accessible. We'll have PyData signage at the entrance to help you find us. Who Should Attend: Data scientists, data analysts, machine learning engineers, software developers, students, and anyone interested in Python, data science, or AI. All skill levels welcome! PyData Wolverhampton is part of the global PyData network, supported by NumFOCUS. We're building a community for data professionals in the Black Country. Follow us on LinkedIn: [PyData Wolverhampton] See you there! |
From Data to Deployment: Building Production AI Systems
|
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
|
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
|
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
|
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
|
|
AI Webinar Series - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series - Evaluating AI Agent Reliability
|
|
AI Webinar Series - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series - Evaluating AI Agent Reliability
|
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
|
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
2026-01-21 · 18:00
Important: Register on the event website to receive joining link. (rsvp on meetup will NOT receive joining link). This is virtual event for our AI global community, please double-check your local time. Can't make it live? Register anyway to receive the webinar recording. Description: Welcome to the weekly AI Deep Dive Webinar Series. Join us for deep dive tech talks on AI, hands-on experiences on code labs, workshops, and networking with speakers & fellow developers from all over the world. Tech Talk: Evaluating AI Agent Reliability Speaker: Anupam Datta (Snowflake) \| Josh Reini (Snowflake) Abstract: Agents often fail in ways you can’t see. They could return a final answer while taking a broken path: drifting from the goal, making irrational plan jumps, or misusing tools. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? These hidden mistakes silently rack up compute costs, spike latency, and cause brittle behavior that collapses in production. Traditional evals won’t flag any of it because they only check the output, not the decisions that produced it. This session introduces the Agent GPA (Goal-Plan-Action) framework, available in the open-source TruLens library. Benchmark tests show the Agent GPA framework consistently outperformed standard LLM evaluators, giving teams scalable and trustworthy insight into agent behavior
You’ll learn how to inspect an agent’s reasoning steps, detect issues like hallucinations, bad tool calls, and missed actions, and leave knowing how to make your agent truly production-ready. Speakers/Topics: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics More upcoming sessions:
Local and Global AI Community on Discord Join us on discord for local and global AI tech community:
|
AI Webinar Series (Virtual) - Evaluating AI Agent Reliability
|
|
Campfire talks about everything Domain Driven Design
2026-01-08 · 17:00
🏕️ Campfire talks about everything Domain Driven Design ⛺️ We organized our first DDD Campfire talks last year, which was a great success. What better time than December to host another one! Who doesn't love sharing and listening to stories. Storytelling has been the way that we share experiences and inspire each other, ever since humankind has existed. Don’t worry, we know It's a bit too cold to sit outside at the moment, that won't hold us back to find a cozy corner inside for our campfire, enjoy food together and share stories. No slides, no elaborate preparation, just a group of people with interesting experiences around Domain-Driven Design. Whether you're a seasoned storyteller or just enjoy listening to a good tale, this gathering is for you! 🧑🏻 Topics 🤖 Last year showed that the cosy atmosphere really allowed everybody to actively participate in sharing their experiences with Domain Driven Design. We had some starter-questions ready, but hardly needed them. This year we'll definitely do the same, maybe with topics starters like "What is the impact of AI 🤖 on what we are trying to achieve as DDD community?". Or maybe the group will decide not to talk about AI at all, as we've been talking about that too much already this year, maybe it's better to talk about 🧑🏻 human stuff for a change 😀. 🚋 Location 🌍 Join us at the Total Design office in IJburg, 15 minutes from Amsterdam Central Station using tram 26. If you come by car, we don't have any parking spots at the location itself, ParkBee Sluishuis is available across the street at a reasonable price. Total Design and Luminis can be found on the 4th floor at Pedro de Medinalaan 1, Amsterdam. 🕖 Schedule 🕙 18:00 - 19:00 Doors open, welcome drinks and food 19:00 - 21:00 Campfire storytelling, sharing DDD experiences and discussing the current state of DDD 🎯Target audience 👥 You have an interest in Domain-Driven Design, want to learn more about it, or want to share your experiences with others. Feel free to drop a comment if you're not sure if you should attend. 🍝 RSVP correctly for your fellow members and host 🍕 Please ensure that you update your RSVP. We understand that attending a meetup isn't always possible even though you wanted to come. Throwing away food is just waste, so feel free to change your mind but update your RSVP accordingly :-) |
Campfire talks about everything Domain Driven Design
|
|
Turn Your Knowledge into an API for LLMs - Meetup @ Holberton School
2025-12-09 · 18:00
A practical, builder-first meetup focused on MCP integrations with Neo4j to enable context-aware AI. Live demos are welcome and content should be concise and technically grounded. Date & Time: December 9, 2025 from 7:00pm - 10:00pm
Propose a Talk: Submit Your Talk Proposal via the form. The form requests a concise 5 minute talk or demo illustrating MCP integrations with context-aware AI, and/ or graph-powered workflows. Sponsors Neo4j - Neo4j is a leading graph database platform that empowers AI developers to harness the power of connected data. Their technology enables efficient storage\, analysis\, and visualization of complex data relationships\, integrating seamlessly with popular AI and machine learning tools. Neo4j’s features\, including native graph storage and advanced algorithms\, make it ideal for powering AI applications like knowledge graphs and recommendation systems\, helping developers extract deeper insights from their data.
— MLOps lead |
Turn Your Knowledge into an API for LLMs - Meetup @ Holberton School
|
|
PyData Prague #32 - Scrapeyard Forge
2025-11-26 · 17:00
Hello data speakers and crawling Pythons! The 32nd Prague PyData meetup will be hosted at Apify offices right in the heart of Prague's city center (see below). The talks will start at 18:30 and we encourage you to come as soon as 18:00 to enjoy the opportunity to socialise and refresh yourselves (which you can continue doing during the break and after the talks). By attending, you agree that you will abide by the relevant parts of Apify Event Terms and Conditions and that you will provide your true identity for registration at the venue entrance. 🤗 Our main goal is to build the community around Python and data and make it welcoming to people of various skills and experience levels. ⚡ If you are interested in giving a lightning talk (up to 5 minutes to present an idea, tool or results related at least to some degree to Python and/or data), please contact us before or during the event. 📢 Jan Kislinger: Forged in Rust, Spoken in Python Python has long been the go-to language for data science and rapid experimentation, but when our models and algorithms start to hit performance limits, we naturally look toward something closer to the metal. In recent years, Rust has become a powerful partner: a language that offers high speed, strong safety guarantees, and a surprisingly pleasant developer experience. With tools like pyo3 and maturin, we can implement performance-critical components in Rust while keeping the flexible, expressive Python interface we love. This talk explores how the evolving Python–Rust ecosystem is gently reshaping the balance between productivity and performance. What new capabilities do we unlock by going lower level, and what, if anything, do we leave behind? 📢 Vladimír Dušek: Dealing with today's web scraping challenges in Python Web data powers today's AI revolution, but accessing it is becoming increasingly complex as websites grow more sophisticated. Modern web scraping and automation face challenges such as IP and geographical blocking, JavaScript-based content rendering, device and browser fingerprinting, CAPTCHAs, anti-scraping HTTP headers, and more. In this talk, you'll learn how to beat these challenges using Crawlee—a modern open-source Python library for web scraping and automation that we built from scratch at Apify. Refreshments will be available, thanks to our generous sponsors at Apify. How to get there? 1. Go to the Lucerna passage from Vodičkova street. 2. On the left-hand side (opposite Lucerna Music Bar)\, you'll see a glass door and gold bells. 3. Ring the "Apify" bell\, then go to the 7th floor - not more than 4 people in the lift !!! Please, RSVP here. See you soon, PyData Prague team |
PyData Prague #32 - Scrapeyard Forge
|
|
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
2025-11-16 · 23:19
Preeti Somal
– EVP of Engineering
@ Temporal
,
Tobias Macey
– host
Summary In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, task queues, and replay—and how it eliminates hand‑rolled retry, checkpoint, and error‑handling scaffolding while letting data remain where it lives. Preeti shares real-world patterns for replacing DAG-first orchestration, integrating application and data teams through signals and Nexus for cross-boundary calls, and using Temporal to coordinate long-running, human-in-the-loop, and agentic AI workflows with full observability and auditability. Shee also discusses heuristics for choosing Temporal alongside (or instead of) traditional orchestrators, managing scale without moving large datasets, and lessons from running durable execution as a cloud service. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Preeti Somal about how to incorporate durable execution and state management into AI application architectures Interview IntroductionHow did you get involved in the area of data management?Can you describe what durable execution is and how it impacts system architecture?With the strong focus on state maintenance and high reliability, what are some of the most impactful ways that data teams are incorporating tools like Temporal into their work?One of the core primitives in Temporal is a "workflow". How does that compare to similar primitives in common data orchestration systems such as Airflow, Dagster, Prefect, etc.? What are the heuristics that you recommend when deciding which tool to use for a given task, particularly in data/pipeline oriented projects? Even if a team is using a more data-focused orchestration engine, what are some of the ways that Temporal can be applied to handle the processing logic of the actual data?AI applications are also very dependent on reliable data to be effective in production contexts. What are some of the design patterns where durable execution can be integrated into RAG/agent applications?What are some of the conceptual hurdles that teams experience when they are starting to adopt Temporal or other durable execution frameworks?What are the most interesting, innovative, or unexpected ways that you have seen Temporal/durable execution used for data/AI services?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Temporal?When is Temporal/durable execution the wrong choice?What do you have planned for the future of Temporal for data and AI systems? Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story. Links TemporalDurable ExecutionFlinkMachine Learning EpochSpark StreamingAirflowDirected Acyclic Graph (DAG)Temporal NexusTensorZeroAI Engineering Podcast Episode The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
I Built an AI That Talks Like My Parents | Emotional AI, Empathy, and the Future of Human Tech
2025-11-11 · 12:00
Mukundan Sankar
– host
I missed my parents, so I built an AI that talks like them. This isn’t about replacing people—it’s about remembering the voices that make us feel safe.
In this 90-minute episode of Data & AI with Mukundan, we explore what happens when technology stops chasing efficiency and starts chasing empathy. Mukundan shares the story behind “What Would Mom & Dad Say?”, a Streamlit + GPT-4 experiment that generates comforting messages in the voice of loved ones.
You’ll hear:
The emotional spark that inspired the projectThe plain-English prompts anyone can use to teach AI empathyBoundaries & ethics of emotional AIHow this project reframed loneliness, creativity, and connectionTakeaway: AI can’t love you—but it can remind you of the people who do.
🔗 Try the free reflection prompts below:
THE ONE-PROMPT VERSION: “What Would Mom & Dad Say?” Join the Discussion (comments hub): https://mukundansankar.substack.com/notes Tools I use for my Podcast and Affiliate PartnersRecording Partner: Riverside → Sign up here (affiliate)Host Your Podcast: RSS.com (affiliate )Research Tools: Sider.ai (affiliate)Sourcetable AI: Join Here(affiliate)🔗 Connect with Me:Free Email NewsletterWebsite: Data & AI with MukundanGitHub: https://github.com/mukund14Twitter/X: @sankarmukund475LinkedIn: Mukundan SankarYouTube: Subscribe |
Data & AI with Mukundan | Learn AI by Building |
|
The AI Data Paradox: High Trust in Models, Low Trust in Data
2025-11-09 · 23:53
Ariel Pohoryles
– guest
@ Rivery
,
Tobias Macey
– host
Summary In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in the research: while 77% of leaders trust the data feeding their AI systems, only 50% trust their organization's data overall. Ariel explains why truly productionizing AI demands broader, continuously refreshed data with stronger automation and governance, and highlights the challenges posed by unstructured data and vector stores. The conversation covers the need to shift from manual reviews to automated pipelines, the resurgence of metadata and master data management, and the importance of guardrails, traceability, and agent governance. Ariel also predicts a growing convergence between data teams and application integration teams and advises leaders to focus on high-value use cases, aggressive pipeline automation, and cataloging and governing the coming sprawl of AI agents, all while using AI to accelerate data engineering itself. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Ariel Pohoryles about data management investments that organizations are making to enable them to scale AI implementationsInterview IntroductionHow did you get involved in the area of data management?Can you start by describing the motivation and scope of your recent survey on data management investments for AI across your respondents?What are the key takeaways that were most significant to you?The survey reveals a fascinating paradox: 77% of leaders trust the data used by their AI systems, yet only half trust their organization's overall data quality. For our data engineering audience, what does this suggest about how companies are currently sourcing data for AI? Does it imply they are using narrow, manually-curated "golden datasets," and what are the technical challenges and risks of that approach as they try to scale?The report highlights a heavy reliance on manual data quality processes, with one expert noting companies feel it's "not reliable to fully automate validation" for external or customer data. At the same time, maturity in "Automated tools for data integration and cleansing" is low, at only 42%. What specific technical hurdles or organizational inertia are preventing teams from adopting more automation in their data quality and integration pipelines?There was a significant point made that with generative AI, "biases can scale much faster," making automated governance essential. From a data engineering perspective, how does the data management strategy need to evolve to support generative AI versus traditional ML models? What new types of data quality checks, lineage tracking, or monitoring for feedback loops are required when the model itself is generating new content based on its own outputs?The report champions a "centralized data management platform" as the "connective tissue" for reliable AI. How do you see the scale and data maturity impacting the realities of that effort?How do architectural patterns in the shape of cloud warehouses, lakehouses, data mesh, data products, etc. factor into that need for centralized/unified platforms?A surprising finding was that a third of respondents have not fully grasped the risk of significant inaccuracies in their AI models if they fail to prioritize data management. In your experience, what are the biggest blind spots for data and analytics leaders?Looking at the maturity charts, companies rate themselves highly on "Developing a data management strategy" (65%) but lag significantly in areas like "Automated tools for data integration and cleansing" (42%) and "Conducting bias-detection audits" (24%). If you were advising a data engineering team lead based on these findings, what would you tell them to prioritize in the next 6-12 months to bridge the gap between strategy and a truly scalable, trustworthy data foundation for AI?The report states that 83% of companies expect to integrate more data sources for their AI in the next year. For a data engineer on the ground, what is the most important capability they need to build into their platform to handle this influx?What are the most interesting, innovative, or unexpected ways that you have seen teams addressing the new and accelerated data needs for AI applications?What are some of the noteworthy trends or predictions that you have for the near-term future of the impact that AI is having or will have on data teams and systems?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links BoomiData ManagementIntegration & Automation DemoAgentstudioData Connector Agent WebinarSurvey ResultsData GovernanceShadow ITPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
Product thinking, turned inward (Matt LeMay & James Gunaca)
2025-10-30 · 18:00
Topic: Product thinking, turned inward. Building product careers, teams and organizations for impact. Speakers: Matt LeMay and James Gunaca Language: English Where & when: In-person @ LeLaptop, 30th October 2025 from 7pm For this edition of ProductTank Paris we’re excited to welcome not one, but two, London-based product experts: Matt LeMay and James Gunaca. Matt and James have more than 35 years combined experience working with big names in global product teams. Join us for an evening of collective introspection (over a friendly drink, of course), and leave with some fresh ideas for how to apply smart product thinking to your careers, teams and orgs (and if you’re lucky, a brand new book!) Matt LeMay Matt has spent the last 15 years working with companies ranging from early-stage startups to Fortune 500 enterprises and tech companies like Google, Spotify, and Mailchimp. He'll be discussing the work he's done across industries and geographies to help product managers, teams, and organizations put business impact first without losing sight of customer needs. He'll also be giving away signed copies of his new book, IMPACT-FIRST PRODUCT TEAMS, which includes powerful questions and conversation starters to help every product team focus on the work that matters most to their particular business. James Gunaca Product Managers are experts at creating roadmaps and strategic plans for their products - but rarely apply these same powerful skills to their own careers. In this interactive talk, you'll get to assess your PM skills and see how they can inform your career roadmap. James Gunaca has a 20+ year career in Product Management and Digital Marketing, working as a Product Leader at companies like Amazon and ExpressVPN across the United States and Europe. He’s built and led multiple Product teams who have delivered global 0-to-1 products like Amazon Echo with the Alexa AI assistant, AppleCare+ as a Subscription, and more. James now coaches Product Managers, on a mission to make better products for the world by helping professionals build and grow their careers in Product Management. He’s also an organiser for ProductTank London and founder of Product Sphere. Practical info: LeLaptop, 7 rue Geoffroy l’Angevin, 75004 Paris (nearest metros : Rambuteau, Châtelet, Hôtel de Ville) Doors open @ 19h Talks start @ 19h30 Drinks, snacks and chats @ 20h30 |
Product thinking, turned inward (Matt LeMay & James Gunaca)
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|