talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (67 results)

See all 67 →

Companies (1 result)

Head of Ecosystem

Activities & events

Title & Speakers Event

Join fellow Airflow enthusiasts and leaders at Salisbury House for an evening of engaging talks, great food and drinks, and exclusive swag!

We'll start you off with a deep dive into the Airflow 2026 survey results, and finish off with a community member presentation on the Apache TinkerPop provider.

PRESENTATIONS

Talk #1: The State of Apache Airflow® 2026

Apache Airflow® continues to thrive as the world’s leading open-source data orchestration platform, with 30M downloads per month and over 3k contributors. 2025 marked a major milestone with the release of Airflow 3, which introduced DAG versioning, enhanced security and task isolation, assets, and more. These changes have reshaped how data teams build, operate, and govern their pipelines.

In this session, our speaker will share insights from the State of Airflow 2026 report, including:

  • Latest trends in how teams are using Airflow today
  • What’s next for the project and ecosystem
  • A discussion of emerging best practices and evolving use cases

Join us to hear directly from a leader in the community and discover how to get the most out of Airflow in the year ahead.

Talk #2: Building the Apache TinkerPop Provider for Airflow

Graph databases are powering everything from recommendation engines to fraud detection, but integrating graph operations into modern data pipelines has often required custom code and workarounds. Earlier this year, Ahmad built a new Apache TinkerPop provider for Airflow, making it easier than ever to orchestrate Gremlin queries, manage graph workloads, and connect Airflow to TinkerPop-enabled systems. In this session, you’ll learn:

  • What the TinkerPop provider does and why it matters for graph-based workloads
  • How to run Gremlin queries and manage graph jobs directly within Airflow
  • Real examples from the development process, including design decisions and lessons learned
  • How this provider opens the door for new use cases in graph analytics and data engineering

Join us to explore how Airflow and TinkerPop can work together to streamline graph workflows and unlock new patterns in modern data pipelines.

AGENDA

  • 5:30-6 PM: Arrivals, networking, food & drinks
  • 6-7PM: Presentations
  • 7-8PM: Networking
The State of Airflow 2026: London Airflow Meetup!
Bennie Haelen – author

In today's race to harness generative AI, many teams struggle to integrate these advanced tools into their business systems. While platforms like GPT-4 and Google's Gemini are powerful, they aren't always tailored to specific business needs. This book offers a practical guide to building scalable, customized AI solutions using the full potential of data lakehouse architecture. Author Bennie Haelen covers everything from deploying ML and GenAI models in Databricks to optimizing performance with best practices. In this must-read for data professionals, you'll gain the tools to unlock the power of large language models (LLMs) by seamlessly combining data engineering and data science to create impactful solutions. Learn to build, deploy, and monitor ML and GenAI models on a data lakehouse architecture using Databricks Leverage LLMs to extract deeper, actionable insights from your business data residing in lakehouses Discover how to integrate traditional ML and GenAI models for customized, scalable solutions Utilize open source models to control costs while maintaining model performance and efficiency Implement best practices for optimizing ML and GenAI models within the Databricks platform

data ai-ml artificial-intelligence-ai generative-ai AI/ML Data Engineering Data Lakehouse Data Science Databricks GenAI LLM
O'Reilly AI & ML Books

Start 2026 with the ClickHouse India community in Gurgaon!

Connect with fellow data practitioners and hear from industry experts through engaging talks focused on lessons learned, best practices, and modern data challenges.

Agenda:

  • 10:30 AM: Registration, light snacks & networking
  • 11:00 AM: Welcome & Introductions
  • 11:10 AM: Inside ClickStack: Engineering Observability for Scale by Rakesh Puttaswamy, Lead Solutions Architect @ ClickHouse
  • 11:35 AM: Supercharging Personalised Notifications At Jobhai With ClickHouse by Sumit Kumar and Arvind Saini, Tech Leads @ Info Edge
  • 12:00 PM: Simplifying CDC: Migrating from Debezium to ClickPipes by Abhash Solanki, DevOps Engineer @ Spyne AI
  • 12:25 PM: Solving Analytics at Scale: From CDC to Actionable Insights by Kunal Sharma, Software Developer @ Samarth eGov
  • 12:50 PM: Q&A
  • 1:30 PM: Lunch & Networking

👉🏼 RSVP to secure your spot!

Interested in speaking at this meetup or future ClickHouse events? 🎤Shoot an email to [email protected] and she'll be in touch.

******** 🎤 Session Details: Inside ClickStack: Engineering Observability for Scale Dive deep into ClickStack, ClickHouse’s fresh approach to observability built for engineers who care about speed, scale, and simplicity. We’ll unpack the technical architecture behind how ClickStack handles metrics, logs, and traces using ClickHouse as the backbone for real-time, high-cardinality analytics. Expect a hands-on look at ingestion pipelines, schema design patterns, query optimization, and the integrations that make ClickStack tick.

Speaker: Rakesh Puttaswamy, Lead Solutions Architect @ ClickHouse

🎤 Session Details: Supercharging Personalised Notifications At Jobhai With ClickHouse Calculating personalized alerts for 2 million users is a data-heavy challenge that requires more than just standard indexing. This talk explores how Jobhai uses ClickHouse to power its morning notification pipeline, focusing on the architectural shifts and query optimizations that made our massive scale manageable and fast.

Speaker: Sumit Kumar and Arvind Saini, Tech Leads @ Info Edge Sumit is a seasoned software engineer with deep expertise in databases, backend systems, and machine learning. For over six years, he has led the Jobhai engineering team, driving continuous improvements across their database infrastructure and user-facing systems while streamlining workflows through ongoing innovation. Connect with Sumit Kumar on LinkedIn.

Arvind is a Tech Lead at Info Edge India Ltd with experience building and scaling backend systems for large consumer and enterprise platforms. Over the years, they have worked across system design, backend optimization, and data-driven services, contributing to initiatives such as notification platforms, workflow automation, and product revamps. Their work focuses on improving reliability, performance, and scalability of distributed systems, and they enjoy solving complex engineering problems while mentoring teams and driving technical excellence.

🎤 Session Details: Simplifying CDC: Migrating from Debezium to ClickPipes In this talk, Abhash will share their engineering team's journey migrating our core MySQL and MongoDB CDC flows to ClickPipes. We will contrast our previous architecture—where every schema change required manual intervention or complex Debezium configurations—with the new reality of ClickPipes' automated schema evolution, which seamlessly handles upstream schema changes and ingests flexible data without breaking pipelines.

Speaker: Abhash Solanki, DevOps Engineer @ Spyne AI Abhash serves as a DevOps Engineer at Spyne, orchestrating the AWS infrastructure behind the company's data warehouse and CDC pipelines. Having managed complex self-hosted Debezium and Kafka clusters, he understands the operational overhead of running stateful data stacks in the cloud. He recently led the architectural shift to ClickHouse Cloud, focusing on eliminating engineering toil and automating schema evolution handling.

🎤 Session Details: Solving Analytics at Scale: From CDC to Actionable Insights As SAMARTH’s data volumes grew rapidly, our analytics systems faced challenges with frequent data changes and near real-time reporting. These challenges were compounded by the platform’s inherently high cardinality in multidimensional data models - spanning institutions, programmes, states, categories, workflow stages, and time, resulting in highly complex and dynamic query patterns.

This talk describes how we evolved from basic CDC pipelines to a fast, reliable, and scalable near real-time analytics platform using ClickHouse. We share key design and operational learnings that enabled us to process continuous high-volume transactional data and deliver low-latency analytics for operational monitoring and policy-level decision-making.

Speaker: Kunal Sharma, Software Developer @ Samarth eGov Kunal Sharma is a data-focused professional with experience in building scalable data pipelines. His work includes designing and implementing robust ETL/ELT workflows, data-driven decision engines, and large-scale analytics platforms. At SAMARTH, he has contributed to building near real-time analytics systems, including the implementation of ClickHouse for large-scale, low-latency analytics.

ClickHouse Gurgaon/Delhi Meetup

These are the notes of the previous "How to Build a Portfolio That Reflects Your Real Skills" event:

Properties of an ideal portfolio repository:

  • Built to prove employable skills and readiness for real work
  • Fewer projects, carefully chosen to match job requirements
  • Clean, readable, refactored code, and follows best practices
  • Detailed READMEs (setup, features, tech stack, decisions, how to deploy, testing strategy, etc)
  • Logical, meaningful commits that show development process <- you can follow the git history for important commits/features
  • Clear architecture (layers, packages, separation of concerns) <- use best practices
  • Unit and integration tests included and explained <-- also talk about them in the README
  • Proper validation, exceptions, and edge case handling
  • Polished, complete, production-like projects only
  • “Can this person work on our codebase?” <-- reviewers will ask this
  • Written for recruiters, hiring managers, and senior engineers
  • Uses industry-relevant and job-listed technologies <- tech stak should match the CV
  • Well-scoped, realistic features similar to real products
  • Consistent style, structure, and conventions across projects
  • Environment variables, clear setup steps, sample configs
  • Minimal, justified dependencies with clear versioning
  • Proper logging, and meaningful log messages
  • No secrets committed, basic security best practices applied
  • Shows awareness of scaling, performance, and future growth <- at least have a "possible improvements" section in the README
  • a list of ADRs explains design choices and trade-offs <- should be a part of the documentation

📌 Backend & Frontend Portfolio Project Ideas

These projects are intentionally reusable across tech stacks. Following tutorials and reusing patterns is expected — what matters is:

  • understanding the architecture
  • explaining trade-offs
  • documenting decisions clearly

☕ Junior Java Backend Developer (Spring Boot)

1. Shop Manager Application

A monolithic Spring Boot app designed with microservice-style boundaries. Features

  • Secure user registration & login
  • Role-based access control using JWT
  • REST APIs for:
  • Users
  • Products
  • Inventory
  • Orders
  • Automatic inventory updates when orders are placed
  • CSV upload for bulk product & inventory import
  • Clear service boundaries (UserService, OrderService, InventoryService, etc.)

Engineering Focus

  • Clean architecture (controllers, services, repositories)
  • Global exception handling
  • Database migrations (Flyway/Liquibase)
  • Unit & integration testing
  • Clear README explaining architecture decisions

2. Parallel Data Processing Engine

Backend service for processing large datasets efficiently. Features

  • Upload large CSV/log files
  • Split data into chunks
  • Process chunks in parallel using:
  • ExecutorService
  • CompletableFuture
  • Aggregate and return results

Demonstrates

  • Java concurrency
  • Thread pools & async execution
  • Performance optimization

3. Distributed Task Queue System

Simple async job processing system. Features

  • One service submits tasks
  • Another service processes them asynchronously
  • Uses Kafka or RabbitMQ
  • Tasks: report generation, data transformation

Demonstrates

  • Message-driven architecture
  • Async workflows
  • Eventual consistency

4. Rate Limiting & Load Control Service

Standalone service that protects APIs from abuse. Features

  • Token bucket or sliding window algorithms
  • Redis-backed counters
  • Per-user or per-IP limits

Demonstrates

  • Algorithmic thinking
  • Distributed state
  • API protection patterns

5. Search & Indexing Backend

Document or record search service. Features

  • In-memory inverted index
  • Text search, filters, ranking
  • Optional Elasticsearch integration

Demonstrates

  • Data structures
  • Read-optimized design
  • Trade-offs between custom vs external tools

6. Distributed Configuration & Feature Flag Service

Centralized config service for other apps. Features

  • Key-value configuration store
  • Feature flags
  • Caching & refresh mechanisms

Demonstrates

  • Caching strategies
  • Consistency vs availability trade-offs
  • System design for shared services

🐹 Mid-Level Go Backend Developer (Non-Kubernetes)

1. High-Throughput Event Processing Pipeline

Multi-stage concurrent pipeline. Features

  • HTTP/gRPC ingestion
  • Validation & transformation stages
  • Goroutines & channels
  • Worker pools, batching, backpressure
  • Graceful shutdown

2. Distributed Job Scheduler & Worker System

Async job execution platform. Features

  • Job scheduling & delayed execution
  • Retries & idempotency
  • Job states (pending, running, failed, completed)
  • Message queue or gRPC-based workers

3. In-Memory Caching Service

Redis-like cache written from scratch. Features

  • TTL support
  • Eviction strategies (LRU/LFU)
  • Concurrent-safe access
  • Optional disk persistence

4. Rate Limiting & Traffic Shaping Gateway

Reverse-proxy-style rate limiter. Features

  • Token bucket / leaky bucket
  • Circuit breakers
  • Redis-backed distributed limits

5. Log Aggregation & Query Engine

Incrementally built system: Step-by-step

  1. REST API + Postgres (store logs, query logs)
  2. Optimize for massive concurrency
  3. Replace DB with in-memory data structures
  4. Add streaming endpoints using channels & batching

🐍 Mid-Level Python Backend Developer

1. Asynchronous Task Processing System

Async job execution platform. Features

  • Async API submission
  • Worker pool (asyncio or Celery-like)
  • Retries & failure handling
  • Job status tracking
  • Idempotency

2. Event-Driven Data Pipeline

Streaming data processing service. Features

  • Event ingestion
  • Validation & transformation
  • Batching & backpressure handling
  • Output to storage or downstream services

3. Distributed Rate Limiting Service

API protection service. Steps

  • Step 1: Use an existing rate-limiting library
  • Step 2: Implement token bucket / sliding window yourself

4. Search & Indexing Backend

Search system for logs or documents. Features

  • Custom indexing or Elasticsearch
  • Filtering & time-based queries
  • Read-heavy optimization

5. Configuration & Feature Flag Service

Shared configuration backend. Steps

  • Step 1: Use a caching library
  • Step 2: Implement your own cache (explain in README)

🟦 Mid-Level TypeScript Backend Developer

1. Asynchronous Job Processing System

Queue-based task execution. Features

  • BullMQ / RabbitMQ / Redis
  • Retries & scheduling
  • Status tracking

2. Real-Time Chat / Notification Service

WebSocket-based system. Features

  • Presence tracking
  • Message persistence
  • Real-time updates

3. Rate Limiting & API Gateway

API gateway with protections. Features

  • Token bucket / sliding window
  • Response caching
  • Request logging

4. Search & Filtering Engine

Search backend for products, logs, or articles. Features

  • In-memory index or Elasticsearch
  • Pagination & sorting

5. Feature Flag & Configuration Service

Centralized config management. Features

  • Versioning
  • Rollout strategies
  • Caching

🟨 Mid-Level Node.js Backend Developer

1. Async Task Queue System

Background job processor. Features

  • Bull / Redis / RabbitMQ
  • Retries & scheduling
  • Status APIs

2. Real-Time Chat / Notification Service

Socket-based system. Features

  • Rooms
  • Presence tracking
  • Message persistence

3. Rate Limiting & API Gateway

Traffic control service. Features

  • Per-user/API-key limits
  • Logging
  • Optional caching

4. Search & Indexing Backend

Indexing & querying service.


5. Feature Flag / Configuration Service

Shared backend for app configs.


⚛️ Mid-Level Frontend Developer (React / Next.js)

1. Dynamic Analytics Dashboard

Interactive data visualization app. Features

  • Charts & tables
  • Filters & live updates
  • React Query / Redux / Zustand
  • Responsive layouts

2. E-Commerce Store

Full shopping experience. Features

  • Product listings
  • Search, filters, sorting
  • Cart & checkout
  • SSR/SSG with Next.js

3. Real-Time Chat / Collaboration App

Live multi-user UI. Features

  • WebSockets or Firebase
  • Presence indicators
  • Real-time updates

4. CMS / Blogging Platform

SEO-focused content app. Features

  • SSR for SEO
  • Markdown or API-based content
  • Admin editing interface

5. Personalized Analytics / Recommendation UI

Data-heavy frontend. Features

  • Filtering & lazy loading
  • Large dataset handling
  • User-specific insights

6. AI Chatbot App — “My House Plant Advisor”

LLM-powered assistant with production-quality UX. Core Features

  • Chat interface with real-time updates
  • Input normalization & validation
  • Offensive content filtering
  • Unsupported query detection
  • Rate limiting (per user)
  • Caching recent queries
  • Conversation history per session
  • Graceful fallbacks & error handling

Advanced Features

  • Prompt tuning (beginner vs expert users)
  • Structured advice formatting (cards, bullets)
  • Local LLM support
  • Analytics dashboard (popular questions)
  • Voice input/output (speech-to-text, TTS)

✅ Final Advice

You do NOT need to build everything. Instead, pick 1–2 strong projects per role and focus on depth:

  • Explain the architecture clearly
  • Document trade-offs (why you chose X over Y)
  • Show incremental improvements
  • Prove you understand why, not just how

📌 Portfolio Quality Signals (Very Important)

  • Have a large, organic commit history → A single or very few commits is a strong indicator of copy-paste work.
  • Prefer 3–5 complex projects over 20 simple ones → Many tiny projects often signal shallow understanding.

🎯 Why This Helps in Interviews

Working on serious projects gives you:

  • Real hands-on practice
  • Concrete anecdotes (stories you can tell in interviews)
  • A safe way to learn technologies you don’t fully know yet
  • Better focus and long-term learning discipline
  • A portfolio that can be ported to another tech stack later (Java → Go, Node → Python, etc.)

🎥 Demo & Documentation Best Practices

  • Create a 2–3 minute demo / walkthrough video
  • Show the app running
  • Explain what problem it solves
  • Highlight one or two technical decisions
  • At the top of every README:
  • Add a plain-English paragraph explaining what the project does
  • Assume the reader is a complete beginner

🤝 Open Source & Personal Projects (Interview Signal)

Always mention that you have contributed to Open Source or built personal projects.

  • Shows team spirit
  • Shows you can read, understand, and navigate an existing codebase
  • Signals that you can onboard into a real-world repository
  • Makes you sound like an engineer, not just a tutorial follower
[Notes]How to Build a Portfolio That Reflects Your Real Skills

Click here to submit your questions before the Q&A session.

Join us for an exclusive live webinar where Twan Koot, Product Manager for NeoLoad at Tricentis, dives into the world of high-scale performance testing. Whether you’re a DevOps engineer, QA lead, performance specialist or product owner, you’ll gain real-world insights and practical advice around NeoLoad’s capabilities, best practices and roadmap.

🎯 Why this is a can’t-miss session

With modern applications becoming more complex. Spanning web, mobile, APIs, microservices, legacy systems and cloud environments; ensuring performance is no longer a nice-to-have: it’s a competitive necessity. NeoLoad empowers organisations to test performance at scale, integrate into CI/CD pipelines, support both code-less and “test-as-code” workflows, and deliver rapid results.

During this live Q&A, you’ll:

  • Hear directly from the product owner how NeoLoad addresses today’s performance-engineering challenges.
  • Explore how teams accelerate test design, cut maintenance effort and scale performance initiatives across the SDLC.
  • Get a sneak-peek into NeoLoad’s roadmap and how it’s evolving with AI, real-browser testing and cloud orchestration.
  • Bring your toughest questions, Twan will tackle them live.

👥 Who should attend

  • Performance Test Engineers & Architects
  • DevOps / CI/CD practitioners
  • QA Managers & Test Automation Leads
  • Product Owners responsible for application performance
  • Anyone looking to shift from manual load testing to a modern, scalable solution

✅ What you’ll walk away with

  • A clearer understanding of what NeoLoad is.
  • Insight into how to accelerate test design and maintenance.
  • Practical ideas on integrating performance testing into DevOps pipelines.
  • Tips on choosing the right testing strategy for modern apps.

About the speaker: Twan Koot

Twan Koot leads product management for NeoLoad at Tricentis, where he's driving the platform's AI strategy and helping enterprises modernize their performance testing approach. A former Microsoft MVP for Cloud Native, he combines cloud-native expertise with performance engineering innovation. A frequent speaker at industry events and contributor to the Performance Advisory Council, he's passionate about making performance engineering a team sport rather than a specialist-only activity.

Live Neoload Q&A - Your Questions Answered

Click here to submit your questions before the Q&A session.

Join us for an exclusive live webinar where Twan Koot, Product Manager for NeoLoad at Tricentis, dives into the world of high-scale performance testing. Whether you’re a DevOps engineer, QA lead, performance specialist or product owner, you’ll gain real-world insights and practical advice around NeoLoad’s capabilities, best practices and roadmap.

🎯 Why this is a can’t-miss session

With modern applications becoming more complex. Spanning web, mobile, APIs, microservices, legacy systems and cloud environments; ensuring performance is no longer a nice-to-have: it’s a competitive necessity. NeoLoad empowers organisations to test performance at scale, integrate into CI/CD pipelines, support both code-less and “test-as-code” workflows, and deliver rapid results.

During this live Q&A, you’ll:

  • Hear directly from the product owner how NeoLoad addresses today’s performance-engineering challenges.
  • Explore how teams accelerate test design, cut maintenance effort and scale performance initiatives across the SDLC.
  • Get a sneak-peek into NeoLoad’s roadmap and how it’s evolving with AI, real-browser testing and cloud orchestration.
  • Bring your toughest questions, Twan will tackle them live.

👥 Who should attend

  • Performance Test Engineers & Architects
  • DevOps / CI/CD practitioners
  • QA Managers & Test Automation Leads
  • Product Owners responsible for application performance
  • Anyone looking to shift from manual load testing to a modern, scalable solution

✅ What you’ll walk away with

  • A clearer understanding of what NeoLoad is.
  • Insight into how to accelerate test design and maintenance.
  • Practical ideas on integrating performance testing into DevOps pipelines.
  • Tips on choosing the right testing strategy for modern apps.

About the speaker: Twan Koot

Twan Koot leads product management for NeoLoad at Tricentis, where he's driving the platform's AI strategy and helping enterprises modernize their performance testing approach. A former Microsoft MVP for Cloud Native, he combines cloud-native expertise with performance engineering innovation. A frequent speaker at industry events and contributor to the Performance Advisory Council, he's passionate about making performance engineering a team sport rather than a specialist-only activity.

Live Neoload Q&A - Your Questions Answered

Speaker: Bianca Stratulat Start Date: Thu, Nov 20th 2025 · 7:00 PM EEST (5:00 PM GMT) Language: ENGLISH Location: Online (link visible for attendees)

===============================================================

Description:

We’ll explore how to integrate Databricks and Power BI effectively, enabling your organisation to unlock real-time analytics and create impactful data stories.

You’ll learn how to: Leverage the Medallion architecture to design scalable and maintainable data workflows in Databricks. Seamlessly connect Power BI to Databricks and consume data for reporting and dashboards. Apply best practices for Direct Query vs Import, balancing real-time insights with performance and scalability.

Whether you’re a data engineer, analyst, or Power BI enthusiast, this session will provide practical techniques and lessons learned from real-world implementations to help you supercharge your analytics capabilities.

===============================================================

At the end of the Meetup we'll have a Raffle with prizes offered by EDNA: 1 FREE one-year access Licenses on EDNA Platform for one lucky winner from the Live attendees !

===============================================================

Speaker: Bianca Stratulat Databricks Champion & Chief Data Officer at UnifEye

Bianca is a Databricks Champion and Chief Data Officer at UnifEye, where she helps organisations unlock the full potential of their data using modern platforms and AI.

With over 10 years of experience in data engineering, analytics, and visual storytelling, she specialises in building scalable data solutions that bridge the gap between technical teams and business leaders.

She has been a speaker at the Databricks Data + AI Summit 2025 in San Francisco and regularly presents at industry events on topics like Lakehouse architecture, real-time analytics, and best practices for integrating Databricks with Power BI.

Bianca is passionate about empowering data communities and helping teams turn complex data into actionable insights that drive innovation and growth.

Connect with Bianca here:

From Lakehouse to Dashboards integrate Databricks with PowerBI| Bianca Stratulat
Vijay Subramanian – Founder and CEO @ Trace , Tobias Macey – host

Summary In this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data practices at Rent the Runway and explains how the modern data stack has led to a proliferation of dashboards without a coherent way for business consumers to reason about cause, effect, and action. He explores how metric trees differ from and interoperate with other data modeling approaches, serve as a backend for analytical workflows, and provide concrete examples like modeling Uber's revenue drivers and customer journeys. Vijay also discusses the potential of AI agents operating on metric trees to execute workflows, organizational patterns for defining inputs and outputs with business teams, and a vision for analytics that becomes invisible infrastructure embedded in everyday decisions.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Vijay Subramanian about metric trees and how they empower more effective and adaptive analyticsInterview IntroductionHow did you get involved in the area of data management?Can you describe what metric trees are and their purpose?How do metric trees relate to metric/semantic layers?What are the shortcomings of existing data modeling frameworks that prevent effective use of those assets?How do metric trees build on top of existing investments in dimensional data models?What are some strategies for engaging with the business to identify metrics and their relationships?What are your recommendations for storage, representation, and retrieval of metric trees?How do metric trees fit into the overall lifecycle of organizational data workflows?When creating any new data asset it introduces overhead of maintenance, monitoring, and evolution. How do metric trees fit into existing testing and validation frameworks that teams rely on for dimensional modeling?What are some of the key differences in useful evaluation/testing that teams need to develop for metric trees?How do metric trees assist in context engineering for AI-powered self-serve access to organizational data?What are the most interesting, innovative, or unexpected ways that you have seen metric trees used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on metric trees and operationalizing them at Trace?When is a metric tree the wrong abstraction?What do you have planned for the future of Trace and applications of metric trees?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links Metric TreeTraceModern Data StackHadoopVerticaLuigidbtRalph KimballBill InmonMetric LayerDimensional Data WarehouseMaster Data ManagementData GovernanceFinancial P&L (Profit and Loss)EBITDA ==Earnings before interest, taxes, depreciation and amortizationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI/ML Analytics Data Engineering Data Management Data Modelling Datafold ETL/ELT dimensional modeling Modern Data Stack Prefect Python Data Streaming
Data Engineering Podcast

Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.

Oct 2 - Women in AI Virtual Event

Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.

Oct 2 - Women in AI Virtual Event

Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.

Oct 2 - Women in AI Virtual Event

Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.

Oct 2 - Women in AI Virtual Event

Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.

Oct 2 - Women in AI Virtual Event

Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.

Oct 2 - Women in AI Virtual Event

Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.

Oct 2 - Women in AI Virtual Event
Dustin Dorsey – author , Cameron Cyr – author

Master the art of data transformation with the second edition of this trusted guide to dbt. Building on the foundation of the first edition, this updated volume offers a deeper, more comprehensive exploration of dbt’s capabilities—whether you're new to the tool or looking to sharpen your skills. It dives into the latest features and techniques, equipping you with the tools to create scalable, maintainable, and production-ready data transformation pipelines. Unlocking dbt, Second Edition introduces key advancements, including the semantic layer, which allows you to define and manage metrics at scale, and dbt Mesh, empowering organizations to orchestrate decentralized data workflows with confidence. You’ll also explore more advanced testing capabilities, expanded CI/CD and deployment strategies, and enhancements in documentation—such as the newly introduced dbt Catalog. As in the first edition, you’ll learn how to harness dbt’s power to transform raw data into actionable insights, while incorporating software engineering best practices like code reusability, version control, and automated testing. From configuring projects with the dbt Platform or open source dbt to mastering advanced transformations using SQL and Jinja, this book provides everything you need to tackle real-world challenges effectively. What You Will Learn Understand dbt and its role in the modern data stack Set up projects using both the cloud-hosted dbt Platform and open source project Connect dbt projects to cloud data warehouses Build scalable models in SQL and Python Configure development, testing, and production environments Capture reusable logic with Jinja macros Incorporate version control with your data transformation code Seamlessly connect your projects using dbt Mesh Build and manage a semantic layer using dbt Deploy dbt using CI/CD best practices Who This Book Is For Current and aspiring data professionals, including architects, developers, analysts, engineers, data scientists, and consultants who are beginning the journey of using dbt as part of their data pipeline’s transformation layer. Readers should have a foundational knowledge of writing basic SQL statements, development best practices, and working with data in an analytical context such as a data warehouse.

data data-engineering storage-repositories data-warehouse CI/CD Cloud Computing dbt DWH Git Modern Data Stack Python SQL
O'Reilly Data Engineering Books

Microsoft added variable libraries on April 1 to markedly improve the development pipeline experience, which greatly improves the entire CICD process for releasing content in Fabric. Now to make variable libraries work, parameters need to be implemented throughout pipelines, notebooks and semantic models. These parameters will be changed when items are moved using Development Pipelines. While development pipelines were originally created for Power BI using them to deploy Fabric content were problematic, but with Variable Pipelines, they are the best way to migrate code from dev to test to prod.

We will explore the foundational concepts of parameter usage in Fabric, demonstrating how to define, manage, and adjust parameters to for different environments. The session will highlight best practices for incorporating variable libraries, showcasing their role in facilitating seamless updates across multiple notebooks and pipelines. Attendees will gain insights into: • The architecture and benefits of using parameters in Fabric environments. • Step-by-step guidance on setting up and utilizing variable libraries. • Real-world examples illustrating the impact of parameterization on workflow efficiency. By the end of this talk, participants will be equipped with the knowledge and tools to enhance their data engineering practices, leveraging the full potential of parameters and variable libraries in Fabric to drive more efficient and scalable data solutions.

Ginger Grant is a distinguished Microsoft Data Platform MVP and Microsoft Certified Trainer (MCT), renowned for her deep expertise in advanced analytics, machine learning, AI, data warehousing, and the evolving landscape of Microsoft Fabric. As a sought-after consultant, Ginger empowers organizations to harness the full potential of their data ecosystems.

Beyond consulting, Ginger is a prolific thought leader and speaker for both keynotes and technical training. She contributes regularly as a columnist for Pure AI, authors insightful books, and shares practical knowledge on her blog, DesertIsleSQL.com. Her educational impact spans a wide range of technologies, including Azure Synapse Analytics, Python, and Azure Machine Learning, making her a trusted voice in the data community.

Whether on stage, in print, or in the classroom, Ginger’s passion for data and commitment to knowledge-sharing make her a standout figure in the world of data and AI.

Checking out Variable Libraries in Fabric | Ginger Grant

The Latin American (LATAM) IT market is projected to reach $74.5 billion by 2029. What role does platform engineering play in this growth spike? Join our expert panel to discuss the real challenges, success stories, and future of platform engineering in the region.

In this session, Platform Engineering Ambassadors Pablo Castelo, Francisco Meneses, Sergio Canales and Caio Medeiros will discuss:

  • What is the #1 challenge a company in LATAM faces when wanting to implement an Internal Developer Platform (IDP) strategy?
  • What unique advantages, or 'superpowers' does LATAM bring to platform engineering ?
  • What is the next frontier for platform engineering in the region?

From budget realities to cultural advantages and adoption stories from the field, join experienced practitioners for an engaging session that examines the realities of platform engineering beyond the typical narratives.

After a 45-minute talk there’ll be a 15-minute Q&A, for which we encourage you to submit questions in advance. A webinar recording and related materials will be shared with all attendees after the event.


Speakers: Pablo Castelo - Architect @ Red Hat Pablo is an experienced IT Architect at Red Hat with over 15 years of industry experience driving digital transformation. He specializes in designing and implementing robust, scalable solutions using cloud-native technologies and DevOps practices. With deep expertise in Kubernetes, OpenShift, and GitOps, Pablo leads technical teams to help organizations solve complex challenges with innovative open-source solutions.

Francisco Meneses - Openshift Associate Manager @ Red Hat Francisco is an experienced information technology professional with over a decade in the industry. Throughout his career, he has served as an analyst, developer, consultant, and technical leader, working with major clients across both the public and private sectors. Currently, as an Associate OpenShift Manager at Red Hat, Francisco leads a team of architects across Latin America, driving the successful implementation of Red Hat technologies and services. As a Platform Engineering Ambassador, he actively contributes to the broader tech community and the Backstage project, sharing insights and code shaped by his extensive hands-on experience as an architect on diverse regional initiatives.

Sergio Canales - Principal Architect @ Red Hat Sergio is a Principal Architect, CNCF Ambassador, and Platform Engineering Ambassador with over a decade of experience in Cloud Native and Platform Engineering. He specializes in driving digital transformation, building resilient platforms, and enabling engineering teams to innovate effectively. Passionate about developer empowerment, open-source communities, and public speaking, Sergio is dedicated to mentoring and inspiring professionals to reach technical excellence. Throughout his career, he has helped leading organizations achieve continuous improvement in their cloud-native and enterprise-level initiatives.

Caio Madeiros - Senior DevOps Pre-sales Architect @ Testkube Caio is an IT professional with a broad skill set and a strong background in DevOps, Lean practices, and Platform Engineering. He is a DevOps Institute and Platform Engineering Community Ambassador, experienced in cloud-native technologies, and an advocate for open-source solutions. With a developer’s mindset, a passion for mentoring, and a reputation as a thought leader in his field, Caio brings both technical expertise and guidance to the tech community. Outside of work, he enjoys life as a devoted husband and proud dog owner.

Latin America builds different: Platform engineering adoption stories

The Latin American (LATAM) IT market is projected to reach $74.5 billion by 2029. What role does platform engineering play in this growth spike? Join our expert panel to discuss the real challenges, success stories, and future of platform engineering in the region.

In this session, Platform Engineering Ambassadors Pablo Castelo, Francisco Meneses, Sergio Canales and Caio Medeiros will discuss:

  • What is the #1 challenge a company in LATAM faces when wanting to implement an Internal Developer Platform (IDP) strategy?
  • What unique advantages, or 'superpowers' does LATAM bring to platform engineering ?
  • What is the next frontier for platform engineering in the region?

From budget realities to cultural advantages and adoption stories from the field, join experienced practitioners for an engaging session that examines the realities of platform engineering beyond the typical narratives.

After a 45-minute talk there’ll be a 15-minute Q&A, for which we encourage you to submit questions in advance. A webinar recording and related materials will be shared with all attendees after the event.


Speakers: Pablo Castelo - Architect @ Red Hat Pablo is an experienced IT Architect at Red Hat with over 15 years of industry experience driving digital transformation. He specializes in designing and implementing robust, scalable solutions using cloud-native technologies and DevOps practices. With deep expertise in Kubernetes, OpenShift, and GitOps, Pablo leads technical teams to help organizations solve complex challenges with innovative open-source solutions.

Francisco Meneses - Openshift Associate Manager @ Red Hat Francisco is an experienced information technology professional with over a decade in the industry. Throughout his career, he has served as an analyst, developer, consultant, and technical leader, working with major clients across both the public and private sectors. Currently, as an Associate OpenShift Manager at Red Hat, Francisco leads a team of architects across Latin America, driving the successful implementation of Red Hat technologies and services. As a Platform Engineering Ambassador, he actively contributes to the broader tech community and the Backstage project, sharing insights and code shaped by his extensive hands-on experience as an architect on diverse regional initiatives.

Sergio Canales - Principal Architect @ Red Hat Sergio is a Principal Architect, CNCF Ambassador, and Platform Engineering Ambassador with over a decade of experience in Cloud Native and Platform Engineering. He specializes in driving digital transformation, building resilient platforms, and enabling engineering teams to innovate effectively. Passionate about developer empowerment, open-source communities, and public speaking, Sergio is dedicated to mentoring and inspiring professionals to reach technical excellence. Throughout his career, he has helped leading organizations achieve continuous improvement in their cloud-native and enterprise-level initiatives.

Caio Madeiros - Senior DevOps Pre-sales Architect @ Testkube Caio is an IT professional with a broad skill set and a strong background in DevOps, Lean practices, and Platform Engineering. He is a DevOps Institute and Platform Engineering Community Ambassador, experienced in cloud-native technologies, and an advocate for open-source solutions. With a developer’s mindset, a passion for mentoring, and a reputation as a thought leader in his field, Caio brings both technical expertise and guidance to the tech community. Outside of work, he enjoys life as a devoted husband and proud dog owner.

Latin America builds different: Platform engineering adoption stories

Speaker: Sue Bayes Start Date: Thu, Aug 21st 2025 · 7:00 PM EEST (4:00 PM UTC) Language: ENGLISH (with Live Translated CCs) Location: Online (link visible for attendees)

===============================================================

This session introduces attendees to the fundamentals of web scraping using Python’s Beautiful Soup library. Attendees will learn how to navigate the complexities of HTML structures to extract valuable data efficiently. Key skills and concepts covered include: Understanding HTML and CSS: Learn how web pages are structured to identify the data you need. Setting up Beautiful Soup: Install and initialize the library to parse web content. Scraping Techniques: Use tags, attributes, and classes to locate and extract specific elements from web pages. Handling Dynamic Content: Work with tools like requests to scrape static pages and integrate with libraries like Selenium for dynamic content. Saving Scraped Data: Export extracted data into structured formats like CSV or JSON for further analysis. The session also emphasizes ethical considerations and best practices for web scraping, including handling website terms of service and respecting rate limits. Through hands-on examples, attendees will scrape a sample website and transform raw HTML into actionable insights. By the end of the session, participants will have the confidence to build their own web scraping workflows and apply them to real-world projects.

=============================================================== Speaker: Sue Bayes Microsoft Data Platform MVP \| Power BI & Microsoft Fabric Specialist

Sue is an independent data consultant and Microsoft MVP with over 20 years’ experience in business intelligence, analytics, and IT education. She’s a Microsoft Certified Fabric Data Engineer, Analytics Engineer, Azure Enterprise Data Analyst, and Power BI Data Analyst Associate.

For the past 7+ years, Sue has worked across public and private sectors, developing impactful Power BI solutions that span planning, project management, finance, and service-specific reporting.

Her work combines robust data engineering with insightful dashboard design, often integrating advanced techniques like sentiment analysis, Python forecasting, and bespoke data cleansing.

Before founding her consultancy, Sue spent 15 years lecturing in Business and Computing—an experience that continues to shape her passion for data literacy and enabling others.

Her current focus includes Microsoft Fabric, semantic modelling, and automating analytics using Python and DAX. She regularly speaks at major data conferences including SQLBits, Microsoft-led events, and other industry gatherings.

Sue also runs her own weekly tech user group, co-hosts the fortnightly Unpivot podcast, and actively contributes to the data community by sharing practical insights that demystify complex tools and help others embrace data with clarity and confidence.

When she’s not building reports or wrangling data, Sue’s often found training for marathons, running long distances across the Devon coast, walking her dog, or championing women in tech spaces.

Web scraping with Python and Beautiful Soup | Sue Bayes