talk-data.com
People (2 results)
Activities & events
| Title & Speakers | Event |
|---|---|
|
(Online) From Raw to Refined: Building Production Data Pipelines That Scale
2026-01-21 · 18:30
This is an Online event, the Teams link will be published on the right of this page for those who have registered. 18:30: From Raw to Refined: Building Production Data Pipelines That Scale - Pradeep Kalluri 19:55 Prize Draw - Packt eBooks Session details: From Raw to Refined: Building Production Data Pipelines That Scale - Pradeep Kalluri Every organization needs to move data from source systems to analytics platforms, but most teams struggle with reliability at scale. In this talk, I'll share the three-zone architecture pattern I use to build production data pipelines that process terabytes daily while maintaining data quality and operational simplicity. You'll learn: - Why the traditional "single pipeline" approach breaks at scale - How to structure pipelines using Raw\, Curated\, and Refined zones - Practical patterns for handling batch and streaming data with Kafka and Spark - Real incidents and lessons learned from production systems - Tools and technologies that work (PySpark\, Airflow\, Snowflake) This isn't theory—it's battle-tested patterns from years of building data platforms. Whether you're designing your first data pipeline or scaling an existing platform, you'll walk away with actionable techniques you can apply immediately. Speaker: Pradeep Kalluri Data Engineer \| NatWest \| Building Scalable Data Platforms Data Engineer with 3+ years of experience building production data platforms at NatWest, Accenture, and Capgemini. Specialized in cloud-native architectures, real-time processing with Kafka and Spark, and data quality frameworks. Published technical writer on Medium, sharing practical lessons from production systems. Passionate about making data platforms reliable and trustworthy. |
(Online) From Raw to Refined: Building Production Data Pipelines That Scale
|
|
[Notes]How to Build a Portfolio That Reflects Your Real Skills
2025-12-28 · 18:00
These are the notes of the previous "How to Build a Portfolio That Reflects Your Real Skills" event: Properties of an ideal portfolio repository:
📌 Backend & Frontend Portfolio Project Ideas
☕ Junior Java Backend Developer (Spring Boot)1. Shop Manager ApplicationA monolithic Spring Boot app designed with microservice-style boundaries. Features
Engineering Focus
2. Parallel Data Processing EngineBackend service for processing large datasets efficiently. Features
Demonstrates
3. Distributed Task Queue SystemSimple async job processing system. Features
Demonstrates
4. Rate Limiting & Load Control ServiceStandalone service that protects APIs from abuse. Features
Demonstrates
5. Search & Indexing BackendDocument or record search service. Features
Demonstrates
6. Distributed Configuration & Feature Flag ServiceCentralized config service for other apps. Features
Demonstrates
🐹 Mid-Level Go Backend Developer (Non-Kubernetes)1. High-Throughput Event Processing PipelineMulti-stage concurrent pipeline. Features
2. Distributed Job Scheduler & Worker SystemAsync job execution platform. Features
3. In-Memory Caching ServiceRedis-like cache written from scratch. Features
4. Rate Limiting & Traffic Shaping GatewayReverse-proxy-style rate limiter. Features
5. Log Aggregation & Query EngineIncrementally built system: Step-by-step
🐍 Mid-Level Python Backend Developer1. Asynchronous Task Processing SystemAsync job execution platform. Features
2. Event-Driven Data PipelineStreaming data processing service. Features
3. Distributed Rate Limiting ServiceAPI protection service. Steps
4. Search & Indexing BackendSearch system for logs or documents. Features
5. Configuration & Feature Flag ServiceShared configuration backend. Steps
🟦 Mid-Level TypeScript Backend Developer1. Asynchronous Job Processing SystemQueue-based task execution. Features
2. Real-Time Chat / Notification ServiceWebSocket-based system. Features
3. Rate Limiting & API GatewayAPI gateway with protections. Features
4. Search & Filtering EngineSearch backend for products, logs, or articles. Features
5. Feature Flag & Configuration ServiceCentralized config management. Features
🟨 Mid-Level Node.js Backend Developer1. Async Task Queue SystemBackground job processor. Features
2. Real-Time Chat / Notification ServiceSocket-based system. Features
3. Rate Limiting & API GatewayTraffic control service. Features
4. Search & Indexing BackendIndexing & querying service. 5. Feature Flag / Configuration ServiceShared backend for app configs. ⚛️ Mid-Level Frontend Developer (React / Next.js)1. Dynamic Analytics DashboardInteractive data visualization app. Features
2. E-Commerce StoreFull shopping experience. Features
3. Real-Time Chat / Collaboration AppLive multi-user UI. Features
4. CMS / Blogging PlatformSEO-focused content app. Features
5. Personalized Analytics / Recommendation UIData-heavy frontend. Features
6. AI Chatbot App — “My House Plant Advisor”LLM-powered assistant with production-quality UX. Core Features
Advanced Features
✅ Final AdviceYou do NOT need to build everything. Instead, pick 1–2 strong projects per role and focus on depth:
📌 Portfolio Quality Signals (Very Important)
🎯 Why This Helps in InterviewsWorking on serious projects gives you:
🎥 Demo & Documentation Best Practices
🤝 Open Source & Personal Projects (Interview Signal)Always mention that you have contributed to Open Source or built personal projects.
|
[Notes]How to Build a Portfolio That Reflects Your Real Skills
|
|
From Batch to Real-Time: The Business Case for Event Streaming
2025-11-20 · 16:00
In today’s digital landscape, data that moves slowly is data that loses value. As organizations strive for instant insights, responsive experiences, and operational agility, batch processing can no longer keep up. Event streaming offers a transformative alternative—enabling continuous data flow, faster decision-making, and real-time responsiveness across systems and services. In this session, we’ll explore why leading enterprises are embracing event-driven architectures and how event streaming unlocks new business potential. You’ll learn how real-time data pipelines drive innovation across industries, and how Kong’s Event Gateway brings consistency, observability, and governance to event-driven communication—bridging the gap between APIs and events for the modern enterprise. What You’ll Learn:
|
From Batch to Real-Time: The Business Case for Event Streaming
|
|
From Batch to Real-Time: The Business Case for Event Streaming
2025-11-20 · 16:00
In today’s digital landscape, data that moves slowly is data that loses value. As organizations strive for instant insights, responsive experiences, and operational agility, batch processing can no longer keep up. Event streaming offers a transformative alternative—enabling continuous data flow, faster decision-making, and real-time responsiveness across systems and services. In this session, we’ll explore why leading enterprises are embracing event-driven architectures and how event streaming unlocks new business potential. You’ll learn how real-time data pipelines drive innovation across industries, and how Kong’s Event Gateway brings consistency, observability, and governance to event-driven communication—bridging the gap between APIs and events for the modern enterprise. What You’ll Learn:
|
From Batch to Real-Time: The Business Case for Event Streaming
|
|
From Batch to Real-Time: The Business Case for Event Streaming
2025-11-20 · 16:00
In today’s digital landscape, data that moves slowly is data that loses value. As organizations strive for instant insights, responsive experiences, and operational agility, batch processing can no longer keep up. Event streaming offers a transformative alternative—enabling continuous data flow, faster decision-making, and real-time responsiveness across systems and services. In this session, we’ll explore why leading enterprises are embracing event-driven architectures and how event streaming unlocks new business potential. You’ll learn how real-time data pipelines drive innovation across industries, and how Kong’s Event Gateway brings consistency, observability, and governance to event-driven communication—bridging the gap between APIs and events for the modern enterprise. What You’ll Learn:
|
From Batch to Real-Time: The Business Case for Event Streaming
|
|
From Batch to Real-Time: The Business Case for Event Streaming
2025-11-20 · 16:00
In today’s digital landscape, data that moves slowly is data that loses value. As organizations strive for instant insights, responsive experiences, and operational agility, batch processing can no longer keep up. Event streaming offers a transformative alternative—enabling continuous data flow, faster decision-making, and real-time responsiveness across systems and services. In this session, we’ll explore why leading enterprises are embracing event-driven architectures and how event streaming unlocks new business potential. You’ll learn how real-time data pipelines drive innovation across industries, and how Kong’s Event Gateway brings consistency, observability, and governance to event-driven communication—bridging the gap between APIs and events for the modern enterprise. What You’ll Learn:
|
From Batch to Real-Time: The Business Case for Event Streaming
|
|
Geospatial Data on Databricks
2025-11-05 · 12:00
Most organizations capture huge volumes of spatial data, including addresses, coordinates, routes, and catchments, but struggle to operationalize it at scale. Traditional GIS (Geographic Information Systems) tools are powerful but isolated; unlocking value requires integrating spatial analytics directly within your data platform. In this session, we’ll cover: - Geospatial fundamentals on Databricks: understanding geometry vs. geography, coordinate systems, and H3 grids. - Scaling challenges: Combining spatial and business data, processing millions of coordinates efficiently, and maintaining real-time freshness. - Databricks capabilities: How Spatial SQL, Lakeflow, and Unity Catalog enable native spatial processing, federated access, and governed sharing across teams. - Applied use cases: From network optimisation to asset tracking and location-based insights across industries. We'll finish with a live demo, see how raw coordinates become actionable intelligence within the Lakehouse. Why Attend: - Learn how to bring geospatial analytics natively into Databricks. - Discover best practices for scaling spatial workloads efficiently. - Understand how Unity Catalog underpins governance and reusability. - See real-world examples and a live demo in action. Join us to learn how Databricks unifies spatial and analytical workloads, delivering governed, high-performance geospatial insight at enterprise scale. This session will be delivered by Unifeye's CDO and Databricks Champion Bianca Stratulat, and Senior Data Engineers, Jordan Begg and Hasnat Abdul |
Geospatial Data on Databricks
|
|
Bridging the AI–Data Gap: Collect, Curate, Serve
2025-11-02 · 19:31
Summary In this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand sides of data, highlighting that the bottleneck lies in the "middle layer" of curation, semantics, and serving. Omri and Ido outline a three-part framework for making data usable by LLMs and agents: collect, curate, serve, and share challenges of scaling from POCs to production, including compounding error rates and reliability concerns. They also explore organizational shifts, patterns for managing context windows, pragmatic views on schema choices, and Upriver's approach to building autonomous data workflows using determinism and LLMs at the right boundaries. The conversation concludes with a look ahead to AI-first data platforms where engineers supervise business semantics while automation stitches technical details end-to-end. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Omri Lifshitz and Ido Bronstein about the challenges of keeping up with the demand for data when supporting AI systemsInterview IntroductionHow did you get involved in the area of data management?We're here to talk about "The Growing Gap Between Data & AI". From your perspective, what is this gap, and why do you think it's widening so rapidly right now?How does this gap relate to the founding story of Upriver? What problems were you and your co-founders experiencing that led you to build this?The core premise of new AI tools, from RAG pipelines to LLM agents, is that they are only as good as the data they're given. How does this "garbage in, garbage out" problem change when the "in" is not a static file but a complex, high-velocity, and constantly changing data pipeline?Upriver is described as an "intelligent agent system" and an "autonomous data engineer." This is a fascinating "AI to solve for AI" approach. Can you describe this agent-based architecture and how it specifically works to bridge that data-AI gap?Your website mentions a "Data Context Layer" that turns "tribal knowledge" into a "machine-usable mode." This sounds critical for AI. How do you capture that context, and how does it make data "AI-ready" in a way that a traditional data catalog or quality tool doesn't?What are the most innovative or unexpected ways you've seen companies trying to make their data "AI-ready"? And where are the biggest points of failure you observe?What has been the most challenging or unexpected lesson you've learned while building an AI system (Upriver) that is designed to fix the data foundation for other AI systems?When is an autonomous, agent-based approach not the right solution for a team's data quality problems? What organizational or technical maturity is required to even start closing this data-AI gap?What do you have planned for the future of Upriver? And looking more broadly, how do you see this gap between data and AI evolving over the next few years?Contact Info Ido - LinkedInOmri - LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UpriverRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeAI AgentContext WindowModel Finetuning)The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
Powering AI Agents With Real-time Analytics
2025-09-30 · 06:30
This talk will explore how real-time analytics, powered by ClickHouse, enables AI agents to achieve high-speed data processing for various use cases, including natural data access. Attendees will discover the challenges and solutions for scaling AI agents and see a live demo of AI agents working with ClickHouse. |
ClickHouse Meetup Madrid
|
|
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
2025-09-28 · 23:46
Brijesh Tripathi
– CEO
@ Flex AI
,
Tobias Macey
– host
Summary In this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC architecture at Intel and deploying supercomputers like Aurora, highlighting how access friction and idle infrastructure slow progress. Join them as they discuss Flex AI's innovative approach to simplifying heterogeneous compute, standardizing on consistent Kubernetes layers, and abstracting inference across various accelerators, allowing teams to iterate faster without wrestling with drivers, libraries, or cloud-by-cloud differences. Brijesh also shares insights into Flex AI's strategies for lifting utilization, protecting real-time workloads, and spanning the full lifecycle from fine-tuning to autoscaled inference, all while keeping complexity at bay. Pre-amble I hope you enjoy this cross-over episode of the AI Engineering Podcast, another show that I run to act as your guide to the fast-moving world of building scalable and maintainable AI systems. As generative AI models have grown more powerful and are being applied to a broader range of use cases, the lines between data and AI engineering are becoming increasingly blurry. The responsibilities of data teams are being extended into the realm of context engineering, as well as designing and supporting new infrastructure elements that serve the needs of agentic applications. This episode is an example of the types of work that are not easily categorized into one or the other camp. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Brijesh Tripathi about FlexAI, a platform offering a service-oriented abstraction for AI workloadsInterview IntroductionHow did you get involved in machine learning?Can you describe what FlexAI is and the story behind it?What are some examples of the ways that infrastructure challenges contribute to friction in developing and operating AI applications?How do those challenges contribute to issues when scaling new applications/businesses that are founded on AI?There are numerous managed services and deployable operational elements for operationalizing AI systems. What are some of the main pitfalls that teams need to be aware of when determining how much of that infrastructure to own themselves?Orchestration is a key element of managing the data and model lifecycles of these applications. How does your approach of "workload as a service" help to mitigate some of the complexities in the overall maintenance of that workload?Can you describe the design and architecture of the FlexAI platform?How has the implementation evolved from when you first started working on it?For someone who is going to build on top of FlexAI, what are the primary interfaces and concepts that they need to be aware of?Can you describe the workflow of going from problem to deployment for an AI workload using FlexAI?One of the perennial challenges of making a well-integrated platform is that there are inevitably pre-existing workloads that don't map cleanly onto the assumptions of the vendor. What are the affordances and escape hatches that you have built in to allow partial/incremental adoption of your service?What are the elements of AI workloads and applications that you are explicitly not trying to solve for?What are the most interesting, innovative, or unexpected ways that you have seen FlexAI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on FlexAI?When is FlexAI the wrong choice?What do you have planned for the future of FlexAI?Contact Info LinkedInParting Question From your perspective, what are the biggest gaps in tooling, technology, or training for AI systems today?Links Flex AIAurora Super ComputerCoreWeaveKubernetesCUDAROCmTensor Processing Unit (TPU)PyTorchTritonTrainiumASIC == Application Specific Integrated CircuitSOC == System On a ChipLoveableFlexAI BlueprintsTenstorrentThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA |
Data Engineering Podcast |
|
Building a Data Analytics Agent using LLMs
2025-07-30 · 18:00
Build Your Own LLM Powered Data Analytics Agent — Using Free & Open-Source ToolsAs large language models continue to reshape the data landscape, one of the most exciting applications is creating intelligent data analytics assistants that make querying and exploring data as simple as asking a question. In this hands-on session, you’ll learn how to build your own interactive assistant using free, open-source tools — no paid licenses or proprietary systems required. We’ll guide you through connecting your data to a language model to enable natural language queries that return automated insights, visualizations, and summaries. Whether you’re a data analyst, business user, or enthusiast, this session will help you turn static datasets into dynamic, conversational experiences. You’ll also see a live demo of an AI-powered agent processing queries, performing analysis, and returning visual insights — with minimal setup and no complex coding. We’ll share practical design tips to make your assistant more reliable, interpretable, and scalable. What We Will Cover:
|
Building a Data Analytics Agent using LLMs
|
|
Building a Data Analytics Agent using LLMs
2025-07-30 · 18:00
Build Your Own LLM Powered Data Analytics Agent — Using Free & Open-Source ToolsAs large language models continue to reshape the data landscape, one of the most exciting applications is creating intelligent data analytics assistants that make querying and exploring data as simple as asking a question. In this hands-on session, you’ll learn how to build your own interactive assistant using free, open-source tools — no paid licenses or proprietary systems required. We’ll guide you through connecting your data to a language model to enable natural language queries that return automated insights, visualizations, and summaries. Whether you’re a data analyst, business user, or enthusiast, this session will help you turn static datasets into dynamic, conversational experiences. You’ll also see a live demo of an AI-powered agent processing queries, performing analysis, and returning visual insights — with minimal setup and no complex coding. We’ll share practical design tips to make your assistant more reliable, interpretable, and scalable. What We Will Cover:
|
Building a Data Analytics Agent using LLMs
|
|
Building a Data Analytics Agent using LLMs
2025-07-30 · 18:00
Build Your Own LLM Powered Data Analytics Agent — Using Free & Open-Source ToolsAs large language models continue to reshape the data landscape, one of the most exciting applications is creating intelligent data analytics assistants that make querying and exploring data as simple as asking a question. In this hands-on session, you’ll learn how to build your own interactive assistant using free, open-source tools — no paid licenses or proprietary systems required. We’ll guide you through connecting your data to a language model to enable natural language queries that return automated insights, visualizations, and summaries. Whether you’re a data analyst, business user, or enthusiast, this session will help you turn static datasets into dynamic, conversational experiences. You’ll also see a live demo of an AI-powered agent processing queries, performing analysis, and returning visual insights — with minimal setup and no complex coding. We’ll share practical design tips to make your assistant more reliable, interpretable, and scalable. What We Will Cover:
|
Building a Data Analytics Agent using LLMs
|
|
Orchestrating Databricks with Airflow: Unlocking the Power of MVs, Streaming Tables, and AI
2025-07-01
Tahir Fayyaz
– Product Manager at Databricks
@ / Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert
,
Shanelle Roman
– Product Manager at Databricks
As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms. |
|
|
Orchestrating Databricks with Airflow: Unlocking the Power of MVs, Streaming Tables, and AI
2025-07-01
Tahir Fayyaz
– Product Manager at Databricks
@ / Google Cloud Platform Team specialising in Data & Machine Learning, BigQuery expert
,
Shanelle Roman
– Product Manager at Databricks
As data workloads grow in complexity, teams need seamless orchestration to manage pipelines across batch, streaming, and AI/ML workflows. Apache Airflow provides a flexible and open-source way to orchestrate Databricks’ entire platform, from SQL analytics with Materialized Views (MVs) and Streaming Tables (STs) to AI/ML model training and deployment. In this session, we’ll showcase how Airflow can automate and optimize Databricks workflows, reducing costs and improving performance for large-scale data processing. We’ll highlight how MVs and STs eliminate manual incremental logic, enable real-time ingestion, and enhance query performance—all while maintaining governance and flexibility. Additionally, we’ll demonstrate how Airflow simplifies ML model lifecycle management by integrating Databricks’ AI/ML capabilities into end-to-end data pipelines. Whether you’re a dbt user seeking better performance, a data engineer managing streaming pipelines, or an ML practitioner scaling AI workloads, this session will provide actionable insights on using Airflow and Databricks together to build efficient, cost-effective, and future-proof data platforms. |
|
|
Scaling Blockchain ML With Databricks: From Graph Analytics to Graph Machine Learning
2025-06-11 · 23:30
Indra Rustandi
– Staff ML Engineer
@ Coinbase
Coinbase leverages Databricks to scale ML on blockchain data, turning vast transaction networks into actionable insights. This session explores how Databricks’ scalable infrastructure, powered by Delta Lake, enables real-time processing for ML applications like NFT floor price predictions. We’ll show how GraphFrames helps us analyze billion-node transaction graphs (e.g., Bitcoin) for clustering and fraud detection, uncovering structural patterns in blockchain data. But traditional graph analytics has limits. We’ll go further with Graph Neural Networks (GNNs) using Kumo AI, which learn from the transaction network itself rather than relying on hand-engineered features. By encoding relationships directly into the model, GNNs adapt to new fraud tactics, capturing subtle relationships that evolve over time. Join us to see how Coinbase is advancing blockchain ML with Databricks and deep learning on graphs. |
Data + AI Summit 2025 |
|
Scaling Talent & Compensation Planning: A DoorDash Story | The Data Apps Conference
2025-06-02 · 19:43
Ashwin Murugappan
– People Applications & Intelligence Engineer
@ DoorDash
Managing performance reviews, calibrations, and compensation adjustments across thousands of employees at DoorDash was becoming increasingly complex—especially after the Wolt acquisition 2x the employee base. Teams struggled with spreadsheet chaos, security risks, and inefficient manual processes. In this session, Ashwin Murugappan (People Applications & Intelligence Engineer) will share how DoorDash built the Cycle Management Hub using Sigma Data Apps to: Eliminate spreadsheet versioning issues with real-time, governed collaboration Improve efficiency and accuracy by integrating directly with Workday & Snowflake Enhance security & compliance with role-based access controls (RLS) Watch the demo and learn how Sigma’s input tables, write-back capabilities, and real-time data processing helped DoorDash modernize its HR data workflows at scale. ➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps ➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture |
Sigma Data Apps Conference 2025 |
|
This webinar will showcase the transformative potential of Microsoft Fabric for ISVs and SaaS applications. Discover how you can leverage Fabric to build innovative, data-driven applications that integrate seamlessly with AI capabilities. Learn from your host, Tighe Brennan and guest experts, Holly Kelly and James Boother, as they share practical insights and real-world examples of how ISVs are successfully building and scaling their applications with Microsoft Fabric. Why you should attend: Unlock New Opportunities: Learn how Microsoft Fabric can help you modernize your infrastructure, reduce friction in data preparation, and enable new data science use cases. Enhance Your Offerings: Understand how to embed powerful analytics and AI features into your applications, providing added value to your customers. Simplify Integration: Discover the benefits of a unified platform that simplifies data integration, governance, and security, making it easier to manage multi-tenant architectures. What we will cover: Introduction to Microsoft Fabric: Overview of the unified data platform and its components. Data Integration and OneLake: How OneLake serves as the center of gravity for data, enabling easy and cost-effective data ingestion and integration. Real-Time Intelligence: Leveraging real-time data processing for continuous data integration and actionable insights. AI and Copilot Integration: Utilizing AI Foundry and Copilot Studio to create generative AI solutions and enhance application capabilities. Multi-Tenancy and ISV Scenarios: Best practices for designing multi-tenant architectures, including tenant isolation models and deployment patterns. Embedding Analytics: Expanding Power BI embedded capabilities to modernize your applications and provide rich analytics experiences. Monetization Strategies: How to monetize your solutions through the Azure Marketplace and leverage Fabric's workload services. |
Empowering ISVs with Microsoft Fabric: Embed Data and AI into your applications
|
|
This webinar will showcase the transformative potential of Microsoft Fabric for ISVs and SaaS applications. Discover how you can leverage Fabric to build innovative, data-driven applications that integrate seamlessly with AI capabilities. Learn from your host, Tighe Brennan and guest experts, Holly Kelly and James Boother, as they share practical insights and real-world examples of how ISVs are successfully building and scaling their applications with Microsoft Fabric. Why you should attend: Unlock New Opportunities: Learn how Microsoft Fabric can help you modernize your infrastructure, reduce friction in data preparation, and enable new data science use cases. Enhance Your Offerings: Understand how to embed powerful analytics and AI features into your applications, providing added value to your customers. Simplify Integration: Discover the benefits of a unified platform that simplifies data integration, governance, and security, making it easier to manage multi-tenant architectures. What we will cover: Introduction to Microsoft Fabric: Overview of the unified data platform and its components. Data Integration and OneLake: How OneLake serves as the center of gravity for data, enabling easy and cost-effective data ingestion and integration. Real-Time Intelligence: Leveraging real-time data processing for continuous data integration and actionable insights. AI and Copilot Integration: Utilizing AI Foundry and Copilot Studio to create generative AI solutions and enhance application capabilities. Multi-Tenancy and ISV Scenarios: Best practices for designing multi-tenant architectures, including tenant isolation models and deployment patterns. Embedding Analytics: Expanding Power BI embedded capabilities to modernize your applications and provide rich analytics experiences. Monetization Strategies: How to monetize your solutions through the Azure Marketplace and leverage Fabric's workload services. Learn new skills and discover the power of Microsoft Fabric with step-by-step guidance: Click Here. |
Empowering ISVs with Microsoft Fabric: Embed Data and AI into your applications
|
|
ClickHouse Delhi/Gurgaon Meetup - March 2025
2025-03-22 · 05:00
We are excited to finally have the first ClickHouse Meetup in the vibrant city of Delhi! Join the ClickHouse crew, from Singapore and from different cities in India, for an engaging day of talks, food, and discussion with your fellow database enthusiasts. But here's the deal: to secure your spot, make sure you register ASAP! 🗓️ Agenda:
If anyone from the community is interested in sharing a talk at future meetups, complete this CFP form and we’ll be in touch. _______ 🎤 Session Details: Introduction to ClickHouse Discover the secrets behind ClickHouse's unparalleled efficiency and performance. Johnny will give an overview of different use cases for which global companies are adopting this groundbreaking database to transform data storage and analytics. Speaker: Rakesh Puttaswamy, Solution Architect @ ClickHouse Rakesh Puttaswamy is a Solution Architect with ClickHouse, working with users across India, with over 12 years of experience in data architecture, big data, data science, and software engineering.Rakesh helps organizations design and implement cutting-edge data-driven solutions. With deep expertise in a broad range of databases and data warehousing technologies, he specializes in building scalable, innovative solutions to enable data transformation and drive business success. 🎤 Session Details: ClickPipes Overview and demo ClickPipes is a powerful integration engine that simplifies data ingestion at scale, making it as easy as a few clicks. With an intuitive onboarding process, setting up new ingestion pipelines takes just a few steps—select your data source, define the schema, and let ClickPipes handle the rest. Designed for continuous ingest, it automates pipeline management, ensuring seamless data flow without manual intervention. In this talk, Kunal will demo the Postgres CDC connector for ClickPipes, enabling seamless, native replication of Postgres data to ClickHouse Cloud in just a few clicks—no external tools needed for fast, cost-effective analytics. Speaker: Kunal Gupta, Sr. Software Engineer @ ClickHouse Kunal Gupta is a Senior Software Engineer at ClickHouse, joining through the acquisition of PeerDB in 2024, where he played a pivotal role as a founding engineer. With several years of experience in architecting scalable systems and real-time applications, Kunal has consistently driven innovation and technical excellence. Previously, he was a founding engineer for new solutions at ICICIdirect and at AsknBid Tech, leading high-impact teams and advancing code analysis, storage solutions, and enterprise software development. 🎤 Session Details: Optimizing Log Management with Clickhouse: Cost-Effective & Scalable Solutions Efficient log management is essential in today's cloud-native environments, yet traditional solutions like ElasticSearch often face scalability issues, high costs, and performance limitations. This talk will begin with an overview of common logging tools and their challenges, followed by an in-depth look at ClickHouse's architecture. We will compare ClickHouse with ElasticSearch, focusing on improvements in query performance, storage efficiency, and overall cost-effectiveness. A key highlight will be OLX India's migration to ClickHouse, detailing the motivations behind the shift, the migration strategy, key optimizations, and the resulting 50% reduction in log storage costs. By the end of this talk, attendees will gain a clear understanding of when and how to leverage ClickHouse for log management, along with best practices for optimizing performance and reducing operational costs. Speaker: Pushpender Kumar, DevOps Architect @ OLX India Born and raised in Bijnor, moved to Delhi to stay ahead in the race of life. Currently working as a DevOps Architect at OLX India, specializing in cloud infrastructure, Kubernetes, and automation with over 10 years of experience. Successfully optimized log storage costs by 50% using Clickhouse, bringing scalability and efficiency to large-scale logging systems. Passionate about cloud optimization, DevOps hiring, and performance engineering. 🎤 Session Details: ClickHouse at Physics Wallah: Empowering Real-Time Analytics at Scale This session explores how Physics Wallah revolutionized its real-time analytics capabilities by leveraging ClickHouse. We'll delve into the journey of implementing ClickHouse to efficiently handle large-scale data processing, optimize query performance, and power diverse use cases such as user activity tracking and engagement analysis. By enabling actionable insights and seamless decision-making, this transformation has significantly enhanced the learning experience for millions of users. Today, more than five customer-facing products at Physics Wallah are powered by ClickHouse, serving over 10 million students and parents, including 1.5 million Daily Active Users. Our in-house ClickHouse cluster, hosted and managed within our EKS infrastructure on AWS Cloud, ingests more than 10 million rows of data daily from various sources. Join us to learn about the architecture, challenges, and key strategies behind this scalable, high-performance analytics solution. Speaker: Utkarsh G. Srivastava, Software Development Engineer III @ Physics Wallah As a versatile Software Engineer with over 7 years of experience in the IT industry, I have had the privilege of taking on diverse roles, with a primary focus on backend development, data engineering, infrastructure, DevOps, and security. Throughout my career, I have played a pivotal role in transformative projects, consistently striving to craft innovative and effective solutions for customers in the SaaS space. 🎤 Session Details: FabFunnel & ClickHouse: Delivering Real-Time Marketing Analytics We are a performance marketing company that relies on real-time reporting to drive data-driven decisions and maximize campaign effectiveness. As our client base expanded, we encountered significant challenges with our reporting system—frequent data updates meant handling large datasets inefficiently, leading to slow query execution and delays in delivering insights. This bottleneck hindered our ability to provide timely optimizations for ad campaigns. To address these issues, we needed a solution that could handle rapid data ingestion and querying at scale without the overhead of traditional refresh processes. In this talk, we’ll share how we transformed our reporting infrastructure to achieve real-time insights, enhancing speed, scalability, and efficiency in managing large-scale ad performance data. Speakers: Anmol Jain, SDE-2 (Full stack Developer), & Siddhant Gaba, SDE-2 (Python) @ Idea Clan From competing as a national table tennis player to building high-performance software, Anmol Jain brings a unique mix of strategy and problem-solving to tech. With 3+ years of experience at Idea Clan, they play a key role in scaling Lookfinity and FabFunnel, managing multi-million-dollar ad spends every month. Specializing in ClickHouse, React.js, and Node.js, Anmol focuses on real-time data processing and scalable backend solutions. At this meet-up, they’ll share insights on solving reporting challenges and driving real-time decision-making in performance marketing. Siddhant Gaba is an SDE II at Idea Clan, with expertise in Python, Java, and C#, specializing in scalable backend systems. With four years of experience working with FastAPI, PostgreSQL, MongoDB, and ClickHouse, he focuses on real-time analytics, database optimization, and distributed systems. Passionate about high-performance computing, asynchronous APIs, and system design, he aims to advance real-time data processing. Outside of work, he enjoys playing volleyball. At this meetup, he will share insights on how ClickHouse transformed real-time reporting and scalability. 🎤 Session Details: From SQL to AI: Building Intelligent Applications with ClickHouse and LangDB As AI becomes a driving force behind innovation, building applications that seamlessly integrate AI capabilities with existing data infrastructures is critical. In this session, we explore the creation of agentic applications using ClickHouse and LangDB. We will introduce the concept of an AI gateway, explaining its role in connecting powerful AI models with the high-performance analytics engine of ClickHouse. By leveraging LangDB, we demonstrate how to directly interact with AI functions as User-Defined Functions (UDFs) in ClickHouse, enabling developers to design and execute complex AI workflows within SQL. Additionally, we will showcase how LangDB facilitates deep visibility into AI function behaviors and agent interactions, providing tools to analyze and optimize the performance of AI-driven logic. Finally, we will highlight how ClickHouse, powered by LangDB APIs, can be used to evaluate and refine the quality of LLM responses, ensuring reliable and efficient AI integrations. Speaker: Matteo Pelati, Co-founder, LangDB.ai Matteo Pelati is a seasoned software engineer with over two decades of experience, specializing in data engineering for the past ten years. He is the co-founder of LangDB, a company based in Singapore building the fastest Open Source AI Gateway. Before founding LangDB, he was part of the early team at DataRobot, where he contributed to scaling their product for enterprise clients. Subsequently, he joined DBS Bank where he built their data platform and team from the ground up. Prior to starting LangDB, Matteo led the data group for Asia Pacific and data engineering at Goldman Sachs. |
ClickHouse Delhi/Gurgaon Meetup - March 2025
|