talk-data.com
People (552 results)
See all 552 →Companies (1 result)
Activities & events
| Title & Speakers | Event |
|---|---|
|
Google NY Site Reliability Engineering (SRE) Tech Talks, 16 Dec 2025
2025-12-16 · 23:00
Google SRE NYC proudly announces our last Google SRE NYC Tech Talk for 2025. This event is co-sponsored by sentry.io. Thank you Sentry for your partnership! Let's farewell 2025 with three amazing interactive short talks on Site Reliability and DevOps topics! As always the event will include an opportunity to mingle with the speakers and attendees over some light snacks and beverages after the talks. The Meetup will take place on Tuesday, 16th of December 2025 at 6:00 PM at our Chelsea Markets office in NYC. The doors will open at 5:30 pm. Pls RSVP only if you're able to attend in-person, there will be no live streaming. When RSVP'ing to this event, please enter your full name exactly as it appears on your government issued ID. You will be required to present your ID at check in. Agenda: Paul Jaffre - Senior Developer Experience Engineer\, sentry.io One Trace to Rule Them All: Unifying Sentry Errors with OpenTelemetry tracing SREs face the challenge of operating reliable observability infrastructure while avoiding vendor lock-in from proprietary APM (Application Performance Monitoring) solutions. OpenTelemetry has become the standard for instrumenting applications, allowing teams to collect traces, metrics, and logs. But raw telemetry data isn't enough. SREs need tools to visualize, debug, and respond to production incidents quickly. Sentry now supports OTLP, enabling teams to send OpenTelemetry data directly to Sentry for analysis. This talk covers how Sentry's OTLP support works in practice: connecting frontend and backend traces across services, correlating logs with distributed traces, and using tools to identify slow queries and performance bottlenecks. We'll discuss the practical benefits for SREs, like faster incident resolution, better cross-team debugging, and the flexibility to change observability backends without re-instrumenting code. Paul’s background spans engineering, product management, UX design, and open source. He has a soft spot for dev tools and loses sleep over making things easy to understand and use. Paul has a dynamic professional background, from strategy to stability. His time at Krossover Intelligence established a strong foundation by blending Product Management with hands-on development, and he later focused on core reliability at MakerBot, where he implemented automated end-to-end testing and drove performance improvements. He then extended this expertise in stability and scale at Cypress.io, where he served as a Developer Experience Engineer, focusing on improving workflow, contribution, and usability for their widely adopted open-source community. Thiara Ortiz - Cloud Gaming SRE Manager\, Netflix Managing Black Box Systems SREs often face ambiguity when managing black box systems (LLMs, Games, Poorly Understood Dependencies). We will discuss how Netflix monitors service health as black boxes using multiple measurement techniques to understand system behavior, aligning with the need for robust observability tools. These strategies are crucial for system reliability and user experience. By proactively identifying and resolving issues, we ensure smoother playback experience and maintain user trust, even as the platform continues to evolve and gain maturity. The principles shared within this talk can be expanded to other applications such as AI reliability in data quality and model deployments. Thiara has worked at some of the largest internet companies in the world, Meta and Netflix. During her time at Meta, Thiara found a passion for distributed systems and bringing new hardware into production. Always curious to explore new solutions to complex problems, Thiara developed Fleet Scanner, internally known as Lemonaid, to perform memory, compute, and storage benchmarks on each Meta server in production. This service runs on over 5 million servers and continues to be utilized at Meta. Since Meta, Thiara has been working at Netflix as a Senior CDN Reliability engineer, and now, Cloud Gaming SRE Manager. When incidents occur and Netflix's systems do not behave as expected, Thiara can be found working and engaging the necessary teams to remediate these issues. Andrew Espira - Platform and Site Reliability Engineer\, Founding Engineer kustode ML-Powered Predictive SRE: Using Behavioral Signals to Prevent Cluster Inefficiencies Before They Impact Production SREs managing ML clusters often discover resource inefficiencies and queue bottlenecks only after they've impacted production services. This talk presents a machine learning approach to predict these issues before they occur, transforming SRE from reactive firefighting to proactive system optimization. We demonstrate how to build predictive models using production cluster traces that identify two critical failure modes: (1) GPU under-utilization relative to requested resources, and (2) abnormal queue wait times that indicate impending service degradation. The SRE practitioners will learn how to extract early warning indicators from standard cluster logs, build ML models that provide actionable confidence scores for operational decisions, and take practical steps to integrate predictive analytics into existing SRE toolchains to achieve 50%+ reduction in resource waste and queue-related incidents This talk bridges the gap between traditional SRE observability and modern predictive analytics, showing how teams can evolve from reactive monitoring to intelligent, forward-looking reliability engineering" Andrew has over 8 years of experience architecting and maintaining large-scale distributed systems. He is the Founding Engineer of Kustode (kustode.com), where he develops cutting-edge reliability and observability solutions for modern infrastructure in the Insurance and health care solutions space. Currently pursuing graduate studies in Data Science at Saint Peter's University, he specializes in the intersection of reliability engineering and artificial intelligence. His research focuses on applying machine learning to operational challenges, with publications in peer-reviewed venues including ScienceDirect. He's passionate about making complex systems more predictable and maintainable through data-driven approaches. When not optimizing cluster performance or building the next generation of observability tools, Andrew enjoys contributing to open-source projects and mentoring early-career engineers in the SRE community. Our Tech Talks series are for professional development and networking: no recruiters, sales or press please! Google is committed to providing a harassment-free and inclusive conference experience for everyone, and all participants must follow our Event Community Guidelines. The event will be photographed and video recorded. Event space is limited! A reservation is required to attend. Reserve your spot today and share the event details with your SRE/DevOps friends 🙂 |
Google NY Site Reliability Engineering (SRE) Tech Talks, 16 Dec 2025
|
|
*In Person Only MeetUp, Free To Attend* Thank you to Accenture for sponsoring and hosting this event. Join us for a In-Person User Group Meeting (LDPaC), where you can network, learn, ask a question, and meet other likeminded folks. These events are a really great opportunity to socialise in an informal learning setting. Remember to tell your friends and the people you work with; make sure you register as soon as you can. We will need to provide a list of names to Accenture before the event, so to ensure there are no issues with access on the day please make sure you have registered. 17.45 - 18:00 Network 🤝 18.00 - 18:30 Drinks & Pizza 🍕 18:30 - 18:40 Intro🎙️ 18:40 - 19:30 An Introduction to Azure Terraform - Jake Walsh Everything you need to start using Azure Terraform in around 60 minutes! Covering setup, tips, tricks, and deploying your first Cloud Environment using Terraform. For many people, the initial steps to using infrastructure as code can be daunting - but in this session I will alleviate those fears! In this session I will cover the following areas: Introduction - what is Terraform and Infrastructure as Code Why? - Why should we use Terraform and what are the benefits? How? - How do I get started? Live Demo - I'll go through a full demo of the installation, code creation, and the deployment of an Azure environment using Terraform. Tips/Tricks - covering tips and tricks that will allow you to get the most when using Terraform. Next Steps - how to continue developing your skills and learning more, including sample challenges, recommended reading, exams, and sample code to try out. 19:30 - 19:40 10-min Break 🥤 19:40 - 20:30 The CEO will thank you: Cost-Cutting Techniques for Data Engineers - Miky Schreiber Businesses spend at least 40% of their cloud costs on data engineering and according to Garter, 70% of the cloud costs are wasted. You do the math - as a data engineer you can save more than 25% of your company’s cloud costs! Isn’t that a good reason for the CEO to personally thank you? In this talk, we will discuss and demonstrate real-world techniques, solutions and methodologies for data engineering cost reduction that we implemented in Next Insurance. We will show how we got into stable, managed and predicted costs. We will also elevate our expertise and learn how to seamlessly integrate cost efficiency into design and code. We’ll look at managing and reducing storage and data scan costs, the right ways to make sure that we don’t find ourselves with unexpected costs with serverless services such as Redshift Serverless and Lambda functions, decrease and control the costs of EMR clusters and building a blameless cost-aware culture in the engineering group. This talk will focus on the AWS technological stack, but the concepts are the same for all cloud providers. Come and join the Leeds Data Community and start learning and networking! All are welcome! |
LDPaC 09 Dec 25: Intro to Terraform | Cost-Cutting Techniques for Data Engineers
|
|
Databricks Cost Optimization | Data Engineering Meetup | Berlin, Dec 9th
2025-12-09 · 17:30
We're celebrating 1 year applydata Meetups in Berlin! 🎉 Let’s kick things off for our last Meetup in 2025, this time focusing on Databricks Cost Optimization and featuring an interactive data engineering quiz. Join us on December 9th in Berlin and bring all your questions & curiosity! Kaan Ara: "Databricks Cost Optimization: A Multi-Layered Strategy for Performance and Efficiency"**Kaan Ara, Senior Cloud Engineer at Diconium, about his talk: "Databricks cost optimization requires a multi-layered strategy that focuses on three pillars: efficient Compute, optimized Storage, and strict Governance. Efficiency is driven by leveraging technologies like Photon and Serverless SQL, while storage is optimized using Delta Lake features such as Z-ordering and aggressive vacuuming. Strict governance, enforced through cluster policies and auto-termination, ensures these technical gains translate into consistent budget predictability without sacrificing performance." Who's the data expert in the room? Interactive data pub quizAfter the keynote, it’s your turn: we’ll fire up a quiz in pub-style. There’s no prep needed – everyone is welcome to join, no matter if you're a data engineering expert or a data newbie! What to expect:
Timetable:
More on the -> applydata data engineering meetup page. Our goal is to form a local data-loving community, so join us and let's talk data together! --- At the event, sound, image and video recordings are created and published for documentation purposes as well as for the presentation of the event in publicly accessible media, on websites and blogs and for presentation on social media. By participating the event, the participant implicitly consents to the aforementioned photo and/or video recordings. Find more information on data protection here. |
Databricks Cost Optimization | Data Engineering Meetup | Berlin, Dec 9th
|
|
Cost-Efficient Data Engineering: Scaling Without Breaking the Bank
2025-12-09 · 17:30
Come along to the last meetup event of the year! In this edition, we'll be discussing cost efficiency and how you can scale without things getting out of hand. This meetup will feature two talks and a panel discussion, so make sure to join us. We’ll be sharing more details soon, but spots are limited — be sure to grab your ticket before they run out! When? 17:30 - 18:30 Networking with food and drinks from Altinity 18:30 - 20:00 Talk + Panel discussion 20:00 - 20:30 More networking Where? London Art Bar (300 High Holborn) Speakers and Talks: Talk: Fast, Big, and Cheap: Pro Tricks for Cost Control in Analytic Apps (Robert Hodges, CEO @ Altinity) Analytic databases are heavy users of storage and compute that quickly grow into budget-busting behemoths. Altinity has helped thousands of companies build high performance analytics on open source ClickHouse. This talk covers our learnings about keeping systems cheap without sacrificing speed--from optimizing I/O to using compute at low-cost providers like Hetzner to keeping a single copy of data in Iceberg data lakes. Places are limited, make sure you register! Note: Please ensure your RSVP status is kept up to date, as this helps us offer spots to those on the waitlist. Please be aware that if you have three or more no-shows, you may be ineligible to attend future events. |
Cost-Efficient Data Engineering: Scaling Without Breaking the Bank
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Oct 30 - AI, ML and Computer Vision Meetup
2025-10-30 · 16:00
Join the virtual Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision. Date, Time and Location Oct 30, 2025 9 AM Pacific Online. Register for the Zoom! The Agent Factory: Building a Platform for Enterprise-Wide AI Automation In this talk we will explore what it takes to build an enterprise-ready AI automation platform at scale. The topics covered will include:
About the Speaker Virender Bhargav at Flipkart is a seasoned engineering leader whose expertise spans business technology integration, enterprise applications, system design/architecture, and building highly scalable systems. With a deep understanding of technology, he has spearheaded teams, modernized technology landscapes, and managed core platform layers and strategic products. With extensive experience driving innovation at companies like Paytm and Flipkart, his contributions have left a lasting impact on the industry. Scaling Generative Models at Scale with Ray and PyTorch Generative image models like Stable Diffusion have opened up exciting possibilities for personalization, creativity, and scalable deployment. However, fine-tuning them in production‐grade settings poses challenges: managing compute, hyperparameters, model size, data, and distributed coordination are nontrivial. In this talk, we’ll dive deep into learning how to fine-tune Stable Diffusion models using Ray Train (with HuggingFace Diffusers), including approaches like DreamBooth and LoRA. We’ll cover what works (and what doesn’t) in scaling out training jobs, handling large data, optimizing for GPU memory and speed, and validating outputs. Attendees will come away with practical insights and patterns they can use to fine-tune generative models in their own work. About the Speaker Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems. Suman’s expertise spans Natural Language Processing (NLP), Large Language Models (LLMs), and Retrieval-Augmented Generation (RAG). Earlier in his career, he developed performance benchmarking and monitoring tools for distributed storage systems. Beyond engineering, Suman is an active community contributor, having spoken at over 100 global conferences and events, including PyCon, PyData, ODSC, AIE and numerous meetups worldwide. Privacy-preserving in Computer Vision through Optics Learning Cameras are now ubiquitous, powering computer vision systems that assist us in everyday tasks and critical settings such as operating rooms. Yet, their widespread use raises serious privacy concerns: traditional cameras are designed to capture high-resolution images, making it easy to identify sensitive attributes such as faces, nudity, or personal objects. Once acquired, such data can be misused if accessed by adversaries. Existing software-based privacy mechanisms, such as blurring or pixelation, often degrade task performance and leave vulnerabilities in the processing pipeline. In this talk, we explore an alternative question: how can we preserve privacy before or during image acquisition? By revisiting the image formation model, we show how camera optics themselves can be learned and optimized to acquire images that are unintelligible to humans yet remain useful for downstream vision tasks like action recognition. We will discuss recent approaches to learning camera lenses that intentionally produce privacy-preserving images, blurry and unrecognizable to the human eye, but still effective for machine perception. This paradigm shift opens the door to a new generation of cameras that embed privacy directly into their hardware design. About the Speaker Carlos Hinojosa is a Postdoctoral researcher at King Abdullah University of Science and Technology (KAUST) working with Prof. Bernard Ghanem. His research interests span Computer Vision, Machine Learning, AI Safety, and AI for Science. He focuses on developing safe, accurate, and efficient vision systems and machine-learning models that can reliably perceive, understand, and act on information, while ensuring robustness, protecting privacy, and aligning with societal values. It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data Can we match vision and language embeddings without any supervision? According to the platonic representation hypothesis, as model and dataset scales increase, distances between corresponding representations are becoming similar in both embedding spaces. Our study demonstrates that pairwise distances are often sufficient to enable unsupervised matching, allowing vision-language correspondences to be discovered without any parallel data. About the Speaker Dominik Schnaus is a third-year Ph.D. student in the Computer Vision Group at the Technical University of Munich (TUM), supervised by Daniel Cremers. His research centers on multimodal and self-supervised learning with a special emphasis on understanding similarities across embedding spaces of different modalities. |
Oct 30 - AI, ML and Computer Vision Meetup
|
|
Dear data-loving community, we can't wait to present to you our new Meetup event: This time, it will be a collaboration with RisingWave, a platform for real-time streaming data management and analysis. Yingjun Wu, Founder and CEO at RisingWave Labs, will share his experience in a techy talk, as well as Behnaz Derakhshani, who works as a Specialist Data Engineer at Diconium's data department. Additionally, we're going to welcome external guest speaker Erik Schmiegelow, CEO at Hivemind Technologies. Exciting line-up, right? :D Join us on September 16th in Berlin and bring all your questions! Here are the topics you can expect: Yingjun Wu: Achieving Sub‑100 ms Real‑Time Stream Processing with an S3‑Native Architecture Stream processing systems have traditionally relied on local storage engines such as RocksDB to achieve low latency. While effective in single-node setups, this model doesn't scale well in the cloud, where elasticity and separation of compute and storage are essential. In this talk, we'll explore how RisingWave rethinks the architecture by building directly on top of S3 while still delivering sub-100 ms latency. At the core is Hummock, a log-structured state engine designed for object storage. Hummock organizes state into a three-tier hierarchy: in-memory cache for the hottest keys, disk cache managed by Foyer for warm data, and S3 as the persistent cold tier. This approach ensures queries never directly hit S3, avoiding its variable performance. We'll also examine how remote compaction offloads heavy maintenance tasks from query nodes, eliminating interference between user queries and background operations. Combined with fine-grained caching policies and eviction strategies, this architecture enables both consistent query performance and cloud-native elasticity. Attendees will walk away with a deeper understanding of how to design streaming systems that balance durability, scalability, and low latency in an S3-based environment. Behnaz Derakhshani: From Raw Data to Trusted Assets: A Practical Walkthrough with AWS services and Collibra Expect a hands-on journey of Behnaz showing how modern data lake tools and governance platforms connect the dots, making your data discoverable, governed, and productized for real-world use. Erik Schmiegelow: Effective Agentic GenAI in Data Streaming Successful genAI projects strike the balance between impact, accuracy, and cost. In this talk, Erik will cover how to create agentic data applications effectively, choosing when and how to integrate them in data streams and keep response quality issues and costs in check. What you can expect:
Timetable:
Our goal is to form a local data-loving community, so join us and let's talk data together! -> Our event page, where you can also contact us if you want to present in the future at our Meetup: Data Engineering MeetUp Berlin - applydata --- At the event, sound, image and video recordings are created and published for documentation purposes as well as for the presentation of the event in publicly accessible media, on websites and blogs and for presentation on social media. By participating the event, the participant implicitly consents to the aforementioned photo and/or video recordings. Find more information on data protection here. |
Data Builders’ Evening: Architecture, Engineering & Beyond | Berlin, Sep. 16th
|
|
ClickHouse Delhi/Gurgaon Meetup - March 2025
2025-03-22 · 05:00
We are excited to finally have the first ClickHouse Meetup in the vibrant city of Delhi! Join the ClickHouse crew, from Singapore and from different cities in India, for an engaging day of talks, food, and discussion with your fellow database enthusiasts. But here's the deal: to secure your spot, make sure you register ASAP! 🗓️ Agenda:
If anyone from the community is interested in sharing a talk at future meetups, complete this CFP form and we’ll be in touch. _______ 🎤 Session Details: Introduction to ClickHouse Discover the secrets behind ClickHouse's unparalleled efficiency and performance. Johnny will give an overview of different use cases for which global companies are adopting this groundbreaking database to transform data storage and analytics. Speaker: Rakesh Puttaswamy, Solution Architect @ ClickHouse Rakesh Puttaswamy is a Solution Architect with ClickHouse, working with users across India, with over 12 years of experience in data architecture, big data, data science, and software engineering.Rakesh helps organizations design and implement cutting-edge data-driven solutions. With deep expertise in a broad range of databases and data warehousing technologies, he specializes in building scalable, innovative solutions to enable data transformation and drive business success. 🎤 Session Details: ClickPipes Overview and demo ClickPipes is a powerful integration engine that simplifies data ingestion at scale, making it as easy as a few clicks. With an intuitive onboarding process, setting up new ingestion pipelines takes just a few steps—select your data source, define the schema, and let ClickPipes handle the rest. Designed for continuous ingest, it automates pipeline management, ensuring seamless data flow without manual intervention. In this talk, Kunal will demo the Postgres CDC connector for ClickPipes, enabling seamless, native replication of Postgres data to ClickHouse Cloud in just a few clicks—no external tools needed for fast, cost-effective analytics. Speaker: Kunal Gupta, Sr. Software Engineer @ ClickHouse Kunal Gupta is a Senior Software Engineer at ClickHouse, joining through the acquisition of PeerDB in 2024, where he played a pivotal role as a founding engineer. With several years of experience in architecting scalable systems and real-time applications, Kunal has consistently driven innovation and technical excellence. Previously, he was a founding engineer for new solutions at ICICIdirect and at AsknBid Tech, leading high-impact teams and advancing code analysis, storage solutions, and enterprise software development. 🎤 Session Details: Optimizing Log Management with Clickhouse: Cost-Effective & Scalable Solutions Efficient log management is essential in today's cloud-native environments, yet traditional solutions like ElasticSearch often face scalability issues, high costs, and performance limitations. This talk will begin with an overview of common logging tools and their challenges, followed by an in-depth look at ClickHouse's architecture. We will compare ClickHouse with ElasticSearch, focusing on improvements in query performance, storage efficiency, and overall cost-effectiveness. A key highlight will be OLX India's migration to ClickHouse, detailing the motivations behind the shift, the migration strategy, key optimizations, and the resulting 50% reduction in log storage costs. By the end of this talk, attendees will gain a clear understanding of when and how to leverage ClickHouse for log management, along with best practices for optimizing performance and reducing operational costs. Speaker: Pushpender Kumar, DevOps Architect @ OLX India Born and raised in Bijnor, moved to Delhi to stay ahead in the race of life. Currently working as a DevOps Architect at OLX India, specializing in cloud infrastructure, Kubernetes, and automation with over 10 years of experience. Successfully optimized log storage costs by 50% using Clickhouse, bringing scalability and efficiency to large-scale logging systems. Passionate about cloud optimization, DevOps hiring, and performance engineering. 🎤 Session Details: ClickHouse at Physics Wallah: Empowering Real-Time Analytics at Scale This session explores how Physics Wallah revolutionized its real-time analytics capabilities by leveraging ClickHouse. We'll delve into the journey of implementing ClickHouse to efficiently handle large-scale data processing, optimize query performance, and power diverse use cases such as user activity tracking and engagement analysis. By enabling actionable insights and seamless decision-making, this transformation has significantly enhanced the learning experience for millions of users. Today, more than five customer-facing products at Physics Wallah are powered by ClickHouse, serving over 10 million students and parents, including 1.5 million Daily Active Users. Our in-house ClickHouse cluster, hosted and managed within our EKS infrastructure on AWS Cloud, ingests more than 10 million rows of data daily from various sources. Join us to learn about the architecture, challenges, and key strategies behind this scalable, high-performance analytics solution. Speaker: Utkarsh G. Srivastava, Software Development Engineer III @ Physics Wallah As a versatile Software Engineer with over 7 years of experience in the IT industry, I have had the privilege of taking on diverse roles, with a primary focus on backend development, data engineering, infrastructure, DevOps, and security. Throughout my career, I have played a pivotal role in transformative projects, consistently striving to craft innovative and effective solutions for customers in the SaaS space. 🎤 Session Details: FabFunnel & ClickHouse: Delivering Real-Time Marketing Analytics We are a performance marketing company that relies on real-time reporting to drive data-driven decisions and maximize campaign effectiveness. As our client base expanded, we encountered significant challenges with our reporting system—frequent data updates meant handling large datasets inefficiently, leading to slow query execution and delays in delivering insights. This bottleneck hindered our ability to provide timely optimizations for ad campaigns. To address these issues, we needed a solution that could handle rapid data ingestion and querying at scale without the overhead of traditional refresh processes. In this talk, we’ll share how we transformed our reporting infrastructure to achieve real-time insights, enhancing speed, scalability, and efficiency in managing large-scale ad performance data. Speakers: Anmol Jain, SDE-2 (Full stack Developer), & Siddhant Gaba, SDE-2 (Python) @ Idea Clan From competing as a national table tennis player to building high-performance software, Anmol Jain brings a unique mix of strategy and problem-solving to tech. With 3+ years of experience at Idea Clan, they play a key role in scaling Lookfinity and FabFunnel, managing multi-million-dollar ad spends every month. Specializing in ClickHouse, React.js, and Node.js, Anmol focuses on real-time data processing and scalable backend solutions. At this meet-up, they’ll share insights on solving reporting challenges and driving real-time decision-making in performance marketing. Siddhant Gaba is an SDE II at Idea Clan, with expertise in Python, Java, and C#, specializing in scalable backend systems. With four years of experience working with FastAPI, PostgreSQL, MongoDB, and ClickHouse, he focuses on real-time analytics, database optimization, and distributed systems. Passionate about high-performance computing, asynchronous APIs, and system design, he aims to advance real-time data processing. Outside of work, he enjoys playing volleyball. At this meetup, he will share insights on how ClickHouse transformed real-time reporting and scalability. 🎤 Session Details: From SQL to AI: Building Intelligent Applications with ClickHouse and LangDB As AI becomes a driving force behind innovation, building applications that seamlessly integrate AI capabilities with existing data infrastructures is critical. In this session, we explore the creation of agentic applications using ClickHouse and LangDB. We will introduce the concept of an AI gateway, explaining its role in connecting powerful AI models with the high-performance analytics engine of ClickHouse. By leveraging LangDB, we demonstrate how to directly interact with AI functions as User-Defined Functions (UDFs) in ClickHouse, enabling developers to design and execute complex AI workflows within SQL. Additionally, we will showcase how LangDB facilitates deep visibility into AI function behaviors and agent interactions, providing tools to analyze and optimize the performance of AI-driven logic. Finally, we will highlight how ClickHouse, powered by LangDB APIs, can be used to evaluate and refine the quality of LLM responses, ensuring reliable and efficient AI integrations. Speaker: Matteo Pelati, Co-founder, LangDB.ai Matteo Pelati is a seasoned software engineer with over two decades of experience, specializing in data engineering for the past ten years. He is the co-founder of LangDB, a company based in Singapore building the fastest Open Source AI Gateway. Before founding LangDB, he was part of the early team at DataRobot, where he contributed to scaling their product for enterprise clients. Subsequently, he joined DBS Bank where he built their data platform and team from the ground up. Prior to starting LangDB, Matteo led the data group for Asia Pacific and data engineering at Goldman Sachs. |
ClickHouse Delhi/Gurgaon Meetup - March 2025
|
|
Munich dbt & Snowflake Meetup (in-person)
2025-02-20 · 16:30
This dbt Meetup is an opportunity for the local Munich dbt and Snowflake Community to connect and collaborate. If you work with data, this event is for you. We welcome data analysts, scientists, engineers, architects, and more! ➡️ Join the dbt Slack community: https://community.getdbt.com/ Join the conversation in the #local-munich channel in dbt Slack to connect with other data practitioners locally and make sure to join the #local-dach channel aswell. 🤝 Organizer: btelligent 🏠 Venue: Synabi Offices, Oskar-Schlemmer-Straße 13, 80807 München ✨ Agenda & Speakers ✨ 17.30: Registration **** Registration will close at 17.50**** 18.00: Welcome b.telligent \| dbt \| Snowflake 18.15: Common pitfalls when building a data warehouse [dbt] Tim Hiebenthal - Project A Ventures Over the years, we have collected a list of best practices to avoid common pitfalls like inconsistent metrics and slow response times. Discover how to build a robust and adaptable data warehouse that fosters trust and alignment between data teams and stakeholders 18:45: Using the Snowflake Cortex Search Service to Interact with Your Documents [Snowflake] Maja Ferle Snowflake Data Superhero - In516ht This session, will demonstrate how to quick and easy it is to build a RAG (retrieval augmented generation) solution that enables us to chat with pdf documents in natural language. Maja will use Snowflake Cortex LLM (large language model) functions and the Cortex Search service to upload and process pdf documents and demonstrate how to ask questions to get specific information from the documents. 19.15-20.30: Networking With Food & Drinks dbt is the standard in data transformation, used by over 40,000 organizations worldwide. With practices like modularity, version control, testing, and documentation, dbt’s analytics engineering workflow helps teams work more efficiently and build trustworthy data for the whole organization. Learn more: https://www.getdbt.com/ Snowflake is a leading cloud-based data platform, trusted by thousands of organizations. Snowflake enables teams to manage and analyze their data with unprecedented flexibility and efficiency. Its architecture allows for the separation of storage and compute, providing cost-effective solutions tailored to diverse business needs. Snowflake empowers organizations to derive insights from their data in real-time, enhancing decision-making across the enterprise. Learn more: https://www.snowflake.com/ Note: To attend, please read the Health and Safety Policy and Terms of Participation: Health & Safety |
Munich dbt & Snowflake Meetup (in-person)
|
|
Apache Iceberg Bay Area Community Meetup
2025-01-30 · 21:00
Please complete your registration via this link to secure your seat: https://lu.ma/u1xq8c5t. CFP Submission: https://bit.ly/Iceberg-Meetup-CFP About the Event Join us for a special edition of the Apache Iceberg Bay Area Meetup—a half-day event co-hosted by PuppyGraph, Snowflake, AWS, Databricks, and the Apache Iceberg Community! This event offers both in-person attendance and virtual streaming, making it accessible to everyone. Our agenda is packed with the latest innovations and best practices in the Apache Iceberg ecosystem. The event features 4 hours of insightful talks and 2.5 hours of networking sessions. Since it's near Lunar New Year, we'll have festive food and beverages to celebrate the special occasion. Can't make it live? No worries—register anyway to receive the recordings. Don't miss this opportunity to engage with the Apache Iceberg community and stay updated on the latest advancements. Call For Presentations Have a compelling story, innovative solution, or unique insight to share about the Apache Iceberg ecosystem? We’re inviting community members and industry experts to present at our upcoming Apache Iceberg Bay Area Meetup. This is your chance to showcase your expertise and contribute to an agenda filled with cutting-edge discussions and best practices. Interested? Submit your presentation here. We can’t wait to hear your ideas! About PuppyGraph PuppyGraph is the first and only real time, zero-ETL graph query engine in the market, empowering companies to transform existing relational data stores into a unified graph model in under 10 minutes, bypassing traditional graph databases' cost, latency, and maintenance hurdles. 💬 Join PuppyGraph Community Slack 📚 Check out PuppyGraph Engineering Blog 📲 Follow PuppyGraph on LinkedIn & Twitter 🖥️ Subscribe to PuppyGraph YouTube 💾 Download PuppyGraph Forever Free Developer Edition (no form & no payment required) About Databricks Databricks is the data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Condé Nast, Rivian, Shell and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse and Apache Spark™. 📚 Check out Tabular has joined Databricks towards a joint vision of the open lakehouse. 📲 Follow Databricks on LinkedIn, X, and Facebook. 🖥️ Subscribe to Databricks YouTube. 💾 Sign up for Databricks Express Setup and get $400 free credits when using your work email. About AWS Apache Iceberg is an open-source table format that simplifies table management while improving performance. AWS analytics services such as Amazon EMR, AWS Glue, Amazon Athena, and Amazon Redshift include native support for Apache Iceberg, so you can easily build transactional data lakes on top of Amazon Simple Storage Service (Amazon S3) on AWS. Additional Resources and Information: 📚 Workshop: Running Apache Iceberg on AWS 📚 Blogs: Apache Iceberg on AWS 📚 AWS Prescriptive Guidance: Using Apache Iceberg on AWS 🖥️ Subscribe to AWS Events and AWS Developers 💜 We’re hiring, join our team About Snowflake Snowflake makes enterprise AI easy, efficient and trusted. More than 10,000 companies around the globe, including hundreds of the world’s largest, use Snowflake’s AI Data Cloud to share data, build applications, and power their business with AI. Snowflake provides native support for Apache Iceberg™ and Apache Polaris™ (incubating). 📚 Check out how Snowflake can power your open data lakehouse 📲 Follow Snowflake on LinkedIn & X 🖥 Subscribe to Snowflake Developers YouTube ❄️ Start your 30-day free Snowflake trial which includes $400 worth of free usage |
Apache Iceberg Bay Area Community Meetup
|
|
LLM Meetup (Graphcore x PyData)
2024-11-27 · 17:00
When: 27th November 2024 Where: Graphcore Office, Olivia Star - 30th Floor, Aleja Grunwaldzka 472C [Registration info - important!] Number of seats is limited to 65. Please provide your full first and last name and email address while registering for the event, here on meetup. The list of attendees freezes 4 hours before the event starts - in case of any urgent changes please leave us a note via [email protected] . Don't forget to bring your ID for the security check. Agenda 18:00 - 18:10 Welcoming words by Graphcore and PyData 18:10 - 18:40 Low-Precision Data Formats for High-Performance AI Alex Titterton, ML Engineer GRAPHCORE 18:40 - 19:20 Make Llamas Run Faster: How to Speed Up LLM Inference, Luke Hudlass-Galley, Research Scientist GRAPHCORE Alexandre Payot, ML Engineer GRAPHCORE 19:20 – 19:30 Q&A session 19:30 – 21:00 Networking & pizza About the talks Talk #1 Low-Precision Data Formats for High-Performance AI Abstract: “In recent years AI models, in particular LLMs, have scaled up enormously both in terms of capability and hardware requirements. Providing the required computational power, storage capacity and memory bandwidth all come at a cost, leading to increased research activity into low-precision data formats both for storage and compute. In this talk we discuss recent advances in low-precision training and inference, quantisation methods and new microscaling (MX) data formats designed to offer efficient AI compute with minimal loss in accuracy and without requiring changes to model training workflows.” Presenters: Alex Titterton, ML Engineer, Graphcore Presenter intro: “Alex is a Machine Learning Engineer at Graphcore, who over the last 5 years has been working with customers across a wide range of applications from computer vision to large language models, and has led Graphcore’s Academic Programme, collaborating with AI researchers around the world. Before joining Graphcore in 2019 Alex completed his PhD in Particle Physics at the University of Bristol and University of Southampton, working on the Compact Muon Solenoid experiment at CERN in search for Supersymmetry.” Talk #2 Make Llamas Run Faster: How to Speed Up LLM Inference Abstract: “As we try and push the capability limits of large language models, we want to feed in larger and larger sequences into the model. However, these long sequences cause throughput to drop, bottlenecking the performance we can achieve with these models. In this talk, we will uncover what is causing this slow down, look at standard optimisations to resolve it, and present our solution, SparQ Attention, as a way to overcome this.” Presenters: Luke Hudlass-Galley, Research Scientist, Graphcore Presenter intro: “Luke is a Research Scientist at Graphcore, who has worked on a range of fundamental machine learning topics over the last six years, including computer vision, distributed processing, multimodal embedding alignment, and LLM inference. Luke currently leads Graphcore’s reasoning effort, helping to uncover how to get language models to solve difficult problems through step-by-step thought. Prior to Graphcore, Luke received his Masters in Engineering Mathematics from the University of Bristol.” Alexandre Payot, ML Engineer, Graphcore Presenter intro: “Alex is an ML Engineer on Graphcore’s Applied AI team. Over the last three years, he has contributed to a wide range of projects, from kernel implementation to delivering Graphcore's cloud models. His latest focus has been on LLM inference for an internal code assistant. Before joining Graphcore, Alex developed software for designing and optimizing steered carbon fiber composites, worked on data analytics and computer vision for microscopes, and even filled up the supercomputer at Bristol University while pursuing a PhD in aerodynamic optimization.” |
LLM Meetup (Graphcore x PyData)
|
|
PyData Southampton - 11th Meetup
2024-11-26 · 19:00
Venue: Carnival House, 100 Harbour Parade, Southampton, SO15 1ST 📢 Want to speak 📢: submit your talk proposal Please note:
If your RSVP status says "You're going" you will be able to get in. No further confirmation required. You will NOT need to show your RSVP confirmation when signing in. If you can no longer make it, please unRSVP as soon as you know so we can assign your place to someone on the waiting list. *** Code of Conduct: This event follows the NumFOCUS Code of Conduct, please familiarise yourself with it before the event. Please get in touch with the organisers with any questions or concerns regarding the Code of Conduct. *** There will be pizza & drinks, generously provided by our host, Carnival UK. *** Mastering Data Flow: Prefect Pipelines Workshop - Adam Hill & Chris Frohmaier Join us for an engaging workshop where we'll dive deep into the world of data engineering with Prefect 3. Throughout the session, participants will explore the following key topics:
Building Data Pipelines:
Advanced Techniques and Best Practices:
By the end of the workshop, attendees will have gained a comprehensive understanding of Prefect 3 and its capabilities, empowering them to design, execute, and optimise data pipelines efficiently in real-world scenarios. We invite you to join us on this exciting journey of mastering data flows with Prefect! Instructions to prepare in advance Workshop Materials and Requirements: In advance of the workshop please visit the github repo here: https://github.com/Cadarn/PyData-Prefect-Workshop. Clone a copy of the repository and follow the setup instructions in the README file including:
Please follow the instructions in advance of attending the workshop/ Please note this is a practical session and you will need to bring your own laptop. We recommend you bring it fully charged, if you can, as there may not be enough plug sockets for everyone to use at the same time. Logistics Doors open at 6.30 pm, talks start at 7 pm. For those who wish to continue networking and chatting we will move to a nearby pub/bar for drinks from 9 pm. Please unRSVP in good time if you realise you can't make it. We're limited by building security on the number of attendees, so please free up your place for your fellow community members! Follow @pydatasoton (https://twitter.com/pydatasoton) for updates and early announcements. We are also on Instagram/Threads as @pydatasoton; and find us on LinkedIn. |
PyData Southampton - 11th Meetup
|
|
Data Engineering Meetup - Data Storage
2024-07-22 · 17:00
Welcome to the new edition of Data Engineering London on Data Storage! Join us for the fourth edition of the Data Engineering meetup with a range of talks looking at data storage. You'll have the chance to network and meet fellow data engineers (and other data enthusiasts)! 👉 The venue requires us to collect names/emails, if you have RSVP'ed yes, please make sure to fill out this google form: https://forms.gle/ZR3prm5HgtXQTv7X8 👈 When? 18:00 - 18:30 Networking with food and drinks from Dremio 18:30 - 19:45 Talks 19:45 - 20:30 More networking Where? Dremio offices (see address) Speakers and Talks: 1.Building an Open Data Lakehouse using Apache Spark, Apache Iceberg and Dremio - by Mike Flower (Solution Architect @ Dremio) 2. Micro-Partitions\, Clustering and Pruning - Improving Query Performance with Storage Optimization - by Niall Woodward (Co-founder & CTO @ SELECT) 3. Geospatial Analysis in Snowflake: How native Snowflake capabilities make light work of Lidar data - by Mike Taylor (Principal Architect @ Snowflake) If you have a topic you're passionate about and wish to see discussed, let us know! We're always looking for more talks for our future events. Places are limited, make sure you register! |
Data Engineering Meetup - Data Storage
|
|
11th Jul 24 | East Midlands Data | Nottingham | Speaker - Simon Whiteley
2024-07-11 · 17:00
_________________ MeetUp Agenda 18:00 - Arrive. Networking 18:30 - Guest Speaker Talk 19:30 - Pizza. Networking 20:30 - Event Close _________________ An Introduction to Azure Databricks Azure Databricks is a cloud-based platform that provides a unified environment for data engineering, data science, and analytics. It is based on Apache Spark, an open-source framework for distributed computing, and offers various features and benefits that make it a powerful and scalable solution for data-driven applications. In this talk, we will introduce the main concepts and components of Azure Databricks and show how it can help you accelerate your data projects and deliver business value. Some of the topics that we will cover in this talk are:
By the end of this talk, you will have a better understanding of Azure Databricks and its capabilities, and you will be able to start using it for your own data projects. You will also learn some best practices and tips for working with Azure Databricks and optimizing its performance and cost. Whether you are new to Azure Databricks or already have some experience with it, this talk will provide you with valuable insights and knowledge that you can apply to your data scenarios. _________________ Venue Website https://www.castlerockbrewery.co.uk/pubs/vat-and-fiddle _________________ MeetUp Member Resources In GitHub Here: https://github.com/EastMidlandsData Including introduction slides and our Code of Conduct. _________________ |
11th Jul 24 | East Midlands Data | Nottingham | Speaker - Simon Whiteley
|
|
June 2024 Reading Data & AI MeetUp (In-person Only)
2024-06-05 · 17:00
Welcome to our June 2024 Data & AI Meetup, We have two great sessions with Pizza and Networking between them. Krishna Yogi Bio: I'm Data Consultant with over 15 years of experience in Data engineering and data science. I've worked with various companies like Microsoft, Deutsche Bank in the past, I am currently heading the data division at Scrumconnect a public sector-focused consultancy. I founded 2 start-ups in the past Maptags (geotagging to a simple word), Sashwat (certificate fraud prevention using blockchain) and I am the Author of the short book 'Why Bitcoin'. Outside work I'm a Star Wars geek, board game nerd, and Bitcoiner. Session Abstract: This session explores the limitations of traditional data warehouses and the rise of data lakes for flexible data storage. We'll discuss how data lakehouses merge the structured schema of warehouses with the scalability of lakes with a focus on Delta Lake, an open-source layer from Databricks. ------ Break for Networking and Pizza ------ Marc Dunbar Bio: Marc has been working in the IT industry for over 15 years, specialising in Infrastructure, Networking and Security as well as Microsoft Cloud Technolgies. He has spent most of that time working with clients to navigate the ever changing world of IT and has worked for some of the industries best know consultancies like Microsoft and Coeo. Session Abstract: The Digital Transformation is coming to an end with Quantum Computing around the corner and AI is the next technological revolution. What if we combine these Technolgies? In this session we'll explore where we are now, and where we are going with Quantum and AI. You might be surprised how far we have come already. Bring an inquisitive mind as we consider ways to prepare ourselves for the future. |
June 2024 Reading Data & AI MeetUp (In-person Only)
|