talk-data.com talk-data.com

Topic

MLOps

machine_learning devops ai

233

tagged

Activity Trend

26 peak/qtr
2020-Q1 2026-Q1

Activities

233 activities · Newest first

Scaling MLOps for a Demand Forecasting Across Multiple Markets for a Large CPG

In this session, we look at how one of the world’s largest CPG company setup a scalable MLOps pipeline for a demand forecasting use case that predicted demand at 100,000+ DFUs (demand forecasting units) on a weekly basis across more than 20 markets. This implementation resulted in significant cost savings in terms of improved productivity, reduced cloud usage and faster time to value amongst other benefits. You will leave this session with a clearer picture on the following:

  • Best practices in scaling MLOps with Databricks and Azure for a demand forecasting use case with a multi-market and multi-region roll-out.
  • Best practices related to model re-factoring and setting up standard CI-CD pipelines for MLOps.
  • What are some of the pitfalls to avoid in such scenarios?

Talk by: Sunil Ranganathan and Vinit Doshi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Journey to Real-Time ML: A Look at Feature Platforms & Modern RT ML Architectures Using Tecton

Are you struggling to keep up with the demands of real-time machine learning? Like most organizations building real-time ML, you’re probably looking for a better way to: Manage the lifecycle of ML models and features, Implement batch, streaming, and real-time data pipelines, Generate accurate training datasets and serve models and data online with strict SLAs, supporting millisecond latencies and high query volumes. Look no further. In this session, we will unveil a modern technical architecture that simplifies the process of managing real-time ML models and features.

Using MLflow and Tecton, we’ll show you how to build a robust MLOps platform on Databricks that can easily handle the unique challenges of real-time data processing. Join us to discover how to streamline the lifecycle of ML models and features, implement data pipelines with ease, and generate accurate training datasets with minimal effort. See how to serve models and data online with mission-critical speed and reliability, supporting millisecond latencies and high query volumes.

Take a firsthand look at how FanDuel uses this solution to power their real-time ML applications, from responsible gaming to content recommendations and marketing optimization. See for yourself how this system can be used to define features, train models, process streaming data, and serve both models and features online for real-time inference with a live demo. Join us to learn how to build a modern MLOps platform for your real-time ML use cases.

Talk by: Mike Del Balso and Morgan Hsu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: AWS|Build Generative AI Solution on Open Source Databricks Dolly 2.0 on Amazon SageMaker

Create a custom chat-based solution to query and summarize your data within your VPC using Dolly 2.0 and Amazon SageMaker. In this talk, you will learn about Dolly 2.0, Databricks, state-of-the-art, open source, LLM, available for commercial and Amazon SageMaker, AWS’s premiere toolkit for ML builders. You will learn how to deploy and customize models to reference your data using retrieval augmented generation (RAG) and additional fine tuning techniques…all using open-source components available today.

Talk by: Venkat Viswanathan and Karl Albertsen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Immuta | Building an End-to-End MLOps Workflow with Automated Data Access Controls

WorldQuant Predictive’s customers rely on our predictions to understand how changing world and market conditions will impact decisions to be made. Speed is critical, and so are accuracy and resilience. To that end, our data team built a modern, automated MLOps data flow using Databricks as a key part of our data science tooling, and integrated with Immuta to provide automated data security and access control.

In this session, we will share details of how we used policy-as-code to support our globally distributed data science team with secure data sharing, testing, validation and other model quality requirements. We will also discuss our data science workflow that uses Databricks-hosted MLflow together with an Immuta-backed custom feature store to maximize speed and quality of model development through automation. Finally, we will discuss how we deploy the models into our customized serverless inference environment, and how that powers our industry solutions.

Talk by: Tyler Ditto

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Colossal AI: Scaling AI Models in Big Model Era

The proliferation of large models based on Transformer has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this growing demand, best practices for choosing an optimal strategy are still lacking due to the breadth of knowledge required across HPC, DL, and distributed systems. These difficulties have stimulated both AI and HPC developers to explore the key questions: How can training and inference efficiency of large models be improved to reduce costs? How can larger AI models be accommodated even with limited resources?

What can be done to enable more community members to easily access large models and large-scale applications? In this session, we investigate efforts to solve the questions mentioned above. Firstly, diverse parallelization is an important tool to improve the efficiency of large model training and inference. Heterogeneous memory management can help enhance the model accommodation capacity of processors like GPUs.

Furthermore, user-friendly DL systems for large models significantly reduce the specialized background knowledge users need, allowing more community members to get started with larger models more efficiently. We will provide participants with a system-level open-source solution, Colossal-AI. More information can be found at https://github.com/hpcaitech/ColossalAI.

Talk by: James Demmel and Yang You

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Enterprise Use of Generative AI Needs Guardrails: Here's How to Build Them

Large Language Models (LLMs) such as ChatGPT have revolutionized AI applications, offering unprecedented potential for complex real-world scenarios. However, fully harnessing this potential comes with unique challenges such as model brittleness and the need for consistent, accurate outputs. These hurdles become more pronounced when developing production-grade applications that utilize LLMs as a software abstraction layer.

In this session, we will tackle these challenges head-on. We introduce Guardrails AI, an open-source platform designed to mitigate risks and enhance the safety and efficiency of LLMs. We will delve into specific techniques and advanced control mechanisms that enable developers to optimize model performance effectively. Furthermore, we will explore how implementing these safeguards can significantly improve the development process of LLMs, ultimately leading to safer, more reliable, and robust real-world AI applications

Talk by: Shreya Rajpal

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Navigating the Complexities of LLMs: Insights from Practitioners

Interested in diving deeper into the world of large language models (LLMs) and their real-life applications? In this session, we bring together our experienced team members and some of our esteemed customers to talk about their journey with LLMs. We'll delve into the complexities of getting these models to perform accurately and efficiently, the challenges, and the dynamic nature of LLM technology as it constantly evolves. This engaging conversation will offer you a broader perspective on how LLMs are being applied across different industries and how they’re revolutionizing our interaction with technology. Whether you're well-versed in AI or just beginning to explore, this session promises to enrich your understanding of the practical aspects of LLM implementation.

Talk by: Sai Ravuru, Eric Peter, Ankit Mathur, and Salman Mohammed

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Train Your Own Large Language Models

Given the success of OpenAI’s GPT-4 and Google’s PaLM, every company is now assessing its own use cases for Large Language Models (LLMs). Many companies will ultimately decide to train their own LLMs for a variety of reasons, ranging from data privacy to increased control over updates and improvements. One of the most common reasons will be to make use of proprietary internal data.

In this session, we’ll go over how to train your own LLMs, from raw data to deployment in a user-facing production environment. We’ll discuss the engineering challenges, and the vendors that make up the modern LLM stack: Databricks, Hugging Face, and MosaicML. We’ll also break down what it means to train an LLM using your own data, including the various approaches and their associated tradeoffs.

Topics covered in this session: - How Replit trained a state-of-the-art LLM from scratch - The different approaches to using LLMs with your internal data - The differences between fine-tuning, instruction tuning, and RLHF

Talk by: Reza Shabani

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Infosys | Topaz AI First Innovations

Insights into Infosys' Topaz AI First Innovations including AI-enabled Analytics and AI-enabled Automation to help clients in significant cost savings, improved efficiency and customer experience across industry segments.

Talk by: Neeraj Dixit

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Discuss How LLMs Will Change the Way We Work

Will LLMs change the way we work?  Ask questions from a panel of LLM and AI experts on what problems LLMs will solve and its potential new challenges

Talk by: Ben Harvey, Jan van der Vegt, Ankit Mathur, Debu Sinha, and Sean Owen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Foundation Models in the Modern Data Stack

As Foundation Models (FMs) continue to grow in size, innovations continue to push the boundaries of what these models can do on language and image tasks. This talk will describe our work on applying FMs to structured data tasks like data linkage, cleaning and querying. We will then discuss challenges and solutions that these models present for production deployment in the modern data stack.

Talk by: Ines Chami

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PaLM 2: A Smaller, Faster and More Capable LLM

PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction.

PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

Talk by: Andy Dai

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Perplexity: A Copilot for All Your Web Searches and Research

In this demo, we will show you the fastest and functional answer engine and search copilot that exists right now: Perplexity.ai. It can solve a wide array of problems starting from giving you fast answers to any topic to planning trips and doing market research on things unfamiliar to you, all in a trustworthy way without hallucinations, providing you references in the form of citations. This is made possible by harnessing the power of LLMs along with retrieval augmented generation from traditional search engines and indexes.

We will also show you how information discovery can now be fully personalized to you: personalization through prompt engineering. Finally, we will see use cases of how this search copilot can help you in your day to day tasks in a data team: be it a data engineer, data scientist, or a data analyst.

Talk by: Aravind Srinivas

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Anomalo | Scaling Data Quality with Unsupervised Machine Learning Methods

The challenge is no longer how big, diverse, or distributed your data is. It's that you can't trust it. Companies are utilizing rules and metrics to monitor data quality, but they’re tedious to set up and maintain. We will present a set of fully unsupervised machine learning algorithms for monitoring data quality at scale, which requires no setup, catching unexpected issues and preventing alert fatigue by minimizing false positives. At the end of this talk, participants will be equipped with insight into unsupervised data quality monitoring, its advantages and limitations, and how it can help scale trust in your data.

Talk by: Vicky Andonova

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Wipro | Personalized Price Transparency Using Generative AI

Patients are increasingly taking an active role in managing their healthcare costs and are more likely to choose providers and treatments based on cost considerations. Learn how technology can help build cost-efficient care models across the healthcare continuum, delivering higher quality care while improving patient experience and operational efficiency.

Talk by: Janine Pratt

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

MLOps at Gucci: From Zero to Hero

Delta Lake is an open-source storage format that can be ideally used for storing large-scale datasets, which can be used for single-node and distributed training of deep learning models. Delta Lake storage format gives deep learning practitioners unique data management capabilities for working with their datasets. The challenge is that, as of now, it’s not possible to use Delta Lake to train PyTorch models directly.

PyTorch community has recently introduced a Torchdata library for efficient data loading. This library supports many formats out of the box, but not Delta Lake. This talk will demonstrate using the Delta Lake storage format for single-node and distributed PyTorch training using the torchdata framework and standalone delta-rs Delta Lake implementation.

Talk by: Michael Shtelma

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

LLMOps: Everything You Need to Know to Manage LLMs

With the recent surge in popularity of ChatGPT and other LLMs such as Dolly, many people are going to start training, tuning, and deploying their own custom models to solve their domain-specific challenges. When training and tuning these models, there are certain considerations that need to be accounted for in the MLOps process that differ from traditional machine learning. Come watch this session where you’ll gain a better understanding of what to look out for when starting to enter the world of applying LLMs in your domain.

In this session, you’ll learn about:

  • Grabbing foundational models and fine-tuning them
  • Optimizing resource management such as GPUs
  • Integrating human feedback and reinforcement learning to improve model performance
  • Different evaluation methods for LLMs

Talk by: Joseph Bradley and Eric Peter

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

We talked about:

Bela's background Why startups even need investors Why open source is a viable go-to-market strategy Building a bottom-up community The investment thesis for the TKM Family Office and the blurriness of the funding round naming convention Angel investors vs VC Funds vs family offices Bela's investment criteria and GitHub stars as a metric Inbound sourcing, outbound sourcing, and investor networking Making a good impression on an investor Balancing open and closed source parts of a product The future of open source Recent successes of open source companies Bela's resource recommendations

Links:

Understand who is engaging with your open source project article: https://www.crowd.dev/ Top 6 Books on Developer Community Building: https://www.crowd.dev/post/top-6-books-on-developer-community-building Which open source software metrics matter: https://www.bvp.com/atlas/measuring-the-engagement-of-an-open-source-software-community#Which-open-source-software-metrics-matter

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Links:

Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter Discount: poddatatalks21 (35% off) Evidently: https://www.evidentlyai.com/ Article: https://medium.com/people-ai-engineering/design-documents-for-ml-models-bbcd30402ff7

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

We talked about:

Simon's background What MLOps is and what it isn't Skills needed to build an ML platform that serves 100s of models Ranking the importance of skills The point where you should think about building an ML platform The importance of processes in ML platforms Weighing your options with SaaS platforms The exploratory setup, experiment tracking, and model registry What comes after deployment? Stitching tools together to create an ML platform Keeping data governance in mind when building a platform What comes first – the model or the platform? Do MLOps engineers need to have deep knowledge of how models work? Is API design important for MLOps? Simon's recommendations for furthering MLOps knowledge

Links:

LinkedIn: https://www.linkedin.com/in/simonstiebellehner/ Github: https://github.com/stiebels Medium: https://medium.com/@sistel

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html