MLOps

Scaling MLOps for a Demand Forecasting Across Multiple Markets for a Large CPG

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Vinit Doshi , Sunil Ranganathan

Azure CI/CD Cloud Computing Databricks

In this session, we look at how one of the world’s largest CPG company setup a scalable MLOps pipeline for a demand forecasting use case that predicted demand at 100,000+ DFUs (demand forecasting units) on a weekly basis across more than 20 markets. This implementation resulted in significant cost savings in terms of improved productivity, reduced cloud usage and faster time to value amongst other benefits. You will leave this session with a clearer picture on the following:

Best practices in scaling MLOps with Databricks and Azure for a demand forecasting use case with a multi-market and multi-region roll-out.
Best practices related to model re-factoring and setting up standard CI-CD pipelines for MLOps.
What are some of the pitfalls to avoid in such scenarios?

Talk by: Sunil Ranganathan and Vinit Doshi

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Journey to Real-Time ML: A Look at Feature Platforms & Modern RT ML Architectures Using Tecton

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Mike Del Balso , Morgan Hsu

AI/ML Databricks Marketing Data Streaming

Are you struggling to keep up with the demands of real-time machine learning? Like most organizations building real-time ML, you’re probably looking for a better way to: Manage the lifecycle of ML models and features, Implement batch, streaming, and real-time data pipelines, Generate accurate training datasets and serve models and data online with strict SLAs, supporting millisecond latencies and high query volumes. Look no further. In this session, we will unveil a modern technical architecture that simplifies the process of managing real-time ML models and features.

Using MLflow and Tecton, we’ll show you how to build a robust MLOps platform on Databricks that can easily handle the unique challenges of real-time data processing. Join us to discover how to streamline the lifecycle of ML models and features, implement data pipelines with ease, and generate accurate training datasets with minimal effort. See how to serve models and data online with mission-critical speed and reliability, supporting millisecond latencies and high query volumes.

Take a firsthand look at how FanDuel uses this solution to power their real-time ML applications, from responsible gaming to content recommendations and marketing optimization. See for yourself how this system can be used to define features, train models, process streaming data, and serve both models and features online for real-time inference with a live demo. Join us to learn how to build a modern MLOps platform for your real-time ML use cases.

Talk by: Mike Del Balso and Morgan Hsu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored: AWS|Build Generative AI Solution on Open Source Databricks Dolly 2.0 on Amazon SageMaker

Colossal AI: Scaling AI Models in Big Model Era

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Yang You , James Demmel

AI/ML Databricks GitHub LLM

The proliferation of large models based on Transformer has outpaced advances in hardware, resulting in an urgent need for the ability to distribute enormous models across multiple GPUs. Despite this growing demand, best practices for choosing an optimal strategy are still lacking due to the breadth of knowledge required across HPC, DL, and distributed systems. These difficulties have stimulated both AI and HPC developers to explore the key questions: How can training and inference efficiency of large models be improved to reduce costs? How can larger AI models be accommodated even with limited resources?

What can be done to enable more community members to easily access large models and large-scale applications? In this session, we investigate efforts to solve the questions mentioned above. Firstly, diverse parallelization is an important tool to improve the efficiency of large model training and inference. Heterogeneous memory management can help enhance the model accommodation capacity of processors like GPUs.

Furthermore, user-friendly DL systems for large models significantly reduce the specialized background knowledge users need, allowing more community members to get started with larger models more efficiently. We will provide participants with a system-level open-source solution, Colossal-AI. More information can be found at https://github.com/hpcaitech/ColossalAI.

Talk by: James Demmel and Yang You

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Enterprise Use of Generative AI Needs Guardrails: Here's How to Build Them

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Shreya Rajpal (Guardrails AI)

AI/ML Databricks GenAI LLM

Large Language Models (LLMs) such as ChatGPT have revolutionized AI applications, offering unprecedented potential for complex real-world scenarios. However, fully harnessing this potential comes with unique challenges such as model brittleness and the need for consistent, accurate outputs. These hurdles become more pronounced when developing production-grade applications that utilize LLMs as a software abstraction layer.

In this session, we will tackle these challenges head-on. We introduce Guardrails AI, an open-source platform designed to mitigate risks and enhance the safety and efficiency of LLMs. We will delve into specific techniques and advanced control mechanisms that enable developers to optimize model performance effectively. Furthermore, we will explore how implementing these safeguards can significantly improve the development process of LLMs, ultimately leading to safer, more reliable, and robust real-world AI applications

Talk by: Shreya Rajpal

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Navigating the Complexities of LLMs: Insights from Practitioners

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Ankit Mathur (Databricks) , Eric Peter (Databricks) , Salman Mohammed , Sai Ravuru

AI/ML Databricks LLM

Interested in diving deeper into the world of large language models (LLMs) and their real-life applications? In this session, we bring together our experienced team members and some of our esteemed customers to talk about their journey with LLMs. We'll delve into the complexities of getting these models to perform accurately and efficiently, the challenges, and the dynamic nature of LLM technology as it constantly evolves. This engaging conversation will offer you a broader perspective on how LLMs are being applied across different industries and how they’re revolutionizing our interaction with technology. Whether you're well-versed in AI or just beginning to explore, this session promises to enrich your understanding of the practical aspects of LLM implementation.

Talk by: Sai Ravuru, Eric Peter, Ankit Mathur, and Salman Mohammed

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

How to Train Your Own Large Language Models

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Reza Shabani

Databricks LLM

Given the success of OpenAI’s GPT-4 and Google’s PaLM, every company is now assessing its own use cases for Large Language Models (LLMs). Many companies will ultimately decide to train their own LLMs for a variety of reasons, ranging from data privacy to increased control over updates and improvements. One of the most common reasons will be to make use of proprietary internal data.

In this session, we’ll go over how to train your own LLMs, from raw data to deployment in a user-facing production environment. We’ll discuss the engineering challenges, and the vendors that make up the modern LLM stack: Databricks, Hugging Face, and MosaicML. We’ll also break down what it means to train an LLM using your own data, including the various approaches and their associated tradeoffs.

Topics covered in this session: - How Replit trained a state-of-the-art LLM from scratch - The different approaches to using LLMs with your internal data - The differences between fine-tuning, instruction tuning, and RLHF

Talk by: Reza Shabani

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Infosys | Topaz AI First Innovations

Discuss How LLMs Will Change the Way We Work

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Ben Harvey , Sean Owen (Databricks) , Ankit Mathur (Databricks) , Debu Sinha , Jan van der Vegt

AI/ML Databricks LLM

Will LLMs change the way we work? Ask questions from a panel of LLM and AI experts on what problems LLMs will solve and its potential new challenges

Talk by: Ben Harvey, Jan van der Vegt, Ankit Mathur, Debu Sinha, and Sean Owen

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Foundation Models in the Modern Data Stack

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Ines Chami

Databricks LLM Modern Data Stack

As Foundation Models (FMs) continue to grow in size, innovations continue to push the boundaries of what these models can do on language and image tasks. This talk will describe our work on applying FMs to structured data tasks like data linkage, cleaning and querying. We will then discuss challenges and solutions that these models present for production deployment in the modern data stack.

Talk by: Ines Chami

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

PaLM 2: A Smaller, Faster and More Capable LLM

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Andy Dai

AI/ML Databricks LLM

PaLM 2 is a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction.

PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities.

Talk by: Andy Dai

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Perplexity: A Copilot for All Your Web Searches and Research

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Aravind Srinivas

AI/ML Databricks LLM RAG

In this demo, we will show you the fastest and functional answer engine and search copilot that exists right now: Perplexity.ai. It can solve a wide array of problems starting from giving you fast answers to any topic to planning trips and doing market research on things unfamiliar to you, all in a trustworthy way without hallucinations, providing you references in the form of citations. This is made possible by harnessing the power of LLMs along with retrieval augmented generation from traditional search engines and indexes.

We will also show you how information discovery can now be fully personalized to you: personalization through prompt engineering. Finally, we will see use cases of how this search copilot can help you in your day to day tasks in a data team: be it a data engineer, data scientist, or a data analyst.

Talk by: Aravind Srinivas

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Sponsored by: Anomalo | Scaling Data Quality with Unsupervised Machine Learning Methods

MLOps at Gucci: From Zero to Hero

2023-07-25 · Databricks DATA + AI Summit 2023 Watch

video

by Michael Shtelma

Data Management Databricks Delta PyTorch

Delta Lake is an open-source storage format that can be ideally used for storing large-scale datasets, which can be used for single-node and distributed training of deep learning models. Delta Lake storage format gives deep learning practitioners unique data management capabilities for working with their datasets. The challenge is that, as of now, it’s not possible to use Delta Lake to train PyTorch models directly.

PyTorch community has recently introduced a Torchdata library for efficient data loading. This library supports many formats out of the box, but not Delta Lake. This talk will demonstrate using the Delta Lake storage format for single-node and distributed PyTorch training using the torchdata framework and standalone delta-rs Delta Lake implementation.

Talk by: Michael Shtelma

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

LLMOps: Everything You Need to Know to Manage LLMs

2023-07-25 · Databricks DATA + AI Summit 2023 Watch

video

by Joseph Bradley , Eric Peter (Databricks)

AI/ML Databricks LLM

With the recent surge in popularity of ChatGPT and other LLMs such as Dolly, many people are going to start training, tuning, and deploying their own custom models to solve their domain-specific challenges. When training and tuning these models, there are certain considerations that need to be accounted for in the MLOps process that differ from traditional machine learning. Come watch this session where you’ll gain a better understanding of what to look out for when starting to enter the world of applying LLMs in your domain.

In this session, you’ll learn about:

Grabbing foundational models and fine-tuning them
Optimizing resource management such as GPUs
Integrating human feedback and reinforcement learning to improve model performance
Different evaluation methods for LLMs

Talk by: Joseph Bradley and Eric Peter

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Investing in Open-Source Data Tools - Bela Wiertz

2023-07-21 · DataTalks.Club Listen

podcast_episode

by Bela Wiertz (TKM Family Office)

GitHub HTML

We talked about:

Bela's background Why startups even need investors Why open source is a viable go-to-market strategy Building a bottom-up community The investment thesis for the TKM Family Office and the blurriness of the funding round naming convention Angel investors vs VC Funds vs family offices Bela's investment criteria and GitHub stars as a metric Inbound sourcing, outbound sourcing, and investor networking Making a good impression on an investor Balancing open and closed source parts of a product The future of open source Recent successes of open source companies Bela's resource recommendations

Links:

Understand who is engaging with your open source project article: https://www.crowd.dev/ Top 6 Books on Developer Community Building: https://www.crowd.dev/post/top-6-books-on-developer-community-building Which open source software metrics matter: https://www.bvp.com/atlas/measuring-the-engagement-of-an-open-source-software-community#Which-open-source-software-metrics-matter

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Why Machine Learning Design is Broken - Valerii Babushkin

2023-07-14 · DataTalks.Club Listen

podcast_episode

by Valerii Babushkin

AI/ML GitHub HTML

Links:

Book: https://www.manning.com/books/machine-learning-system-design?utm_source=AGMLBookcamp&utm_medium=affiliate&utm_campaign=book_babushkin_machine_4_25_23&utm_content=twitter Discount: poddatatalks21 (35% off) Evidently: https://www.evidentlyai.com/ Article: https://medium.com/people-ai-engineering/design-documents-for-ml-models-bbcd30402ff7

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner

2023-06-30 · DataTalks.Club Listen

podcast_episode

by Simon Stiebellehner

AI/ML API Data Governance GitHub HTML SaaS

We talked about:

Simon's background What MLOps is and what it isn't Skills needed to build an ML platform that serves 100s of models Ranking the importance of skills The point where you should think about building an ML platform The importance of processes in ML platforms Weighing your options with SaaS platforms The exploratory setup, experiment tracking, and model registry What comes after deployment? Stitching tools together to create an ML platform Keeping data governance in mind when building a platform What comes first – the model or the platform? Do MLOps engineers need to have deep knowledge of how models work? Is API design important for MLOps? Simon's recommendations for furthering MLOps knowledge

Links:

LinkedIn: https://www.linkedin.com/in/simonstiebellehner/ Github: https://github.com/stiebels Medium: https://medium.com/@sistel

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

talk-data.com

Activity Trend

Top Events

Top Speakers

Scaling MLOps for a Demand Forecasting Across Multiple Markets for a Large CPG

Journey to Real-Time ML: A Look at Feature Platforms & Modern RT ML Architectures Using Tecton

Sponsored: AWS|Build Generative AI Solution on Open Source Databricks Dolly 2.0 on Amazon SageMaker

Sponsored by: Immuta | Building an End-to-End MLOps Workflow with Automated Data Access Controls

Colossal AI: Scaling AI Models in Big Model Era

Enterprise Use of Generative AI Needs Guardrails: Here's How to Build Them

Navigating the Complexities of LLMs: Insights from Practitioners

How to Train Your Own Large Language Models

Sponsored by: Infosys | Topaz AI First Innovations

Discuss How LLMs Will Change the Way We Work

Foundation Models in the Modern Data Stack

PaLM 2: A Smaller, Faster and More Capable LLM

Perplexity: A Copilot for All Your Web Searches and Research

Sponsored by: Anomalo | Scaling Data Quality with Unsupervised Machine Learning Methods

Sponsored by: Wipro | Personalized Price Transparency Using Generative AI

MLOps at Gucci: From Zero to Hero

LLMOps: Everything You Need to Know to Manage LLMs

Investing in Open-Source Data Tools - Bela Wiertz

Why Machine Learning Design is Broken - Valerii Babushkin

From Scratch to Success: Building an MLOps Team and ML Platform - Simon Stiebellehner