MLOps

Accelerating ML Training And Delivery With In-Database Machine Learning

2021-06-15 · Data Engineering Podcast Listen

podcast_episode

by Paige Roberts (Vertica) , Tobias Macey

AI/ML API BigQuery Cloud Computing CSV Data Engineering Data Management dbt DWH Hubspot Kubernetes Marketing +7 more

Summary When you build a machine learning model, the first step is always to load your data. Typically this means downloading files from object storage, or querying a database. To speed up the process, why not build the model inside the database so that you don’t have to move the information? In this episode Paige Roberts explains the benefits of pushing the machine learning processing into the database layer and the approach that Vertica has taken for their implementation. If you are looking for a way to speed up your experimentation, or an easy way to apply AutoML then this conversation is for you.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. We’ve all been asked to help with an ad-hoc request for data by the sales and marketing team. Then it becomes a critical report that they need updated every week or every day. Then what do you do? Send a CSV via email? Write some Python scripts to automate it? But what about incremental sync, API quotas, error handling, and all of the other details that eat up your time? Today, there is a better way. With Census, just write SQL or plug in your dbt models and start syncing your cloud warehouse to SaaS applications like Salesforce, Marketo, Hubspot, and many more. Go to dataengineeringpodcast.com/census today to get a free 14-day trial. Your host is Tobias Macey and today I’m interviewing Paige Roberts about machine learning workflows inside the database

Interview

Introduction How did you get involved in the area of data management? Can you start by giving an overview of the current state of the market for databases that support in-process machine learning?

What are the motivating factors for running a machine learning workflow inside the database?

What styles of ML are feasible to do inside the database? (e.g. bayesian inference, deep learning, etc.) What are the performance implications of running a model training pipeline within the database runtime? (both in terms of training performance boosts, and database performance impacts) Can you describe the architecture of how the machine learning process is managed by the database engine? How do you manage interacting with Python/R/Jupyter/etc. when working within the database? What is the impact on data pipeline and MLOps architectures when using the database to manage the machine learning workflow? What are the most interesting, innovative, or unexpected ways that you have seen in-database ML used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on machine learning inside the database? When is in-database ML the wrong choice? What are the recent trends/

Building Online Tech Communities - Demetrios Brinkmann

2021-04-02 · DataTalks.Club Listen

podcast_episode

by Demetrios Brinkmann (MLOps Community)

HTML

We talked about:

Demetrious’ background and starting the MLOps community Growing MLOps community Community moderations and dealing with problems Becoming a community and connecting with people Feeling belonged Managing a community as an introvert Keeping communities active Doing custdev and talking to users Random coffee and meeting with community members Organizing community activities Is community a business? Five steps for starting a community in 2021 Shameless plug from Demetrious

Links:

https://mlops.community/

Join DataTalks.Club: https://datatalks.club/slack.html

DataOps 101 - Lars Albertsson

2021-03-26 · DataTalks.Club Listen

podcast_episode

by Lars Albertsson

DataOps HTML Data Streaming

We talked about:

Lars’ career Doing DataOps before it existed What is DataOps Data platform Main components of the data platform and tools to implement it Books about functional programming principles Batch vs Streaming Maturity levels Building self-service tools MLOps vs DataOps Data Mesh Keeping track of transformations Lake house

Links:

https://www.scling.com/reading-list/ https://www.scling.com/presentations/

Join DataTalks.Club: https://datatalks.club/slack.html

Bridging The Gap Between Machine Learning And Operations At Iguazio

2021-03-02 · Data Engineering Podcast Listen

podcast_episode

by Yaron Haviv (Iguazio) , Tobias Macey

AI/ML Airflow Analytics BI BigQuery CI/CD Cloud Computing Data Engineering Data Management Data Quality Data Science Datafold +7 more

Summary The process of building and deploying machine learning projects requires a staggering number of systems and stakeholders to work in concert. In this episode Yaron Haviv, co-founder of Iguazio, discusses the complexities inherent to the process, as well as how he has worked to democratize the technologies necessary to make machine learning operations maintainable.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Datafold helps Data teams gain visibility and confidence in the quality of their analytical data through data profiling, column-level lineage and intelligent anomaly detection. Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Go to dataengineeringpodcast.com/datafold today to start a 30-day trial of Datafold. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask. RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming. With RudderStack you can use all of your customer data to answer more difficult questions and then send those insights to your whole customer data stack. Sign up free at dataengineeringpodcast.com/rudder today. Your host is Tobias Macey and today I’m interviewing Yaron Haviv about Iguazio, a platform for end to end automation of machine learning applications using MLOps principles.

Interview

Introduction How did you get involved in the area of data science & analytics? Can you start by giving an overview of what Iguazio is and the story of how it got started? How would you characterize your target or typical customer? What are the biggest challenges that you see around building production grade workflows for machine learning?

How does Iguazio help to address those complexities?

For customers who have already invested in the technical and organizational capacity for data science and data engineering, how does Iguazio integrate with their environments? What are the responsibilities of a data engineer throughout the different stages of the lifecycle for a machine learning application? Can you describe how the Iguazio platform is architected?

How has the design of the platform evolved since you first began working on it? How have the industry best practices around bringing machine learning to production changed?

How do you approach testing/validation of machine learning applications and releasing them to production environments? (e.g. CI/CD) Once a model is in

The Rise of MLOps - Theofilos Papapanagiotou

2021-02-05 · DataTalks.Club Listen

podcast_episode

by Theofilos Papapanagiotou

AI/ML Azure Cloud Computing DataOps GitHub Microsoft

We covered:

What is MLOps The difference between MLOps and ML Engineering Getting into MLOps Kubeflow and its components, ML Platforms Learning Kubeflow DataOps

And other things

Links:

Microsoft MLOps maturity model: https://docs.microsoft.com/en-us/azure/architecture/example-scenario/mlops/mlops-maturity-model Google MLOps maturity levels: https://cloud.google.com/solutions/machine-learning/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning MLOps roadmap 2020-2025: https://github.com/cdfoundation/sig-mlops/blob/master/roadmap/2020/MLOpsRoadmap2020.md Kubeflow website: https://www.kubeflow.org/ TFX Paper: https://research.google/pubs/pub46484/

Join DataTalks.Club: https://datatalks.club

Bringing Feature Stores and MLOps to the Enterprise at Tecton

2021-01-05 · Data Engineering Podcast Listen

podcast_episode

by Kevin Stumpf (Tecton) , Tobias Macey

AI/ML Analytics API BI Computer Science Data Engineering Data Management Datadog ETL/ELT Kubernetes Monte Carlo New Relic +1 more

Summary As more organizations are gaining experience with data management and incorporating analytics into their decision making, their next move is to adopt machine learning. In order to make those efforts sustainable, the core capability they need is for data scientists and analysts to be able to build and deploy features in a self service manner. As a result the feature store is becoming a required piece of the data platform. To fill that need Kevin Stumpf and the team at Tecton are building an enterprise feature store as a service. In this episode he explains how his experience building the Michelanagelo platform at Uber has informed the design and architecture of Tecton, how it integrates with your existing data systems, and the elements that are required for well engineered feature store.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $60 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Do you want to get better at Python? Now is an excellent time to take an online course. Whether you’re just learning Python or you’re looking for deep dives on topics like APIs, memory mangement, async and await, and more, our friends at Talk Python Training have a top-notch course for you. If you’re just getting started, be sure to check out the Python for Absolute Beginners course. It’s like the first year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving. Go to dataengineeringpodcast.com/talkpython today and get 10% off the course that will help you find your next level. That’s dataengineeringpodcast.com/talkpython, and don’t forget to thank them for supporting the show. You invest so much in your data infrastructure – you simply can’t afford to settle for unreliable data. Fortunately, there’s hope: in the same way that New Relic, DataDog, and other Application Performance Management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Monte Carlo’s end-to-end Data Observability Platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence. The platform uses machine learning to infer and learn your data, proactively identify data issues, assess its impact through lineage, and notify those who need to know before it impacts the business. By empowering data teams with end-to-end data reliability, Monte Carlo helps organizations save time, increase revenue, and restore trust in their data. Visit dataengineeringpodcast.com/montecarlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure. The first 25 will receive a free, limited edition Monte Carlo hat! Your host is Tobias Macey and today I’m interviewing Kevin Stumpf about Tecton and the role that the feature store plays in a modern MLOps platform

Interview

Introduction How did you get involved in the area of data management? Can you start by describing what you are building at Tecton and your motivation for starting the business? For anyone who isn’t familiar with the concept, what is an example of a feature? How do you define what a feature store is? What role does a feature store play in the overall lifecycle of a machine learning p

Roles in a data team - Alexey Grigorev

2020-11-21 · DataTalks.Club Listen

podcast_episode

by Alexey Grigorev (DataTalks.Club)

AI/ML

We talked about:

different roles in a data team: product managers, data analysts, data engineers, data scientists, ML engineers, MLOps engineers
their responsibilities
the skills they need

DataTalks.Club is the place to talk about data. Join our community: https://datatalks.club

ML Ops: Operationalizing Data Science

2020-04-25 · O'Reilly Data Science Books O'Reilly Amazon

book

by Dev Kannabiran , Michael O’Connell (TIBCO Software) , Dan Rope , Thomas Hill , Steven Hillion (Astronomer) , David Sweenor

AI/ML Analytics Data Analytics Data Science data data-science

More than half of the analytics and machine learning (ML) models created by organizations today never make it into production. Instead, many of these ML models do nothing more than provide static insights in a slideshow. If they aren’t truly operational, these models can’t possibly do what you’ve trained them to do. This report introduces practical concepts to help data scientists and application engineers operationalize ML models to drive real business change. Through lessons based on numerous projects around the world, six experts in data analytics provide an applied four-step approach—Build, Manage, Deploy and Integrate, and Monitor—for creating ML-infused applications within your organization. You’ll learn how to: Fulfill data science value by reducing friction throughout ML pipelines and workflows Constantly refine ML models through retraining, periodic tuning, and even complete remodeling to ensure long-term accuracy Design the ML Ops lifecycle to ensure that people-facing models are unbiased, fair, and explainable Operationalize ML models not only for pipeline deployment but also for external business systems that are more complex and less standardized Put the four-step Build, Manage, Deploy and Integrate, and Monitor approach into action

6 Trends Framing the State of AI and ML

2020-03-26 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Roger Magoulas , Steve Swoyer

AI/ML ai-ml data machine-learning machine-learning-tasks mlflow

O’Reilly usage analysis shows continued growth in AI/ML and early signs that organizations are experimenting with advanced tools and methods.

ML Ops

2019-11-27 · Data Skeptic Listen

podcast_episode

by Damian Brady , Kyle Polich

AI/ML

Kyle met up with Damian Brady at MS Ignite 2019 to discuss machine learning operations.

Automating LLM lifecycle on Databricks using Agent Bricks

· PyTorch Meetup #21

talk

by udaybhanu gaddam (Databricks) , sanjay ashok (Databricks)

Databricks agent bricks llm lifecycle

This talk explores what components are included in the LLM lifecycle: fine-tuning, evaluations, LLMs as judges, and human-in-the-loop feedback. We’ll show how Agent Bricks on Databricks automate these workflows, making it easier to build, assess, and scale trustworthy AI systems

From theory to reality: How AI transforms network operations

· Microsoft Ignite 2025 Watch

talk

by Phillip Gervasi (Kentik)

AI/ML API LLM NLP RAG SQL

AI is reshaping NetOps from scripted automation to intelligent, data driven workflows. We will show uses: incident triage, knowledge retrieval, traffic analysis, prediction, and contrast legacy monitoring with ML, NLP, and LLMs. See how RAG, text to SQL, and agent workflows enable real time insights across hybrid data. We will outline data pipelines and MLOps, address accuracy, reliability, cost, compliance, and weigh build vs buy. We will cover API integration and human in the loop guardrails.

MLOps: Streamline AI operations

· Google Cloud Next '25

demo

AI/ML product-vertex-ai-pipelines

Automate AI deployment and management. Build efficient machine learning operations (MLOps) pipelines with Vertex AI.

talk-data.com

Activity Trend

Top Events

Top Speakers

Accelerating ML Training And Delivery With In-Database Machine Learning

Building Online Tech Communities - Demetrios Brinkmann

DataOps 101 - Lars Albertsson

Bridging The Gap Between Machine Learning And Operations At Iguazio

The Rise of MLOps - Theofilos Papapanagiotou

Bringing Feature Stores and MLOps to the Enterprise at Tecton

Roles in a data team - Alexey Grigorev

ML Ops: Operationalizing Data Science

6 Trends Framing the State of AI and ML

ML Ops

Automating LLM lifecycle on Databricks using Agent Bricks

From theory to reality: How AI transforms network operations

MLOps: Streamline AI operations