Join us for a lightning talk summarizing the Google x Kaggle Gen AI Intensive, a 5-day live course that empowered over 140,000 participants with a comprehensive understanding of generative AI. From foundational models and prompt engineering to MLOps and real-world applications, this series covered it all through a mix of theory, hands-on learning, and community engagement, with learning material created by experts across Google. Learn how you can leverage the resources from this ongoing series to upskill yourself and stay ahead in the rapidly evolving field of generative AI.
talk-data.com
Topic
MLOps
233
tagged
Activity Trend
Top Events
Boost AI innovation in regulated industries! Use Red Hat OpenShift AI/IBM Watsonx and Vertex AI to split model lifecycles: train securely on isolated OpenShift AI/Watsonx, then deploy via Vertex AI's model registry. Explore use cases, deployment patterns, and the advantages of this approach. Ensure training on a trusted platform for sensitive data. Deploy models easily via Google Cloud's registry. Learn how this MLOps architecture helps meet regulations while leveraging cloud AI.
This Session is hosted by a Google Cloud Next Sponsor.
Visit your registration profile at g.co/cloudnext to opt out of sharing your contact information with the sponsor hosting this session.
Discover how Target modernized its MLOps workflows using Ray and Vertex AI to build scalable ML applications. This session will cover key strategies for optimizing model performance, ensuring security and compliance, and fostering collaboration between data science and platform teams. Whether you’re looking to streamline model deployment, enhance data access, or improve infrastructure management in a hybrid setup, this session provides practical insights and guidance for integrating Ray and Vertex AI into your MLOps roadmap.
In this podcast episode, we talked with Bartosz Mikulski about Data Intensive AI.
About the Speaker: Bartosz is an AI and data engineer. He specializes in moving AI projects from the good-enough-for-a-demo phase to production by building a testing infrastructure and fixing the issues detected by tests. On top of that, he teaches programmers and non-programmers how to use AI. He contributed one chapter to the book 97 Things Every Data Engineer Should Know, and he was a speaker at several conferences, including Data Natives, Berlin Buzzwords, and Global AI Developer Days.
In this episode, we discuss Bartosz’s career journey, the importance of testing in data pipelines, and how AI tools like ChatGPT and Cursor are transforming development workflows. From prompt engineering to building Chrome extensions with AI, we dive into practical use cases, tools, and insights for anyone working in data-intensive AI projects. Whether you’re a data engineer, AI enthusiast, or just curious about the future of AI in tech, this episode offers valuable takeaways and real-world experiences.
0:00 Introduction to Bartosz and his background 4:00 Bartosz’s career journey from Java development to AI engineering 9:05 The importance of testing in data engineering 11:19 How to create tests for data pipelines 13:14 Tools and approaches for testing data pipelines 17:10 Choosing Spark for data engineering projects 19:05 The connection between data engineering and AI tools 21:39 Use cases of AI in data engineering and MLOps 25:13 Prompt engineering techniques and best practices 31:45 Prompt compression and caching in AI models 33:35 Thoughts on DeepSeek and open-source AI models 35:54 Using AI for lead classification and LinkedIn automation 41:04 Building Chrome extensions with AI integration 43:51 Comparing Cursor and GitHub Copilot for coding 47:11 Using ChatGPT and Perplexity for AI-assisted tasks 52:09 Hosting static websites and using AI for development 54:27 How blogging helps attract clients and share knowledge 58:15 Using AI to assist with writing and content creation
🔗 CONNECT WITH Bartosz LinkedIn: https://www.linkedin.com/in/mikulskibartosz/ Github: https://github.com/mikulskibartosz Website: https://mikulskibartosz.name/blog/
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/r?cid=ZjhxaWRqbnEwamhzY3A4ODA5azFlZ2hzNjBAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ Check other upcoming events - https://lu.ma/dtc-events LinkedIn - https://www.linkedin.com/company/datatalks-club/ Twitter - https://twitter.com/DataTalksClub Website - https://datatalks.club/
In this podcast episode, we talked with Nemanja Radojkovic about MLOps in Corporations and Startups.
About the Speaker: Nemanja Radojkovic is Senior Machine Learning Engineer at Euroclear.
In this event,we’re diving into the world of MLOps, comparing life in startups versus big corporations. Joining us again is Nemanja, a seasoned machine learning engineer with experience spanning Fortune 500 companies and agile startups. We explore the challenges of scaling MLOps on a shoestring budget, the trade-offs between corporate stability and startup agility, and practical advice for engineers deciding between these two career paths. Whether you’re navigating legacy frameworks or experimenting with cutting-edge tools.
1:00 MLOps in corporations versus startups 6:03 The agility and pace of startups 7:54 MLOps on a shoestring budget 12:54 Cloud solutions for startups 15:06 Challenges of cloud complexity versus on-premise 19:19 Selecting tools and avoiding vendor lock-in 22:22 Choosing between a startup and a corporation 27:30 Flexibility and risks in startups 29:37 Bureaucracy and processes in corporations 33:17 The role of frameworks in corporations 34:32 Advantages of large teams in corporations 40:01 Challenges of technical debt in startups 43:12 Career advice for junior data scientists 44:10 Tools and frameworks for MLOps projects 49:00 Balancing new and old technologies in skill development 55:43 Data engineering challenges and reliability in LLMs 57:09 On-premise vs. cloud solutions in data-sensitive industries 59:29 Alternatives like Dask for distributed systems
🔗 CONNECT WITH NEMANJA LinkedIn - / radojkovic Github - https://github.com/baskervilski
🔗 CONNECT WITH DataTalksClub Join the community - https://datatalks.club/slack.html Subscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/... Check other upcoming events - https://lu.ma/dtc-events LinkedIn - / datatalks-club Twitter - / datatalksclub Website - https://datatalks.club/
Get ready to dive into the world of DevOps & Cloud tech! This session will help you navigate the complex world of Cloud and DevOps with confidence. This session is ideal for new grads, career changers, and anyone feeling overwhelmed by the buzz around DevOps. We'll break down its core concepts, demystify the jargon, and explore how DevOps is essential for success in the ever-changing technology landscape, particularly in the emerging era of generative AI. A basic understanding of software development concepts is helpful, but enthusiasm to learn is most important.
Vishakha is a Senior Cloud Architect at Google Cloud Platform with over 8 years of DevOps and Cloud experience. Prior to Google, she was a DevOps engineer at AWS and a Subject Matter Expert (SME) for the IaC offering CloudFormation in the NorthAm region. She has experience in diverse domains including Financial Services, Retail, and Online Media. She primarily focuses on Infrastructure Architecture, Design & Automation (IaC), Public Cloud (AWS, GCP), Kubernetes/CNCF tools, Infrastructure Security & Compliance, CI/CD & GitOps, and MLOPS.
Summary In this episode of the Data Engineering Podcast Bartosz Mikulski talks about preparing data for AI applications. Bartosz shares his journey from data engineering to MLOps and emphasizes the importance of data testing over software development in AI contexts. He discusses the types of data assets required for AI applications, including extensive test datasets, especially in generative AI, and explains the differences in data requirements for various AI application styles. The conversation also explores the skills data engineers need to transition into AI, such as familiarity with vector databases and new data modeling strategies, and highlights the challenges of evolving AI applications, including frequent reprocessing of data when changing chunking strategies or embedding models.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Bartosz Mikulski about how to prepare data for use in AI applicationsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining some of the main categories of data assets that are needed for AI applications?How does the nature of the application change those requirements? (e.g. RAG app vs. agent, etc.)How do the different assets map to the stages of the application lifecycle?What are some of the common roles and divisions of responsibility that you see in the construction and operation of a "typical" AI application?For data engineers who are used to data warehousing/BI, what are the skills that map to AI apps?What are some of the data modeling patterns that are needed to support AI apps?chunking strategies metadata managementWhat are the new categories of data that data engineers need to manage in the context of AI applications?agent memory generation/evolution conversation history managementdata collection for fine tuningWhat are some of the notable evolutions in the space of AI applications and their patterns that have happened in the past ~1-2 years that relate to the responsibilities of data engineers?What are some of the skills gaps that teams should be aware of and identify training opportunities for?What are the most interesting, innovative, or unexpected ways that you have seen data teams address the needs of AI applications?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI applications and their reliance on data?What are some of the emerging trends that you are paying particular attention to?Contact Info WebsiteLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SparkRayChunking StrategiesHypothetical document embeddingsModel Fine TuningPrompt CompressionThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
In this podcast episode, we talked with Alexander Guschin about launching a career off Kaggle.
About the Speaker:
Alexander Guschin is a Machine Learning Engineer with 10+ years of experience, a Kaggle Grandmaster ranked 5th globally, and a teacher to 100K+ students. He leads DS and SE teams and contributes to open-source ML tools.
0:00 Starting with Machine Learning: Challenges and Early Steps
13:05 Community and Learning Through Kaggle Sessions
17:10 Broadening Skills Through Kaggle Participation
18:54 Early Competitions and Lessons Learned
21:10 Transitioning to Simpler Solutions Over Time
23:51 Benefits of Kaggle for Starting a Career in Machine Learning
29:08 Teamwork vs. Solo Participation in Competitions
31:14 Schoolchildren in AI Competitions
42:33 Transition to Industry and MLOps
50:13 Encouraging teamwork in student projects
50:48 Designing competitive machine learning tasks
52:22 Leaderboard types for tracking performance
53:44 Managing small-scale university classes
54:17 Experience with Coursera and online teaching
59:40 Convincing managers about Kaggle's value
61:38 Secrets of Kaggle competition success
63:11 Generative AI's impact on competitive ML
65:13 Evolution of automated ML solutions
66:22 Reflecting on competitive data science experience
🔗 CONNECT WITH ALEXANDER GUSCHINLinkedin - https://www.linkedin.com/in/1aguschin/Website - https://www.aguschin.com/
🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:https://datatalks.club/slack.html Our events:https://datatalks.club/events.html Datalike Substack -https://datalike.substack.com/ LinkedIn: / datatalks-club
Business runs on tabular data in databases, spreadsheets, and logs. Crunch that data using deep learning, gradient boosting, and other machine learning techniques. Machine Learning for Tabular Data teaches you to train insightful machine learning models on common tabular business data sources such as spreadsheets, databases, and logs. You’ll discover how to use XGBoost and LightGBM on tabular data, optimize deep learning libraries like TensorFlow and PyTorch for tabular data, and use cloud tools like Vertex AI to create an automated MLOps pipeline. Machine Learning for Tabular Data will teach you how to: Pick the right machine learning approach for your data Apply deep learning to tabular data Deploy tabular machine learning locally and in the cloud Pipelines to automatically train and maintain a model Machine Learning for Tabular Data covers classic machine learning techniques like gradient boosting, and more contemporary deep learning approaches. By the time you’re finished, you’ll be equipped with the skills to apply machine learning to the kinds of data you work with every day. About the Technology Machine learning can accelerate everyday business chores like account reconciliation, demand forecasting, and customer service automation—not to mention more exotic challenges like fraud detection, predictive maintenance, and personalized marketing. This book shows you how to unlock the vital information stored in spreadsheets, ledgers, databases and other tabular data sources using gradient boosting, deep learning, and generative AI. About the Book Machine Learning for Tabular Data delivers practical ML techniques to upgrade every stage of the business data analysis pipeline. In it, you’ll explore examples like using XGBoost and Keras to predict short-term rental prices, deploying a local ML model with Python and Flask, and streamlining workflows using large language models (LLMs). Along the way, you’ll learn to make your models both more powerful and more explainable. What's Inside Master XGBoost Apply deep learning to tabular data Deploy models locally and in the cloud Build pipelines to train and maintain models About the Reader For readers experienced with Python and the basics of machine learning. About the Authors Mark Ryan is the AI Lead of the Developer Knowledge Platform at Google. A three-time Kaggle Grandmaster, Luca Massaron is a Google Developer Expert (GDE) in machine learning and AI. He has published 17 other books. Quotes
Unlock the power of Amazon SageMaker Studio, a comprehensive IDE for streamlining the machine learning (ML) lifecycle. Explore data exploration, transformation, automated feature engineering with AutoML, and collaborative coding using integrated Jupyter Notebooks. Discover how SageMaker Studio MLOps integration simplifies model deployment, monitoring, and governance. Through live demos and best practices, learn to leverage SageMaker Studio tools for efficient feature engineering, model development, collaboration, and data security.
Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP
Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4
About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
AWSreInvent #AWSreInvent2024
Prepare for Microsoft Exam DP-100 and demonstrate your real-world knowledge of managing data ingestion and preparation, model training and deployment, and machine learning solution monitoring with Python, Azure Machine Learning, and MLflow. Designed for professionals with data science experience, this Exam Ref focuses on the critical thinking and decision-making acumen needed for success at the Microsoft Certified: Azure Data Scientist Associate level. Focus on the expertise measured by these objectives: Design and prepare a machine learning solution Explore data and train models Prepare a model for deployment Deploy and retrain a model This Microsoft Exam Ref: Organizes its coverage by exam objectives Features strategic, what-if scenarios to challenge you Assumes you have experience in designing and creating a suitable working environment for data science workloads, training machine learning models, and managing, deploying, and monitoring scalable machine learning solutions About the Exam Exam DP-100 focuses on knowledge needed to design and prepare a machine learning solution, manage an Azure Machine Learning workspace, explore data and train models, create models by using the Azure Machine Learning designer, prepare a model for deployment, manage models in Azure Machine Learning, deploy and retrain a model, and apply machine learning operations (MLOps) practices. About Microsoft Certification Passing this exam fulfills your requirements for the Microsoft Certified: Azure Data Scientist Associate credential, demonstrating your expertise in applying data science and machine learning to implement and run machine learning workloads on Azure, including knowledge and experience using Azure Machine Learning and MLflow.
Amazon SageMaker provides purpose-built tools to create a reliable path to production for both machine learning and generative AI workflows. SageMaker MLOps helps you automate and standardize processes across generative AI and ML lifecycles. Using SageMaker, you can train, test, troubleshoot, deploy, and govern models at scale to boost your productivity while maintaining model performance in production. Explore the latest and greatest capabilities such as SageMaker Experiments with MLflow, SageMaker Pipelines, and SageMaker Model Registry supporting efficiencies in your ML workflow (MLOps) and generative AI workflows (FMOps). Learn how to bring generative AI concept to production quickly and securely.
Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP
Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4
About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
AWSreInvent #AWSreInvent2024
We talked about:
00:00 DataTalks.Club intro
02:34 Career journey and transition into MLOps
08:41 Dutch agriculture and its challenges
10:36 The concept of "technical debt" in MLOps
13:37 Trade-offs in MLOps: moving fast vs. doing things right
14:05 Building teams and the role of coordination in MLOps
16:58 Key roles in an MLOps team: evangelists and tech translators
23:01 Role of the MLOps team in an organization
25:19 How MLOps teams assist product teams
27 :56 Standardizing practices in MLOps
32:46 Getting feedback and creating buy-in from data scientists
36:55 The importance of addressing pain points in MLOps
39:06 Best practices and tools for standardizing MLOps processes
42:31 Value of data versioning and reproducibility
44:22 When to start thinking about data versioning
45:10 Importance of data science experience for MLOps
46:06 Skill mix needed in MLOps teams
47:33 Building a diverse MLOps team
48:18 Best practices for implementing MLOps in new teams
49:52 Starting with CI/CD in MLOps
51:21 Key components for a complete MLOps setup
53:08 Role of package registries in MLOps
54:12 Using Docker vs. packages in MLOps
57:56 Examples of MLOps success and failure stories
1:00:54 What MLOps is in simple terms
1:01:58 The complexity of achieving easy deployment, monitoring, and maintenance
Join our Slack: https://datatalks .club/slack.html
The "LLM Engineer's Handbook" is your comprehensive guide to mastering Large Language Models from concept to deployment. Written by leading experts, it combines theoretical foundations with practical examples to help you build, refine, and deploy LLM-powered solutions that solve real-world problems effectively and efficiently. What this Book will help me do Understand the principles and approaches for training and fine-tuning Large Language Models (LLMs). Apply MLOps practices to design, deploy, and monitor your LLM applications effectively. Implement advanced techniques such as retrieval-augmented generation (RAG) and preference alignment. Optimize inference for high performance, addressing low-latency and high availability for production systems. Develop robust data pipelines and scalable architectures for building modular LLM systems. Author(s) Paul Iusztin and Maxime Labonne are experienced AI professionals specializing in natural language processing and machine learning. With years of industry and academic experience, they are dedicated to making complex AI concepts accessible and actionable. Their collaborative authorship ensures a blend of theoretical rigor and practical insights tailored for modern AI practitioners. Who is it for? This book is tailored for AI engineers, NLP professionals, and LLM practitioners who wish to deepen their understanding of Large Language Models. Ideal readers possess some familiarity with Python, AWS, and general AI concepts. If you aim to apply LLMs to real-world scenarios or enhance your expertise in AI-driven systems, this handbook is designed for you.
This book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists
Scaling machine learning at large organizations like Renault Group presents unique challenges, in terms of scales, legal requirements, and diversity of use cases. Data scientists require streamlined workflows and automated processes to efficiently deploy models into production. We present an MLOps pipeline based on python Kubeflow and GCP Vertex AI API designed specifically for this purpose. It enables data scientists to focus on code development for pre-processing, training, evaluation, and prediction. This MLOPS pipeline is a cornerstone of the AI@Scale program, which aims to roll out AI across the Group.
We choose a Python-first approach, allowing Data scientists to focus purely on writing preprocessing or ML oriented Python code, also allowing data retrieval through SQL queries. The pipeline addresses key questions such as prediction type (batch or API), model versioning, resource allocation, drift monitoring, and alert generation. It favors faster time to market with automated deployment and infrastructure management. Although we encountered pitfalls and design difficulties, that we will discuss during the presentation, this pipeline integrates with a CI/CD process, ensuring efficient and automated model deployment and serving.
Finally, this MLOps solution empowers Renault data scientists to seamlessly translate innovative models into production, and smoothen the development of scalable, and impactful AI-driven solutions.
This talk explores the disconnect between MLOps fundamental principles and their practical application in designing ML pipelines
In the world of GenAI, advancements are happening at a crazy speed. These advancements concern not only the algorithms but also the operations side of things. In this talk, we will go back to the basics, discuss the main principles of building robust ML systems (traceability, reproducibility, and monitoring), and explain what types of tools are required to support these principles for different types of applications.
Crafting Tech Stacks to Embrace Traditional and Generative AI in Enterprise Environments In this talk, Bas will present a reference architecture for machine learning systems that incorporates MLOps standards and best practices. This blueprint promises scalability and effectiveness for ML platforms, integrating modern technological concepts such as feature stores, vector stores, and model registries seamlessly into the architecture. With a spotlight on emerging generative AI techniques like retrieval-augmented generation, attendees will gain valuable insights into harnessing the power of modern AI practices. Additionally, Bas will delve into the aspects of MLOps, including feedback loops and model monitoring, ensuring a holistic understanding of how to operationalize and optimize ML systems for sustained success.
AI is full of buzzwords, but what do they really mean for your business? In this 30-minute session, we’ll demystify key AI terms such as Artificial Intelligence, Machine Learning, Deep Learning, NLP, and MLOps. More importantly, we’ll demonstrate how these concepts can be applied to deliver tangible business value.
Through practical case studies, you’ll discover how organisations are using AI to optimise processes and achieve measurable outcomes. We’ll also discuss how to align AI initiatives with your business objectives to ensure success.
Join us for an insightful journey that simplifies AI and equips you with actionable strategies. Plus, stay for an interactive Q&A to explore how these ideas can be tailored to your needs.
Note: Visit Billigence at Stand Y239 for further insights.