Redefining AI Infrastructure: Open-Source, Chips, and the Future Beyond Kubernetes – Andrey Cheptsov

2025-01-31 · DataTalks.Club Listen

podcast_episode

by Andrey Cheptsov (dstack)

AI/ML Cloud Computing GitHub HTML LLM

In this podcast episode, we talked with Andrey Cheptsov about The future of AI infrastructure.

About the Speaker: Andrey Cheptsov is the founder and CEO of dstack, an open-source alternative to Kubernetes and Slurm, built to simplify the orchestration of AI infrastructure. Before dstack, Andrey worked at JetBrains for over a decade helping different teams make the best developer tools. During the event, the guest, Andrey Cheptsov, founder and CEO of dstack, discussed the complexities of AI infrastructure. We explore topics like the challenges of using Kubernetes for AI workloads, the need to rethink container orchestration, and the future of hybrid and cloud-only infrastructures. Andrey also shares insights into the role of on-premise and bare-metal solutions, edge computing, and federated learning. 00:00 Andrey's Career Journey: From JetBrains to DStack 5:00 The Motivation Behind DStack 7:00 Challenges in Machine Learning Infrastructure 10:00 Transitioning from Cloud to On-Prem Solutions 14:30 Reflections on OpenAI's Evolution 17:30 Open Source vs Proprietary Models: A Balanced Perspective 21:01 Monolithic vs. Decentralized AI businesses 22:05 The role of privacy and control in AI for industries like banking and healthcare 30:00 Challenges in training large AI models: GPUs and distributed systems 37:03 DeepSpeed's efficient training approach vs. brute force methods 39:00 Challenges for small and medium businesses: hosting and fine-tuning models 47:01 Managing Kubernetes challenges for AI teams 52:00 Hybrid vs. cloud-only infrastructure 56:03 On-premise vs. bare-metal solutions 58:05 Exploring edge computing and its challenges

🔗 CONNECT WITH ANDREY CHEPTSOV Twitter - / andrey_cheptsov Linkedin - / andrey-cheptsov GitHub - https://github.com/dstackai/dstack/ Website - https://dstack.ai/

🔗 CONNECT WITH DataTalksClub Join DataTalks.Club:⁠⁠⁠https://datatalks.club/slack.html⁠⁠⁠ Our events:⁠⁠⁠https://datatalks.club/events.html⁠⁠⁠ Datalike Substack -⁠⁠⁠https://datalike.substack.com/⁠⁠⁠ LinkedIn:⁠⁠⁠ / datatalks-club ⁠

Revolutionizing Hybrid Computing w/ Matthew Shaxted

2024-12-27 · Data Unchained

podcast_episode

by Matthew Shaxted (Parallel Works) , Molly Presley

AI/ML Big Data Cloud Computing

Welcome to Data Unchained, the podcast where we delve into the evolving world of decentralized data and workflows. Hosted by Molly Presley, this episode features a thought-provoking discussion with Matthew Shaxted, Co-Founder and CEO of Parallel Works, about the challenges and opportunities in hybrid and multi-cloud environments. Key Highlights: - The journey of Parallel Works: From HPC simulations to democratizing large-scale computing resources. - The convergence of HPC and AI infrastructure—how organizations are adapting to GPU-heavy workflows. - Overcoming decentralized data challenges: Solutions for application portability and cost-efficient workload management. The evolution of AI-driven task placement for seamless resource optimization. - Real-world insights into managing hybrid and multi-cloud workloads with cost controls and global namespaces. - Matthew also introduces ACTIVATE, Parallel Works' next-gen hybrid multi-cloud platform, and shares exciting announcements for the future, including advancements in Kubernetes integration and benchmarking AI task placement. Learn more about Parallel Works: https://parallel.works @parallel-works

dataunchained #DecentralizedData #HybridCloud #MultiCloud #HPC #AIWorkflows #ParallelWorks #DataManagement #CloudComputing #ArtificialIntelligence #DataInnovation #TechPodcast #BigData #MachineLearning #futureofai

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Infrastructure migrations at scale

2024-11-27 · How We Improve Developer Experience at Grammarly

talk

by Roman Pogribnyi (Grammarly)

AWS ci/cd

How we tackle infrastructure migrations at scale, with hundreds of services and dozens of dev teams; Changing culture to aid teams in using the new tooling; DevX practices and tooling for team onboarding and self-service; Internally developed tools to keep on-call rotation manageable during and beyond the migration process.

Building Your Own Cloud: From Bare Metal to Modern Platform

2024-11-27 · Platform Engineering, Cloud Infrastructure, and Latest Innovations

talk

by Florian Stadler (Pulumi)

Pulumi hetzner

Are rising cloud costs keeping you up at night? With companies like 37signals making headlines for their cloud exodus, many organizations are reconsidering their infrastructure strategy. But what does it really take to build and run your own cloud platform?In this technical session, we'll explore how to build a modern cloud platform on bare metal infrastructure using Pulumi and Kubernetes. Using Hetzner as our example provider, we'll demonstrate how to create a cost-effective, controllable, and scalable infrastructure.

Bridging Code and UI in Data Orchestration with Kestra

2024-11-26 · Data Engineering Podcast Listen

podcast_episode

by Anna Geller , Tobias Macey

AI/ML Analytics API CI/CD Collibra Data Engineering Data Management Datafold ETL/ELT Kestra

Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestration. Anna defines data orchestration as automating the coordination of workflow nodes that interact with data across various business functions, discussing how it goes beyond ETL and analytics to enable real-time data processing across different internal systems. She explores the challenges of using existing scheduling tools for data-specific workflows, highlighting limitations and anti-patterns, and discusses Kestra's solution, a low-code orchestration platform that combines code-driven flexibility with UI-driven simplicity. Anna delves into Kestra's architectural design, API-first approach, and pluggable infrastructure, and shares insights on balancing UI and code-driven workflows, the challenges of open-core business models, and innovative user applications of Kestra's platform.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us you should listen to Data Citizens® Dialogues, the forward-thinking podcast from the folks at Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. They address questions around AI governance, data sharing, and working at global scale. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. While data is shaping our world, Data Citizens Dialogues is shaping the conversation. Subscribe to Data Citizens Dialogues on Apple, Spotify, Youtube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Anna Geller about incorporating both code and UI driven interfaces for data orchestrationInterview IntroductionHow did you get involved in the area of data management?Can you start by sharing a definition of what constitutes "data orchestration"?There are many orchestration and scheduling systems that exist in other contexts (e.g. CI/CD systems, Kubernetes, etc.). Those are often adapted to data workflows because they already exist in the organizational context. What are the anti-patterns and limitations that approach introduces in data workflows?What are the problems that exist in the opposite direction of using data orchestrators for CI/CD, etc.?Data orchestrators have been around for decades, with many different generations and opinions about how and by whom they are used. What do you see as the main motivation for UI vs. code-driven workflows?What are the benefits of combining code-driven and UI-driven capabilities in a single orchestrator?What constraints does it necessitate to allow for interoperability between those modalities?Data Orchestrators need to integrate with many external systems. How does Kestra approach building integrations and ensure governance for all their underlying configurations?Managing workflows at scale across teams can be challenging in terms of providing structure and visibility of dependencies across workflows and teams. What features does Kestra offer so that all pipelines and teams stay organised?What are

Kubernetes + Pulumi: Pulumi Kubernetes provider, Docker provider, and GitOps with Pulumi Operator

2024-11-18 · Workshop: Pulumi and Kubernetes - Better Together

workshop

by Josh Kodroff (Pulumi)

Pulumi YAML gitops helm pulumi docker provider pulumi kubernetes provider

Hands-on workshop on using Pulumi to deploy and manage Kubernetes applications, including the Pulumi Kubernetes provider, Pulumi Docker provider, integration with YAML manifests and Helm charts, and running Pulumi IaC programs in a GitOps fashion.

Application and Kubernetes Security

2024-10-29 · Best Practices for Designing a Secure Google Cloud Network

talk

application security gke

Application and Kubernetes Security

The irony of cloud cost-cutting: when saving money leads to losing money

2024-09-26 · BLN DevOps Sept edition #47

talk

by Timur Bublik (TIER Mobility SE)

AWS Docker Terraform kafka/confluent cloud opentofu spacelift

At TIER Mobility, we successfully reduced our cloud expenses by over 60% in less than two years. While this was a significant achievement, the journey wasn’t without its challenges. In this presentation, I’ll share insights into the potential pitfalls of cost reduction strategies that might end up being more expensive in the long run.

Batteries included: Making [open] Observability easier for users

2024-09-20 · Kubernetes & Cloud Native Berlin Meetup September Edition

talk

by Aditya Konarde (Grafana Labs)

observability open source

With dozens of both open and closed source tools available at hand, setting up observability for your applications may seem like a daunting task. In this talk, Aditya will share his experiences with observability, and show some ways to get you a head-start on your journey. With a collection of open-source tooling, we will take a look at how observability can be made easier for Kubernetes and beyond. This talk will conclude with a demo that shows up some of the latest advancements in open-source observability tools.

Crafting Your Platform Experience

2024-09-20 · Kubernetes & Cloud Native Berlin Meetup September Edition

talk

by René Dudfield (Microsoft)

cncf headlamp

Showing how you can construct a custom platform dashboard. Headlamp is an open-source CNCF sandbox project for making custom Kubernetes platform experiences. Making your own dashboard for your organization's platform has advantages: you can provide a minimal set of features for your users in one place, instead of all the features in a portal you can reduce it down to only the ones they need. I will show: how to extend Headlamp to craft this custom experience for your platform's users; how you can provide UIs for CNCF ecosystem tools inside your platform UI, rather than use separate tools.

One Size Fits All - On AI, Platforms & Cloud Native.

2024-08-21 · NYC Meetup On AI, Platforms & Cloud Native

talk

by Ramiro Berelleza (Okteto)

ai infrastructure genai remote development environments

Your AI team doesn't need a platform, but a paved ramp sure can help! In this session, Ramiro will discuss the risks of premature platformatization, why Kubernetes is the best tool for AI Infrastructure, and how remote development environments are especially useful when it comes to building paved roads for AI development.

Big Data on Kubernetes

2024-07-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Neylson Crepalde

Airflow BI Big Data Docker Kafka Python Spark SQL YAML data data-engineering streaming-messaging

Big Data on Kubernetes is your comprehensive guide to leveraging Kubernetes for scalable and efficient big data solutions. You will learn key concepts of Kubernetes architecture and explore tools like Apache Spark, Airflow, and Kafka. Gain hands-on experience building complete data pipelines to tackle real-world data challenges. What this Book will help me do Understand Kubernetes architecture and learn to deploy and manage clusters. Build and orchestrate big data pipelines using Spark, Airflow, and Kafka. Develop scalable and resilient data solutions with Docker and Kubernetes. Integrate and optimize data tools for real-time ingestion and processing. Apply concepts to hands-on projects addressing actual big data scenarios. Author(s) Neylson Crepalde is an experienced data specialist with extensive knowledge of Kubernetes and big data solutions. With deep practical experience, Neylson brings real-world insights to his writing. His approach emphasizes actionable guidance and relatable problem-solving with a strong foundation in scalable architecture. Who is it for? This book is ideal for data engineers, BI analysts, data team leaders, and tech managers familiar with Python, SQL, and YAML. Targeted at professionals seeking to develop or expand their expertise in scalable big data solutions, it provides practical insights into Docker, Kubernetes, and prominent big data tools.