API

#327 Building a Sales and Marketing Capability for Data Applications with Denise Persson, CMO at Snowflake, and Chris Degnan, former CRO at Snowflake

2025-10-20 · DataFramed Listen

podcast_episode

by Chris Degnan (Snowflake) , Richie (DataCamp) , Denise Persson (Snowflake)

AI/ML DWH Informatica Marketing Snowflake VMware

The journey from startup to billion-dollar enterprise requires more than just a great product—it demands strategic alignment between sales and marketing. How do you identify your ideal customer profile when you're just starting out? What data signals help you find the twins of your successful early adopters? With AI now automating everything from competitive analysis to content creation, the traditional boundaries between departments are blurring. But what personality traits should you look for when building teams that can scale with your growth? And how do you ensure your data strategy supports rather than hinders your AI ambitions in this rapidly evolving landscape? Denise Persson is CMO at Snowflake and has 20 years of technology marketing experience at high-growth companies. Prior to joining Snowflake, she served as CMO for Apigee, an API platform company that went public in 2015 and Google acquired in 2016. She began her career at collaboration software company Genesys, where she built and led a global marketing organization. Denise also helped lead Genesys through its expansion to become a successful IPO and acquired company. Denise holds a BA in Business Administration and Economics from Stockholm University, and holds an MBA from Georgetown University. Chris Degnan is the former CRO at Snowflake and has over 15 years of enterprise technology sales experience. Before working at Snowflake, Chris served as the AVP of the West at EMC, and prior to that as VP Western Region at Aveksa, where he helped grow the business 250% year-over-year. Before Aveksa, Chris spent eight years at EMC and managed a team responsible for 175 select accounts. Prior to EMC, Chris worked in enterprise sales at Informatica and Covalent Technologies (acquired by VMware). He holds a BA from the University of Delaware. In the episode, Richie, Denise, and Chris explore the journey to a billion-dollar ARR, the importance of customer obsession, aligning sales and marketing, leveraging data for decision-making, and the role of AI in scaling operations, and much more. Links Mentioned in the Show: SnowflakeSnowflake BUILDConnect with Denise and ChrisSnowflake is FREE on DataCamp this weekRelated Episode: Adding AI to the Data Warehouse with Sridhar Ramaswamy, CEO at SnowflakeRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Foundational Data Engineering At Two Sigma

2025-07-06 · Data Engineering Podcast Listen

podcast_episode

by Effie Baram (Two Sigma) , Tobias Macey

AI/ML Data Collection Data Engineering Data Management Data Quality Datafold Python

Summary In this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing data quality with delivery speed, and the socio-technical challenges of building a foundational data platform that supports research and operational needs while maintaining regulatory compliance and data quality. Effie also shares insights into treating data as code, leveraging modern data warehouses, and the evolving role of data engineers in a rapidly changing technological landscape.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial. Your host is Tobias Macey and today I'm interviewing Effie Baram about data engineering in the finance sectorInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the role of data in the context of Two Sigma?What are some of the key characteristics of the types of data sources that you work with?Your role is leading "foundational data engineering" at Two Sigma. Can you unpack that title and how it shapes the ways that you think about what you build?How does the concept of "foundational data" influence the ways that the business thinks about the organizational patterns around data?Given the regulatory environment around finance, how does that impact the ways that you think about the "what" and "how" of the data that you deliver to data consumers?Being the foundational team for data use at Two Sigma, how have you approached the design and architecture of your technical systems?How do you think about the boundaries between your responsibilities and the rest of the organization?What are the design patterns that you have found most helpful in empowering data consumers to build on top of your work?What are some of the elements of sociotechnical friction that have been most challenging to address?What are the most interesting, innovative, or unexpected ways that you have seen the ideas around "foundational data" applied in your organization?What are the most interesting, unexpected, or challenging lessons that you have learned while working with financial data?When is a foundational data team the wrong approach?What do you have planned for the future of your platform design?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links 2SigmaReliability EngineeringSLA == Service-Level AgreementAirflowParquet File FormatBigQuerySnowflakedbtGemini AssistMCP == Model Context ProtocoldtraceThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Enabling Agents In The Enterprise With A Platform Approach

2025-06-29 · Data Engineering Podcast Listen

podcast_episode

by Arun Joseph (Deutsche Telekom) , Tobias Macey

AI/ML Data Collection Data Engineering Data Management Data Quality Datafold GenAI Python

Summary In this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agent systems, Arun shares insights on building agentic systems at an organizational scale, highlighting the importance of robust models, data connectivity, and orchestration loops. Listen in as he discusses the challenges of managing data context and cost in large-scale agent systems, the need for a unified context management platform to prevent data silos, and the potential for open-source projects like LMOS to provide a foundational substrate for agentic use cases that can transform enterprise architectures by enabling more efficient data management and decision-making processes.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial. Your host is Tobias Macey and today I'm interviewing Arun Joseph about building an agent platform to empower the business to adopt agentic capabilitiesInterview IntroductionHow did you get involved in the area of data management?Can you start by giving an overview of how Deutsche Telekom has been approaching applications of generative AI?What are the key challenges that have slowed adoption/implementation?Enabling non-engineering teams to define and manage AI agents in production is a challenging goal. From a data engineering perspective, what does the abstraction layer for these teams look like? How do you manage the underlying data pipelines, versioning of agents, and monitoring of these user-defined agents?What was your process for developing the architecture and interfaces for what ultimately became the LMOS?How do the principles of operatings systems help with managing the abstractions and composability of the framework?Can you describe the overall architecture of the LMOS?What does a typical workflow look like for someone who wants to build a new agent use case?How do you handle data discovery and embedding generation to avoid unnecessary duplication of processing?With your focus on openness and local control, how do you see your work complementing projects like OumiWhat are the most interesting, innovative, or unexpected ways that you have seen LMOS used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on LMOS?When is LMOS the wrong choice?What do you have planned for the future of LMOS and MASAIC?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links LMOSDeutsche TelekomMASAICOpenAI Agents SDKRAG == Retrieval Augmented GenerationLangChainMarvin MinskyVector DatabaseMCP == Model Context ProtocolA2A (Agent to Agent) ProtocolQdrantLlamaIndexDVC == Data Version ControlKubernetesKotlinIstioXerox PARC)OODA (Observe, Orient, Decide, Act) LoopThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Dagster's New Era: Modularizing Data Transformation in the Age of AI

2025-06-18 · Data Engineering Podcast Listen

podcast_episode

by Nick Schrock (Elementl) , Tobias Macey

AI/ML Analytics Dagster Dashboard Data Collection Data Contracts Data Engineering Data Management Data Quality Datafold KPI LLM

Summary In this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insights on how it will ultimately enhance productivity and expand software engineering's scope. He delves into the current state of AI adoption, the importance of maintaining core data engineering principles, and the need for human oversight when leveraging AI tools effectively. Nick also introduces Dagster's new components feature, designed to modularize and standardize data transformation processes, making it easier for teams to collaborate and integrate AI into their workflows. Join in to explore the future of data engineering, the potential for AI to abstract away complexity, and the importance of open standards in preventing walled gardens in the tech industry.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementThis episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial. Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th.Your host is Tobias Macey and today I'm interviewing Nick Schrock about lowering the barrier to entry for data platform consumersInterview IntroductionHow did you get involved in the area of data management?Can you start by giving your summary of the impact that the tidal wave of AI has had on data platforms and data teams?For anyone who hasn't heard of Dagster, can you give a quick summary of the project?What are the notable changes in the Dagster project in the past year?What are the ecosystem pressures that have shaped the ways that you think about the features and trajectory of Dagster as a project/product/community?In your recent release you introduced "components", which is a substantial change in how you enable teams to collaborate on data problems. What was the motivating factor in that work and how does it change the ways that organizations engage with their data?tension between being flexible and extensible vs. opinionated and constrainedincreased dependency on orchestration with LLM use casesreducing the barrier to contribution for data platform/pipelinesbringing application engineers into the mixchallenges of meeting users/teams where they are (languages, platform investments, etc.)What are the most interesting, innovative, or unexpected ways that you have seen teams applying the Components pattern?What are the most interesting, unexpected, or challenging lessons that you have learned while working on the latest iterations of Dagster?When is Dagster the wrong choice?What do you have planned for the future of Dagster?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links Dagster+ EpisodeDagster Components Slide DeckThe Rise Of Medium CodeLakehouse ArchitectureIcebergDagster ComponentsPydantic ModelsKubernetesDagster PipesRuby on RailsdbtSlingFivetranTemporalMCP == Model Context ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

AI and the Lakehouse: How Starburst is Pioneering New Workflows

2025-06-11 · Data Engineering Podcast Listen

podcast_episode

by Tobias Macey , Alex Albu (Starburst)

AI/ML Analytics Dashboard Data Collection Data Contracts Data Engineering Data Lakehouse Data Management Data Quality Datafold Iceberg KPI +6 more

Summary In this episode of the Data Engineering Podcast Alex Albu, tech lead for AI initiatives at Starburst, talks about integrating AI workloads with the lakehouse architecture. From his software engineering roots to leading data engineering efforts, Alex shares insights on enhancing Starburst's platform to support AI applications, including an AI agent for data exploration and using AI for metadata enrichment and workload optimization. He discusses the challenges of integrating AI with data systems, innovations like SQL functions for AI tasks and vector databases, and the limitations of traditional architectures in handling AI workloads. Alex also shares his vision for the future of Starburst, including support for new data formats and AI-driven data exploration tools.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.This is a pharmaceutical Ad for Soda Data Quality. Do you suffer from chronic dashboard distrust? Are broken pipelines and silent schema changes wreaking havoc on your analytics? You may be experiencing symptoms of Undiagnosed Data Quality Syndrome — also known as UDQS. Ask your data team about Soda. With Soda Metrics Observability, you can track the health of your KPIs and metrics across the business — automatically detecting anomalies before your CEO does. It’s 70% more accurate than industry benchmarks, and the fastest in the category, analyzing 1.1 billion rows in just 64 seconds. And with Collaborative Data Contracts, engineers and business can finally agree on what “done” looks like — so you can stop fighting over column names, and start trusting your data again.Whether you’re a data engineer, analytics lead, or just someone who cries when a dashboard flatlines, Soda may be right for you. Side effects of implementing Soda may include: Increased trust in your metrics, reduced late-night Slack emergencies, spontaneous high-fives across departments, fewer meetings and less back-and-forth with business stakeholders, and in rare cases, a newfound love of data. Sign up today to get a chance to win a $1000+ custom mechanical keyboard. Visit dataengineeringpodcast.com/soda to sign up and follow Soda’s launch week. It starts June 9th. This episode is brought to you by Coresignal, your go-to source for high-quality public web data to power best-in-class AI products. Instead of spending time collecting, cleaning, and enriching data in-house, use ready-made multi-source B2B data that can be smoothly integrated into your systems via APIs or as datasets. With over 3 billion data records from 15+ online sources, Coresignal delivers high-quality data on companies, employees, and jobs. It is powering decision-making for more than 700 companies across AI, investment, HR tech, sales tech, and market intelligence industries. A founding member of the Ethical Web Data Collection Initiative, Coresignal stands out not only for its data quality but also for its commitment to responsible data collection practices. Recognized as the top data provider by Datarade for two consecutive years, Coresignal is the go-to partner for those who need fresh, accurate, and ethically sourced B2B data at scale. Discover how Coresignal's data can enhance your AI platforms. Visit dataengineeringpodcast.com/coresignal to start your free 14-day trial.Your host is Tobias Macey and today I'm interviewing Alex Albu about how Starburst is extending the lakehouse to support AI workloadsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the interaction points of AI with the types of data workflows that you are supporting with Starburst?What are some of the limitations of warehouse and lakehouse systems when it comes to supporting AI systems?What are the points of friction for engineers who are trying to employ LLMs in the work of maintaining a lakehouse environment?Methods such as tool use (exemplified by MCP) are a means of bolting on AI models to systems like Trino. What are some of the ways that is insufficient or cumbersome?Can you describe the technical implementation of the AI-oriented features that you have incorporated into the Starburst platform?What are the foundational architectural modifications that you had to make to enable those capabilities?For the vector storage and indexing, what modifications did you have to make to iceberg?What was your reasoning for not using a format like Lance?For teams who are using Starburst and your new AI features, what are some examples of the workflows that they can expect?What new capabilities are enabled by virtue of embedding AI features into the interface to the lakehouse?What are the most interesting, innovative, or unexpected ways that you have seen Starburst AI features used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI features for Starburst?When is Starburst/lakehouse the wrong choice for a given AI use case?What do you have planned for the future of AI on Starburst?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links StarburstPodcast EpisodeAWS AthenaMCP == Model Context ProtocolLLM Tool UseVector EmbeddingsRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeStarburst Data ProductsLanceLanceDBParquetORCpgvectorStarburst IcehouseThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Bringing AI Into The Inner Loop of Data Engineering With Ascend

2025-03-24 · Data Engineering Podcast Listen

podcast_episode

by Sean Knapp (Ascend) , Tobias Macey

AI/ML Data Engineering Data Management Datafold GenAI LLM Python

Summary In this episode of the Data Engineering Podcast Sean Knapp, CEO of Ascend.io, explores the intersection of AI and data engineering. He discusses the evolution of data engineering and the role of AI in automating processes, alleviating burdens on data engineers, and enabling them to focus on complex tasks and innovation. The conversation covers the challenges and opportunities presented by AI, including the need for intelligent tooling and its potential to streamline data engineering processes. Sean and Tobias also delve into the impact of generative AI on data engineering, highlighting its ability to accelerate development, improve governance, and enhance productivity, while also noting the current limitations and future potential of AI in the field.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Sean Knapp about how Ascend is incorporating AI into their platform to help you keep up with the rapid rate of changeInterview IntroductionHow did you get involved in the area of data management?Can you describe what Ascend is and the story behind it?The last time we spoke was August of 2022. What are the most notable or interesting evolutions in your platform since then?In that same time "AI" has taken up all of the oxygen in the data ecosystem. How has that impacted the ways that you and your customers think about their priorities?The introduction of AI as an API has caused many organizations to try and leap-frog their data maturity journey and jump straight to building with advanced capabilities. How is that impacting the pressures and priorities felt by data teams?At the same time that AI-focused product goals are straining data teams capacities, AI also has the potential to act as an accelerator to their work. What are the roadblocks/speedbumps that are in the way of that capability?Many data teams are incorporating AI tools into parts of their workflow, but it can be clunky and cumbersome. How are you thinking about the fundamental changes in how your platform works with AI at its center?Can you describe the technical architecture that you have evolved toward that allows for AI to drive the experience rather than being a bolt-on?What are the concrete impacts that these new capabilities have on teams who are using Ascend?What are the most interesting, innovative, or unexpected ways that you have seen Ascend + AI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on incorporating AI into the core of Ascend?When is Ascend the wrong choice?What do you have planned for the future of AI in Ascend?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AscendCursor AI Code EditorDevinGitHub CopilotOpenAI DeepResearchS3 TablesAWS GlueAWS BedrockSnowparkCo-Intelligence: Living and Working with AI by Ethan Mollick (affiliate link)OpenAI o3The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

#82 AI Cracked a Decades-Old Science Problem & Europe’s Push for Digital Sovereignty, The Secret Behind MCP, and The Cloud Hack That Slashes Costs By 62%.

2025-03-20 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

AI/ML AWS Cloud Computing LLM

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. DataTopics Unpluggedis your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don’t), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data—unplugged style! In this episode: OpenAI asks White House for AI regulation relief: OpenAI seeks federal-level AI policy exceptions in exchange for transparency. But is this a sign they’re losing momentum?Hot take: GPT-4.5 is a ‘nothing burger’: Is GPT-4.5 actually an upgrade, or just a well-marketed rerun?Claude 3.7 & Blowing €100 in Two Days: One of the hosts tests Claude extensively—and racks up a pricey bill. Was it worth it?OpenAI’s Deep Research: How does OpenAI’s new research tool compare to Perplexity?AI cracks superbug problem in two days: AI speeds up decades of scientific research—should we be impressed or concerned?European tech coalition demands ‘radical action’ on digital sovereignty: Big names like Airbus and Proton push for homegrown European tech.Migrating from AWS to a European cloud: A real-world case study on cutting costs by 62%—is it worth the trade-offs?Docs by the French government: A Notion alternative for open-source government collaboration.Why people hate note-taking apps: A deep dive into the frustrations with Notion, Obsidian, and alternatives.Model Context Protocol (MCP): How MCP is changing AI tool integrations—and why OpenAI isn’t on board (yet).OpenRouter.ai: The one-stop API for switching between AI models. Does it live up to the hype?OTDiamond.ai: A multi-LLM approach that picks the best model for your queries to balance cost and performance.Are you polite to AI?: Study finds most people say "please" to ChatGPT—good manners or fear of the AI uprising?AI refusing to do your work?: A hilarious case of an AI refusing to generate code because it "wants you to learn."And finally, a big announcement—DataTopics Unplugged is evolving! Stay tuned for an updated format and a fresh take on tech discussions.

The Future of Time Management

2025-03-17 · Data & AI with Mukundan | Learn AI by Building Listen

podcast_episode

by Mukundan Sankar

AI/ML LLM

Summary In this episode of Data and AI with Mukundan, the host discusses the creation and impact of an AI life planner designed to enhance productivity and time management. The conversation covers the technology behind the planner, including the use of GPT-4, Google Calendar API, and the Pomodoro technique, as well as the personal transformation experienced by the host as a result of implementing this tool. Takeaways Most of us struggle with time management.AI can help optimize our schedules.The AI life planner analyzes daily habits.It syncs with Google Calendar for seamless planning.Reminders are sent via Slack API integration.A Pomodoro timer helps maintain focus.The planner allows for real-time adjustments.Productivity can skyrocket with the right tools.You can build your own AI life planner.Engaging with the audience for feedback is important.If you want to see exactly how I built this AI Life Planner, check out my full guide here: https://mukundansankar.substack.com/p/i-never-thought-i-had-my-life-together

How the App looks like: https://youtu.be/pyyWV7-Ty5w?feature=shared

Episode 225: CppNorth & Flux Plans, The Slow Death of Twitter and More!

2025-03-14 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Tristan Brindle (C++ London Uni) , Bryce Adelstein Lelbach (NVIDIA)

GitHub Rust

In this episode, Conor and Ben chat with Tristan Brindle about plans for CppNorth 2025, plans for Flux, the slow death of Twitter and more! Link to Episode 225 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBen Deane: Twitter | BlueSkyAbout the Guest Tristan Brindle a freelance programmer and trainer based in London, mostly focussing on C++. He is a member of the UK national body (BSI) and ISO WG21. Occasionally I can be found at C++ conferences. He is also a director of C++ London Uni, a not-for-profit organisation offering free beginner programming classes in London and online. He has a few fun projects on GitHub that you can find out about here. Show Notes Date Generated: 2025-02-17 Date Released: 2025-03-14 CppNorth 2025FluxIteration Revisited: A Safer Iteration Model for C++ - Tristan Brindle - CppNorth 2023ADSP Episode 126: Flux (and Flow) with Tristan BrindleIterators and Ranges: Comparing C++ to D to Rust - Barry Revzin - [CppNow 2021]Keynote: Iterators and Ranges: Comparing C++ to D, Rust, and Others - Barry Revzin - CPPP 2021Iteration Inside and Out - Bob Nystrom BlogExpanding the internal iteration API #99std::distancestd::ranges::distanceC++ London MeetupDenver C++ MeetupIntro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

Episode 224: Flux Updates & Internal Iteration

2025-03-07 · ADSP: Algorithms + Data Structures = Programs Listen

podcast_episode

by Conor Hoekstra , Tristan Brindle (C++ London Uni) , Bryce Adelstein Lelbach (NVIDIA)

GitHub Rust

In this episode, Conor and Ben chat with Tristan Brindle about recent updates to Flux, internal iteration vs external iteration and more. Link to Episode 224 on WebsiteDiscuss this episode, leave a comment, or ask a question (on GitHub)Socials ADSP: The Podcast: TwitterConor Hoekstra: Twitter | BlueSky | MastodonBen Deane: Twitter | BlueSkyAbout the Guest Tristan Brindle a freelance programmer and trainer based in London, mostly focussing on C++. He is a member of the UK national body (BSI) and ISO WG21. Occasionally I can be found at C++ conferences. He is also a director of C++ London Uni, a not-for-profit organisation offering free beginner programming classes in London and online. He has a few fun projects on GitHub that you can find out about here. Show Notes Date Generated: 2025-02-17 Date Released: 2025-03-07 FluxLightning Talk: Faster Filtering with Flux - Tristan Brindle - CppNorth 2023Arrays, Fusion & CPUs vs GPUs.pdfIteration Revisited: A Safer Iteration Model for C++ - Tristan Brindle - CppNorth 2023ADSP Episode 126: Flux (and Flow) with Tristan BrindleIterators and Ranges: Comparing C++ to D to Rust - Barry Revzin - [CppNow 2021]Keynote: Iterators and Ranges: Comparing C++ to D, Rust, and Others - Barry Revzin - CPPP 2021Iteration Inside and Out - Bob Nystrom BlogExpanding the internal iteration API #99Intro Song Info Miss You by Sarah Jansen https://soundcloud.com/sarahjansenmusic Creative Commons — Attribution 3.0 Unported — CC BY 3.0 Free Download / Stream: http://bit.ly/l-miss-you Music promoted by Audio Library https://youtu.be/iYYxnasvfx8

#282 Navigating the Challenges of Product Integrations with Gil Feig, Co-Founder and CTO of Merge

2025-02-10 · DataFramed Listen

podcast_episode

by Gil Feig (Merge) , Richie (DataCamp)

AI/ML

As the software landscape becomes more fragmented, the importance of product integrations continues to rise. For those working in data and engineering roles, this presents both challenges and opportunities. How do you efficiently manage and scale integrations across diverse systems? What tools and strategies can help you maintain data integrity and streamline workflows? And how can you ensure that your integration strategy aligns with broader business goals and customer expectations? Gil Feig is the Co-Founder and CTO of Merge, the leading unified API platform. Previously, Gil was the Head of Engineering at Untapped and worked as a software engineer at Wealthfront and LinkedIn. A graduate of Columbia University, he lives and works in New York City. In the episode, Richie and Gil explore the complexities of product integrations, the evolution of software ecosystems, the challenges of scaling integrations, the role of data engineers, the impact on business operations, and the future of integration technology, and much more. Links Mentioned in the Show: MergeConnect with GilCourse: Implementing AI Solutions in BusinessRelated Episode: Building Multi-Modal AI Applications with Russ d'Sa, CEO & Co-founder of LiveKitSign up to RADAR: Skills Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

#278 Building Multi-Modal AI Applications with Russ d'Sa, CEO & Co-founder of LiveKit

2025-01-27 · DataFramed Listen

podcast_episode

by Richie (DataCamp) , Russ D'Sa (LiveKit)

AI/ML LLM

As multimodal AI continues to grow, professionals are exploring new skills to harness its potential. From understanding real-time APIs to navigating new application architectures, the landscape is shifting. How can developers stay ahead in this evolving field? What opportunities do AI agents present for automating tasks and enhancing productivity? And how can businesses ensure they're ready for the future of AI-driven interactions? Russ D'Sa is the CEO & Co-founder at Livekit. Russ is building the transport layer for AI computing. He founded Livekit, the company that powers voice chat for OpenAI and Character.ai. Previously, he was a Product Manager at Medium and an engineer at Twitter. He's also a serial entrepreneur, having previously founded mobile search platform Evie Labs. In the episode, Richie and Russ explore the evolution of voice AI, the challenges of building voice applications, the rise of video AI, the implications of deep fakes, the potential of AI-generated worlds, the future of AI in customer service and education, and much more. Links Mentioned in the Show: LiveKitChatGPT VoiceCourse: Developing LLM Applications with LangChainRelated Episode: Creating High Quality AI Applications with Theresa Parker & Sudhi Balan, Rocket SoftwareRewatch sessions from RADAR: Forward Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Revolutionizing Social Media Management

2024-12-11 · Data Product Management in Action: The Practitioner's Podcast Listen

podcast_episode

by Michael Toland (Pathfinder Product) , Max Fritzhand (Bolta AI)

AI/ML Analytics

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. This episode of Data Product Management in Action, host Michael Toland sits down with Max Fritzhand, the creator of Bolta AI, to explore how he turned a simple frustration into a powerful platform for managing social media on Threads. Max discusses leveraging machine learning, navigating API challenges, and engaging users to create a feature-rich tool that offers analytics, content suggestions, and optimal posting times. A must-listen for innovators and product managers alike! About our host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. Since 2016, Michael has worked on large-scale system modernizations and migration initiatives at Verizon. Outside his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors with Venture for America, sings with the Columbus Symphony, and writes satire for his blog Dignified Product. He is excited to discuss data product management with the podcast audience. Connect with Michael on LinkedIn. About our guest Max Fritzhand: Max is a software engineer and entrepreneur based in Cincinnati, Ohio, with a passion for innovation and automation. With a background in developing user-centric tools, he’s built projects like Bolta to simplify social media management and empower creators. Known for his drive to achieve financial independence, he explores investment opportunities and strives to build impactful, scalable solutions. A creative thinker with ADHD, Max channels his energy into solving complex problems and building a fulfilling work-life balance. Connect with Max on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

Bridging Code and UI in Data Orchestration with Kestra

2024-11-26 · Data Engineering Podcast Listen

podcast_episode

by Anna Geller , Tobias Macey

AI/ML Analytics CI/CD Collibra Data Engineering Data Management Datafold ETL/ELT Kestra Kubernetes

Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestration. Anna defines data orchestration as automating the coordination of workflow nodes that interact with data across various business functions, discussing how it goes beyond ETL and analytics to enable real-time data processing across different internal systems. She explores the challenges of using existing scheduling tools for data-specific workflows, highlighting limitations and anti-patterns, and discusses Kestra's solution, a low-code orchestration platform that combines code-driven flexibility with UI-driven simplicity. Anna delves into Kestra's architectural design, API-first approach, and pluggable infrastructure, and shares insights on balancing UI and code-driven workflows, the challenges of open-core business models, and innovative user applications of Kestra's platform.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us you should listen to Data Citizens® Dialogues, the forward-thinking podcast from the folks at Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. They address questions around AI governance, data sharing, and working at global scale. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. While data is shaping our world, Data Citizens Dialogues is shaping the conversation. Subscribe to Data Citizens Dialogues on Apple, Spotify, Youtube, or wherever you get your podcasts.Your host is Tobias Macey and today I'm interviewing Anna Geller about incorporating both code and UI driven interfaces for data orchestrationInterview IntroductionHow did you get involved in the area of data management?Can you start by sharing a definition of what constitutes "data orchestration"?There are many orchestration and scheduling systems that exist in other contexts (e.g. CI/CD systems, Kubernetes, etc.). Those are often adapted to data workflows because they already exist in the organizational context. What are the anti-patterns and limitations that approach introduces in data workflows?What are the problems that exist in the opposite direction of using data orchestrators for CI/CD, etc.?Data orchestrators have been around for decades, with many different generations and opinions about how and by whom they are used. What do you see as the main motivation for UI vs. code-driven workflows?What are the benefits of combining code-driven and UI-driven capabilities in a single orchestrator?What constraints does it necessitate to allow for interoperability between those modalities?Data Orchestrators need to integrate with many external systems. How does Kestra approach building integrations and ensure governance for all their underlying configurations?Managing workflows at scale across teams can be challenging in terms of providing structure and visibility of dependencies across workflows and teams. What features does Kestra offer so that all pipelines and teams stay organised?What are

155 - Understanding Human Engagement Risk When Designing AI and GenAI User Experiences

2024-10-29 · Experiencing Data w/ Brian T. O’Neill (AI & data product management leadership—powered by UX design) Listen

podcast_episode

by Brian O’Neill (Designing for Analytics) , Ovetta Sampson (Google)

AI/ML GenAI LLM

The relationship between AI and ethics is both developing and delicate. On one hand, the GenAI advancements to date are impressive. On the other, extreme care needs to be taken as this tech continues to quickly become more commonplace in our lives. In today’s episode, Ovetta Sampson and I examine the crossroads ahead for designing AI and GenAI user experiences.

While professionals and the general public are eager to embrace new products, recent breakthroughs, etc.; we still need to have some guard rails in place. If we don’t, data can easily get mishandled, and people could get hurt. Ovetta possesses firsthand experience working on these issues as they sprout up. We look at who should be on a team designing an AI UX, exploring the risks associated with GenAI, ethics, and need to be thinking about going forward.

Highlights/ Skip to: (1:48) Ovetta's background and what she brings to Google’s Core ML group (6:03) How Ovetta and her team work with data scientists and engineers deep in the stack (9:09) How AI is changing the front-end of applications (12:46) The type of people you should seek out to design your AI and LLM UXs (16:15) Explaining why we’re only at the very start of major GenAI breakthroughs (22:34) How GenAI tools will alter the roles and responsibilities of designers, developers, and product teams (31:11) The potential harms of carelessly deploying GenAI technology (42:09) Defining acceptable levels of risk when using GenAI in real-world applications (53:16) Closing thoughts from Ovetta and where you can find her

Quotes from Today’s Episode “If artificial intelligence is just another technology, why would we build entire policies and frameworks around it? The reason why we do that is because we realize there are some real thorny ethical issues [surrounding AI]. Who owns that data? Where does it come from? Data is created by people, and all people create data. That’s why companies have strong legal, compliance, and regulatory policies around [AI], how it’s built, and how it engages with people. Think about having a toddler and then training the toddler on everything in the Library of Congress and on the internet. Do you release that toddler into the world without guardrails? Probably not.” - Ovetta Sampson (10:03) “[When building a team] you should look for a diverse thinker who focuses on the limitations of this technology- not its capability. You need someone who understands that the end destination of that technology is an engagement with a human being. You need somebody who understands how they engage with machines and digital products. You need that person to be passionate about testing various ways that relationships can evolve. When we go from execution on code to machine learning, we make a shift from [human] agency to a shared-agency relationship. The user and machine both have decision-making power. That’s the paradigm shift that [designers] need to understand. You want somebody who can keep that duality in their head as they’re testing product design.” - Ovetta Sampson (13:45) “We’re in for a huge taxonomy change. There are words that mean very specific definitions today. Software engineer. Designer. Technically skilled. Digital. Art. Craft. AI is changing all that. It’s changing what it means to be a software engineer. Machine learning used to be the purview of data scientists only, but with GenAI, all of that is baked in to Gemini. So, now you start at a checkpoint, and you’re like, all right, let’s go make an API, right? So, the skills, the understanding, the knowledge, the taxonomy even, how we talk about these things, how do we talk about the machine who speaks to us talks to us, who could create a podcast out of just voice memos?” - Ovetta Sampson (24:16) “We have to be very intentional [when building AI tools], and that’s the kind of folks you want on teams. [Designers] have to go and play scary scenarios. We have to do that. No designer wants to be “Negative Nancy,” but this technology has huge potential to harm. It has harmed. If we don’t have the skill sets to recognize, document, and minimize harm, that needs to be part of our skill set. If we’re not looking out for the humans, then who actually is?” - Ovetta Sampson (32:10) “[Research shows] things happen to our brain when we’re exposed to artificial intelligence… there are real human engagement risks that are an opportunity for design. When you’re designing a self-driving car, you can’t just let the person go to sleep unless the car is fully [automated] and every other car on the road is self-driving. If there are humans behind the wheel, you need to have a feedback loop system—something that’s going to happen [in case] the algorithm is wrong. If you don’t have that designed, there’s going to be a large human engagement risk that a car is going to run over somebody who’s [for example] pushing a bike up a hill[...] Why? The car could not calculate the right speed and pace of a person pushing their bike. It had the speed and pace of a person walking, the speed and pace of a person on a bike, but not the two together. Algorithms will be wrong, right?” - Ovetta Sampson (39:42) “Model goodness used to be the purview of companies and the data scientists. Think about the first search engines. Their model goodness was [about] 77%. That’s good, right? And then people started seeing photos of apes when [they] typed in ‘black people.’ Companies have to get used to going to their customers in a wide spectrum and asking them when they’re [models or apps are] right and wrong. They can’t take on that burden themselves anymore. Having ethically sourced data input and variables is hard work. If you’re going to use this technology, you need to put into place the governance that needs to be there.” - Ovetta Sampson (44:08)

The Role of Python in Shaping the Future of Data Platforms with DLT

2024-10-13 · Data Engineering Podcast Listen

podcast_episode

by Marcin Rudolf (DLT Hub) , Tobias Macey , Adrian Brudaru (dlthub)

AI/ML Arrow Data Engineering Data Lake Data Lakehouse Data Management Datafold DuckDB GenAI Python

Summary In this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guiding DLT's development, emphasizing its role as a library rather than a platform, and its integration with lakehouse architectures and AI application frameworks. The episode explores the impact of the Python ecosystem's growth on DLT, highlighting integrations with high-performance libraries and the benefits of Arrow and DuckDB. The episode concludes with a discussion on the future of DLT, including plans for a portable data lake and the importance of interoperability in data management tools. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementImagine catching data issues before they snowball into bigger problems. That’s what Datafold’s new Monitors do. With automatic monitoring for cross-database data diffs, schema changes, key metrics, and custom data tests, you can catch discrepancies and anomalies in real time, right at the source. Whether it’s maintaining data integrity or preventing costly mistakes, Datafold Monitors give you the visibility and control you need to keep your entire data stack running smoothly. Want to stop issues before they hit production? Learn more at dataengineeringpodcast.com/datafold today!Your host is Tobias Macey and today I'm interviewing Adrian Brudaru and Marcin Rudolf, cofounders at dltHub, about the growth of dlt and the numerous ways that you can use it to address the complexities of data integrationInterview IntroductionHow did you get involved in the area of data management?Can you describe what dlt is and how it has evolved since we last spoke (September 2023)?What are the core principles that guide your work on dlt and dlthub?You have taken a very opinionated stance against managed extract/load services. What are the shortcomings of those platforms, and when would you argue in their favor?The landscape of data movement has undergone some interesting changes over the past year. Most notably, the growth of PyAirbyte and the rapid shifts around the needs of generative AI stacks (vector stores, unstructured data processing, etc.). How has that informed your product development and positioning?The Python ecosystem, and in particular data-oriented Python, has also undergone substantial evolution. What are the developments in the libraries and frameworks that you have been able to benefit from?What are some of the notable investments that you have made in the developer experience for building dlt pipelines?How have the interfaces for source/destination development improved?You recently published a post about the idea of a portable data lake. What are the missing pieces that would make that possible, and what are the developments/technologies that put that idea within reach?What is your strategy for building a sustainable product on top of dlt?How does that strategy help to form a "virtuous cycle" of improving the open source foundation?What are the most interesting, innovative, or unexpected ways that you have seen dlt used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on dlt?When is dlt the wrong choice?What do you have planned for the future of dlt/dlthub?Contact Info AdrianLinkedInMarcinLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links dltPodcast EpisodePyArrowPolarsIbisDuckDBPodcast Episodedlt Data ContractsRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodePyAirbyteOpenAI o1 ModelLanceDBQDrant EmbeddedAirflowGitHub ActionsArrow DataFusionApache ArrowPyIcebergDelta-RSSCD2 == Slowly Changing DimensionsSQLAlchemySQLGlotFSSpecPydanticSpacyEntity RecognitionParquet File FormatPython DecoratorREST API ToolkitOpenAPI Connector GeneratorConnectorXPython no-GILDelta LakePodcast EpisodeSQLMeshPodcast EpisodeHamiltonTabularPostHogPodcast.init EpisodeAsyncIOCursor.AIData MeshPodcast EpisodeFastAPILangChainGraphRAGAI Engineering Podcast EpisodeProperty GraphPython uvThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

#62 The End of Pandas, Rise of Ibis: AI, Function Calling, & Python’s New Tools

2024-09-26 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

AI/ML Docker DuckDB LLM Pandas Polars Python Rust

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. We dive into conversations smoother than your morning coffee (but let’s be honest, just as caffeinated) where industry insights meet light-hearted banter. Whether you’re a data wizard or just curious about the digital chaos around us, kick back and get ready to talk shop—unplugged style! In this episode: Farewell Pandas, Hello Future: Pandas is out, and Ibis is in. We're talking faster, smarter data processing—featuring the rise of DuckDB and the powerhouse that is Polars. Is this the end of an era for Pandas?UV vs. Rye: Forget pip—are these new Python package managers built in Rust the future? We break down UV, Rye, and what it all means for your next Python project.AI-Generated Podcasts: Is AI about to take over your favorite podcasts? We explore the potential of Google’s Notebook LM to transform content into audio gold.When AI Steals Your Voice: Jeff Geerling’s voice gets cloned by AI—without his consent. We dive into the wild world of voice cloning, the ethics, and the future of AI-generated media.Hacking AI with Prompt Injection: Could you outsmart AI? We share some wild strategies from the game Gandalf that challenge your prompt injection skills and teach you how to jailbreak even the toughest guardrails.Jony Ive’s New Gadget Rumor: Is Jony Ive plotting an Apple killer? Rumors are swirling about a new AI-powered handheld device that could shake up the smartphone market.Zero-Downtime Deployments with Kamal Proxy: No more downtime! We geek out over Kamal Proxy, the sleek HTTP tool designed for effortless Docker deployments.Function Calling and LLMs: Get ready for the next evolution in AI—function calling. We discuss its rise in LLMs and dive into the Gorilla project, the leaderboard testing the future of smart APIs.

Achieving Data Reliability: The Role of Data Contracts in Modern Data Management

2024-07-28 · Data Engineering Podcast Listen

podcast_episode

by Tom Baeyens (Soda Data) , Tobias Macey

AI/ML Cloud Computing Data Contracts Data Engineering Data Lake Data Lakehouse Data Management Data Quality dbt Delta GenAI Hive +3 more

Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.At Outshift, the incubation engine from Cisco, they are driving innovation in AI, cloud, and quantum technologies with the powerful combination of enterprise strength and startup agility. Their latest innovation for the AI ecosystem is Motific, addressing a critical gap in going from prototype to production with generative AI. Motific is your vendor and model-agnostic platform for building safe, trustworthy, and cost-effective generative AI solutions in days instead of months. Motific provides easy integration with your organizational data, combined with advanced, customizable policy controls and observability to help ensure compliance throughout the entire process. Move beyond the constraints of traditional AI implementation and ensure your projects are launched quickly and with a firm foundation of trust and efficiency. Go to motific.ai today to learn more!Your host is Tobias Macey and today I'm interviewing Tom Baeyens about using data contracts to build a clearer API for your dataInterview IntroductionHow did you get involved in the area of data management?Can you describe the scope and purpose of data contracts in the context of this conversation?In what way(s) do they differ from data quality/data observability?Data contracts are also known as the API for data, can you elaborate on this?What are the types of guarantees and requirements that you can enforce with these data contracts?What are some examples of constraints or guarantees that cannot be represented in these contracts?Are data contracts related to the shift-left?Data contracts are also known as the API for data, can you elaborate on this?The obvious application of data contracts are in the context of pipeline execution flows to prevent failing checks from propagating further in the data flow. What are some of the other ways that these contracts can be integrated into an organization's data ecosystem?How did you approach the design of the syntax and implementation for Soda's data contracts?Guarantees and constraints around data in different contexts have been implemented in numerous tools and systems. What are the areas of overlap in e.g. dbt, great expectations?Are there any emerging standards or design patterns around data contracts/guarantees that will help encourage portability and integration across tooling/platform contexts?What are the most interesting, innovative, or unexpected ways that you have seen data contracts used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data contracts at Soda?When are data contracts the wrong choice?What do you have planned for the future of data contracts?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SodaPodcast EpisodeJBossData ContractAirflowUnit TestingIntegration TestingOpenAPIGraphQLCircuit Breaker PatternSodaCLSoda Data ContractsData MeshGreat Expectationsdbt Unit TestsOpen Data ContractsODCS == Open Data Contract StandardODPS == Open Data Product SpecificationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

#57 Can the Music Industry Win the Battle Against AI?

2024-07-04 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

AI/ML GenAI GitHub LLM Python Cyber Security

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don’t), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! In this episode, we explore: LLMs Gaming the System: Uncover how LLMs are using political sycophancy and tool-using flattery to game the system. Dive deeper: paper, chain of thought prompting & post on x.Recording Industry Association of America (RIAA) Sue AI Music Generators: They are taking on Suno and Udio for using copyrighted music to train their models. Some ai generated music that is very similar to existing songs: song 1, song 2, song 3. More on GenAI: midjourney creating copyrighted images, and chatGPT reciting email-adresses.AI-Powered Olympic Recaps: NBC’s personalized daily recaps with Al Michaels' voice offer a new way to catch up on the Olympics.Figma’s AI Redesign: Discover Figma’s new AI tools that speed up design and creativity. We debate the tool's value and its application in the design process. Rabbit R1 Security Flaws: Hackers exposed hardcoded API keys in Rabbit R1’s source code, leading to major security issues. Find out more.Pyinstrument for Python: Meet Pyinstrument, the easy-to-use Python profiler that optimizes code performance. Explore it on GitHub.The Ultimate Font - Bart’s dreams come true: Explore the groundbreaking integration of True Type Fonts with AI for dynamic text rendering. Discover more here.Hot Takes on AI Competition: Google claims no one has a moat in AI, sparking debate on open-source models' future. We also explore Ladybird Browser Project, an independently funded browser project aiming to build a cutting-edge browser engine.

#248: The Fundamentally Fascinating World of APIs with Marco Palladino

2024-06-25 · The Analytics Power Hour Listen

podcast_episode

by Julie Hoyer , Tim Wilson (Analytics Power Hour - Columbus (OH) , Moe Kiss (Canva) , Marco Palladino (Kong, Inc.)

AI/ML

Application Programming Interfaces (APIs) are as pervasive as they are critical to the functioning of the modern world. That personalized and content-rich product page with a sub-second load time on Amazon? That's just a couple-hundred API calls working their magic. Every experience on your mobile device? Loaded with APIs. But, just because they're everywhere doesn't mean that they spring forth naturally from the keystrokes of a developer. There's a lot more going on that requires real thought and planning, and the boisterous arrival of AI to mainstream modernity has made the role of APIs and their underlying infrastructure even more critical. On this episode, Moe, Julie, and Tim dug into the fascinating world with API Maven Marco Palladino, the co-founder and CTO at Kong, Inc. For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

talk-data.com

Activity Trend

Top Events

Top Speakers

#327 Building a Sales and Marketing Capability for Data Applications with Denise Persson, CMO at Snowflake, and Chris Degnan, former CRO at Snowflake

Foundational Data Engineering At Two Sigma

Enabling Agents In The Enterprise With A Platform Approach

Dagster's New Era: Modularizing Data Transformation in the Age of AI

AI and the Lakehouse: How Starburst is Pioneering New Workflows

Bringing AI Into The Inner Loop of Data Engineering With Ascend

#82 AI Cracked a Decades-Old Science Problem & Europe’s Push for Digital Sovereignty, The Secret Behind MCP, and The Cloud Hack That Slashes Costs By 62%.

The Future of Time Management

Episode 225: CppNorth & Flux Plans, The Slow Death of Twitter and More!

Episode 224: Flux Updates & Internal Iteration

#282 Navigating the Challenges of Product Integrations with Gil Feig, Co-Founder and CTO of Merge

#278 Building Multi-Modal AI Applications with Russ d'Sa, CEO & Co-founder of LiveKit

Revolutionizing Social Media Management

Bridging Code and UI in Data Orchestration with Kestra

155 - Understanding Human Engagement Risk When Designing AI and GenAI User Experiences

The Role of Python in Shaping the Future of Data Platforms with DLT

#62 The End of Pandas, Rise of Ibis: AI, Function Calling, & Python’s New Tools

Achieving Data Reliability: The Role of Data Contracts in Modern Data Management

#57 Can the Music Industry Win the Battle Against AI?

#248: The Fundamentally Fascinating World of APIs with Marco Palladino