Overcoming Redis Limitations: The Dragonfly DB Approach

2025-03-30 · Data Engineering Podcast Listen

podcast_episode

by Roman Gershman (Dragonfly DB) , Tobias Macey

AI/ML Data Engineering Datafold Python Redis

Summary In this episode of the Data Engineering Podcast Roman Gershman, CTO and founder of Dragonfly DB, explores the development and impact of high-speed in-memory databases. Roman shares his experience creating a more efficient alternative to Redis, focusing on performance gains, scalability, and cost efficiency, while addressing limitations such as high throughput and low latency scenarios. He explains how Dragonfly DB solves operational complexities for users and delves into its technical aspects, including maintaining compatibility with Redis while innovating on memory efficiency. Roman discusses the importance of cost efficiency and operational simplicity in driving adoption and shares insights on the broader ecosystem of in-memory data stores, future directions like SSD tiering and vector search capabilities, and the lessons learned from building a new database engine.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Roman Gershman about building a high-speed in-memory database and the impact of the performance gains on data applicationsInterview IntroductionHow did you get involved in the area of data management?Can you describe what DragonflyDB is and the story behind it?What is the core problem/use case that is solved by making a "faster Redis"?The other major player in the high performance key/value database space is Aerospike. What are the heuristics that an engineer should use to determine whether to use that vs. Dragonfly/Redis?Common use cases for Redis involve application caches and queueing (e.g. Celery/RQ). What are some of the other applications that you have seen Redis/Dragonfly used for, particularly in data engineering use cases?There is a piece of tribal wisdom that it takes 10 years for a database to iron out all of the kinks. At the same time, there have been substantial investments in commoditizing the underlying components of database engines. Can you describe how you approached the implementation of DragonflyDB to arive at a functional and reliable implementation?What are the architectural elements that contribute to the performance and scalability benefits of Dragonfly?How have the design and goals of the system changed since you first started working on it?For teams who migrate from Redis to Dragonfly, beyond the cost savings what are some of the ways that it changes the ways that they think about their overall system design?What are the most interesting, innovative, or unexpected ways that you have seen Dragonfly used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on DragonflyDB?When is DragonflyDB the wrong choice?What do you have planned for the future of DragonflyDB?Contact Info GitHubLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DragonflyDBRedisElasticacheValKeyAerospikeLaravelSidekiqCelerySeastar FrameworkShared-Nothing Architectureio_uringmidi-redisDunning-Kruger EffectRustThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Bringing AI Into The Inner Loop of Data Engineering With Ascend

2025-03-24 · Data Engineering Podcast Listen

podcast_episode

by Sean Knapp (Ascend) , Tobias Macey

AI/ML API Data Engineering Datafold GenAI LLM Python

Summary In this episode of the Data Engineering Podcast Sean Knapp, CEO of Ascend.io, explores the intersection of AI and data engineering. He discusses the evolution of data engineering and the role of AI in automating processes, alleviating burdens on data engineers, and enabling them to focus on complex tasks and innovation. The conversation covers the challenges and opportunities presented by AI, including the need for intelligent tooling and its potential to streamline data engineering processes. Sean and Tobias also delve into the impact of generative AI on data engineering, highlighting its ability to accelerate development, improve governance, and enhance productivity, while also noting the current limitations and future potential of AI in the field.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Sean Knapp about how Ascend is incorporating AI into their platform to help you keep up with the rapid rate of changeInterview IntroductionHow did you get involved in the area of data management?Can you describe what Ascend is and the story behind it?The last time we spoke was August of 2022. What are the most notable or interesting evolutions in your platform since then?In that same time "AI" has taken up all of the oxygen in the data ecosystem. How has that impacted the ways that you and your customers think about their priorities?The introduction of AI as an API has caused many organizations to try and leap-frog their data maturity journey and jump straight to building with advanced capabilities. How is that impacting the pressures and priorities felt by data teams?At the same time that AI-focused product goals are straining data teams capacities, AI also has the potential to act as an accelerator to their work. What are the roadblocks/speedbumps that are in the way of that capability?Many data teams are incorporating AI tools into parts of their workflow, but it can be clunky and cumbersome. How are you thinking about the fundamental changes in how your platform works with AI at its center?Can you describe the technical architecture that you have evolved toward that allows for AI to drive the experience rather than being a bolt-on?What are the concrete impacts that these new capabilities have on teams who are using Ascend?What are the most interesting, innovative, or unexpected ways that you have seen Ascend + AI used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on incorporating AI into the core of Ascend?When is Ascend the wrong choice?What do you have planned for the future of AI in Ascend?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AscendCursor AI Code EditorDevinGitHub CopilotOpenAI DeepResearchS3 TablesAWS GlueAWS BedrockSnowparkCo-Intelligence: Living and Working with AI by Ethan Mollick (affiliate link)OpenAI o3The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Insights from Data Means Business, Second Edition

2025-03-20 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by Barry Green (Continuium Consulting Limited)

AI/ML Analytics GenAI

Jason's co-author, Barry Green, joins this next episode as they discuss the release of the second edition of "Data Means Business" on 25th March. Returning for his third conversation on the podcast, Barry shares updates on the evolving landscape of data and AI, the impact of generative AI, and the importance of business capabilities. They delve into the changes since the first edition, including the role of the Chief Data Officer and the significance of adaptability in today's fast-paced world. Tune in to hear their thoughts on driving transformational change and delivering value with data & AI. The second edition of Data Means Business will be out on 25th March and will be available on Amazon. *****  Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation.

Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy

2025-03-16 · Data Engineering Podcast Listen

podcast_episode

by Pete DeJoy (Astronomer) , Tobias Macey

AI/ML Airflow Astronomer Data Engineering Datafold DataOps dbt Python

Summary In this episode of the Data Engineering Podcast Pete DeJoy, co-founder and product lead at Astronomer, talks about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3. Pete shares his journey into data engineering, discusses Astronomer's contributions to the Airflow project, and highlights the critical role of Airflow in powering operational data products. He covers the evolution of Airflow, its position in the data ecosystem, and the challenges faced by data engineers, including infrastructure management and observability. The conversation also touches on the upcoming Airflow 3 release, which introduces data awareness, architectural improvements, and multi-language support, and Astronomer's observability suite, Astro Observe, which provides insights and proactive recommendations for Airflow users.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Pete DeJoy about building and managing Airflow pipelines on Astronomer and the upcoming improvements in Airflow 3Interview IntroductionCan you describe what Astronomer is and the story behind it?How would you characterize the relationship between Airflow and Astronomer?Astronomer just released your State of Airflow 2025 Report yesterday and it is the largest data engineering survey ever with over 5,000 respondents. Can you talk a bit about top level findings in the report?What about the overall growth of the Airflow project over time?How have the focus and features of Astronomer changed since it was last featured on the show in 2017?Astro Observe GA’d in early February, what does the addition of pipeline observability mean for your customers? What are other capabilities similar in scope to observability that Astronomer is looking at adding to the platform?Why is Airflow so critical in providing an elevated Observability–or cataloging, or something simlar - experience in a DataOps platform? What are the notable evolutions in the Airflow project and ecosystem in that time?What are the core improvements that are planned for Airflow 3.0?What are the most interesting, innovative, or unexpected ways that you have seen Astro used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Airflow and Astro?What do you have planned for the future of Astro/Astronomer/Airflow?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links AstronomerAirflowMaxime BeaucheminMongoDBDatabricksConfluentSparkKafkaDagsterPodcast EpisodePrefectAirflow 3The Rise of the Data Engineer blog postdbtJupyter NotebookZapiercosmos library for dbt in AirflowRuffAirflow Custom OperatorSnowflakeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Accelerated Computing in Modern Data Centers With Datapelago

2025-03-08 · Data Engineering Podcast Listen

podcast_episode

by Rajan Goyal (Datapelago) , Tobias Macey

AI/ML Data Engineering Data Lake Datafold Iceberg Spark Trino

Summary In this episode of the Data Engineering Podcast Rajan Goyal, CEO and co-founder of Datapelago, talks about improving efficiencies in data processing by reimagining system architecture. Rajan explains the shift from hyperconverged to disaggregated and composable infrastructure, highlighting the importance of accelerated computing in modern data centers. He discusses the evolution from proprietary to open, composable stacks, emphasizing the role of open table formats and the need for a universal data processing engine, and outlines Datapelago's strategy to leverage existing frameworks like Spark and Trino while providing accelerated computing benefits.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Rajan Goyal about how to drastically improve efficiencies in data processing by re-imagining the system architectureInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining the main factors that contribute to performance challenges in data lake environments?The different components of open data processing systems have evolved from different starting points with different objectives. In your experience, how has that un-planned and un-synchronized evolution of the ecosystem hindered the capabilities and adoption of open technologies?The introduction of a new cross-cutting capability (e.g. Iceberg) has typically taken a substantial amount of time to gain support across different engines and ecosystems. What do you see as the point of highest leverage to improve the capabilities of the entire stack with the least amount of co-ordination?What was the motivating insight that led you to invest in the technology that powers Datapelago?Can you describe the system design of Datapelago and how it integrates with existing data engines?The growth in the generation and application of unstructured data is a notable shift in the work being done by data teams. What are the areas of overlap in the fundamental nature of data (whether structured, semi-structured, or unstructured) that you are able to exploit to bridge the processing gap?What are the most interesting, innovative, or unexpected ways that you have seen Datapelago used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Datapelago?When is Datapelago the wrong choice?What do you have planned for the future of Datapelago?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Links DatapelagoMIPS ArchitectureARM ArchitectureAWS NitroMellanoxNvidiaVon Neumann ArchitectureTPU == Tensor Processing UnitFPGA == Field-Programmable Gate ArraySparkTrinoIcebergPodcast EpisodeDelta LakePodcast EpisodeHudiPodcast EpisodeApache GlutenIntermediate RepresentationTuring CompletenessLLVMAmdahl's LawLSTM == Long Short-Term MemoryThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Gender Diversity in Data & AI with Lou Hutchins, Director of Data Culture and Literacy at Cynozure and Rose Attridge, Strategy Advisor at Cynozure

2025-03-06 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by Lou Hutchins (Cynozure) , Jason Foster (Cynozure) , Rose Attridge (Cynozure)

AI/ML Analytics

This episode is a special edition in honour of International Women's Day on 8th March. Host Jason Foster is joined by Lou Hutchins, Director of Data Culture & Literacy at Cynozure, and Rose Attridge, Strategy Advisor at Cynozure. Together, they explore gender diversity in data and AI, the importance of sponsorship and allies, and challenges in male-dominated industries. They also discuss the role of data and AI in driving change, the need for role models, early engagement, and company action. *****  Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023 and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation.

#288 How Generative AI is Transforming Finance with Andrew Reiskind, CDO at Mastercard

2025-03-03 · DataFramed Listen

podcast_episode

by Andrew Reiskind (Mastercard) , Adel (DataFramed)

AI/ML Data Quality GenAI Marketing

Generative AI has transformed the financial services sector, sparking interest at all organizational levels. As AI becomes more accessible, professionals are exploring its potential to enhance their work. How can AI tools improve personalization and fraud detection? What efficiencies can be gained in product development and internal processes? These are the questions driving the adoption of AI as companies strive to innovate responsibly while maximizing value. Andrew serves as the Chief Data Officer for Mastercard, leading the organization’s data strategy and innovation efforts while navigating current and future data risks. Andrews's prior roles at Mastercard include Senior Vice President, Data Management, in which he was responsible for the quality, collection, and use of data for Mastercard’s information services and advisory business, and Mastercard’s Deputy Chief Privacy Officer, in which he was responsible for privacy and data protection issues globally for Mastercard. Andrew also spent many years as a Privacy & Intellectual Property Council advising direct marketing services, interactive advertising, and industrial chemicals industries. Andrew holds Juris Doctor from Columbia University School of Law and has his bachelor’s degree, cum laude, in Chemical Engineering from the University of Delaware. Andrew is a retired member of the State Bar of New York. In the episode, Adel and Andrew explore GenAI's transformative impact on financial services, the democratization of AI tools, efficiency gains in product development, the importance of AI governance and data quality, the cultural shifts and regulatory landscapes shaping AI's future, and much more. Links Mentioned in the Show: MastercardConnect with AndrewSkill Track: Artificial Intelligence (AI) LeadershipRelated Episode: How Generative AI is Changing Leadership with Christie Smith, Founder of the Humanity Institute and Kelly Monahan, Managing Director, Research InstituteSign up to attend RADAR: Skills Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

The Future of Data Mesh

2025-02-26 · Data Product Management in Action: The Practitioner's Podcast Listen

podcast_episode

by Tom DeWolf (ACA Group) , Michael Toland (Pathfinder Product)

S1 Ep#34: The Future of Data Mesh The Data Product Management In Action podcast, brought to you by executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences.

In Season 01, Episode 34, Join host Michael Toland as he welcomes Tom DeWolf, a data mesh expert with a PhD in distributed systems and years of experience in software engineering. Tom shares insights from his four-year journey in data mesh, emphasizing the need for self-service in data products, the benefits of an evolutionary architecture, and the challenges of governance in multi-organization environments. He also discusses key lessons from past failures, highlighting the critical role of user engagement in building successful data ecosystems. Don't miss this deep dive into the future of data management!

About our Host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. He has worked in product officially since 2016, where he worked at Verizon on large scale system modernizations and migration initiatives for reference data and decision platforms. Outside of his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors fellows with Venture for America, sings in the Columbus Symphony, writing satire posts for his blog Dignified Product or Test Double, depending on the topic, and is excited to be chatting with folks on Data Product Management. Connect with Michael on LinkedIn.

About our guest Tom DeWolf: Tom is an experienced hands-on architect and serves as the innovation lead, spearheading new innovative initiatives for ACA Group. His expertise lies in data mesh platform and platform engineering, leveraging his background in software engineering and experience in designing various architectures, including software, microservices, data platforms, evolutionary architectures, among others. Tom is the founder and host of Data Mesh Belgium meetup and the new Data Mesh Live conference, and active Data Mesh community thought leader. Connect with Tom on LinkedIn.

All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else.

Join the conversation on LinkedIn.

Apply to be a guest or nominate someone that you know.

Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

The Future of Data Engineering: AI, LLMs, and Automation

2025-02-26 · Data Engineering Podcast Listen

podcast_episode

by Gleb Mezhanskiy (Datafold) , Tobias Macey

AI/ML Data Engineering Datafold LLM Modern Data Stack Python SQL

Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large language models (LLMs) to enhance productivity and reduce manual toil. The conversation covers the potential of AI to transform data engineering tasks, such as text-to-SQL interfaces and creating semantic graphs to improve data accessibility, and explores practical applications of LLMs in automating code reviews, testing, and understanding data lineage.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Gleb Mezhanskiy about Interview IntroductionHow did you get involved in the area of data management?modern data stack is deadwhere is AI in the data stack?"buy our tool to ship AI"opportunities for LLM in DE workflowContact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DatafoldCopilotCursor IDEAI AgentsDataChatAI Engineering Podcast EpisodeMetrics LayerEmacsLangChainLangGraphCrewAIThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Measuring the impact of sustainability with Adam Elman, Director of Sustainability - EMEA, at Google

2025-02-20 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by Adam Elman (Google) , Jason Foster (Cynozure)

AI/ML Analytics

In this episode, host Jason Foster is joined by Adam Elman, Director of Sustainability - EMEA, at Google.

Together they explore defining sustainability from environmental, social and governance perspectives, and the role of AI in sustainability, including AI's potential to reduce global greenhouse gas emissions by up to 10% by 2030.

They also discuss the future of sustainability and possibilities around technology's role in combating climate change, including advancements in clean energy and expanded AI applications.

***** 

Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023, and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024.

Evolving Responsibilities in AI Data Management

2025-02-16 · Data Engineering Podcast Listen

podcast_episode

by Bartosz Mikulski , Tobias Macey

AI/ML BI Data Engineering Data Modelling Datafold DWH GenAI MLOps Python RAG Vector DB

Summary In this episode of the Data Engineering Podcast Bartosz Mikulski talks about preparing data for AI applications. Bartosz shares his journey from data engineering to MLOps and emphasizes the importance of data testing over software development in AI contexts. He discusses the types of data assets required for AI applications, including extensive test datasets, especially in generative AI, and explains the differences in data requirements for various AI application styles. The conversation also explores the skills data engineers need to transition into AI, such as familiarity with vector databases and new data modeling strategies, and highlights the challenges of evolving AI applications, including frequent reprocessing of data when changing chunking strategies or embedding models.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Bartosz Mikulski about how to prepare data for use in AI applicationsInterview IntroductionHow did you get involved in the area of data management?Can you start by outlining some of the main categories of data assets that are needed for AI applications?How does the nature of the application change those requirements? (e.g. RAG app vs. agent, etc.)How do the different assets map to the stages of the application lifecycle?What are some of the common roles and divisions of responsibility that you see in the construction and operation of a "typical" AI application?For data engineers who are used to data warehousing/BI, what are the skills that map to AI apps?What are some of the data modeling patterns that are needed to support AI apps?chunking strategies metadata managementWhat are the new categories of data that data engineers need to manage in the context of AI applications?agent memory generation/evolution conversation history managementdata collection for fine tuningWhat are some of the notable evolutions in the space of AI applications and their patterns that have happened in the past ~1-2 years that relate to the responsibilities of data engineers?What are some of the skills gaps that teams should be aware of and identify training opportunities for?What are the most interesting, innovative, or unexpected ways that you have seen data teams address the needs of AI applications?What are the most interesting, unexpected, or challenging lessons that you have learned while working on AI applications and their reliance on data?What are some of the emerging trends that you are paying particular attention to?Contact Info WebsiteLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SparkRayChunking StrategiesHypothetical document embeddingsModel Fine TuningPrompt CompressionThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Bringing File & Object Storage Together in a Single Data Platform w/ Jon Toor

2025-02-14 · Data Unchained

podcast_episode

by Jon Toor (Cloudian)

AI/ML Big Data

Welcome back to another podcast episode of Data Unchained. Jon Toor, CMO of Cloudian, joins us at Super Computing 2024 to discuss the future of decentralized data management, the evolving landscape of AI-driven storage, and what the next steps look like for metadata and object storage.

DataUnchained #Supercomputing2024 #AI #GPUComputing #ObjectStorage #GPUDirect #Cloudian #Hammerspace #DataScience #MachineLearning #AIInfrastructure #DataStorage #TechPodcast #ArtificialIntelligence #SC24 #BigData #DataManagement

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

Driving Value: Data and AI Impact with Nick Zervoudis

2025-02-12 · Data Product Management in Action: The Practitioner's Podcast Listen

podcast_episode

by Nick Zervoudis (CKDelta) , Frannie Helforoush (RBC Global Asset Management)

AI/ML Analytics

The Data Product Management In Action podcast, brought to you by executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In this episode, host Frannie Helforoush talks with our host Nick Zervoudis, Head of Product at CKDelta and founder of Value from Data and AI, about his new course designed to help data teams deliver maximum impact. Nick discusses the growing importance of being value-focused amidst economic challenges like inflation and layoffs. Tailored for data product managers and consultants, the cohort-based course emphasizes opportunity discovery, valuation, and aligning data initiatives with business profitability. Hosted on Maven, the interactive format fosters peer learning. Check out Nick's course on Maven and tune in to learn how to ensure your data and AI efforts deliver tangible results. About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn. Meet our Guest Nick Zervoudis: Nick is Head of Product at CKDelta, an AI software business within the CKHutchison Holdings group. Nick oversees a portfolio of data products and works with sister companies to uncover new opportunities to innovate using data,analytics, and machine learning.Nick's career has revolved around data and advanced analytics from day one,having worked as an analyst, consultant, product manager, and instructor for startups, SMEs, and enterprises including PepsiCo, Sainsbury's, Lloyds BankingGroup, IKEA, Capgemini Invent, BrainStation, QuantSpark, and Hg Capital. Nick is also the co-host ofLondon's Data Product Management meetup, and speaks and writes regularly about data & AI product management. Connect with Nick on LinkedIn. Core offerings delivered by Value from Data & AI: Data product management training Fractional Data Product Manager Data startup advisory 1:1 coaching One-off data and AI discovery projects Data monetization advisory All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

The impact of AI on young people and education with Amanda Bickerstaff, CEO & Founder, AI for Education

2025-02-06 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by Amanda Bickerstaff (AI for Education) , Jason Foster (Cynozure)

AI/ML Analytics

In this episode, host Jason Foster is joined by Amanda Bickerstaff, CEO & Founder, AI for Education.

Together they explore the impact of AI on young people, particularly in the context of education. They discuss AI literacy, the need for AI literacy among students, teachers, and policymakers and the need for understanding AI's capabilities, limitations, and ethical considerations.

They also discuss social media & AI integration and challenges in the education system. The conversation touches on how AI is embedded in platforms like Snapchat, TikTok, and video games, influencing young people's perceptions and behaviours, sometimes without their awareness. ***** 

Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023, and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024. Cynozure is a certified B Corporation.

From Engineering to Data Strategy: Driving AI and Decision-Making

2025-02-05 · Data Product Management in Action: The Practitioner's Podcast Listen

podcast_episode

by Nick Zervoudis (CKDelta) , Theo Bell (Rimes)

AI/ML Analytics

S1 Ep#30: From Engineering to Data Strategy: Driving AI and Decision-Making The Data Product Management In Action podcast, season 1, is brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. Our guest this week is Theo Bell, a data product manager. She chats with host Nick Zerviudis and shares her transition from mechanical engineering to roles at Goldman Sachs and Palantir, emphasizing the importance of data integration in strategic decision-making. She discusses how Palantir helped a manufacturer prioritize client orders during raw material shortages and explores the challenges of convincing stakeholders to adopt new data models, advocating for production-ready pilots over proof-of-concepts. Theo also offers insights on fostering AI adoption within organizations, using a news summarization tool for a CEO as an example. She recommends the GTD framework and Surrounded by Idiots for enhancing productivity and communication. About our Host Nick Zervoudis: Nick is Head of Product at CKDelta, an AI software business within the CK Hutchison Holdings group. Nick oversees a portfolio of data products and works with sister companies to uncover new opportunities to innovate using data, analytics, and machine learning. Nick's career has revolved around data and advanced analytics from day one, having worked as an analyst, consultant, product manager, and instructor for startups, SMEs, and enterprises including PepsiCo, Sainsbury's, Lloyds Banking Group, IKEA, Capgemini Invent, BrainStation, QuantSpark, and Hg Capital. Nick is also the co-host of London's Data Product Management meetup, and speaks & writes regularly about data & AI product management. Connect with Nick on LinkedIn and through his newsletter, Value from Data & AI. About our Guest Theo Bell: Theo is the Head of AI Product at Rimes, where she leads the company’s efforts to leverage AI technology in order to provide cutting-edge data management solutions to clients. Previously, Theo held key roles at Palantir Technologies and Goldman Sachs, where she enabled various industries to leverage data through AI/ML-driven software, notably Airbus' Skywise platform, the NHS, and the UK Ministry of Defense. Theo is dedicated to using AI and technology for global challenges, particularly in improving health, enhancing society, and fostering sustainable businesses. She holds a PhD in Engineering from the University of Cambridge. Connect with Theo in LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights! .

EMPOWERING DATA ENGINEERING WITH LAKEHOUSE ARCHITECTURE AND LARGE LANGUAGE MODELS

2025-02-01 · Superweek 2025

talk

by Mimoune Djouallah (Microsoft)

Data Engineering Data Lakehouse LLM

This session explores the rise of Lakehouse architecture and its industry-wide adoption, highlighting its ability to simplify Data Management. We’ll also examine how Large Language Models (LLMs) are transforming Data Engineering, enabling analysts to solve complex problems that once required advanced technical skills.

The Psychology of Leadership with Lara Menke, Leadership Psychologist & Executive Coach

2025-01-23 · Hub & Spoken: Data | Analytics | Chief Data Officer | CDO | Data Strategy Listen

podcast_episode

by Lara Menke , Jason Foster (Cynozure)

AI/ML Analytics

In this episode, host Jason Foster is joined by Lara Menke, Leadership Psychologist & Executive Coach.

Together they explore leadership development and the application of business psychology to improve workplace dynamics and team performance.

They also discuss the transformative potential of emotionally intelligent, self-aware, and empathetic leadership in fostering resilient and effective teams.

***** 

Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. It works with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and data leadership. The company was named one of The Sunday Times' fastest-growing private companies in both 2022 and 2023, and recognised as The Best Place to Work in Data by DataIQ in 2023 and 2024.

Special 25th Episode Release Celebration Minisode Pt.2

2025-01-22 · Data Product Management in Action: The Practitioner's Podcast Listen

podcast_episode

by Nadiem von Heydebrand (Mindfuel) , Frannie Helforoush (RBC Global Asset Management)

AI/ML Data Science SaaS

The Data Product Management In Action podcast, brought to you by executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In the 25th celebration minisode of Data Product Management in Action, hosts Frannie Helforoush and Nadiem von Heydebrand reflect on the progress of data product management in 2024. They highlight the growing clarity and recognition of the field, the rise of AI product management, and the importance of thoughtful integration without succumbing to overhype. The episode revisits key 2024 discussions on building data platforms, decision support products, and data mesh implementation. Looking forward to 2025, they foresee increased interest and adoption, emphasizing the field's potential for driving organizational value. Frannie and Nadiem express excitement for future episodes and community contributions. About our Host Nadiem von Heydebrand: Nadiem is CEO and Co-Founder at Mindfuel. In 2019, he combined his passion for data science with product management and is a thought leader for data product management today, aiming to prove true value contribution from data. Working as an expert in the data industry for over a decade now, he has seen hundreds of data science initiatives, built scaled data teams and enabled global organizations like Volkswagen, Munich Re, Allianz, Red Bull, Vorwerk to become data-driven. With Mindfuel “Delight”, a Data Product Management SaaS solution combined with professional services, he brought in experience from hands-on challenges like scaling out data platforms and architecture, implementing data mesh concepts or transforming AI performance into business performance to delight consumers all over the globe. Connect with Nadiem on LinkedIn

About our Host Frannie Helforoush: From coding to crafting customer-centric products, my journey began as a software engineer and evolved into a strategic product manager. With an innate curiosity for problem-solving, I fuse my expertise in data and product management to create impactful solutions as a data product manager now. With a background in both software engineering and product management, I seamlessly bridge the gap between the data and product worlds. I thrive on making data accessible and actionable for driving product innovation and ensuring that product thinking is applied to every aspect of data management. Connect with Frannie on LinkedIn All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

Simplifying AI Complexity with Data Management w/ Mark Seamans

2025-01-17 · Data Unchained

podcast_episode

by Molly Presley , Mark Seamans (Penguin Solutions)

AI/ML Cloud Computing

Welcome to Data Unchained! In this episode, recorded live at the Supercomputing 24 Conference in Atlanta, Georgia, Molly Presley sits down with Mark Seamans from Penguin Solutions to explore the exciting intersection of high-performance computing (HPC) and AI innovations. Episode Highlights: - The explosive growth of AI and large language models in HPC. - How Penguin Solutions helps enterprises overcome GPU and AI complexity. - The role of OriginAI in simplifying AI project deployment. - Challenges of decentralized and unstructured data in AI workflows. - Emerging trends in hybrid cloud solutions and GPU-specific clouds. - The power of ClusterWare for optimizing high-performance clusters. Mark Seamans shares insights on how enterprises can effectively implement AI strategies, manage data complexity, and maximize their IT investments with innovative solutions like ClusterWare and OriginAI. Whether you're navigating AI for the first time or optimizing your HPC systems, this episode is packed with actionable takeaways!

AI #HighPerformanceComputing #DataScience #Supercomputing #PenguinSolutions #Hammerspace #CloudComputing #DataManagement #GPUComputing #AIProjects #TechInnovation #HybridCloud #ClusterWare #OriginAI #Supercomputing24 #Podcast

Cyberpunk by jiglr | https://soundcloud.com/jiglrmusic Music promoted by https://www.free-stock-music.com Creative Commons Attribution 3.0 Unported License https://creativecommons.org/licenses/by/3.0/deed.en_US Hosted on Acast. See acast.com/privacy for more information.

CSVs Will Never Die And OneSchema Is Counting On It

2025-01-13 · Data Engineering Podcast Listen

podcast_episode

by Andrew Luo (OneSchema) , Tobias Macey

AI/ML CRM CSV Data Engineering Datafold Python SQL

Summary In this episode of the Data Engineering Podcast Andrew Luo, CEO of OneSchema, talks about handling CSV data in business operations. Andrew shares his background in data engineering and CRM migration, which led to the creation of OneSchema, a platform designed to automate CSV imports and improve data validation processes. He discusses the challenges of working with CSVs, including inconsistent type representation, lack of schema information, and technical complexities, and explains how OneSchema addresses these issues using multiple CSV parsers and AI for data type inference and validation. Andrew highlights the business case for OneSchema, emphasizing efficiency gains for companies dealing with large volumes of CSV data, and shares plans to expand support for other data formats and integrate AI-driven transformation packs for specific industries.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Andrew Luo about how OneSchema addresses the headaches of dealing with CSV data for your businessInterview IntroductionHow did you get involved in the area of data management?Despite the years of evolution and improvement in data storage and interchange formats, CSVs are just as prevalent as ever. What are your opinions/theories on why they are so ubiquitous?What are some of the major sources of CSV data for teams that rely on them for business and analytical processes?The most obvious challenge with CSVs is their lack of type information, but they are notorious for having numerous other problems. What are some of the other major challenges involved with using CSVs for data interchange/ingestion?Can you describe what you are building at OneSchema and the story behind it?What are the core problems that you are solving, and for whom?Can you describe how you have architected your platform to be able to manage the variety, volume, and multi-tenancy of data that you process?How have the design and goals of the product changed since you first started working on it?What are some of the major performance issues that you have encountered while dealing with CSV data at scale?What are some of the most surprising things that you have learned about CSVs in the process of building OneSchema?What are the most interesting, innovative, or unexpected ways that you have seen OneSchema used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on OneSchema?When is OneSchema the wrong choice?What do you have planned for the future of OneSchema?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links OneSchemaEDI == Electronic Data InterchangeUTF-8 BOM (Byte Order Mark) CharactersSOAPCSV RFCIcebergSSIS == SQL Server Integration ServicesMS AccessDatafusionJSON SchemaSFTP == Secure File Transfer ProtocolThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

talk-data.com

Data Management

Activity Trend

Top Events

Top Speakers

Overcoming Redis Limitations: The Dragonfly DB Approach

Bringing AI Into The Inner Loop of Data Engineering With Ascend

Insights from Data Means Business, Second Edition

Astronomer's Role in the Airflow Ecosystem: A Deep Dive with Pete DeJoy

Accelerated Computing in Modern Data Centers With Datapelago

Gender Diversity in Data & AI with Lou Hutchins, Director of Data Culture and Literacy at Cynozure and Rose Attridge, Strategy Advisor at Cynozure

#288 How Generative AI is Transforming Finance with Andrew Reiskind, CDO at Mastercard

The Future of Data Mesh

The Future of Data Engineering: AI, LLMs, and Automation

Measuring the impact of sustainability with Adam Elman, Director of Sustainability - EMEA, at Google

Evolving Responsibilities in AI Data Management

Bringing File & Object Storage Together in a Single Data Platform w/ Jon Toor

DataUnchained #Supercomputing2024 #AI #GPUComputing #ObjectStorage #GPUDirect #Cloudian #Hammerspace #DataScience #MachineLearning #AIInfrastructure #DataStorage #TechPodcast #ArtificialIntelligence #SC24 #BigData #DataManagement

Driving Value: Data and AI Impact with Nick Zervoudis

The impact of AI on young people and education with Amanda Bickerstaff, CEO & Founder, AI for Education

From Engineering to Data Strategy: Driving AI and Decision-Making

EMPOWERING DATA ENGINEERING WITH LAKEHOUSE ARCHITECTURE AND LARGE LANGUAGE MODELS

The Psychology of Leadership with Lara Menke, Leadership Psychologist & Executive Coach

Special 25th Episode Release Celebration Minisode Pt.2

Simplifying AI Complexity with Data Management w/ Mark Seamans

AI #HighPerformanceComputing #DataScience #Supercomputing #PenguinSolutions #Hammerspace #CloudComputing #DataManagement #GPUComputing #AIProjects #TechInnovation #HybridCloud #ClusterWare #OriginAI #Supercomputing24 #Podcast

CSVs Will Never Die And OneSchema Is Counting On It