talk-data.com talk-data.com

Topic

Data Management

data_governance data_quality metadata_management

1097

tagged

Activity Trend

88 peak/qtr
2020-Q1 2026-Q1

Activities

1097 activities · Newest first

In Analytics and Data Science departments, we've got a pretty good sense for why investing in data is important for any organization.   But how well could you pitch your company to spend its precious resources on improving data quality or better data management practices? Could you tell that data story to the right stakeholders when it matters?   In this episode, you'll hear from The Data Whisperer, Scott Taylor, sharing his best advice and practical tips for becoming a better storyteller and getting people to take action.   What You'll Learn: Why storytelling is a key skill for anyone who works in data The importance of data management, and what that really means Practical tips and frameworks for telling an effective data story   Register for free to be part of the next live session: https://bit.ly/3XB3A8b   About our guest: Scott Taylor The Data Whisperer, Scott Taylor, has helped countless companies by enlightening business executives to the strategic value of master data and proper data management. He focuses on business alignment and the "strategic WHY" rather than system implementation and the "technical HOW." At MetaMeta Consulting he works with Enterprise Data Leadership teams and Innovative Tech Brands to tell their data story. Get Scott's book: Telling Your Data Story: Data Storytelling for Data Management Follow Scott on LinkedIn

Follow us on Socials: LinkedIn YouTube Instagram (Mavens of Data) Instagram (Maven Analytics) TikTok Facebook Medium X/Twitter

Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and preventing issues before they arise. He explains how data contracts can be used to enforce guarantees and requirements, and how they fit into the broader context of data observability and quality monitoring. The discussion also covers the challenges and benefits of implementing data contracts, the organizational impact, and the potential for standardization in the field.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.At Outshift, the incubation engine from Cisco, they are driving innovation in AI, cloud, and quantum technologies with the powerful combination of enterprise strength and startup agility. Their latest innovation for the AI ecosystem is Motific, addressing a critical gap in going from prototype to production with generative AI. Motific is your vendor and model-agnostic platform for building safe, trustworthy, and cost-effective generative AI solutions in days instead of months. Motific provides easy integration with your organizational data, combined with advanced, customizable policy controls and observability to help ensure compliance throughout the entire process. Move beyond the constraints of traditional AI implementation and ensure your projects are launched quickly and with a firm foundation of trust and efficiency. Go to motific.ai today to learn more!Your host is Tobias Macey and today I'm interviewing Tom Baeyens about using data contracts to build a clearer API for your dataInterview IntroductionHow did you get involved in the area of data management?Can you describe the scope and purpose of data contracts in the context of this conversation?In what way(s) do they differ from data quality/data observability?Data contracts are also known as the API for data, can you elaborate on this?What are the types of guarantees and requirements that you can enforce with these data contracts?What are some examples of constraints or guarantees that cannot be represented in these contracts?Are data contracts related to the shift-left?Data contracts are also known as the API for data, can you elaborate on this?The obvious application of data contracts are in the context of pipeline execution flows to prevent failing checks from propagating further in the data flow. What are some of the other ways that these contracts can be integrated into an organization's data ecosystem?How did you approach the design of the syntax and implementation for Soda's data contracts?Guarantees and constraints around data in different contexts have been implemented in numerous tools and systems. What are the areas of overlap in e.g. dbt, great expectations?Are there any emerging standards or design patterns around data contracts/guarantees that will help encourage portability and integration across tooling/platform contexts?What are the most interesting, innovative, or unexpected ways that you have seen data contracts used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on data contracts at Soda?When are data contracts the wrong choice?What do you have planned for the future of data contracts?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SodaPodcast EpisodeJBossData ContractAirflowUnit TestingIntegration TestingOpenAPIGraphQLCircuit Breaker PatternSodaCLSoda Data ContractsData MeshGreat Expectationsdbt Unit TestsOpen Data ContractsODCS == Open Data Contract StandardODPS == Open Data Product SpecificationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

How can you transform your organisation with AI? Join the conversation with Jason Foster and Charlie Stack, Global Data, Analytics & AI Practice Leader at Spencer Stuart, as they discuss the culture, processes and mindset shifts that leaders need to adopt to leverage AI in their businesses. Discover practical tips on generating and testing business hypotheses, empowering cross-functional teams, and breaking habitual behaviours that create obstacles to innovation and experimentation. Tune in now to learn how to leverage AI for your business.


Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. They work with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and change management and leadership. The company was named one of The Sunday Times' fastest-growing private companies in 2022 and 2023 and named the Best Place to Work in Data by DataIQ in 2023.For more information, visit www.cynozure.com.

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In Season 01, Episode 09, host Frannie Helforoush (Senior Digital Product Manager at RBC Global Asset Management) talks to guest Jill Maffeo (Principal Product Manager at Vista). Jill is a data and product management practitioner who joins Frannie in conversation to talk about building and leading effective data and product teams. There's a lot for listeners to takeaway including challenges faced by Jill and how to foster collaboration and drive success. A few of the big questions tackled in this episode are: What are the key elements in building a cohesive and productive team? How can you optimize team structures for efficiency and innovation? What is the 'Product Trio', and why is it crucial for creating successful digital products? How do you integrate data-driven insights into the product development lifecycle? What team metrics or process metrics to follow to align a team? Should a data product manager report to a CPO or a data leader?   About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn. About our guest Jill Maffeo: Jill has a background in analytics and product management, with experience spanning teams focused on channels and marketing, customer retention, customer service, and ecommerce site platforms. Currently, she leads the strategy for Site Search at Vistaprint, overseeing a dynamic team of software engineers, data engineers, a data product analyst, and data scientists dedicated to enhancing customer and partner experiences. Jill's professional passion lies in metadata development and management, providing solid foundational data to build models and orchestrate customer journeys. Outside of work, she enjoys good books, tasty food, and going on walks. Connect with Jill on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

Building an end to end data strategy for analytics and generative AI | AWS Events

In this session, Rick Sears, General Manager of Amazon Athena, EMR, and Lake Formation at AWS, explores how generative AI is revolutionizing businesses and the critical role data plays in this transformation. He discusses the evolution of AI models and the importance of a comprehensive data management strategy encompassing availability, quality, and protection of data.

Mark Greville, Vice President of Architecture at Workhuman, shares insights from Workhuman's journey in building a robust cloud-based data strategy, emphasizing the significance of storytelling, demonstrating value, and gaining executive support.

Kamal Sampathkumar, Senior Manager of Data Architecture at Workhuman, delves into the technical aspects, detailing the architecture of Workhuman's data platform and showcasing solutions like Data API and self-service reporting that deliver substantial value to customers.

Learn more at: https://go.aws/3x2mha0

Learn more about AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSEvents #awsaianddataconference #generativeaiconference #genaiconference #genaievent #AWSgenerativeai #AWSgenai

Summary Generative AI has rapidly gained adoption for numerous use cases. To support those applications, organizational data platforms need to add new features and data teams have increased responsibility. In this episode Lior Gavish, co-founder of Monte Carlo, discusses the various ways that data teams are evolving to support AI powered features and how they are incorporating AI into their work. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Lior Gavish about the impact of AI on data engineersInterview IntroductionHow did you get involved in the area of data management?Can you start by clarifying what we are discussing when we say "AI"?Previous generations of machine learning (e.g. deep learning, reinforcement learning, etc.) required new features in the data platform. What new demands is the current generation of AI introducing?Generative AI also has the potential to be incorporated in the creation/execution of data pipelines. What are the risk/reward tradeoffs that you have seen in practice?What are the areas where LLMs have proven useful/effective in data engineering?Vector embeddings have rapidly become a ubiquitous data format as a result of the growth in retrieval augmented generation (RAG) for AI applications. What are the end-to-end operational requirements to support this use case effectively?As with all data, the reliability and quality of the vectors will impact the viability of the AI application. What are the different failure modes/quality metrics/error conditions that they are subject to?As much as vectors, vector databases, RAG, etc. seem exotic and new, it is all ultimately shades of the same work that we have been doing for years. What are the areas of overlap in the work required for running the current generation of AI, and what are the areas where it diverges?What new skills do data teams need to acquire to be effective in supporting AI applications?What are the most interesting, innovative, or unexpected ways that you have seen AI impact data engineering teams?What are the most interesting, unexpected, or challenging lessons that you have learned while working with the current generation of AI?When is AI the wrong choice?What are your predictions for the future impact of AI on data engineering teams?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your Links Monte CarloPodcast EpisodeNLP == Natural Language ProcessingLarge Language ModelsGenerative AIMLOpsML EngineerFeature StoreRetrieval Augmented Generation (RAG)LangchainThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In Season 01, Episode 08, host Frannie Helforoush (Senior Digital Product Manager at RBC Global Asset Management) and guest Corrin Sholomo Goldenberg (Product Manager at Chainlink Labs), a practitioner with experience in both software product management and data product management, focus their conversation on building data platforms and teams from scratch. Their conversation also explores team dynamics and structure in product and data management. Takeaways include insights into KPIs and metrics to measure success, effective team structures, collaboration strategies, and the integration of diverse expertise.  About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn. About our guest Corrin Shlomo Goldenberg: Corrin is a data-driven product manager with nearly two decades of experience in the tech industry, that fuels the creation of impactful data products that make a real difference. Corrin is passionate about uncovering insights across diverse fields. Connect with Corrin on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know. Do you love what you're listening to? Please rate and review the podcast, and share it with fellow practitioners you know. Your support helps us reach more listeners and continue providing valuable insights!

Summary In this episode Praveen Gujar, Director of Product at LinkedIn, talks about the intricacies of product management for data and analytical platforms. Praveen shares his journey from Amazon to Twitter and now LinkedIn, highlighting his extensive experience in building data products and platforms, digital advertising, AI, and cloud services. He discusses the evolving role of product managers in data-centric environments, emphasizing the importance of clean, reliable, and compliant data. Praveen also delves into the challenges of building scalable data platforms, the need for organizational and cultural alignment, and the critical role of product managers in bridging the gap between engineering and business teams. He provides insights into the complexities of platformization, the significance of long-term planning, and the necessity of having a strong relationship with engineering teams. The episode concludes with Praveen offering advice for aspiring product managers and discussing the future of data management in the context of AI and regulatory compliance.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Praveen Gujar about product management for data and analytical platformsInterview IntroductionHow did you get involved in the area of data management?Product management is typically thought of as being oriented toward customer facing functionality and features. What is involved in being a product manager for data systems?Many data-oriented products that are customer facing require substantial technical capacity to serve those use cases. How does that influence the process of determining what features to provide/create?investment in technical capacity/platformsidentifying groupings of features that can be served by a common platform investmentmanaging organizational pressures between engineering, product, business, finance, etc.What are the most interesting, innovative, or unexpected ways that you have seen "Data Products & Platforms @ Big-tech" used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on "Building Data Products & Platforms for Big-tech"?When is "Data Products & Platforms @ Big-tech" the wrong choice?What do you have planned for the future of "Data Products & Platforms @ Big-tech"?Contact Info LinkedInWebsiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DataHubPodcast EpisodeRAG == Retrieval Augmented GenerationThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Databricks Customers at Data + AI Summit

At this year's event, over 250 customers shared their data and AI journies. They showcased a wide variety of use cases, best practices and lessons from their leadership and innovation with the latest data and AI technologies.

See how enterprises are leveraging generative AI in their data operations and how innovative data management and data governance are fueling organizations as they race to develop GenAI applications. https://www.databricks.com/blog/how-real-world-enterprises-are-leveraging-generative-ai

To see more real-world use cases and customer success stories, visit: https://www.databricks.com/customers

Dive into a world where data meets sustainability. In this episode, Jason Foster sits down with Simon Leesley, the COO of Too Good to Go, the revolutionary marketplace on a mission to end food waste. Discover Simon's inspiring journey, the mission and vision of Too Good to Go and how the company works with 165,000 partner stores across 17 markets. Join the conversation and discover data's pivotal role in optimising the user experience, balancing supply and demand, and fostering a culture of intellectual curiosity. Will their data-driven approach be the recipe for saving our planet? Tune in to find out!


Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. They work with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and change management and leadership. The company was named one of The Sunday Times' fastest-growing private companies in 2022 and 2023 and named the Best Place to Work in Data by DataIQ in 2023.For more information, visit www.cynozure.com.

Summary Postgres is one of the most widely respected and liked database engines ever. To make it even easier to use for developers to use, Nikita Shamgunov decided to makee it serverless, so that it can scale from zero to infinity. In this episode he explains the engineering involved to make that possible, as well as the numerous details that he and his team are packing into the Neon service to make it even more attractive for anyone who wants to build on top of Postgres. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Nikita Shamgunov about his work on making Postgres a serverless database at Neon.Interview IntroductionHow did you get involved in the area of data management?Can you describe what Neon is and the story behind it?The ecosystem around Postgres is large and varied. What are the pain points that you are trying to address with Neon? What does it mean for a database to be serverless?What kinds of products and services are unlocked by making Postgres a serverless database?How does your vision for Neon compare/contrast with what you know of PlanetScale?Postgres is known for having a large ecosystem of plugins that add a lot of interesting and useful features, but the storage layer has not been as easily extensible historically. How have architectural changes in recent Postgres releases enabled your work on Neon?What are the core pieces of engineering that you have had to complete to make Neon possible?How have the design and goals of the project evolved since you first started working on it?The separation of storage and compute is one of the most fundamental promises of the cloud. What new capabilities does that enable in Postgres?How does the branching functionality change the ways that development teams are able to deliver and debug features?Because the storage is now a networked system, what new performance/latency challenges does that introduce? How have you addressed them in Neon?Anyone who has ever operated a Postgres instance has had to tackle the upgrade process. How does Neon address that process for end users?The rampant growth of AI has touched almost every aspect of computing, and Postgres is no exception. How does the introduction of pgvector and semantic/similarity search functionality impact the adoption and usage patterns of Postgres/Neon?What new challenges does that introduce for you as an operator and business owner?What are the lessons that you learned from MemSQL/SingleStore that have been most helpful in your work at Neon?What are the most interesting, innovative, or unexpected ways that you have seen Neon used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Neon?When is Neon the wrong choice? Postgres?What do you have planned for the future of Neon?Contact Info @nikitabase on TwitterLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links NeonPostgreSQLNeon GithubPHPMySQLSQL ServerSingleStorePodcast EpisodeAWS AuroraKhosla VenturesYugabyteDBPodcast EpisodeCockroachDBPodcast EpisodePlanetScalePodcast EpisodeClickhousePodcast EpisodeDuckDBPodcast EpisodeWAL == Write-Ahead LogPgBouncerPureStoragePaxos)HNSW IndexIVF Flat IndexRAG == Retrieval Augmented GenerationAlloyDBNeon Serverless DriverDevinmagic.devThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Beginning Mathematica and Wolfram for Data Science: Applications in Data Analysis, Machine Learning, and Neural Networks

Enhance your data science programming and analysis with the Wolfram programming language and Mathematica, an applied mathematical tools suite. This second edition introduces the latest LLM Wolfram capabilities, delves into the exploration of data types in Mathematica, covers key programming concepts, and includes code performance and debugging techniques for code optimization. You’ll gain a deeper understanding of data science from a theoretical and practical perspective using Mathematica and the Wolfram Language. Learning this language makes your data science code better because it is very intuitive and comes with pre-existing functions that can provide a welcoming experience for those who use other programming languages. Existing topics have been reorganized for better context and to accommodate the introduction of Notebook styles. The book also incorporates new functionalities in code versions 13 and 14 for imported and exported data. You’ll see how to use Mathematica, where data management and mathematical computations are needed. Along the way, you’ll appreciate how Mathematica provides an entirely integrated platform: its symbolic and numerical calculation result in a mized syntax, allowing it to carry out various processes without superfluous lines of code. You’ll learn to use its notebooks as a standard format, which also serves to create detailed reports of the processes carried out. What You Will Learn Create datasets, work with data frames, and create tables Import, export, analyze, and visualize data Work with the Wolfram data repository Build reports on the analysis Use Mathematica for machine learning, with different algorithms, including linear, multiple, and logistic regression; decision trees; and data clustering Who This Book Is For Data scientists who are new to using Wolfram and Mathematica as a programming language or tool. Programmers should have some prior programming experience, but can be new to the Wolfram language.

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In Season 01, Episode 06, host Frannie Helforoush (Senior Digital Product Manager at RBC Global Asset Management) and guest Nathan Worrell (Senior Product Manager, Data Analytics at Cortland) explore areas that are crucial to successfully realizing data product management and delivering value. With Nathan's experience and passion, he shares his thoughts on applying product thinking to data products and emphasizes the often-forgotten core soft skills necessary to augment success. They leave no stone unturned as they dive into the detail of product thinking. Nathan provides practical, concrete examples that are easy for anyone to take away and implement, including the strategic use of Generative AI.  About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn. About our guest Nathan Worrell: Nathan is a dynamic product manager with a passion for AI, data, and process optimization. He has a proven track record of success across multiple industries, leading complex initiatives and building products from the ground up. Nathan thrives on working with diverse teams with the goal of driving businesses to become more data-driven. Connect with Nathan on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn. Apply to be a guest or nominate someone that you know.

AI workloads are becoming increasingly complex, with unique requirements around data management, compute scalability, and model lifecycle management. In this session, we will explore the real-world challenges users face when operating AI at scale. Through real-world examples, we will uncover common pitfalls in areas like data versioning, reproducibility, model deployment, and monitoring. Our practical guide will highlight strategies for building robust and scalable AI platforms leveraging Airflow as the orchestration layer and AWS for its extensive AI/ML capabilities. We will showcase how users have tackled these challenges, streamlined their AI workflows, and unlocked new levels of productivity and innovation.

Summary This episode features an insightful conversation with Petr Janda, the CEO and founder of Synq. Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for transparency and ownership in data systems. Synq's platform helps data teams manage incidents, understand data dependencies, and ensure data quality by providing insights and automation capabilities. Petr emphasizes the need for a holistic approach to data reliability, integrating data systems into broader business processes. He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.Your host is Tobias Macey and today I'm interviewing Petr Janda about Synq, a data reliability platform focused on leveling up data teams by supporting a culture of engineering rigorInterview IntroductionHow did you get involved in the area of data management?Can you describe what Synq is and the story behind it? Data observability/reliability is a category that grew rapidly over the past ~5 years and has several vendors focused on different elements of the problem. What are the capabilities that you saw as lacking in the ecosystem which you are looking to address?Operational/infrastructure engineers have spent the past decade honing their approach to incident management and uptime commitments. How do those concepts map to the responsibilities and workflows of data teams? Tooling only plays a small part in SLAs and incident management. How does Synq help to support the cultural transformation that is necessary?What does an on-call rotation for a data engineer/data platform engineer look like as compared with an application-focused team?How does the focus on data assets/data products shift your approach to observability as compared to a table/pipeline centric approach?With the focus on sharing ownership beyond the boundaries on the data team there is a strong correlation with data governance principles. How do you see organizations incorporating Synq into their approach to data governance/compliance?Can you describe how Synq is designed/implemented? How have the scope and goals of the product changed since you first started working on it?For a team who is onboarding onto Synq, what are the steps required to get it integrated into their technology stack and workflows?What are the types of incidents/errors that you are able to identify and alert on? What does a typical incident/error resolution process look like with Synq?What are the most interesting, innovative, or unexpected ways that you have seen Synq used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on Synq?When is Synq the wrong choice?What do you have planned for the future of Synq?Contact Info LinkedInSubstackParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SynqIncident ManagementSLA == Service Level AgreementData GovernancePodcast EpisodePagerDutyOpsGenieClickhousePodcast EpisodedbtPodcast EpisodeSQLMeshPodcast EpisodeThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Summary

Data lakehouse architectures have been gaining significant adoption. To accelerate adoption in the enterprise Microsoft has created the Fabric platform, based on their OneLake architecture. In this episode Dipti Borkar shares her experiences working on the product team at Fabric and explains the various use cases for the Fabric service.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Dipti Borkar about her work on Microsoft Fabric and performing analytics on data withou

Interview

Introduction How did you get involved in the area of data management? Can you describe what Microsoft Fabric is and the story behind it? Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. What are the motivating factors that you see for that trend? Microsoft has been investing heavily in open source in recent years, and the Fabric platform relies on several open components. What are the benefits of layering on top of existing technologies rather than building a fully custom solution?

What are the elements of Fabric that were engineered specifically for the service? What are the most interesting/complicated integration challenges?

How has your prior experience with Ahana and Presto informed your current work at Microsoft? AI plays a substantial role in the product. What are the benefits of embedding Copilot into the data engine?

What are the challenges in terms of safety and reliability?

What are the most interesting, innovative, or unexpected ways that you have seen the Fabric platform used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data lakes generally, and Fabric specifically? When is Fabric the wrong choice? What do you have planned for the future of data lake analytics?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.

Links

Microsoft Fabric Ahana episode DB2 Distributed Spark Presto Azure Data MAD Landscape

Podcast Episode ML Podcast Episode

Tableau dbt Medallion Architecture Microsoft Onelake ORC Parquet Avro Delta Lake Iceberg

Podcast Episode

Hudi

Podcast Episode

Hadoop PowerBI

Podcast Episode

Velox Gluten Apache XTable GraphQL Formula 1 McLaren

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Starburst: Starburst Logo

This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Powered by T

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In Season 01, Episode 002, host Frannie Helforoush (Senior Digital Product Manager at RBC Global Asset Management) chats with Deepti Surabattula (Principal Data Product Manager and AI Delivery & Support Workstream Lead at Pfizer). They discuss the importance of user and stakeholder involvement in data product management and effective relationship management. Deepti shares experiences and challenges with different implementation processes and how to enjoy and find reward in creating valuable data products. About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn.

About our guest Deepti Surabattula: Deepti is a product leader with a strong engineering background. She has proven success across Life Sciences, Aerospace, and Medical Devices, leading AI, data, and regulatory-compliant products from inception to delivery. Deepti is an expert in regulatory guidelines for data integrity and product compliance (21 CFR part 11, GDPR, MHRA, ICH, EMA) and is passionate about strategy, technology innovation, and quality solutions to improve human lives. Connect with Deepti on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn.  

The Data Product Management In Action podcast, brought to you by Soda and executive producer Scott Hirleman, is a platform for data product management practitioners to share insights and experiences. In Season 01, Episode 001, hosts Frannie Helforoush (Senior Digital Product Manager at RBC Global Asset Management) and Michael Toland (Product Management Coach and Consultant with Pathfinder Product) introduce themselves and dive into the challenges, joys, and potential of data product management. About our host Frannie Helforoush: Frannie's journey began as a software engineer and evolved into a strategic product manager. Now, as a data product manager, she leverages her expertise in both fields to create impactful solutions. Frannie thrives on making data accessible and actionable, driving product innovation, and ensuring product thinking is integral to data management. Connect with Frannie on LinkedIn.

About our host Michael Toland: Michael is a Product Management Coach and Consultant with Pathfinder Product, a Test Double Operation. Since 2016, Michael has worked on large-scale system modernizations and migration initiatives at Verizon. Outside his professional career, Michael serves as the Treasurer for the New Leaders Council, mentors with Venture for America, sings with the Columbus Symphony, and writes satire for his blog Dignified Product. He is excited to discuss data product management with the podcast audience. Connect with Michael on LinkedIn. All views and opinions expressed are those of the individuals and do not necessarily reflect their employers or anyone else. Join the conversation on LinkedIn.  

How can data drive change and innovation in a traditional and global sport? Join host Jason Foster for a fascinating conversation with Thomas Musson, the head of data at The R&A, the governing body for golf. They discuss Thomas's career journey from sales to data, the challenges and opportunities he faced along the way, and the skills and qualities of a strategic data leader. They also explore the role of data in the governance and growth of golf, the projects and initiatives that The R&A is involved in, and the value of aligning data work with the organisation's values and vision.


Cynozure is a leading data, analytics and AI company that helps organisations to reach their data potential. They work with clients on data and AI strategy, data management, data architecture and engineering, analytics and AI, data culture and literacy, and change management and leadership. The company was named one of The Sunday Times' fastest-growing private companies in 2022 and 2023 and named the Best Place to Work in Data by DataIQ in 2023.For more information, visit www.cynozure.com. Check out our free AI Scorecard and we'll send you a personalised report that outlines what's needed to drive innovation to your business and be competitive.