talk-data.com talk-data.com

Topic

DuckDB

embedded_database analytics olap

98

tagged

Activity Trend

13 peak/qtr
2020-Q1 2026-Q1

Activities

98 activities · Newest first

It’s time for another episode of Data Engineering Central Podcast, our third one! Topics in this episode … * Should you use DuckDB or Polars? * Small Engineering Changes (PR Reviews) * Daft vs Spark on Databricks with Unity Catalog (Delta Lake) * Primary and Foreign keys in the Lake House Enjoy!

This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit dataengineeringcentral.substack.com/subscribe

Summary In this episode of the Data Engineering Podcast, Adrian Broderieux and Marcin Rudolph, co-founders of DLT Hub, delve into the principles guiding DLT's development, emphasizing its role as a library rather than a platform, and its integration with lakehouse architectures and AI application frameworks. The episode explores the impact of the Python ecosystem's growth on DLT, highlighting integrations with high-performance libraries and the benefits of Arrow and DuckDB. The episode concludes with a discussion on the future of DLT, including plans for a portable data lake and the importance of interoperability in data management tools. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementImagine catching data issues before they snowball into bigger problems. That’s what Datafold’s new Monitors do. With automatic monitoring for cross-database data diffs, schema changes, key metrics, and custom data tests, you can catch discrepancies and anomalies in real time, right at the source. Whether it’s maintaining data integrity or preventing costly mistakes, Datafold Monitors give you the visibility and control you need to keep your entire data stack running smoothly. Want to stop issues before they hit production? Learn more at dataengineeringpodcast.com/datafold today!Your host is Tobias Macey and today I'm interviewing Adrian Brudaru and Marcin Rudolf, cofounders at dltHub, about the growth of dlt and the numerous ways that you can use it to address the complexities of data integrationInterview IntroductionHow did you get involved in the area of data management?Can you describe what dlt is and how it has evolved since we last spoke (September 2023)?What are the core principles that guide your work on dlt and dlthub?You have taken a very opinionated stance against managed extract/load services. What are the shortcomings of those platforms, and when would you argue in their favor?The landscape of data movement has undergone some interesting changes over the past year. Most notably, the growth of PyAirbyte and the rapid shifts around the needs of generative AI stacks (vector stores, unstructured data processing, etc.). How has that informed your product development and positioning?The Python ecosystem, and in particular data-oriented Python, has also undergone substantial evolution. What are the developments in the libraries and frameworks that you have been able to benefit from?What are some of the notable investments that you have made in the developer experience for building dlt pipelines?How have the interfaces for source/destination development improved?You recently published a post about the idea of a portable data lake. What are the missing pieces that would make that possible, and what are the developments/technologies that put that idea within reach?What is your strategy for building a sustainable product on top of dlt?How does that strategy help to form a "virtuous cycle" of improving the open source foundation?What are the most interesting, innovative, or unexpected ways that you have seen dlt used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on dlt?When is dlt the wrong choice?What do you have planned for the future of dlt/dlthub?Contact Info AdrianLinkedInMarcinLinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links dltPodcast EpisodePyArrowPolarsIbisDuckDBPodcast Episodedlt Data ContractsRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodePyAirbyteOpenAI o1 ModelLanceDBQDrant EmbeddedAirflowGitHub ActionsArrow DataFusionApache ArrowPyIcebergDelta-RSSCD2 == Slowly Changing DimensionsSQLAlchemySQLGlotFSSpecPydanticSpacyEntity RecognitionParquet File FormatPython DecoratorREST API ToolkitOpenAPI Connector GeneratorConnectorXPython no-GILDelta LakePodcast EpisodeSQLMeshPodcast EpisodeHamiltonTabularPostHogPodcast.init EpisodeAsyncIOCursor.AIData MeshPodcast EpisodeFastAPILangChainGraphRAGAI Engineering Podcast EpisodeProperty GraphPython uvThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

In-Memory Analytics with Apache Arrow - Second Edition

Dive into efficient data handling with 'In-Memory Analytics with Apache Arrow.' This book explores Apache Arrow, a powerful open-source project that revolutionizes how tabular and hierarchical data are processed. You'll learn to streamline data pipelines, accelerate analysis, and utilize high-performance tools for data exchange. What this Book will help me do Understand and utilize the Apache Arrow in-memory data format for your data analysis needs. Implement efficient and high-speed data pipelines using Arrow subprojects like Flight SQL and Acero. Enhance integration and performance in analysis workflows by using tools like Parquet and Snowflake with Arrow. Master chaining and reusing computations across languages and environments with Arrow's cross-language support. Apply in real-world scenarios by integrating Apache Arrow with analytics systems like Dremio and DuckDB. Author(s) Matthew Topol, the author of this book, brings 15 years of technical expertise in the realm of data processing and analysis. Having worked across various environments and languages, Matthew offers insights into optimizing workflows using Apache Arrow. His approachable writing style ensures that complex topics are comprehensible. Who is it for? This book is tailored for developers, data engineers, and data scientists eager to enhance their analytic toolset. Whether you're a beginner or have experience in data analysis, you'll find the concepts actionable and transformative. If you are curious about improving the performance and capabilities of your analytic pipelines or tools, this book is for you.

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. We dive into conversations smoother than your morning coffee (but let’s be honest, just as caffeinated) where industry insights meet light-hearted banter. Whether you’re a data wizard or just curious about the digital chaos around us, kick back and get ready to talk shop—unplugged style! In this episode: Farewell Pandas, Hello Future: Pandas is out, and Ibis is in. We're talking faster, smarter data processing—featuring the rise of DuckDB and the powerhouse that is Polars. Is this the end of an era for Pandas?UV vs. Rye: Forget pip—are these new Python package managers built in Rust the future? We break down UV, Rye, and what it all means for your next Python project.AI-Generated Podcasts: Is AI about to take over your favorite podcasts? We explore the potential of Google’s Notebook LM to transform content into audio gold.When AI Steals Your Voice: Jeff Geerling’s voice gets cloned by AI—without his consent. We dive into the wild world of voice cloning, the ethics, and the future of AI-generated media.Hacking AI with Prompt Injection: Could you outsmart AI? We share some wild strategies from the game Gandalf that challenge your prompt injection skills and teach you how to jailbreak even the toughest guardrails.Jony Ive’s New Gadget Rumor: Is Jony Ive plotting an Apple killer? Rumors are swirling about a new AI-powered handheld device that could shake up the smartphone market.Zero-Downtime Deployments with Kamal Proxy: No more downtime! We geek out over Kamal Proxy, the sleek HTTP tool designed for effortless Docker deployments.Function Calling and LLMs: Get ready for the next evolution in AI—function calling. We discuss its rise in LLMs and dive into the Gorilla project, the leaderboard testing the future of smart APIs.

DuckDB in Action

Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: Read and process data from CSV, JSON and Parquet sources both locally and remote Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the Technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the Book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's Inside Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Fast-paced SQL recap: From simple queries to advanced analytics About the Reader For data pros comfortable with Python and CLI tools. About the Authors Mark Needham is a blogger and video creator at @‌LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j. Quotes I use DuckDB every day, and I still learned a lot about how DuckDB makes things that are hard in most databases easy! - Jordan Tigani, Founder, MotherDuck An excellent resource! Unlocks possibilities for storing, processing, analyzing, and summarizing data at the edge using DuckDB. - Pramod Sadalage, Director, Thoughtworks Clear and accessible. A comprehensive resource for harnessing the power of DuckDB for both novices and experienced professionals. - Qiusheng Wu, Associate Professor, University of Tennessee Excellent! The book all we ducklings have been waiting for! - Gunnar Morling, Decodable

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style! Bookmarklet Maker: Discover how to automate tasks with the Bookmarklet Maker, a tool for turning scripts into handy browser bookmarks. RouteLLM Framework: Explore the RouteLLM framework by LMSys and Anyscale, designed to optimize the cost-performance ratio of LLM routers. Learn more about this collaboration at LMSys and Anyscale. Q for SQL on CSV/TSV: Meet Q, a command-line tool that lets you run SQL queries directly on CSV or TSV files, simplifying data exploration from your terminal. DuckDB Community Extensions: Check out the latest updates in DuckDB's community extensions and see how this database system is evolving. Apple Intelligence and AI Maximalism: Explore Apple's AI strategy, their avoidance of chat UIs, risk management with OpenAI, and the shift of compute costs to users. Being Glue: Delve into the challenges of being "Glue" at work. Explore why women are more likely to take on non-promotable work and how this affects career progression and workplace dynamics.

Getting Started with DuckDB

Unlock the full potential of DuckDB with 'Getting Started with DuckDB,' your guide to mastering data analysis efficiently. By reading this book, you'll discover how to load, transform, and query data using DuckDB, leveraging its unique capabilities for processing large datasets. Gain hands-on experience with SQL, Python, and R to enhance your data science and engineering workflows. What this Book will help me do Effectively load and manage various types of data in DuckDB for seamless processing. Gain hands-on experience writing and optimizing SQL queries tailored for analytical tasks. Integrate DuckDB capabilities into Python and R workflows for streamlined data analysis. Understand DuckDB's optimizations and extensions for specialized data applications. Explore the broader ecosystem of data tools that complement DuckDB's capabilities. Author(s) Simon Aubury and Ned Letcher are seasoned experts in the field of data analytics and engineering. With extensive experience in using both SQL and programming languages like Python and R, they bring practical insights into the innovative uses of DuckDB. They have designed this book to provide a hands-on and approachable way to learn DuckDB, making complex concepts accessible. Who is it for? This book is well-suited for data analysts aiming to accelerate their data analysis workflows, data engineers looking for effective tools for data processing, and data scientists searching for a versatile library for scalable data manipulation. Prior exposure to SQL and programming in Python or R will be beneficial for readers to maximize their learning.

The Evolution of Delta Lake from Data + AI Summit 2024

Shant Hovsepian, Chief Technology Officer of Data Warehousing at Databricks explains why Delta Lake is the most adopted open lakehouse format.

Includes: - Delta Lake UniForm GA (support for and compatibility with Hudi, Apache Iceberg, Delta) - Delta Lake Liquid Clustering - Delta Lake production-ready catalog (Iceberg REST API) - The growth and strength of the Delta ecosystem - Delta Kernel - DuckDB integration with Delta - Delta 4.0

Announcing DuckDB Support for Delta Lake and a DuckDB Extension to Unity Catalog - Hannes Mühleisen

Hannes Mühleisen of DuckDB Labs addressed an audience of thousands during his keynote address at Data + AI Summit 2024 in San Francisco. Mühleisen announced DuckDB support for Delta Lake, a new DuckDB Extension to Unity Catalog, and Community Extensions.

Speaker: Hannes Mühleisen, Creator of DuckDB, DuckDB Labs @duckdb @duckdb3282

Data + AI Summit 2024 - Keynote Day 2 - Full
video
by Bilal Aslam (Databricks) , Yejin Choi (University of Washington; AI2) , Darshana Sivakumar (Databricks) , Ryan Blue (Tabular) , Zeashan Pappa (Databricks) , Ali Ghodsi (Databricks) , Reynold Xin (Databricks) , Matei Zaharia (Databricks) , Hannes Mühleisen (DuckDB Labs) , Alexander Booth (Texas Rangers Baseball Club) , Tareef Kawaf (Posit Sofware, PBC)

Speakers: - Alexander Booth, Asst Director of Research & Development, Texas Rangers - Ali Ghodsi, Co-Founder and CEO, Databricks - Bilal Aslam, Sr. Director of Product Management, Databricks - Darshana Sivakumar, Staff Product Manager, Databricks - Hannes Mühleisen, Creator of DuckDB, DuckDB Labs - Matei Zaharia, Chief Technology Officer and Co-Founder, Databricks - Reynold Xin, Chief Architect and Co-Founder, Databricks - Ryan Blue, CEO, Tabular - Tareef Kawaf, President, Posit Software, PBC - Yejin Choi, Sr Research Director Commonsense AI, AI2, University of Washington - Zeashan Pappa, Staff Product Manager, Databricks

About Databricks Databricks is the Data and AI company. More than 10,000 organizations worldwide — including Block, Comcast, Conde Nast, Rivian, and Shell, and over 60% of the Fortune 500 — rely on the Databricks Data Intelligence Platform to take control of their data and put it to work with AI. Databricks is headquartered in San Francisco, with offices around the globe, and was founded by the original creators of Lakehouse, Apache Spark™, Delta Lake and MLflow.

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data… Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

In this episode, we're thrilled to have special guest Mehdi Ouazza  diving into a plethora of hot tech topics: Mehdi Ouazza's Insights into his career, online community and working with DuckDB and MotherDuck.Demystifying DevRel: Definitions and distinctions in the realm of tech influence (dive deeper here).Terraform's Licensing Shift: Reactions to HashiCorp's recent changes and its new IBM collaboration, more details here.Github Copilot Workspace: Exploring the latest in AI-powered coding assistance, comparing with devin.ai and CodySnowflake's Arctic LLM: Discussing the latest enterprise AI capabilities and their real-world applications. Read more about Arctic - what it excels at, and how its performance was measuredMore legal kerfuffle in the GenAI realm: The ongoing legal debates around AI's use in creative industries, highlighted by a dispute over Drake’s use of late rapper Tupac’s AI-generated voice in diss track & the licensing deal between Financial Times and OpenAIFuture of Data Engineering: Examining the integration of LLMs into data engineering tools. Insights on prompt-based feature engineering and Databricks' English SDKAI in Music Creation: A little bonus with an AI generated song about Murilo, created with Suno

Hybrid query execution: What is a database client, anyway? - Coalesce 2023

Running a full-fledged analytical database inside the client opens up new ways of executing your query; you can run parts of your query locally and part remotely. Once you can split the query plan into two pieces, the same mechanism works with N stages, which can be in series or a tree. This talk discusses the hybrid execution system based on DuckDB built at MotherDuck, but also discusses some further query topologies that are enabled by this pattern.

Speaker: Jordan Tigani, Co-Founder & Chief Duck-Herder, MotherDuck

Register for Coalesce at https://coalesce.getdbt.com

If a Duck Quacks in the Forest and Everyone Hears, Should You Care?

YES! "Duck posting" has become an internet meme for praising DuckDB on Twitter. Nearly every quack using DuckDB has done it once or twice. But, why all the fuss? With advances in CPUs, memory, SSDs, and the software that enables it all, our personal machines are powerful beasts relegated to handling a few Chrome tabs and sitting 90% idle. As data engineers and data analysts, this seems like a waste that's not only expensive, but also impacting the environment.

In this session, you will see how DuckDB brings SQL analytics capabilities to a 2MB standalone executable on your laptop that only recently required a large cluster. This session will explain the architecture of DuckDB that enables high performance analytics on a laptop: great query optimization, vectorized execution, continuous improvements in compression and more. We will show its capabilities using live demos, from the pandas library to WASM, to the command-line. We'll demonstrate performance on large datasets, and talk about how we're exploring using the laptop to augment cloud analytics workloads.

Talk by: Ryan Boyd

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Ryan Boyd and I chat about the evolution and future of databases, the pendulum between single-server and distributed computing, DuckDB and Motherduck, and much more.

We also talk about developer relations, which I consider Ryan as one of the OG's in the field.

Note - this was recorded the week of Databricks Summit 2023.


If you like this show, give it a 5-star rating on your favorite podcast platform.

Purchase Fundamentals of Data Engineering at your favorite bookseller.

Subscribe to my Substack: https://joereis.substack.com/

Data Warehouses are Gilded Cages  What Comes Next | Motherduck

ABOUT THE TALK: If you squint, the data warehouse-centric analytics stack can look a lot like mainframe-era centralized control: shared distributed compute, some sandbox space if you are lucky, and no way to work locally that's equivalent. And it's been getting expensive! Yet the major vendors appear to be converging on features. Is this "The End of History" for OLAP? Your laptop is incredibly powerful and tragically underutilized. It has better I/O than servers from just a few years back, and with a solid analytic engine like DuckDB, it can handle surprising workloads.

In this talk Nicholas Ursa will share what makes DuckDB fast and show how far single node scale-up can take you. He will also show some ideas Motherduck is testing that blur the line between local and remote systems and shift some of that centralized control back to users.

ABOUT THIS SPEAKER: Nicholas Ursa is an Engineer at Motherduck. He led data engineering teams at The New York Times and better.com after cutting his teeth in ad tech.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Big Data is Dead | MotherDuck

This talk will make the case that the era of Big Data is over. Now we can stop worrying about data size and focus on how we’re going to use it to make better decisions.

The data behind the graphs shown in this talk come from Jordan Tigani having analyzed query logs, deal post-mortems, benchmark results (published and unpublished), customer support tickets, customer conversations, service logs, and published blog posts, plus a bit of intuition.

ABOUT THE SPEAKER: Jordan Tigani is co-founder and chief duck-herder at MotherDuck, a startup building a serverless analytics platform based on DuckDB. He helped create Google BigQuery, wrote two books on it, and led first the engineering team then the product team through its first $1B or so in revenue.

👉 Sign up for our “No BS” Newsletter to get the latest technical data & AI content: https://datacouncil.ai/newsletter

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Summary

The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Businesses that adapt well to change grow 3 times faster than the industry average. As your business adapts, so should your data. RudderStack Transformations lets you customize your event data in real-time with your own JavaScript or Python code. Join The RudderStack Transformation Challenge today for a chance to win a $1,000 cash prize just by submitting a Transformation to the open-source RudderStack Transformation library. Visit dataengineeringpodcast.com/rudderstack today to learn more Your host is Tobias Macey and today I'm interviewing Matt Turck about his annual report on the Machine Learning, AI, & Data landscape and the insights around data infrastructure that he has gained in the process

Interview

Introduction How did you get involved in the area of data management? Can you describe what the MAD landscape report is and the story behind it?

At a high level, what is your goal in the compilation and maintenance of your landscape document? What are your guidelines for what to include in the landscape?

As the data landscape matures, how have you seen that influence the types of projects/companies that are founded?

What are the product categories that were only viable when capital was plentiful and easy to obtain? What are the product categories that you think will be swallowed by adjacent concerns, and which are likely to consolidate to remain competitive?

The rapid growth and proliferation of data tools helped establish the "Modern Data Stack" as a de-facto architectural paradigm. As we move into this phase of contraction, what are your predictions for how the "Modern Data Stack" will evolve?

Is there a different architectural paradigm that you see as growing to take its place?

How has your presentation and the types of information that you collate in the MAD landscape evolved since you first started it?~~ What are the most interesting, innovative, or unexpected product and positioning approaches that you have seen while tracking data infrastructure as a VC and maintainer of the MAD landscape? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the MAD landscape over the years? What do you have planned for future iterations of the MAD landscape?

Contact Info

Website @mattturck on Twitter MAD Landscape Comments Email

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

MAD Landscape First Mark Capital Bayesian Learning AI Winter Databricks Cloud Native Landscape LUMA Scape Hadoop Ecosystem Modern Data Stack Reverse ETL Generative AI dbt Transform

Podcast Episode

Snowflake IPO Dataiku Iceberg

Podcast Episode

Hudi

Podcast Episode

DuckDB

Podcast Episode

Trino Y42

Podcast Episode

Mozart Data

Podcast Episode

Keboola MPP Database

The intro and outro music is f