Motherduck

Duck, duck, "deploy": Building an AI-ready app in 2 hours

2025-11-04 · Small Data SF 2025

workshop

by Russ Garner (Omni) , Becca Bruggman (Omni)

AI/ML Analytics API Data Modelling Omni

Start with a dataset in Motherduck and build a production-ready analytics app using Omni’s semantic model and APIs. We’ll cover practical data modeling techniques, share lessons learned from building AI features, and walk through how to give AI the context it needs to answer questions accurately. You’ll leave with a working app and the skills to build your next one.

From Parsing Nightmares to "Production": Any Unstructured Input → JSON → MotherDuck in Seconds

2025-11-04 · Small Data SF 2025

workshop

by Upal Saha (bem)

API CSV HTML JSON

Every sprint consumed by fixing parsers is a sprint spent not shipping product- brittle parsing kills velocity. This workshop is about retiring that cycle so you can move from messy, unstructured inputs to production-ready data in seconds. bem ingests and transforms any unstructured input at any volume — PDFs, emails, Excel, Word, CSV, text, JSON, images (PNG, JPEG, HEIC, HEIF, WebP), HTML, and audio (WAV, MP3, M4A) — into clean JSON instantly via API. With primitives like Transform, Join, Split, Route, and Analyze, you define the exact workflow your product needs. Built-in Evals measure + enforce accuracy automatically so quality doesn’t drop as you scale. Flow outputs straight into MotherDuck so you can go from chaos to query without manual cleanup — and your team can focus on shipping, not scraping.

From Zero to "Query": Building Your First Serverless Lakehouse with DuckLake

2025-11-04 · Small Data SF 2025

workshop

by Jacob Matson (MotherDuck)

Big Data Cloud Computing Data Lakehouse SQL

The lakehouse promised to unify our data, but popular formats can feel bloated and hard to use for most real-world workloads. If you've ever felt that the complexity and operational overhead of "Big Data" tools are overkill, you're not alone. What if your lakehouse could be simple, fast, and maybe even a little fun? Enter DuckLake , the native lakehouse format, managed on MotherDuck. It delivers the powerful features you need like ACID transactions, time travel, and schema evolution without the heavyweight baggage. This approach truly makes massive data sets feel like Small Data. This workshop is a practical, step-by-step walkthrough for the data practitioner. We'll get straight to the point and show you how to build a fully functional, serverless lakehouse from scratch. You will learn: The Architecture: We’ll explore how DuckLake's design choices make it fundamentally simpler and faster for analytical queries compared to its JVM-based cousins. The Workflow: Through hands-on examples, you'll create a DuckLake table, perform atomic updates, and use time travel—all with the simple SQL you already know. The MotherDuck Advantage: Discover how the serverless platform makes it easy to manage, share, and query your DuckLake tables, enabling a seamless hybrid workflow between your laptop and the cloud.

Just-in-Time Insights with "Estuary": Real-Time Data Streaming Made Simple

2025-11-04 · Small Data SF 2025

workshop

by Zulfikar Qureshi (Estuary)

Data Streaming

Gain a clear understanding of Estuary and its role in real-time data integration. The session will begin with an overview of the platform and how it works, then move into the distinctive advantages that set Estuary apart in today’s data landscape. From there, you’ll explore practical use cases that demonstrate how organizations are leveraging real-time data to drive meaningful outcomes. We’ll close by examining why Estuary has become the leading choice for loading data into MotherDuck, highlighting the speed, reliability, and simplicity it delivers. Gain hands-on experience with Estuary by completing a guided lab exercise: Setting up a source connection and capturing data in real time. Configuring a MotherDuck connection and materializing the data. Moving live, streaming data end-to-end.

DuckLake: Making BIG DATA feel small

2025-10-14 · dbt Coalesce 2025 Watch

talk

by Jacob Matson (MotherDuck) , Alex Monahan (MotherDuck)

Big Data Data Lakehouse dbt

Easy, fast, and scalable: pick 3. MotherDuck’s managed DuckLake data lakehouse blends the cost efficiency, scale, and openness of a lakehouse with the speed of a warehouse for truly joyful dbt pipelines. We will show you how!

Berlin PyData 2025 Conference Interviews

2025-09-26 · DataTalks.Club Listen

podcast_episode

by Yashasvi Misra (Pure Storage) , Igor Kvachenok (Leuphana University of Lüneburg) , Selim Nowicki (Distill Labs) , Mehdi Ouazza (MotherDuck) , Gülsah Durmaz

AI/ML Analytics Data Engineering Data Science dbt DuckDB Kubernetes LLM MLOps Python

At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.

Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.
Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.
Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.
Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.
Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.

Igor Kvachenok Master’s student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.

Connect: https://www.linkedin.com/in/igor-kvachenok/

Selim Nowicki Founder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today’s ML tooling and dbt’s impact on analytics.

Connect: https://www.linkedin.com/in/selim-nowicki/

Gülsah Durmaz Architect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.

Connect: https://www.linkedin.com/in/gulsah-durmaz/

Yashasvi (Yashi) Misra Data Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.

Connect: https://www.linkedin.com/in/misrayashasvi/

Mehdi Ouazza Developer Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.

Connect: https://www.linkedin.com/in/mehd-io/

Duck Lake: Simplifying the Lakehouse Ecosystem

2025-09-10 · Data Engineering Podcast Listen

podcast_episode

by Mark Raasveldt (DuckDB) , Hannes Mühleisen (DuckDB Labs) , Tobias Macey

AI/ML Flink Data Engineering Data Lakehouse Data Management Datafold Delta DuckDB ETL/ELT Iceberg Lance Prefect +4 more

Summary In this episode of the Data Engineering Podcast Hannes Mühleisen and Mark Raasveldt, the creators of DuckDB, share their work on Duck Lake, a new entrant in the open lakehouse ecosystem. They discuss how Duck Lake, is focused on simplicity, flexibility, and offers a unified catalog and table format compared to other lakehouse formats like Iceberg and Delta. Hannes and Mark share insights into how Duck Lake revolutionizes data architecture by enabling local-first data processing, simplifying deployment of lakehouse solutions, and offering benefits such as encryption features, data inlining, and integration with existing ecosystems.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. Your host is Tobias Macey and today I'm interviewing Hannes Mühleisen and Mark Raasveldt about DuckLake, the latest entrant into the open lakehouse ecosystemInterview IntroductionHow did you get involved in the area of data management?Can you describe what DuckLake is and the story behind it?What are the particular problems that DuckLake is solving for?How does this compare to the capabilities of MotherDuck?Iceberg and Delta already have a well established ecosystem, but so does DuckDB. Who are the primary personas that you are trying to focus on in these early days of DuckLake?One of the major factors driving the adoption of formats like Iceberg is cost efficiency for large volumes of data. That brings with it challenges of large batch processing of data. How does DuckLake account for these axes of scale?There is also a substantial investment in the ecosystem of technologies that support Iceberg. The most notable ecosystem challenge for DuckDB and DuckLake is in the query layer. How are you thinking about the evolution and growth of that capability beyond DuckDB (e.g. support in Trino/Spark/Flink)?What are your opinions on the viability of a future where DuckLake and Iceberg become a unified standard and implementation? (why can't Iceberg REST catalog implementations just use DuckLake under the hood?)Digging into the specifics of the specification and implementation, what are some of the capabilities that it offers above and beyond Iceberg?Is it now possible to enforce PK/FK constraints, indexing on underlying data?Given that DuckDB has a vector type, how do you think about the support for vector storage/indexing?How do the capabilities of DuckLake and the integration with DuckDB change the ways that data teams design their data architecture and access patterns?What are your thoughts on the impact of "data gravity" in today's data ecosystem, with engines like DuckDB, KuzuDB, LanceDB, etc. available for embedded and edge use cases?What are the most interesting, innovative, or unexpected ways that you have seen DuckLake used?What are the most interesting, unexpected, or challenging lessons that you have learned while working on DuckLake?When is DuckLake the wrong choice?What do you have planned for the future of DuckLake?Contact Info HannesWebsiteMarkWebsiteParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links DuckDBPodcast EpisodeDuckLakeDuckDB LabsMySQLCWIMonetDBIcebergIceberg REST CatalogDeltaHudiLanceDuckDB Iceberg ConnectorACID == Atomicity, Consistency, Isolation, DurabilityMotherDuckMotherDuck Managed DuckLakeTrinoSparkPrestoSpark DuckLake DemoDelta KernelArrowdltS3 TablesAttribute Based Access Control (ABAC)ParquetArrow FlightHadoopHDFSDuckLake RoadmapThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Hamilton Ulmer - Instant SQL with DuckDB/MotherDuck - Practical Data Lunch and Learn

2025-05-30 · The Joe Reis Show Listen

podcast_episode

by Hamilton Ulmer , Joe Reis (DeepLearning.AI)

DuckDB SQL

Imagine writing SQL and getting instant results as you type? Yes, this is reality now. It's amazing!DuckDB/MotherDuck's Instant SQL made a big splash at last month's Data Council. Hamilton Ulmer gives a demo of Instant SQL at the Practical Data Community.----------------------------Instant SQL: https://motherduck.com/blog/introducing-instant-sql/Practical Data Community Discord: https://discord.gg/gNfw5AKWSK

Where Data Science Meets Shrek: How BuzzFeed uses AI

2025-01-21 · Small Data SF 2024 Watch

video

by Gilad Cohen (BuzzFeed)

AI/ML Big Data Data Science LLM

By introducing a range of AI-enhanced products that amplify creativity and interactivity across our platforms, Buzzfeed has been able to connect with the largest global audience of young people online to cement its role as the defining digital media company of the AI era. Notably, some of Buzzfeed's most successful tools and content experiences thrive on the power of small, focused datasets. Still wondering how Shrek fits into the picture? You'll have to watch!

Video from: https://smalldatasf.com/

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Why Small Data?: https://benn.substack.com/p/is-excel-... Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Bluesky: motherduck.com Blog: https://motherduck.com/blog/

Discover how BuzzFeed's Data team, led by Gilad Cohen, harnesses AI for creative purposes, leveraging large language models (LLMs) and generative image capabilities to enhance content creation. This video explores how machine learning teams build tools to create new interactive media experiences, focusing on augmenting creative workflows rather than replacing jobs, allowing readers to participate more deeply in the content they consume.

We dive into the core data science problem of understanding what a piece of content is about, a crucial step for improving content recommendation systems. Learn why traditional methods fall short and how the team is constantly seeking smaller, faster, and more performant models. This exploration covers the evolution from earlier architectures like DistilBERT to modern, more efficient approaches for better content representation, clustering, and user personalization.

A key technique explored is the use of text embeddings, which are dense, low-dimensional vector representations of data. This video provides an accessible explanation of embeddings as a form of compressed knowledge, showing how BuzzFeed creates a unique vector for each article. This allows for simple vector math to find semantically similar content, forming a foundational infrastructure for powerful ranking and recommender systems.

Explore how BuzzFeed leverages generative image capabilities to create new interactive formats. The journey began with Midjourney experiments and evolved to building custom tools by fine-tuning a Stable Diffusion XL model using LORA (Low-Rank Approximation). This advanced technique provides greater control over image output, enabling the rapid creation of viral AI generators that respond to trending topics and allow for massive user engagement.

Finally, see a practical application of machine learning for content optimization. BuzzFeed uses its vast historical dataset from Bayesian A/B testing to train a model that predicts headline performance. By generating multiple headline candidates with an LLM like Claude and running them through this predictive model, they can identify the winning headline. This showcases how to use unique, in-house data to build powerful tools that improve click-through rates and drive engagement, pointing to a significant transformation in how media is created and consumed.

The intersection of UI, exploratory data analysis, and SQL (w/ Hamilton Ulmer)

2024-12-22 · The Analytics Engineering Podcast Listen

podcast_episode

by Tristan Handy (dbt Labs) , Hamilton Ulmer

Analytics Analytics Engineering DataViz dbt SQL

Hamilton Ulmer is working at the intersection of UI, Exploratory Data Analysis, and SQL at MotherDuck, and he's built a long career in EDA. Hamilton and Tristan dive deep into the history of exploratory data analysis. Even if you spend most of your time below the frontend layer of the stack, it is important to understand the trends in both the practice of data visualization and the technologies that underlie that practice. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Think Inside the Box: Constraints Drive Data Warehousing Innovation

2024-11-24 · Small Data SF 2024 Watch

video

by Lindsay Murphy (Secoda)

Big Data Cloud Computing Data Engineering DuckDB DWH Modern Data Stack

As a Head of Data or a one-person data team, keeping the lights on for the business while running all things data-related as efficiently as possible is no small feat. This talk will focus on tactics and strategies to manage within and around constraints, including monetary costs, time and resources, and data volumes.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Why Small Data?: https://benn.substack.com/p/is-excel-... Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Blog: https://motherduck.com/blog/

Learn how your data team can drive innovation and maximize ROI by embracing constraints, drawing inspiration from SpaceX's revolutionary cost-effective approach. This video challenges the "abundance mindset" prevalent in the modern data stack, where easily scalable cloud data warehouses and a surplus of tools often lead to unmanageable data models and underutilized dashboards. We explore a focused data strategy for extracting maximum value from small data, shifting the paradigm from "more data" to more impact.

To maximize value, data teams must move beyond being order-takers and practice strategic stakeholder management. Discover how to use frameworks like the stakeholder engagement matrix to prioritize high-impact business leaders and align your work with core business goals. This involves speaking the language of business growth models, not technical jargon about data pipelines or orchestration, ensuring your data engineering efforts resonate with key decision-makers and directly contribute to revenue-generating activities.

Embracing constraints is key to innovation and effective data project management. We introduce the Iron Triangle—a fundamental engineering concept balancing scope, cost, and time—as a powerful tool for planning data projects and having transparent conversations with the business. By treating constraints not as limitations but as opportunities, data engineers and analysts can deliver higher-quality data products without succumbing to scope creep or uncontrolled costs.

A critical component of this strategy is understanding the Total Cost of Ownership (TCO), which goes far beyond initial compute costs to include ongoing maintenance, downtime, and the risk of vendor pricing changes. Learn how modern, efficient tools like DuckDB and MotherDuck are designed for cost containment from the ground up, enabling teams to build scalable, cost-effective data platforms. By making the true cost of data requests visible, you can foster accountability and make smarter architectural choices. Ultimately, this guide provides a blueprint for resisting data stack bloat and turning cost and constraints into your greatest assets for innovation.

Is BI Too Big for Small Data?

2024-11-17 · Small Data SF 2024 Watch

video

by David Wheeler (Mode)

Analytics BI Big Data Data Analytics Modern Data Stack Power BI Tableau

This is a talk about how we thought we had Big Data, and we built everything planning for Big Data, but then it turns out we didn't have Big Data, and while that's nice and fun and seems more chill, it's actually ruining everything, and I am here asking you to please help us figure out what we are supposed to do now.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-... Small Data Manifesto: https://motherduck.com/blog/small-dat... Is Excel Immortal?: https://benn.substack.com/p/is-excel-immortal Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: / motherduck
X/Twitter : / motherduck
Blog: https://motherduck.com/blog/

Mode founder David Wheeler challenges the data industry's obsession with "big data," arguing that most companies are actually working with "small data," and our tools are failing us. This talk deconstructs the common sales narrative for BI tools, exposing why the promise of finding game-changing insights through data exploration often falls flat. If you've ever built dashboards nobody uses or wondered why your analytics platform doesn't deliver on its promises, this is a must-watch reality check on the modern data stack.

We explore the standard BI demo, where an analyst uncovers a critical insight by drilling into event data. This story sells tools like Tableau and Power BI, but it rarely reflects reality, leading to a "revolving door of BI" as companies swap tools every few years. Discover why the narrative of the intrepid analyst finding a needle in the haystack only works in movies and how this disconnect creates a cycle of failed data initiatives and unused "trashboards."

The presentation traces our belief that "data is the new oil" back to the early 2010s, with examples from Target's predictive analytics and Facebook's growth hacking. However, these successes were built on truly massive datasets. For most businesses, analyzing small data results in noisy charts that offer vague "directional vibes" rather than clear, actionable insights. We contrast the promise of big data analytics with the practical challenges of small data interpretation.

Finally, learn actionable strategies for extracting real value from the data you actually have. We argue that BI tools should shift focus from data exploration to data interpretation, helping users understand what their charts actually mean. Learn why "doing things that don't scale," like manually analyzing individual customer journeys, can be more effective than complex models for small datasets. This talk offers a new perspective for data scientists, analysts, and developers looking for better data analysis techniques beyond the big data hype.

Big Data is Dead: Long Live Hot Data 🔥

2024-11-15 · Small Data SF 2024 Watch

video

by Jordan Tigani (MotherDuck)

Analytics Big Data BigQuery Cloud Computing Data Analytics Data Engineering DuckDB DWH Hadoop Snowflake

Over the last decade, Big Data was everywhere. Let's set the record straight on what is and isn't Big Data. We have been consumed by a conversation about data volumes when we should focus more on the immediate task at hand: Simplifying our work.

Some of us may have Big Data, but our quest to derive insights from it is measured in small slices of work that fit on your laptop or in your hand. Easy data is here— let's make the most of it.

📓 Resources Big Data is Dead: https://motherduck.com/blog/big-data-is-dead/ Small Data Manifesto: https://motherduck.com/blog/small-data-manifesto/ Small Data SF: https://www.smalldatasf.com/

➡️ Follow Us LinkedIn: https://linkedin.com/company/motherduck X/Twitter : https://twitter.com/motherduck Blog: https://motherduck.com/blog/

Explore the "Small Data" movement, a counter-narrative to the prevailing big data conference hype. This talk challenges the assumption that data scale is the most important feature of every workload, defining big data as any dataset too large for a single machine. We'll unpack why this distinction is crucial for modern data engineering and analytics, setting the stage for a new perspective on data architecture.

Delve into the history of big data systems, starting with the non-linear hardware costs that plagued early data practitioners. Discover how Google's foundational papers on GFS, MapReduce, and Bigtable led to the creation of Hadoop, fundamentally changing how we scale data processing. We'll break down the "big data tax"—the inherent latency and system complexity overhead required for distributed systems to function, a critical concept for anyone evaluating data platforms.

Learn about the architectural cornerstone of the modern cloud data warehouse: the separation of storage and compute. This design, popularized by systems like Snowflake and Google BigQuery, allows storage to scale almost infinitely while compute resources are provisioned on-demand. Understand how this model paved the way for massive data lakes but also introduced new complexities and cost considerations that are often overlooked.

We examine the cracks appearing in the big data paradigm, especially for OLAP workloads. While systems like Snowflake are still dominant, the rise of powerful alternatives like DuckDB signals a shift. We reveal the hidden costs of big data analytics, exemplified by a petabyte-scale query costing nearly $6,000, and argue that for most use cases, it's too expensive to run computations over massive datasets.

The key to efficient data processing isn't your total data size, but the size of your "hot data" or working set. This talk argues that the revenge of the single node is here, as modern hardware can often handle the actual data queried without the overhead of the big data tax. This is a crucial optimization technique for reducing cost and improving performance in any data warehouse.

Discover the core principles for designing systems in a post-big data world. We'll show that since only 1 in 500 users run true big data queries, prioritizing simplicity over premature scaling is key. For low latency, process data close to the user with tools like DuckDB and SQLite. This local-first approach offers a compelling alternative to cloud-centric models, enabling faster, more cost-effective, and innovative data architectures.

Coalesce 2024: Simplify your dbt data pipelines with serverless DuckDB

2024-10-17 · Dbt Coalesce 2024 Watch

video

by Alex Monahan (MotherDuck)

Analytics Cloud Computing dbt DuckDB

Discover how to cut complexity of your dbt data pipelines with serverless DuckDB while improving performance and drastically reducing costs. This session covers practical strategies for cutting complexity and expenses in data flows while enjoying a more ergonomic and frictionless workflow. Learn how adopting a DuckDB-based architecture can streamline your operations, enhance developer experience, and boost efficiency.

Speaker: Alex Monahan Forward Deployed Software Engineer MotherDuck

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

#252 Is Big Data Dead? MotherDuck and the Small Data Manifesto with Ryan Boyd Co-Founder at MotherDuck

2024-10-14 · DataFramed Listen

podcast_episode

by Ryan Boyd (Databricks) , Richie (DataCamp)

AI/ML Big Data Data Engineering Data Science Databricks Marketing Neo4j SQL

Businesses are collecting more data than ever before. But is bigger always better? Many companies are starting to question whether massive datasets and complex infrastructure are truly delivering results or just adding unnecessary costs and complications. How can you make sure your data strategy is aligned with your actual needs? What if focusing on smaller, more manageable datasets could improve your efficiency and save resources, all while delivering the same insights? Ryan Boyd is the Co-Founder & VP, Marketing + DevRel at MotherDuck. Ryan started his career as a software engineer, but since has led DevRel teams for 15+ years at Google, Databricks and Neo4j, where he developed and executed numerous marketing and DevRel programs. Prior to MotherDuck, Ryan worked at Databricks and focussed the team on building an online community during the pandemic, helping to organize the content and experience for an online Data + AI Summit, establishing a regular cadence of video and blog content, launching the Databricks Beacons ambassador program, improving the time to an “aha” moment in the online trial and launching a University Alliance program to help professors teach the latest in data science, machine learning and data engineering. In the episode, Richie and Ryan explore data growth and computation, the data 1%, the small data movement, data storage and usage, the shift to local and hybrid computing, modern data tools, the challenges of big data, transactional vs analytical databases, SQL language enhancements, simple and ergonomic data solutions and much more. Links Mentioned in the Show: MotherDuckThe Small Data ManifestoConnect with RyanSmall DataSF conferenceRelated Episode: Effective Data Engineering with Liya Aizenberg, Director of Data Engineering at AwayRewatch sessions from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Jordan Tigani - Why Small Data is Awesome, DuckDB, and More

2024-09-05 · The Joe Reis Show Listen

podcast_episode

by Jordan Tigani (MotherDuck) , Joe Reis (DeepLearning.AI)

AI/ML DuckDB

Jordan Tigani is back to chat about why small data is awesome, data lakehouses, DuckDB, AI, and much more.

Motherduck: https://motherduck.com/

LinkedIn: https://www.linkedin.com/in/jordantigani/

Twitter: https://twitter.com/jrdntgn?lang=en

DuckDB in Action

2024-08-21 · O'Reilly Data Science Books O'Reilly Amazon

book

by Michael Simons , Mark Needham , Michael Hunger

Analytics API Big Data Cloud Computing CSV Data Analytics DuckDB DWH Java JSON Neo4j Pandas +8 more

Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: Read and process data from CSV, JSON and Parquet sources both locally and remote Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the Technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the Book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's Inside Prepare, ingest and query large datasets Build cloud data pipelines Extend DuckDB with custom functionality Fast-paced SQL recap: From simple queries to advanced analytics About the Reader For data pros comfortable with Python and CLI tools. About the Authors Mark Needham is a blogger and video creator at @‌LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j. Quotes I use DuckDB every day, and I still learned a lot about how DuckDB makes things that are hard in most databases easy! - Jordan Tigani, Founder, MotherDuck An excellent resource! Unlocks possibilities for storing, processing, analyzing, and summarizing data at the edge using DuckDB. - Pramod Sadalage, Director, Thoughtworks Clear and accessible. A comprehensive resource for harnessing the power of DuckDB for both novices and experienced professionals. - Qiusheng Wu, Associate Professor, University of Tennessee Excellent! The book all we ducklings have been waiting for! - Gunnar Morling, Decodable

#221 [Radar Recap] The Future of Programming: Accelerating Coding Workflows with LLMs

2024-07-02 · DataFramed Listen

podcast_episode

by Ryan J. Salva (GitHub) , Jordan Tigani (MotherDuck) , Michele Catasta (Replit)

AI/ML Data Science GitHub LLM

From data science to software engineering, Large Language Models (LLMs) have emerged as pivotal tools in shaping the future of programming. In this session, Michele Catasta, VP of AI at Replit, Jordan Tigani, CEO at Motherduck, and Ryan J. Salva, VP of Product at GitHub, will explore practical applications of LLMs in coding workflows, how to best approach integrating AI into the workflows of data teams, what the future holds for AI-assisted coding, and a lot more. Links Mentioned in the Show: Rewatch Session from RADAR: AI Edition New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

#48 How Can We Define DevRel in the Tech World? Tech Insights with Mehdi Ouazza

2024-05-02 · DataTopics: All Things Data, AI & Tech Listen

podcast_episode

by Mehdi Ouazza (MotherDuck)

AI/ML Data Engineering Databricks DuckDB GenAI GitHub IBM LLM Terraform

Send us a text Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society. Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!

In this episode, we're thrilled to have special guest Mehdi Ouazza diving into a plethora of hot tech topics: Mehdi Ouazza's Insights into his career, online community and working with DuckDB and MotherDuck.Demystifying DevRel: Definitions and distinctions in the realm of tech influence (dive deeper here).Terraform's Licensing Shift: Reactions to HashiCorp's recent changes and its new IBM collaboration, more details here.Github Copilot Workspace: Exploring the latest in AI-powered coding assistance, comparing with devin.ai and CodySnowflake's Arctic LLM: Discussing the latest enterprise AI capabilities and their real-world applications. Read more about Arctic - what it excels at, and how its performance was measuredMore legal kerfuffle in the GenAI realm: The ongoing legal debates around AI's use in creative industries, highlighted by a dispute over Drake’s use of late rapper Tupac’s AI-generated voice in diss track & the licensing deal between Financial Times and OpenAIFuture of Data Engineering: Examining the integration of LLMs into data engineering tools. Insights on prompt-based feature engineering and Databricks' English SDKAI in Music Creation: A little bonus with an AI generated song about Murilo, created with Suno

Panel discussion: Fixing the data eng lifecycle - Coalesce 2023

2023-10-27 · dbt Coalesce 2023 Watch

video

by Sung Won Chung (Datafold) , Mehdi Ouazza (MotherDuck) , Matt Housley (Halfpipe Systems)

Analytics Data Analytics Data Engineering Datafold Modern Data Stack

As Joe Reis recently opined, if you want to know what’s next in data engineering, just look at the software engineer. The MDS-in-a-box pattern has been a game changer for applying software engineering principles to local data development– improving the ability to share data, collaborate on modeling work and data analysis the same way we build and share open source tooling.

This panel brings together experts in data engineering, data analytics and software engineering to explore the current state of the pattern, pieces that remain missing today and how emerging tools and data engineering testing capabilities can refine the transition from local development to production workflows.

Speakers: Matt Housley, CTO, Halfpipe Systems; Mehdi Ouazza, Developer Advocate, MotherDuck; Sung Won Chung, Solutions Engineer, Datafold; Louise de Leyritz, Host, The Data Couch podcast

Register for Coalesce at https://coalesce.getdbt.com

talk-data.com

Activity Trend

Top Events

Top Speakers

Duck, duck, "deploy": Building an AI-ready app in 2 hours

From Parsing Nightmares to "Production": Any Unstructured Input → JSON → MotherDuck in Seconds

From Zero to "Query": Building Your First Serverless Lakehouse with DuckLake

Just-in-Time Insights with "Estuary": Real-Time Data Streaming Made Simple

DuckLake: Making BIG DATA feel small

Berlin PyData 2025 Conference Interviews

Duck Lake: Simplifying the Lakehouse Ecosystem

Hamilton Ulmer - Instant SQL with DuckDB/MotherDuck - Practical Data Lunch and Learn

Where Data Science Meets Shrek: How BuzzFeed uses AI

The intersection of UI, exploratory data analysis, and SQL (w/ Hamilton Ulmer)

Think Inside the Box: Constraints Drive Data Warehousing Innovation

Is BI Too Big for Small Data?

Big Data is Dead: Long Live Hot Data 🔥

Coalesce 2024: Simplify your dbt data pipelines with serverless DuckDB

#252 Is Big Data Dead? MotherDuck and the Small Data Manifesto with Ryan Boyd Co-Founder at MotherDuck

Jordan Tigani - Why Small Data is Awesome, DuckDB, and More

DuckDB in Action

#221 [Radar Recap] The Future of Programming: Accelerating Coding Workflows with LLMs

#48 How Can We Define DevRel in the Tech World? Tech Insights with Mehdi Ouazza

Panel discussion: Fixing the data eng lifecycle - Coalesce 2023