talk-data.com talk-data.com

Topic

dbt

dbt (data build tool)

data_transformation analytics_engineering sql

758

tagged

Activity Trend

134 peak/qtr
2020-Q1 2026-Q1

Activities

758 activities · Newest first

Operational AI for the Modern Data Stack

The opportunities for AI and machine learning are everywhere in modern businesses, but today's MLOps ecosystem is drowning in complexity. In this talk, we'll show how to use dbt and Continual to scale operational AI — from customer churn predictions to inventory forecasts — without complex engineering or operational burden.

Check the slides here: https://docs.google.com/presentation/d/1vNcQxCjAK4xZVZC1ZHzqBzPiJE7uwhDIVWGeT9Poi1U/edit#slide=id.g15b1f544dd5_0_1500

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Streaming with dbt: the Jaffle Shop don’t stop!

In between JVM languages, high-maintenance frameworks and academic papers, streaming remains a hard beast to tame for most of us. What if nothing had to change, and streaming just meant…still writing dbt models? At Materialize, we’re exploring how to make the most of dbt for streaming — from real-time analytics to continuous testing, and beyond! Join us to learn how to get started with no blood, sweat or tears, using the Jaffle Shop as a playground. Our toolbox? A database that feels like Postgres but works like all the streaming systems you’ve been avoiding, some SQL and a dash of magic.

Check the slides here: https://docs.google.com/presentation/d/11PANQElVxtzqgzmRCcQfZy24vdMeYDokpxr7LdlrbrE/edit#slide=id.g105b4fffa32_0_942

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Testing: Our assertions vs. reality

Testing data models is sometimes like trying to form a bust from clay made of cornstarch and water. Right when you think you've got it into the right shape and set it on a shelf, it completely melts into a puddle of mush. Our practice of testing transformations on top of shifting, changing data falls apart in the same way over and over again, yet we don't learn our lesson. Come learn from Mariah Rogers (Palmetto) why we're doing model testing wrong, how we can change our ways to do it better, and what problems will be essential for the dbt Community to solve together to bridge the gap.

Check the slides here: https://docs.google.com/presentation/d/1oTWnOJxCSRN7ihgI-SflQCBkA7cwmcpGvryOh1vWKoc/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

When the Real World Messes with Your Schedule: Event Driven Dbt Models for the MDS

The real world is unreliable. Planes take off late, trains leave early, and cars break down. Sometimes, we need to get data from a source without a standard connector. Sometimes, a schedule really doesn't cut it. In this talk, we'll build a pipeline that responds to events to ensure that data is delivered quickly and reliably. We'll also ensure it can handle failure and keep bad data from clogging the plumbing.

Check the slides here: https://docs.google.com/presentation/d/1W9p7H4l0fUr7iAJ3GxEGUTmWGtmc_iu02N-MKb2BSFM/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Why rent when you can own? Build your modern data lakehouse with true optionality

With Trino (formerly PrestoSQL) and dbt combined, you can get faster access to your data and the ability to analyze data across multiple data sources with ease. Extract, load and transform data in your data lakehouse easier than ever before using dbt’s Trino adapter. Join Brian Zhan and Tom Nats as they talk about the new dbt connector for Trino and how it works, along with a demo showing how easy it is to deploy, build and serve up analytics using dbt and Starburst Galaxy.

Check the slides here: https://docs.google.com/presentation/d/1-A-mfc1RIj87ypz6KeZvxK62QLaGthmMqBPy10vNnDk/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Workshop: Advanced Testing

Do you want to take your dbt project beyond simple unique and not-null tests, but don’t know where to start? Join the dbt Labs team for a deep dive into testing. You’ll learn how to customize tests to fit your unique needs, lean on the amazing dbt community for pre-built tests you can add straight to your project, and flex your Jinja skills by creating your own custom tests. By the end of this course you’ll be walking tall knowing that the data you’re providing to your customers is clean, reliable, and consistent.

Check the slides here: https://docs.google.com/presentation/d/1TCehN5TxHYIuE6gk3rCGx1f9kLkkcXM7TnfcDejUnqo/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Workshop: Build your first dbt Python model

Description: dbt now supports Python models! In this hands-on workshop you’ll learn how to build your first Python models in dbt, alongside SQL at the center of your transformations.

You’ll learn how to: - Build your Python transformation in a notebook - Add this transformation as a model in your dbt project - Decide between building models in SQL or in Python

Prerequisites: - Basic familiarity with Python and DataFrames - If you want to use your own Warehouse and dbt project, make sure that you have dbt 1.3 installed and have followed the “additional setup” from our docs

Check the slides here: https://docs.google.com/presentation/d/133CVwwAxc5qT80ZJwngQ_ZSikOkCttvzWwGpdZCgOHQ/edit#slide=id.g1693e59a4f4_0_0

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Data Change Management: Lessons Learned at Vouch

Allowing more people into the data development process can improve shipment speed, but can also cause some anxiety for folks that wonder how to preserve best practices as participation expands. In his session, Kshitij Aranke (Vouch Insurance), shares how his team created safe inroads to communal development through automated change management on dbt projects that provided automatic best practice checks on each pull request.

Check Google Slides here: https://docs.google.com/presentation/d/17D2DC4KUxfLopYLMvK4ywFVy5MPaPFeRE9fskkir0CM/edit#slide=id.gac0f4c9a75_0_0

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Minimum viable (data) product

Analytics work mirrors product development: identify a user need, build a minimum viable product to address that need, evaluate the impact and iterate. In this talk, Michal Kolacek, analytics engineer at Slido describes how MVP-like thinking can help data teams counterbalance and complement the standardized approaches of dbt.

We will walk through Slido’s evolution in their approach, tooling and the vision of building better data products using Deepnote notebooks. Finally, we will take a look under the hood of the new dbt integration in Deepnote and outline how data teams can use it to accelerate model prototyping and metrics workflows.

Check the slides here: https://docs.google.com/presentation/d/1-L7ndud6z5gsFtF3WdjA6AVG40_vrCAcVRNarqWNtPg/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Seeing is Believing: Data Observability with dbt Labs

The modern data platform elegantly consists of a complex set of elements. It must manage data from multiple source systems existing in different locations that are matched, consolidated, controlled, and packaged with business logic within an infrastructure of varied technologies strung together. Providing evidence and trust, an observability platform with dbt as the hub monitors ELT down to the individual processes. It ensures that SLAs are met regarding availability, throughput, and quality. Come see how Slalom confidently assures the user community that the democratized data there meets organizational standards.

Check Notion document here: https://montrealanalytics.notion.site/Coalesce-Workshop-Guide-6382db82046f41599e9ec39afb035bdb

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

The Return on Analytics Engineering

As analytics engineers and data people, we know the value we create in our own blood, sweat, and dbt models. But how is this value actually realized in practice? In this talk, David Jayatillake (Metaplane) draws on his experiences to discuss the processes, ways of thinking, tooling, and governance needed to realize the benefits from analytics engineering work in the greater organization.

Check the slides here: https://docs.google.com/presentation/d/1VmmqNQsrv1t0uuV81O6PJQ1XASyLRGxvAdB8eWIG9TQ/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

When analysts outnumber engineers 5 to 1: Our journey with dbt at M1

How do you train and enable 20 data analysts to use dbt Core in a short amount of time?

At M1, engineering and analytics are far apart on the org chart, but work hand-in-hand every day. M1 engineering has a culture that celebrates open source, where every data engineer is trained and empowered to work all the way down the infrastructure stack, using tools like Terraform and Kubernetes. The analytics team is comprised of strong SQL writers who use Tableau to create visualizations used company wide. When M1 knew they needed a tool like dbt for change management and data documentation generation, they had to figure out how to bridge the gap between engineering and analytics to enable analysts to contribute with minimal engineering intervention. Join Kelly Wachtel, a senior data engineer at M1, explain how they trained about 20 analysts to use git and dbt Core over the past year, and strengthened their collaboration between their data engineering and analytics teams.

Check the slides here: https://docs.google.com/presentation/d/1CWI97EMyLIz6tptLPKt4VuMjJzV_X3oO/edit?usp=sharing&ouid=110293204340061069659&rtpof=true&sd=true

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Partner Day

Participants in the dbt Labs partnership program are invited to a partner-only event on Monday, October 17th to hear what’s new on our product and company roadmap, learn about our partner programs, and meet the dbt Labs team and colleagues at other partner companies.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Katie was a founding member of Reddit's data science team and, currently, as Twitter's Data Science Manager, she leads the company's infrastructure data science and analytics organization. In this conversation with Tristan and Julia, Katie explores how, as a manager, to help data people (especially those new to the field!) do their best work. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com.  The Analytics Engineering Podcast is sponsored by dbt Labs.

Open Source Powers the Modern Data Stack

Lakehouses like Databricks’ Delta Lake are becoming the central brain for all data systems. But Lakehouses are only one component of the data stack. There are many building blocks required for tackling data needs, including data integrations, data transformation, data quality, observability, orchestration etc.

In this session, we will present how open source powers companies' approach to building a modern data stack. We will talk about technologies like Airbyte, Airflow, dbt, Preset, and how to connect them in order to build a customized and extensible data platform centered around Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Power to the (SQL) People: Python UDFs in DBSQL

Databricks SQL (DB SQL) allows customers to leverage the simple and powerful Lakehouse architecture with up to 12x better price/performance compared to traditional cloud data warehouses. Analysts can use standard SQL to easily query data and share insights using a query editor, dashboards or a BI tool of their choice, and analytics engineers can build and maintain efficient data pipelines, including with tools like dbt.

While SQL is great at querying and transforming data, sometimes you need to extend its capabilities with the power of Python, a full programming language. Users of Databricks notebooks already enjoy seamlessly mixing SQL, Python and several other programming languages. Use cases include masking or encrypting and decrypting sensitive data, complex transformation logic, using popular open source libraries or simply reusing code that has already been written elsewhere in Databricks. In many cases, it is simply prohibitive or even impossible to rewrite the logic in SQL.

Up to now, there was no way to use Python from within DBSQL. We are removing this restriction with the introduction of Python User Defined Functions (UDFs). DBSQL users can now create, manage and use Python UDFs using standard SQL. UDFs are registered in Unity Catalog, which means they can be governed and used throughout Databricks, including in notebooks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Rethinking Orchestration as Reconciliation: Software-Defined Assets in Dagster

This talk discusses “software-defined assets”, a declarative approach to orchestration and data management that makes it drastically easier to trust and evolve datasets and ML models. Dagster is an open source orchestrator built for maintaining software-defined assets.

In traditional data platforms, code and data are only loosely coupled. As a consequence, deploying changes to data feels dangerous, backfills are error-prone and irreversible, and it’s difficult to trust data, because you don’t know where it comes from or how it’s intended to be maintained. Each time you run a job that mutates a data asset, you add a new variable to account for when debugging problems.

Dagster proposes an alternative approach to data management that tightly couples data assets to code - each table or ML model corresponds to the function that’s responsible for generating it. This results in a “Data as Code” approach that mimics the “Infrastructure as Code” approach that’s central to modern DevOps. Your git repo becomes your source of truth on your data, so pushing data changes feels as safe as pushing code changes. Backfills become easy to reason about. You trust your data assets because you know how they’re computed and can reproduce them at any time. The role of the orchestrator is to ensure that physical assets in the data warehouse match the logical assets that are defined in code, so each job run is a step towards order.

Software-defined assets is a natural approach to orchestration for the modern data stack, in part because dbt models are a type of software-defined asset.

Attendees of this session will learn how to build and maintain lakehouses of software-defined assets with Dagster.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Constraints, Democratization, and the Modern Data Stack - Building a Data Platform At Red Ventures

The time and attention of skilled engineers are some of the most constrained, valuable resources at Red Digital, a marketing agency embedded within Red Ventures. Acknowledging that constraint, the team at Red Digital has taken a deliberate, product-first approach to modernize and democratize their data platform. With the help of modern tools like Databricks, Fivetran, dbt, Monte Carlo, and Airflow, Red Digital has increased its development velocity and the size of the available talent pool to continue to grow the business.

This talk will walk through some of the key challenges, decisions, and solutions that the Red Digital team has made to build a suite of parallel data stacks capable of supporting its growing business.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Databricks SQL Under the Hood: What's New with Live Demos

With serverless SQL compute and built-in governance, Databricks SQL lets every analyst and analytics engineer easily ingest, transform, and query the freshest data directly on your data lake, using their tools of choice like Fivetran, dbt, PowerBI or Tableau, and standard SQL. There is no need to move data to another system. All this takes place at virtually any scale, at a fraction of the cost of traditional cloud data warehouses. Join this session for a deep dive into how Databricks SQL works under the hood, and see a live end-to-end demo of the data and analytics on Databricks from data ingestion, transformation, and consumption, using the modern data stack along with Databricks SQL.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Data Warehousing on the Lakehouse

Most organizations routinely operate their business with complex cloud data architectures that silo applications, users and data. As a result, there is no single source of truth of data for analytics, and most analysis is performed with stale data. To solve these challenges, the lakehouse has emerged as the new standard for data architecture, with the promise to unify data, AI and analytic workloads in one place. In this session, we will cover why the data lakehouse is the next best data warehouse. You will hear from the experts success stories, use cases, and best practices learned from the field and discover how the data lakehouse ingests, stores and governs business-critical data at scale to build a curated data lake for data warehousing, SQL and BI workloads. You will also learn how Databricks SQL can help you lower costs and get started in seconds with instant, elastic SQL serverless compute, and how to empower every analytics engineers and analysts to quickly find and share new insights using their favorite BI and SQL tools, like Fivetran, dbt, Tableau or PowerBI.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/