SQL

Taking A Look Under The Hood At CreditKarma's Data Platform

2022-11-14 · Data Engineering Podcast Listen

podcast_episode

by Vishnu Venkataraman (CreditKarma) , Tobias Macey

Airflow Analytics AWS Azure BigQuery CDP CI/CD Cloud Computing Data Engineering Data Lake Data Management Data Quality +14 more

Summary CreditKarma builds data products that help consumers take advantage of their credit and financial capabilities. To make that possible they need a reliable data platform that empowers all of the organization’s stakeholders. In this episode Vishnu Venkataraman shares the journey that he and his team have taken to build and evolve their systems and improve the product offerings that they are able to support.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Vishnu Venkataraman about building the data platform at CreditKarma and the forces that shaped the design

Interview

Introduction How did you get involved in the area of data management? Can you describe what CreditKarma is and the role

Introducing RavenDB: The Database for Modern Data Persistence

2022-11-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dejan Miličić

NoSQL RDBMS data data-engineering

Simplify your first steps with the RavenDB NoSQL Document Database. This book takes a task-oriented approach by showing common problems, potential solutions, brief explanations of how those solutions work, and the mechanisms used. Based on real-world examples, the recipes in this book will show you how to solve common problems with Raven Query Language and will highlight reasons why RavenDB is a great choice for fast prototyping solutions that can sustain increasing amounts of data as your application grows. Introducing RavenDB includes code and query examples that address real-life challenges you’ll encounter when using RavenDB, helping you learn the basics of the Raven Query Language more quickly and efficiently. In many cases, you’ll be able to copy and paste the examples into your own code, making only minor modifications to suit your application. RavenDB supports many advanced features, such full-text search, graph queries, and timeseries; recipes in the latter portion of the book will help you understand those advanced features and how they might be applied to your own code and applications. After reading this book, you will be able to employ RavenDB’s powerful features in your own projects. What You Will Learn Set up and start working with RavenDB Model your objects for persistence in a NoSQL document database Write basic and advanced queries in the Raven Query Language Index your data using map/reduce techniques Implement techniques leading to highly performant systems Efficiently aggregate data and query on those aggregations Who This Book Is For Developers accustomed to relational databases who are about to enter a world of NoSQL databases. The book is also for experienced programmers who have used other non-relational databases and want to learn RavenDB. It will also prove useful for developers who want to move away from using Object-Relational Modeling frameworks and start working with a persistence solution that can store object graphs directly.

SQL Server 2022 Query Performance Tuning: Troubleshoot and Optimize Query Performance

2022-11-09 · O'Reilly SQL Books O'Reilly Amazon

book

by Grant Fritchey

Cloud Computing microsoft sql server

Troubleshoot slow-performing queries and make them run faster. Database administrators and SQL developers are constantly under pressure to provide more speed. This new edition has been redesigned and rewritten from scratch based on the last 15 years of learning, knowledge, and experience accumulated by the author. The book Includes expanded information on using extended events, automatic execution plan correction, and other advanced features now available in SQL Server. These modern features are covered while still providing the necessary fundamentals to better understand how statistics and indexes affect query performance. The book gives you knowledge and tools to help you identify poorly performing queries and understand the possible causes of that poor performance. The book also provides mechanisms for resolving the issues identified, whether on-premises, in containers, or on cloud platform providers. You’ll learn about key fundamentals, such as statistics, data distribution, cardinality, and parameter sniffing. You’ll learn to analyze and design your indexes and your queries using best practices that ward off performance problems before they occur. You’ll also learn to use important modern features, such as Query Store to manage and control execution plans, the automated performance tuning feature set, and memory-optimized OLTP tables and procedures. You will be able to troubleshoot in a systematic way. Query tuning doesn’t have to be difficult. This book helps you to make it much easier. What You Will Learn Use Query Store to understand and easily change query performance Recognize and eliminate bottlenecks leading to slow performance Tune queries whether on-premises, in containers, or on cloud platform providers Implement best practices in T-SQL to minimize performance risk Design in the performance that you need through careful query and index design Understand how built-in, automatic tuning can assist your performance enhancement efforts Protect query performance during upgrades to the newer versions of SQL Server Who This Book Is For Developers and database administrators with responsibility for query performance in SQL Server environments, and anyone responsible for writing or creating T-SQL queries and in need of insight into bottlenecks (including how to identify them, understand them, and eliminate them)

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

2022-11-07 · Data Engineering Podcast Listen

podcast_episode

by Sonal Goyal (Nube Technologies) , Tobias Macey

AI/ML Airflow Analytics AWS Azure BigQuery CDP CI/CD Cloud Computing Data Engineering Data Lake Data Management +15 more

Summary Despite the best efforts of data engineers, data is as messy as the real world. Entity resolution and fuzzy matching are powerful utilities for cleaning up data from disconnected sources, but it has typically required custom development and training machine learning models. Sonal Goyal created and open-sourced Zingg as a generalized tool for data mastering and entity resolution to reduce the effort involved in adopting those practices. In this episode she shares the story behind the project, the details of how it is implemented, and how you can use it for your own data projects.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Sonal Goyal about Zingg, an open source entity resolution frame

SQL Server 2022 Revealed: A Hybrid Data Platform Powered by Security, Performance, and Availability

2022-11-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Bob Ward (Azure Data)

Analytics Azure Blockchain Cloud Computing JSON Microsoft Parquet S3 Cyber Security Synapse data data-engineering +2 more

Know how to use the new capabilities and cloud integrations in SQL Server 2022. This book covers the many innovative integrations with the Azure Cloud that make SQL Server 2022 the most cloud-connected edition ever. The book covers cutting-edge features such as the blockchain-based Ledger for creating a tamper-evident record of changes to data over time that you can rely on to be correct and reliable. You'll learn about built-in Query Intelligence capabilities to help you to upgrade with confidence that your applications will perform at least as fast after the upgrade than before. In fact, you'll probably see an increase in performance from the upgrade, with no code changes needed. Also covered are innovations such as contained availability groups and data virtualization with S3 object storage. New cloud integrations covered in this book include Microsoft Azure Purview and the use of Azure SQL for high availability and disaster recovery. The bookcovers Azure Synapse Link with its built-in capabilities to take changes and put them into Synapse automatically. Anyone building their career around SQL Server will want this book for the valuable information it provides on building SQL skills from edge to the cloud. What You Will Learn Know how to use all of the new capabilities and cloud integrations in SQL Server 2022 Connect to Azure for disaster recovery, near real-time analytics, and security Leverage the Ledger to create a tamper-evident record of data changes over time Upgrade from prior releases and achieve faster and more consistent performance with no code changes Access data and storage in different and new formats, such as Parquet and S3, without moving the data and using your existing T-SQL skills Explore new application scenarios using innovations with T-SQL in areassuch as JSON and time series Who This Book Is For SQL Server professionals who want to upgrade their skills to the latest edition of SQL Server; those wishing to take advantage of new integrations with Microsoft Azure Purview (governance), Azure Synapse (analytics), and Azure SQL (HA and DR); and those in need of the increased performance and security offered by Query Intelligence and the new Ledger

dbt Project Evaluator

2022-11-01 · dbt Coalesce 2022 Watch

video

by Grace Goheen (dbt Labs)

Analytics dbt

Since the dawn of time (or at least the last few years), the proserv team has been “dbt_project_evaluator”s. They've written articles, given talks, created training courses, and personally delivered just a truly obscene amount of audits. Up until now, evaluating your own dbt project—even with every aforementioned resource—would be incredibly time consuming. To quote dbt Lab’s SQL style guide, “brain time is expensive.” Enter: dbt_project_evaluator. In this talk, Grace Goheen (dbt Labs) which share how this package enables analytics engineers to follow dbt Labs own best practices by automatically curating a list of improvements, in the dbt-language of “models” and “tests” that they already know and love. By decreasing the “discovery” period, analytics engineers can use their brain time to work on actually implementing the recommended changes.

Check the slides here: https://docs.google.com/presentation/d/1U7CaoSceXumbzlPGqqAQukaz1YdOgT46sb-M6x_LNCw/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

2022-10-30 · Data Engineering Podcast Listen

podcast_episode

by Nandam Karthik (Optimus) , Tobias Macey

Airflow Analytics Analytics Engineering AWS Azure BigQuery CDP CI/CD Cloud Computing Data Analytics Data Engineering Data Lake +16 more

Summary One of the most impactful technologies for data analytics in recent years has been dbt. It’s hard to have a conversation about data engineering or analysis without mentioning it. Despite its widespread adoption there are still rough edges in its workflow that cause friction for data analysts. To help simplify the adoption and management of dbt projects Nandam Karthik helped create Optimus. In this episode he shares his experiences working with organizations to adopt analytics engineering patterns and the ways that Optimus and dbt were combined to let data analysts deliver insights without the roadblocks of complex pipeline management.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Nand

How to leverage dbt Community as the first & ONLY data hire to survive

2022-10-25 · dbt Coalesce 2022 Watch

video

by Fabiyi Opeyemi (Data Culture)

AI/ML Analytics Cloud Computing Data Engineering Data Science dbt Python Snowflake

As data science and machine learning adoption grew over the last few years, Python moved up the ranks catching up to SQL in popularity in the world of data processing. SQL and Python are both powerful on their own, but their value in modern analytics is highest when they work together. This was a key motivator for us at Snowflake to build Snowpark for Python: to help modern analytics, data engineering, and data science teams generate insights without complex infrastructure management for separate languages.

Join this session to learn more about how dbt's new support for Python-based models and Snowpark for Python can help polyglot data teams get more value from their data through secure, efficient and performant metrics stores, feature stores, or data factories in the Data Cloud.

Check the slides here: https://docs.google.com/presentation/d/1xJEyfg81azw2hVilhGZ5BptnAQo8q1L7aDLGrnSYoUM/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

SQL: The video game

2022-10-25 · dbt Coalesce 2022 Watch

video

by Joe Markiewicz

Analytics BigQuery dbt

Do you enjoy waking up in the morning and playing the daily Wordle puzzle? Have you been wishing there was a similar game for you to play, but was built specifically for data folks? Well, you are in luck!

Join Joe Markiewicz (analytics engineer by day, video game maker by night) as he explains his inspiration and how he leveraged dbt and BigQuery to create a new video game aimed at helping experienced analysts keep their SQL skills sharp and data newcomers increase their SQL literacy.

Check the slides here: https://docs.google.com/presentation/d/1C1qUZEcpfBa6oA_CTHGx3GR1WLW3s2adVXnX2BRKVWA/edit

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

dbt Labs and Databricks: best practices and future roadmap

2022-10-25 · dbt Coalesce 2022 Watch

video

by Bilal Aslam (Databricks) , Nana Essuman (Conde Nast)

AI/ML Analytics Cloud Computing Data Lakehouse Databricks dbt Python

The Databricks Lakehouse Platform unifies the best of data warehouses and data lakes in one simple platform to handle all your data, analytics and AI use cases. Databricks now includes complete support for dbt Core and dbt Cloud and you will hear from Conde Nast using dbt and Databricks together to democratize insights. We will also share best practices for developing and productionizing dbt projects containing SQL and Python, governing data with standard SQL, and exciting features on our roadmap such as materialized views for Databricks SQL.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Introducing dbt with Databricks

2022-10-25 · dbt Coalesce 2022 Watch

video

by Roberto Salcido (Databricks) , Prasad Kona (Databricks) , Pradeep Anandapu (Databricks)

Analytics Data Lake Data Lakehouse Databricks dbt Modern Data Stack

In this live, instructor-led hands-on lab, you’ll learn how to build a modern data stack with Databricks and dbt, using dbt to manage data transformations in Databricks and perform exploratory data analysis on the clean data sets using Databricks SQL. Based on the lakehouse architecture and built on an open data lake, data analysts, analytics engineers, and data scientists can use dbt and Databricks to work with the freshest and most complete data, and quickly derive new insights for accurate decision-making.

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Why Metrics Are Even More Valuable Than You Think They Are

2022-10-25 · dbt Coalesce 2022 Watch

video

dbt YAML

Creating / migrating metric metadata to dbt can be a pain, because the level of underlying data knowledge required to create the YAML files properly. You might have found yourself wondering, “is this worth it just to standardize metric definitions?”. This talk will tell you why it is definitely worth it… because the functionality you unlock goes beyond just standard metric definitions. Adopting the dbt standard metric syntax unlocks three additional possibilities for your data:

Automated time-aware metric calculations
Dynamic drill downs and segmentation to empower slice and dice analysis
Self-service dynamic transforms using templated SQL

Check slides here: https://docs.google.com/presentation/d/1nJHP2E6NGZ-KHG4_gNiI6w2lq4kjWgQIanAC9yf3cng

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Workshop: Get more out of your DAG

2022-10-25 · dbt Coalesce 2022 Watch

video

dbt DWH

In this workshop, you’ll learn how to create and document macros that leverage the powerful introspective features of dbt to perform dynamic modeling including: run result storage in your warehouse, dynamic value lookup in models, and leveraging model metadata in macros.

You’ll learn how to: - Create macros to store your dbt run results within your data warehouse - Leverage internal dbt graph data for dynamic modeling - Incorporate dbt best-practices when developing macros

Prerequisites: - Basic familiarity with ANSI SQL - Some familiarity using Jinja and writing macros - Experience with dbt required

Check Notion document here: https://www.notion.so/6382db82046f41599e9ec39afb035bdb

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Announcing dbt's Second Language: When and Why We Turn to Python

2022-10-25 · dbt Coalesce 2022 Watch

video

by Cody Peterson , Jeremy Cohen (dbt Labs) , Leah Antkiewicz

dbt Python

For the first time in dbt, you can now run Python models, making it possible to supplement the accessibility of SQL with a new level of power and flexibility.

When is it useful to use Python, and when should you stick with SQL instead? What might a multilingual dbt project look like in practice, and what could it make possible for your team?

Join Jeremy Cohen, Cody Peterson, and Leah Antkiewicz to explore these questions in this interactive session.

Check the slides here: https://docs.google.com/presentation/d/1e3wB7EQ0EXugGhfCjVCp_dDFEbY_uKyVjMqG1o7alnA/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

dbt Labs + Snowflake: Why SQL and Python go perfectly well together

2022-10-25 · dbt Coalesce 2022 Watch

video

by Torsten Grabs (Snowflake)

AI/ML Analytics Cloud Computing Data Engineering Data Science dbt Python Snowflake

As data science and machine learning adoption grew over the last few years, Python moved up the ranks catching up to SQL in popularity in the world of data processing. SQL and Python are both powerful on their own, but their value in modern analytics is highest when they work together. This was a key motivator for us at Snowflake to build Snowpark for Python: to help modern analytics, data engineering, and data science teams generate insights without complex infrastructure management for separate languages.

Join this session to learn more about how dbt's new support for Python-based models and Snowpark for Python can help polyglot data teams get more value from their data through secure, efficient and performant metrics stores, feature stores, or data factories in the Data Cloud.

Check Notion document here: https://www.notion.so/6382db82046f41599e9ec39afb035bdb

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Getting jiggy with jsonschema: The power of contracts for building data systems

2022-10-25 · dbt Coalesce 2022 Watch

video

by Jake Thomas

JSON Schema

Is your SQL query the problem, or how you ask for the data you need, when you need it. In this deep dive, Jake Thomas shares his hypothesis for why the jsonschema is the ticket to contract-driven communication, system interoperability, and an overall improvement to data processing quality of life.

Check the slides here: https://docs.google.com/presentation/d/1kiGyQF7NUWfx-5RyIyeEwSUCwqtIdrXADeI2iixUgiI/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Streaming with dbt: the Jaffle Shop don’t stop!

2022-10-25 · dbt Coalesce 2022 Watch

video

by Anna Glander (Materialize) , Marta Paes (Materialize)

Analytics dbt Data Streaming postgresql

In between JVM languages, high-maintenance frameworks and academic papers, streaming remains a hard beast to tame for most of us. What if nothing had to change, and streaming just meant…still writing dbt models? At Materialize, we’re exploring how to make the most of dbt for streaming — from real-time analytics to continuous testing, and beyond! Join us to learn how to get started with no blood, sweat or tears, using the Jaffle Shop as a playground. Our toolbox? A database that feels like Postgres but works like all the streaming systems you’ve been avoiding, some SQL and a dash of magic.

Check the slides here: https://docs.google.com/presentation/d/11PANQElVxtzqgzmRCcQfZy24vdMeYDokpxr7LdlrbrE/edit#slide=id.g105b4fffa32_0_942

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Workshop: Build your first dbt Python model

2022-10-25 · dbt Coalesce 2022 Watch

video

by Nicholas Yager (dbt Labs) , Wasila Quader (dbt Labs)

dbt Python

Description: dbt now supports Python models! In this hands-on workshop you’ll learn how to build your first Python models in dbt, alongside SQL at the center of your transformations.

You’ll learn how to: - Build your Python transformation in a notebook - Add this transformation as a model in your dbt project - Decide between building models in SQL or in Python

Prerequisites: - Basic familiarity with Python and DataFrames - If you want to use your own Warehouse and dbt project, make sure that you have dbt 1.3 installed and have followed the “additional setup” from our docs

Check the slides here: https://docs.google.com/presentation/d/133CVwwAxc5qT80ZJwngQ_ZSikOkCttvzWwGpdZCgOHQ/edit#slide=id.g1693e59a4f4_0_0

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

When analysts outnumber engineers 5 to 1: Our journey with dbt at M1

2022-10-25 · dbt Coalesce 2022 Watch

video

by Kelly Wachtel (M1)

Analytics Data Engineering dbt Git Kubernetes Tableau Terraform

How do you train and enable 20 data analysts to use dbt Core in a short amount of time?

At M1, engineering and analytics are far apart on the org chart, but work hand-in-hand every day. M1 engineering has a culture that celebrates open source, where every data engineer is trained and empowered to work all the way down the infrastructure stack, using tools like Terraform and Kubernetes. The analytics team is comprised of strong SQL writers who use Tableau to create visualizations used company wide. When M1 knew they needed a tool like dbt for change management and data documentation generation, they had to figure out how to bridge the gap between engineering and analytics to enable analysts to contribute with minimal engineering intervention. Join Kelly Wachtel, a senior data engineer at M1, explain how they trained about 20 analysts to use git and dbt Core over the past year, and strengthened their collaboration between their data engineering and analytics teams.

Check the slides here: https://docs.google.com/presentation/d/1CWI97EMyLIz6tptLPKt4VuMjJzV_X3oO/edit?usp=sharing&ouid=110293204340061069659&rtpof=true&sd=true

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

SQL Antipatterns, Volume 1

2022-10-24 · O'Reilly SQL Books O'Reilly Amazon

book

by Bill Karwin

Data Modelling Java MySQL Python RDBMS Cyber Security

SQL is the ubiquitous language for software developers working with structured data. Most developers who rely on SQL are experts in their favorite language (such as Java, Python, or Go), but they're not experts in SQL. They often depend on antipatterns - solutions that look right but become increasingly painful to work with as you uncover their hidden costs. Learn to identify and avoid many of these common blunders. Refactor an inherited nightmare into a data model that really works. Updated for the current versions of MySQL and Python, this new edition adds a dozen brand new mini-antipatterns for quick wins. No matter which platform, framework, or language you use, the database is the foundation of your application, and the SQL database language is the standard for working with it. Antipatterns are solutions that look simple at the surface, but soon mire you down with needless work. Learn to identify these traps, and craft better solutions for the often-asked questions in this book. Avoid the mistakes that lead to poor performance and quality, and master the principles that make SQL a powerful and flexible tool for handling data and logic. Dive deep into SQL and database design, and learn to recognize the most common missteps made by software developers in database modeling, SQL query logic, and code design of data-driven applications. See practical examples of misconceptions about SQL that can lure software projects astray. Find the greatest value in each group of data. Understand why an intersection table may be your new best friend. Store passwords securely and don't reinvent the wheel. Handle NULL values like a pro. Defend your web applications against the security weakness of SQL injection. Use SQL the right way - it can save you from headaches and needless work, and let your application really shine! What You Need: The SQL examples use the MySQL 8.0 flavor, but other popular brands of RDBMS are mentioned. Other code examples use Python 3.9+ or Ruby 2.7+.

talk-data.com

Activity Trend

Top Events

Top Speakers

Taking A Look Under The Hood At CreditKarma's Data Platform

Introducing RavenDB: The Database for Modern Data Persistence

SQL Server 2022 Query Performance Tuning: Troubleshoot and Optimize Query Performance

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

SQL Server 2022 Revealed: A Hybrid Data Platform Powered by Security, Performance, and Availability

dbt Project Evaluator

Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

How to leverage dbt Community as the first & ONLY data hire to survive

SQL: The video game

dbt Labs and Databricks: best practices and future roadmap

Introducing dbt with Databricks

Why Metrics Are Even More Valuable Than You Think They Are

Workshop: Get more out of your DAG

Announcing dbt's Second Language: When and Why We Turn to Python

dbt Labs + Snowflake: Why SQL and Python go perfectly well together

Getting jiggy with jsonschema: The power of contracts for building data systems

Streaming with dbt: the Jaffle Shop don’t stop!

Workshop: Build your first dbt Python model

When analysts outnumber engineers 5 to 1: Our journey with dbt at M1

SQL Antipatterns, Volume 1