talk-data.com talk-data.com

Topic

DWH

Data Warehouse

analytics business_intelligence data_storage

568

tagged

Activity Trend

35 peak/qtr
2020-Q1 2026-Q1

Activities

568 activities · Newest first

Data warehouse as a product: Design to delivery - Coalesce 2023

Every day, Trade Me gets 1.5 million new listings and 20 million listing views. With all that data comes the difficulty of managing a complex data ecosystem. This got the Trade Me team thinking: "Which problems are we trying to solve? How can we increase speed to customer value?" Using this framework, the team developed a new mission statement: "To build a data warehouse that analysts love to use." In this session, Trade Me shares exactly how they achieved that vision, with a focus on planning, data operating models, and database architecture.

Speaker: Lance Witheridge, Data Modernisation Lead, Trade Me

Register for Coalesce at https://coalesce.getdbt.com

Central application for all your dbt packages - Coalesce 2023

dbt packages are libraries for dbt. Packages can produce information about best practice for your dbt project (ex: dbt project evaluator) and cloud warehouse cost overviews. Unfortunately, all theses KPIs are stored in your data warehouse and it can be painful and expensive to create data visualization dashboards. This application build automatically dashboards from dbt packages that you are using. You just need to parameter your dbt Cloud API key - that's it! In this session, you'll learn how.

Speaker: Adrien Boutreau, Head of Analytics Engineers , Infinite Lambda

Register for Coalesce at https://coalesce.getdbt.com

Your data warehouse is a success but your repository a mess: get your code on a diet - Coalesce 2023

Over the past four years, the data team at EQT has leveraged dbt and Snowflake to create a myriad of data products across the company. With a rapidly growing organization and increased demands for timely and accurate data, their immense monolithic dbt repository has become challenging to maintain. Learn about the best practices they are adopting to keep the platform in shape and scale with the business.

Speaker: Erik Lehto, Senior analytics engineer, EQT

Register for Coalesce at https://coalesce.getdbt.com

Operationalizing Ramp’s data with dbt and Materialize - Coalesce 2023

Traditional data warehouses excel at churning through terabytes of data for historical analysis. But for real-time, business-critical use cases, traditional data warehouses can’t produce results fast enough—and they still rack up a huge bill in the process.

So when Ramp’s data engineering team needed to serve complex analytics queries on the critical path of their production application, they knew they needed a new tool for the job. Enter Materialize, the first operational data warehouse. Like a traditional data warehouse, Materialize centralizes the data from all of a business’s production systems, from application databases to SaaS tools. But unlike a traditional data warehouse, Materialize enables taking immediate and automatic action when that data changes. Queries that once took hours or minutes to run are up-to-date in Materialize within seconds.

This talk presents how Ramp is unlocking new real-time use cases using Materialize as their operational data warehouse. The best part? The team still uses dbt for data modeling and deployment management, just like they are able to with their traditional batch workloads.

Speakers: Nikhil Benesch, CTO, Materialize; Ryan Delgado, Staff Software Engineer, Data Platform, Ramp

Register for Coalesce at https://coalesce.getdbt.com

Using data pipeline contract to prevent breakage in analytics reporting - Coalesce 2023

It’s 2023, why are software engineers still breaking analytics reporting? We’ve all been there, being alerted by an analyst or C-level stakeholders, saying “this report is broken”, only to spend hours determining that an engineer deleted a column on the source database that is now breaking your pipeline and reporting.

At Xometry, the data engineering team wanted to fix this problem at its root and give the engineering teams a clear and repeatable process that allowed them to be the owners of their own database data. Xometry named the process DPICT (data pipeline contract) and built several internal tools that integrated seamlessly with their developer’s microservice toolsets.

Their software engineers mostly build their database microservices using Postgres, and bring in the data using Fivetran. Using that as the baseline, the team created a set of tools that would allow the engineers to quickly build the staging layer of their database in the data warehouse (Snowflake), but also alert them of the consequences of removing a table or column in downstream reporting.

In this talk, Jisan shares the nuts and bolts of the designed solution and process that allowed the team to onboard 13 different microservices seamlessly, working with multiple domains and dozens of developers. The process also helped software engineers to own their own data and realize their impact. The team has saved hundreds hours of data engineering time and resources not having to chase down what changed upstream to break data. Overall, this process has helped to bring transparency to the whole data ecosystem.

Speaker: Jisan Zaman, Data Engineering Manager, Xometry

Register for Coalesce at https://coalesce.getdbt.com

60 sources and counting: Unlocking microservice integration with dbt and Data Vault - Coalesce 2023

The Guild team migrated to Snowflake and dbt for their data warehousing needs and immediately saw the benefits of standardizing model structure, DRYer logic, data lineage ,and automated testing on Pull Requests.

But leveraging dbt didn’t solve everything. Pain points around maintaining model logic, handling historical data, and integrating data from over 60 source systems meant that analysts still struggled to provide a unified view of the business. The team knew that they needed to level up their processes and modeling again, and chose to adopt Data Vault (DV).

Brandon and Rebecca take you behind the scenes of this decision to explain the benefits of Data Vault. They highlight DV’s ability to handle complex data integration requirements while remaining agile and demonstrate that it complements other modern data concepts like domain-driven design and data mesh.

Attendees learn what Data Vault is, when it can be a key component of a successful data strategy, and instances where it’s not the right fit. Walk away with practical tips to successfully transition based on a real-world implementation.

Guild transformed their data warehouse; you can too!

Speakers: Brandon Taylor, Senior Data Architect, Guild; Rebecca Di Bari Staff Data Engineer , Guild

Register for Coalesce at https://coalesce.getdbt.com

Data and monolith: Scaling a computationally slim 1500+ model beast - Coalesce 2023

Learn how ClickUp uses dbt, dbt packages, and Snowflake to save on storage and compute costs using Slim CI and how they empower a data warehouse centric culture across Sales, Marketing, Product Growth, Finance, and RevOps all while maintaining one monolithic dbt build job.

Speaker: Michael Revelo, Data Platform Lead , ClickUp

Register for Coalesce at https://coalesce.getdbt.com

dbt turbocharge: Boosting performance of your data models - Coalesce 2023

Performance is a crucial factor in delivering timely and accurate data to organizations. However, debugging the performance of dbt models can be a challenge, as most resources available focus on legacy databases or tips for specific data engines that do not translate to modern data platforms.

In this talk, Juan Manuel Perafan focuses on optimizing performance for dbt users, without focusing on any specific data warehouse. He explores the commonalities across most data warehouses and provides practical tips and strategies for improving the performance of dbt models. From query optimization to materialization strategies.

Whether you're new to dbt or a seasoned user, this talk provides valuable insights and best practices for improving the performance of your dbt models.

Speaker: Juan Manuel Perafan, Analytics Engineer, Xebia

Register for Coalesce at https://coalesce.getdbt.com

Business process occurrence, volume, and duration modeling using dbt Cloud - Coalesce 2023

Business processes are the foundation of any organization, directing entities towards achieving specific outcomes. These processes can be simple or complex and may take days or even months to complete. Insights into business processes can be determined through three categories: occurrence, volume, and velocity.

In this presentation, Routable’s Director of Data & Analytics discusses the technical and process complexities involved in creating data models in a data warehouse using dbt Cloud. The session also provides tips to make the process easier and explains how to expose this data to users using Looker.

Speaker: Jason Hodson, Director, Data & Analytics, Routable

Register for Coalesce at https://coalesce.getdbt.com

Scaling dbt and BigQuery to infinity and beyond - Coalesce 2023

Bluecore works with the largest retail brands around the world to engage shoppers and keep them coming back. In this talk, you’ll learn how the team at Bluecore went about creating, scaling, and maturing an analytics data warehouse in BigQuery to orchestrate 10,000+ models every 30 minutes without bankrupting the company.

Speakers: Adam Whitaker, Analytics Lead, bluecore; Nicole Dallar-Malburg, Analytics Engineer, Bluecore

Register for Coalecse at https://coalesce.getdbt.com/

Amazon Redshift: The Definitive Guide

Amazon Redshift powers analytic cloud data warehouses worldwide, from startups to some of the largest enterprise data warehouses available today. This practical guide thoroughly examines this managed service and demonstrates how you can use it to extract value from your data immediately, rather than go through the heavy lifting required to run a typical data warehouse. Analytic specialists Rajesh Francis, Rajiv Gupta, and Milind Oke detail Amazon Redshift's underlying mechanisms and options to help you explore out-of-the box automation. Whether you're a data engineer who wants to learn the art of the possible or a DBA looking to take advantage of machine learning-based auto-tuning, this book helps you get the most value from Amazon Redshift. By understanding Amazon Redshift features, you'll achieve excellent analytic performance at the best price, with the least effort. This book helps you: Build a cloud data strategy around Amazon Redshift as foundational data warehouse Get started with Amazon Redshift with simple-to-use data models and design best practices Understand how and when to use Redshift Serverless and Redshift provisioned clusters Take advantage of auto-tuning options inherent in Amazon Redshift and understand manual tuning options Transform your data platform for predictive analytics using Redshift ML and break silos using data sharing Learn best practices for security, monitoring, resilience, and disaster recovery Leverage Amazon Redshift integration with other AWS services to unlock additional value

Many large organisations have the data to pull off sophisticated marketing strategies, but only if they avoid the common pitfalls that limit the potential. In this episode I interview Tejas Manohar on the huge – and typically unexploited – potential for data-driven marketing and personalisation. Tejas is co-founder and co-CEO of Hightouch. Hightouch is a reverse ETL platform that helps organisations synch their data warehouses with business facing tools and technology. Their products are used by big name corporations like Warner Music, Chime, Spotify, NBA, and PetSmart. In this wide-ranging conversation Tejas and I discuss: What a reverse ETL platform is and why we need itWhy Tejas is bullish on turning data warehouses into marketing enginesThe key steps marketers should take to implement personalization effectively using existing company data and platformsThe pitfalls and common mistakes businesses make in data-driven personalisation and how to avoid these, and much more.Tejas on LinkedIn: https://www.linkedin.com/in/tejasmanohar/ Tejas on Twitter (or is it X?): https://twitter.com/tejasmanohar

Learning and Operating Presto

The Presto community has mushroomed since its origins at Facebook in 2012. But ramping up this open source distributed SQL query engine can be challenging even for the most experienced engineers. With this practical book, data engineers and architects, platform engineers, cloud engineers, and software engineers will learn how to use Presto operations at your organization to derive insights on datasets wherever they reside. Authors Angelica Lo Duca, Tim Meehan, Vivek Bharathan, and Ying Su explain what Presto is, where it came from, and how it differs from other data warehousing solutions. You'll discover why Facebook, Uber, Alibaba Cloud, Hewlett Packard Enterprise, IBM, Intel, and many more use Presto and how you can quickly deploy Presto in production. With this book, you will: Learn how to install and configure Presto Use Presto with business intelligence tools Understand how to connect Presto to a variety of data sources Extend Presto for real-time business insight Learn how to apply best practices and tuning Get troubleshooting tips for logs, error messages, and more Explore Presto's architectural concepts and usage patterns Understand Presto security and administration

Data Warehousing using Fivetran, dbt and DBSQL

In this video you will learn how to use Fivetran to ingest data from Salesforce into your Lakehouse. After the data has been ingested, you will then learn how you can transform your data using dbt. Then we will use Databricks SQL to query, visualize and govern your data. Lastly, we will show you how you can use AI functions in Databricks SQL to call language learning models.

Read more about Databricks SQL https://docs.databricks.com/en/sql/index.html#what-is-databricks-sql

Internet-Scale Analytics: Migrating a Mission Critical Product to the Cloud

While we may not all agree on a “If it ain’t broke, don’t fix it” approach, we can all agree that “If it shows any crack, migrate it to the cloud and completely re-architect it.” Akamai’s CSI (Cloud Security Intelligence) group is responsible for processing massive amounts of security events arriving from our edge network, which is estimated to process 30% of internet traffic, making it accessible by various internal consumers powering customer-facing products.

In this session, we will visit the reasons for migrating one of our mission critical security products and its 10GB ingest pipeline to the cloud, examine our new architecture and its benefits and touch on the challenges we faced during the process (and still do). While our requirements are unique and our solution contains a few proprietary components, this session will provide you with several concepts involving popular off-the-shelf products you can easily use in your own cloud environment.

Talk by: Yaniv Kunda

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

If a Duck Quacks in the Forest and Everyone Hears, Should You Care?

YES! "Duck posting" has become an internet meme for praising DuckDB on Twitter. Nearly every quack using DuckDB has done it once or twice. But, why all the fuss? With advances in CPUs, memory, SSDs, and the software that enables it all, our personal machines are powerful beasts relegated to handling a few Chrome tabs and sitting 90% idle. As data engineers and data analysts, this seems like a waste that's not only expensive, but also impacting the environment.

In this session, you will see how DuckDB brings SQL analytics capabilities to a 2MB standalone executable on your laptop that only recently required a large cluster. This session will explain the architecture of DuckDB that enables high performance analytics on a laptop: great query optimization, vectorized execution, continuous improvements in compression and more. We will show its capabilities using live demos, from the pandas library to WASM, to the command-line. We'll demonstrate performance on large datasets, and talk about how we're exploring using the laptop to augment cloud analytics workloads.

Talk by: Ryan Boyd

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Using Lakehouse to Fight Cancer:Ontada’s Journey to Establish a RWD Platform on Databricks Lakehouse

Ontada, a McKesson business, is an oncology real-world data and evidence, clinical education and provider of technology business dedicated to transforming the fight against cancer. Core to Ontada’s mission is using real-world data (RWD) and evidence generation to improve patient health outcomes and to accelerate life science research.

To support its mission, Ontada embarked on a journey to migrate its enterprise data warehouse (EDW) from an on-premise Oracle database to Databricks Lakehouse. This move allows Ontada to now consume data from any source, including structured and unstructured data from its own EHR and genomics lab results, and realize faster time to insight. In addition, using the Lakehouse has helped Ontada eliminate data silos, enabling the organization to realize the full potential of RWD – from running traditional descriptive analytics to extracting biomarkers from unstructured data. The session will cover the following topics:

  • Oracle to Databricks: migration best practices and lessons learned
  • People, process, and tools: expediting innovation while protecting patient information using Unity Catalog
  • Getting the most out of the Databricks Lakehouse: from BI to genomics, running all analytics under one platform
  • Hyperscale biomarker abstraction: reducing the manual effort needed to extract biomarkers from large unstructured data (medical notes, scanned/faxed documents) using spaCY and John Snow Lab NLP libraries

Join this session to hear how Ontada is transforming RWD to deliver safe and effective cancer treatment.

Talk by: Donghwa Kim

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Data Democratization at Michelin

Too often business decisions in large organizations are based on time consuming and labor-intensive data extracts, fragile Excel or access sheets that require significant manual intervention. The teams that prepare these manual reports have invaluable heuristic knowledge that, when combined with meaningful data and tools, can make smart business decisions. Imagine a world where these business teams are empowered with tools that help them build meaningful reports despite their limited technical expertise.

In this session, we will discuss: - The value derived from investing in developing citizen data personas within a business organization - How we successfully built a citizen data analytics culture within Michelin - Real examples of the impact of this initiative on the business and on the people themselves

The audience will walk away with some convincing arguments for building a citizen data culture in their organization and a how-to cookbook that they can use to cultivate citizen data personas. Finally, they can interactively uncover key success factors in the case of Michelin that can help drive a similar initiative in their respective companies.

Talk by: Philippe Leonhart and Fabien Cochet

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?

Rust is a unique language whose traits make it very appealing for data engineering. In this session, we'll walk through the different aspects of the language that make it such a good fit for big data processing including: how it improves performance and how it provides greater safety guarantees and compatibility with a wide range of existing tools that make it well positioned to become a major building block for the future of analytics.

We will also take a hands-on look through real code examples at a few emerging technologies built on top of Rust that utilize these capabilities, and learn how to apply them to our modern lakehouse architecture.

Talk by: Oz Katz

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Making Travel More Accessible for Customers Bringing Mobility Devices

American Airlines takes great pride in caring for customers travel, and recognize the importance of supporting the dignity and independence of everyone who travels with us. As we work to improve the customer experience, we're committed to making our airline more accessible to everyone. Our work to ensure that travel that is accessible to all is well underway. We have been particularly focused on making the journey smoother for customers who rely on wheelchairs or other mobility devices. We have implemented the use of a bag tag specifically for wheelchairs and scooters that gives team members more information, like the mobility device’s weight and battery type, or whether it needs to be returned to a customer before a connecting flight.

As a data engineering and analytics team, we at American Airlines are building a passenger service request data product that will provide timely insights on expected mobility device traffic at each airport so that the front-line team members can provide seamless travel experience to the passengers.

Talk by: Teja Tangeda and Madhan Venkatesan

Here’s more to explore: Why the Data Lakehouse Is Your next Data Warehouse: https://dbricks.co/3Pt5unq Lakehouse Fundamentals Training: https://dbricks.co/44ancQs

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc