talk-data.com talk-data.com

Topic

Cloud Computing

infrastructure saas iaas

4055

tagged

Activity Trend

471 peak/qtr
2020-Q1 2026-Q1

Activities

4055 activities · Newest first

No compromises: Analytics engineering on the Lakehouse - Coalesce 2023

The Lakehouse architecture unites the structured world of analytics with the rapidly evolving world of AI.

But what really makes something a “Lakehouse” and why should you care? In this session, Databricks discusses the key components of a lakehouse, what you should look for when adapting this paradigm and how Databricks and dbt are together enabling analytics engineers, analysts and data scientists to collaborate together on a single unified platform. You’ll hear from a customer about how they leveraged dbt Cloud on Databricks to deliver powerful customer experiences quickly and efficiently. Ken Wong shares the latest capabilities of Databricks platform and provides a sneak peek of upcoming features.

Speakers: Ken Wong, Senior Director of Product Management, Databricks; Samuel Garfield, Analytics Engineer, Retool

Register for Coalesce at https://coalesce.getdbt.com

Leveraging dbt Cloud to transform loan warehousing - Coalesce 2023

Learn how dv01 uses dbt Cloud and BigQuery to create a scalable and modern data pipeline for offerings in loan warehousing analytics. These products serve an esoteric niche of finance and are run by a team of financial analysts with deep industry expertise.

With the challenge of tracking the performance of millions of loans from various sources and file structures, the team initially relied on Excel-based workflows. However, as the client base grew, they needed a reliable solution: a scalable data pipeline with dbt Cloud and BigQuery that allows the team to scale into a growing market and provide innovative new products and services.

Explore the transformative power of dbt Cloud in modernizing unscalable data processes, fostering skill development, and driving success in the specialized world of loan warehousing finance.

Speaker: David Maguire, Data Engineer, dv01

Register for Coalesce at https://coalesce.getdbt.com

10x-ing developer experience with Databricks, Delta, and dbt Cloud - Coalesce 2023

In this session, gain strategic guidance on how to deploy dbt Cloud seamlessly to a team of 5-85 people. You'll learn best practices across development and automation that will ensure stability and high standards as you scale the number of developers using dbt Cloud and the number of models built up to the low thousands.

This session is a great fit for folks with beginner through intermediate levels of experience with dbt. In basketball terms, this talk covers mid-range shooting skills, but does not go into detail about 3-pointers, let alone half court shots. Likewise, this talk is not for people who are brand new to dbt and aren't familiar with the basic architecture of dbt and the modern data stack.

Speakers: Chris Davis, Senior Staff Engineer, Udemy, Inc.

Register for Coalesce at https://coalesce.getdbt.com

Enabling a complete campaign 360 with dbt Cloud - Coalesce 2023

In the dbt Labs on dbt series, you get a behind-the-scenes look at how dbt Labs uses data. You’ll learn how dbt Labs thinks about the role of data, how data developers collaborate with business leaders, and the technical decisions we’ve made in our own dbt project.

In this session, Brandon Thomson, Analytics Lead at dbt Labs, digs deeper into the technical details of the Campaign 360, a powerful marketing analytics asset used by every member of the marketing team. You'll learn about the technical decisions made during the build of this product and explore the finished asset.

Join this session to learn about dbt Labs' journey and leave with ideas that you can implement in your dbt project today.

Speaker: Brandon Thomson, Analytics Lead, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com

How dbt Labs tunes model performance and optimizes cloud data platform costs - Coalesce 2023

In the dbt Labs on dbt series, you get a behind-the-scenes look at how dbt Labs uses data. You’ll learn how dbt Labs thinks about the role of data, how data developers collaborate with business leaders, and the technical decisions we’ve made in our own dbt project.

In this session, Elize Papineau, Senior Data Engineer at dbt Labs, digs deeper into the technical details of the cost optimization project at dbt Labs. You'll learn how the team leveraged query tags in dbt to make model performance monitoring possible, the process for analyzing model performance, the implementation of warehouse specific configurations at the model level, and how the team measures the effectiveness of optimizations and translates it into cost savings.

Watch to learn about dbt Labs' journey and leave with ideas that you can implement in your dbt project today.

Speaker: Elize Papineau, Sr. Data Engineer, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com

Leveraging dbt Cloud for a distributed domain-driven development environment - Coalesce 2023

This session addresses the problem of how to leverage dbt Cloud to support domain user development for a migration from a centralized analytics environment towards a distributed data mesh analytics environment.

Speaker: Holly Burch, Data Architect, Sharp HealthCare

Register for Coalesce at https://coalesce.getdbt.com

The new dbt Cloud development experience - Coalesce 2023

In this session, Jeremy Cohen, product manager at dbt Labs, does an in-depth walk-through of the new dbt Cloud releases shared on-stage during the Keynote & Product Spotlight. This session focuses on changes to the dbt Cloud IDE and new ways to develop with dbt.

Wherever you write code, see the future of dbt development in action.

Speakers: Jeremy Cohen, Product Manager, dbt Labs; Greg McKeon, Product Manager, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com

Enhancing the developer experience with the power of Snowflake and dbt - Coalesce 2023

In the rapidly evolving landscape of data technology, the integration of Snowflake and dbt has revolutionized the creation and management of data applications. Now, developers can harness their combined capabilities to build superior, scalable, and sophisticated data applications.

With Snowflake’s cloud-based architecture, developers can access boundless storage, computing, and seamless data sharing. Additionally, Snowpark Python enables the performance of data transformation, analytics, and algorithmic functions within Snowflake, presenting developers with a new realm of opportunities. Incorporating dbt further enhances the synergy, allowing developers to streamline data workflows in an agile, model-driven environment.

This session covers how the Snowflake and dbt partnership can pave the way toward building better, future-proof data applications that cater to the dynamic needs of businesses in the digital era.

Speaker: Tarik Dwiek, Head of Technology and Application Partners, Snowflake

Register for Coalesce at https://coalesce.getdbt.com

Designing a Modern Application Data Stack

Today's massive datasets represent an unprecedented opportunity for organizations to build data-intensive applications. With this report, product leads, architects, and others who deal with applications and application development will explore why a cloud data platform is a great fit for data-intensive applications. You'll learn how to carefully consider scalability, data processing, and application distribution when making data app design decisions. Cloud data platforms are the modern infrastructure choice for data applications, as they offer improved scalability, elasticity, and cost efficiency. With a better understanding of data-intensive application architectures on cloud-based data platforms and the best practices outlined in this report, application teams can take full advantage of advances in data processing and app distribution to accelerate development, deployment, and adoption cycles. With this insightful report, you will: Learn why a modern cloud data platform is essential for building data-intensive applications Explore how scalability, data processing, and distribution models are key for today's data apps Implement best practices to improve application scalability and simplify data processing for efficiency gains Modernize application distribution plans to meet the needs of app providers and consumers About the authors: Adam Morton works with Intelligen Group, a Snowflake pure-play data and analytics consultancy. Kevin McGinley is technical director of the Snowflake customer acceleration team. Brad Culberson is a data platform architect specializing in data applications at Snowflake.

On the benefits and virtues of drilling pilot holes - Coalesce 2023

A significant proportion of dbt Cloud users do not have a dbt CI job set up. Among those who do, many don’t leverage powerful functionality like state comparison and deferral to implement Slim CI, likely causing teams to miss errors and building unnecessary tables. Setting up Slim CI in dbt Cloud can be especially challenging for larger-scale data organizations who have multiple data environments, git branches, and targets. Watch this session to learn how you can build and evolve a strong, lasting data environment using Slim CI.

Speakers: Leo Folsom, Solutions Engineer, Datafold

Register for Coalesce at https://coalesce.getdbt.com

dbt migration and audit strategy - Coalesce 2023

In this session, phData talks through their experiences dealing with migrations: how to plan for one, how to handle refactoring code, and the difficulty of auditing. This session is especially helpful for organizations that are looking to transition to dbt Cloud, as it provides helpful tips for laying the foundations for a new system and doing the migration right.

Speakers: Dakota Kelley, Solution Architect, phData; Chris Johnson, Solutions Architect, phData

Register for Coalesce at https://coalesce.getdbt.com

Business process occurrence, volume, and duration modeling using dbt Cloud - Coalesce 2023

Business processes are the foundation of any organization, directing entities towards achieving specific outcomes. These processes can be simple or complex and may take days or even months to complete. Insights into business processes can be determined through three categories: occurrence, volume, and velocity.

In this presentation, Routable’s Director of Data & Analytics discusses the technical and process complexities involved in creating data models in a data warehouse using dbt Cloud. The session also provides tips to make the process easier and explains how to expose this data to users using Looker.

Speaker: Jason Hodson, Director, Data & Analytics, Routable

Register for Coalesce at https://coalesce.getdbt.com

How Rebtel increased data product value: A migration story - Coalesce 2023

In this session, you'll learn about Rebtel's migration journey from a legacy architecture to the modern data stack. Due to the challenges of Rebtel's stack, the data product value was decreasing in the company, it was time to migrate. Learn how the team is using dbt Cloud and Snowflake to achieve greater success in delivering value to the business. You'll leave with a richer understanding of how to plan and execute a legacy migration.

Speaker: Quentin Coviaux, Data Engineer, Rebtel

Register for Coalesce at https://coalesce.getdbt.com

Becoming the exponential enterprise with analytics engineering and the Data Cloud - Coalesce 2023

Join Snowflake & Deloitte as they discuss how organizations can become exponential enterprise ready through the power of the Snowflake Data Cloud and dbt Cloud's ability to write, test, and ship reliable data in quick time. This session updates you on what your organization needs to do to become exponential enterprise ready. This session also shares examples of organizations that have already made the successful transformation and why they are winning in the market with dbt, Deloitte and Snowflake.

Speakers: Mathew Zele, Cloud & ISV Lead, Snowflake; Vivek Pradhan Lead Partner - Data and AI Platforms , Deloitte; Sagar Kulkarni Partner Sales Engineer, Snowflake

Register for Coalesce at https://coalesce.getdbt.com/

Domesticating a feral cat data stack - Coalesce 2023

Lauren Benezra has been volunteering with a local cat rescue since 2018. She recently took on the challenge of rebuilding their data stack from scratch, replacing a Jenga tower of incomprehensible Google Sheets with a more reliable system backed by the Modern Data Stack. By using Airtable, Airbyte, BigQuery, dbt Cloud and Census, her role as Foster Coordinator has transformed: instead of digging for buried information while wrangling cats, she now serves up accurate data with ease while... well... wrangling cats.

Viewers will learn that it's possible to run an extremely scalable and reliable stack on a shoestring budget, and will come away with actionable steps to put Lauren's hard-won lessons into practice in their own volunteering projects or as the first data hire in a tiny startup.

Speakers: Lauren Benezra, Senior Analytics Engineer, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com/

Summary

The primary application of data has moved beyond analytics. With the broader audience comes the need to present data in a more approachable format. This has led to the broad adoption of data products being the delivery mechanism for information. In this episode Ranjith Raghunath shares his thoughts on how to build a strategy for the development, delivery, and evolution of data products.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Your host is Tobias Macey and today I'm interviewing Ranjith Raghunath about tactical elements of a data product strategy

Interview

Introduction How did you get involved in the area of data management? Can you describe what is encompassed by the idea of a data product strategy?

Which roles in an organization need to be involved in the planning and implementation of that strategy?

order of operations:

strategy -> platform design -> implementation/adoption platform implementation -> product strategy -> interface development

managing grain of data in products team organization to support product development/deployment customer communications - what questions to ask? requirements gathering, helping to understand "the art of the possible" What are the most interesting, innovative, or unexpected ways that you have seen organizations approach data product strategies? What are the most interesting, unexpected, or challenging lessons that you have learned while working on

Summary

Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable

Interview

Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it?

What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction?

What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data?

How have you worked to address that in the Decodable platform and interfaces?

As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable?

Contact Info

esammer on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Decodable

Podcast Episode

Understanding the Apache Flink Journey Flink

Podcast Episode

Debezium

Podcast Episode

Kafka Redpanda

Podcast Episode

Kinesis PostgreSQL

Podcast Episode

Snowflake

Podcast Episode

Databricks Startree Pinot

Podcast Episode

Rockset

Podcast Episode

Druid InfluxDB Samza Storm Pulsar

Podcast Episode

ksqlDB

Podcast Episode

dbt GitHub Actions Airbyte Singer Splunk Outbox Pattern

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Neo4J: NODES Conference Logo

NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation)

Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to Neo4j.com/NODES today to see the full agenda and register!Rudderstack: Rudderstack

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize: Materialize

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Datafold: Datafold

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare…

Architecting Data and Machine Learning Platforms

All cloud architects need to know how to build data platforms that enable businesses to make data-driven decisions and deliver enterprise-wide intelligence in a fast and efficient way. This handbook shows you how to design, build, and modernize cloud native data and machine learning platforms using AWS, Azure, Google Cloud, and multicloud tools like Snowflake and Databricks. Authors Marco Tranquillin, Valliappa Lakshmanan, and Firat Tekiner cover the entire data lifecycle from ingestion to activation in a cloud environment using real-world enterprise architectures. You'll learn how to transform, secure, and modernize familiar solutions like data warehouses and data lakes, and you'll be able to leverage recent AI/ML patterns to get accurate and quicker insights to drive competitive advantage. You'll learn how to: Design a modern and secure cloud native or hybrid data analytics and machine learning platform Accelerate data-led innovation by consolidating enterprise data in a governed, scalable, and resilient data platform Democratize access to enterprise data and govern how business teams extract insights and build AI/ML capabilities Enable your business to make decisions in real time using streaming pipelines Build an MLOps platform to move to a predictive and prescriptive analytics approach