talk-data.com talk-data.com

Topic

Snowflake

data_warehouse cloud analytics olap

550

tagged

Activity Trend

193 peak/qtr
2020-Q1 2026-Q1

Activities

550 activities · Newest first

Master Schema Translations in the Era of Open Data Lake

Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation

How to Build an Open Lakehouse: Best Practices for Interoperability

Building an open data lakehouse? Start with the right blueprint. This session walks through common reference architectures for interoperable lakehouse deployments across AWS, Google Cloud, Azure and tools like Snowflake, BigQuery and Microsoft Fabric. Learn how to design for cross-platform data access, unify governance with Unity Catalog and ensure your stack is future-ready — no matter where your data lives.

Sponsored by: Onehouse | Open By Default, Fast By Design: One Lakehouse That Scales From BI to AI

You already see the value of the lakehouse. But are you truly maximizing its potential across all workloads, from BI to AI? In this session, Onehouse unveils how our open lakehouse architecture unifies your entire stack, enabling true interoperability across formats, catalogs, and engines. From lightning-fast ingestion at scale to cost-efficient processing and multi-catalog sync, Onehouse helps you go beyond trade-offs. Discover how Apache XTable (Incubating) enables cross-table-format compatibility, how OpenEngines puts your data in front of the best engine for the job, and how OneSync keeps data consistent across Snowflake, Athena, Redshift, BigQuery, and more. Meanwhile, our purpose-built lakehouse runtime slashes ingest and ETL costs. Whether you’re delivering BI, scaling AI, or building the next big thing, you need a lakehouse that’s open and powerful. Onehouse opens everything—so your data can power anything.

Apache Iceberg with Unity Catalog at HelloFresh

Table formats like Delta Lake and Iceberg have been game changers for pushing lakehouse architecture into modern Enterprises. The acquisition of Tabular added Iceberg to the Databricks ecosystem, an open format that was already well supported by processing engines across the industry. At HelloFresh we are building a lakehouse architecture that integrates many touchpoints and technologies all across the organization. As such we chose Iceberg as the table format to bridge the gaps in our decentralized managed tech landscape. We are leveraging Unity Catalog as the Iceberg REST catalog of choice for storing metadata and managing tables. In this talk we will outline our architectural setup between Databricks, Spark, Flink and Snowflake and will explain the native Unity Iceberg REST catalog, as well as catalog federation towards connected engines. We will highlight the impact on our business and discuss the advantages and lessons learned from our early adopter experience.

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency

Bayada is transforming its data ecosystem by consolidating Matillion+Snowflake and SSIS+SQL Server into a unified Enterprise Data Platform powered by Databricks. Using Databricks' Medallion architecture, this platform enables seamless data integration, advanced analytics and machine learning across critical domains like general ledger, recruitment and activity-based costing. Databricks was selected for its scalability, real-time analytics and ability to handle both structured and unstructured data, positioning Bayada for future growth. The migration aims to reduce data processing times by 35%, improve reporting accuracy and cut reconciliation efforts by 40%. Operational costs are projected to decrease by 20%, while real-time analytics is expected to boost efficiency by 15%. Join this session to learn how Bayada is leveraging Databricks to build a high-performance data platform that accelerates insights, drives efficiency and fosters innovation organization-wide.

Unify Your Data and Governance With Lakehouse Federation

In today's data landscape, organizations often grapple with fragmented data spread across various databases, data warehouses and catalogs. Lakehouse Federation addresses this challenge by enabling seamless discovery, querying, and governance of distributed data without the need for duplication or migration. This session will explore how Lakehouse Federation integrates external data sources like Hive Metastore, Snowflake, SQL Server and more into a unified interface, providing consistent access controls, lineage tracking and auditing across your entire data estate. Learn how to streamline analytics and AI workloads, enhance compliance and reduce operational complexity by leveraging a single, cohesive platform for all your data needs.

Breaking Silos: Enabling Databricks-Snowflake Interoperability With Iceberg and Unity Catalog

As data ecosystems grow more complex, organizations often struggle with siloed platforms and fragmented governance. In this session, we’ll explore how our team made Databricks the central hub for cross-platform interoperability, enabling seamless Snowflake integration through Unity Catalog and the Iceberg REST API. We’ll cover: Why interoperability matters and the business drivers behind our approach How Unity Catalog and Uniform simplify interoperability, allowing Databricks to expose an Iceberg REST API for external consumption Technical deep dive into data sharing, query performance, and access control across Databricks and Snowflake Lessons learned and best practices for building a multi-engine architecture while maintaining governance and efficiency By leveraging Uniform, Delta, and Iceberg, we created a flexible, vendor-agnostic architecture that bridges Databricks and Snowflake without compromising performance or security.

You shouldn’t have to sacrifice data governance just to leverage the tools your business needs. In this session, we will give practical tips on how you can cut through the data sprawl and get a unified view of your data estate in Unity Catalog without disrupting existing workloads. We will walk through how to set up federation with Glue, Hive Metastore, and other catalogs like Snowflake, and show you how powerful new tools help you adopt Databricks at your own pace with no downtime and full interoperability.

Wrapping up the week at Snowflake Summit. As always, the big platform ate away at their partners. If you're a partner, what can you do to shield yourself from platform cannibalization? In this episode, I give some advice from what I've seen in the data ecosystem over the years.

This session will share proven enterprise architecture best practices for augmenting Snowflake with data virtualization to deliver real-time insights. We'll explore how to address latency-sensitive use cases—such as month-end financial reconciliations—while ensuring data security and supporting cloud migration using Denodo. Attendees will learn how the combination of Snowflake and Denodo enables scalable, low-latency analytics across highly customized and distributed data environments.

From Days to Minutes: Automating Sales Commission Accuracy at phData | The Data Apps Conference

Managing sales commissions can be complex, especially as teams scale and compensation structures grow more intricate. phData previously relied on spreadsheets to handle quotas, accelerators, SPIFF bonuses, and deal splits—but as the sales team expanded, manual tracking became inefficient, error-prone, and difficult to validate.

In this session, the phData team will demonstrate how they built a comprehensive commission management system using Sigma Data Apps to:

Handle complex commission structures including quota attainment, accelerator gates, and SPIFF bonuses Provide real-time commission visibility that gamifies seller performance Enable flexible deal attribution and splits through input tables Support rapid validation and refinement before finalizing configurations in Salesforce and Snowflake Watch the demo to see how phData replaced spreadsheet chaos with a powerful commission tracking system that not only streamlines operations but also drives sales momentum.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

Scaling Talent & Compensation Planning: A DoorDash Story | The Data Apps Conference

Managing performance reviews, calibrations, and compensation adjustments across thousands of employees at DoorDash was becoming increasingly complex—especially after the Wolt acquisition 2x the employee base. Teams struggled with spreadsheet chaos, security risks, and inefficient manual processes.

In this session, Ashwin Murugappan (People Applications & Intelligence Engineer) will share how DoorDash built the Cycle Management Hub using Sigma Data Apps to:

Eliminate spreadsheet versioning issues with real-time, governed collaboration Improve efficiency and accuracy by integrating directly with Workday & Snowflake Enhance security & compliance with role-based access controls (RLS) Watch the demo and learn how Sigma’s input tables, write-back capabilities, and real-time data processing helped DoorDash modernize its HR data workflows at scale.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

How WHOOP Scales AI-Powered Customer Support with Snowflake and Sigma Technology | Data Apps

Managing customer interactions across multiple disconnected platforms creates inefficiencies and delays in resolving support tickets. At WHOOP, support agents had to manually navigate through siloed data across payments, ERP, and ticketing systems, slowing down response times and impacting customer satisfaction.In this session, Matt Luizzi (Director of Business Analytics, WHOOP) and Brendan Farley (Sales Engineer, Snowflake) will showcase how WHOOP:

Consolidated fragmented data from multiple systems into a unified customer support app. Enabled real-time access to customer history, allowing agents to quickly surface relevant insights. Eliminated the need for custom engineering by leveraging Sigma’s no-code interface to build interactive workflows. Accelerated ticket resolution by allowing support teams to take action directly within Sigma, reducing dependency on multiple SaaS tools. Improved forecasting and decision-making by implementing AI-powered analytics on top of Snowflake. Before Sigma, getting a full view of customer issues required navigating across multiple tools—now, WHOOP’s customer support team can access, analyze, and act on real-time data in a single interface. Join us for an inside look at how WHOOP and Snowflake partnered to build a modern customer support data app that enhances efficiency and customer experience.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

Customer 360: Unlocking Actionable Insights with AI-Powered Customer Intelligence | Data Apps

As companies scale, retaining and accessing institutional knowledge becomes increasingly challenging. Customer Success teams often navigate multiple platforms to piece together customer histories, making it difficult to maintain continuity and provide efficient service across account transitions.

In this session, Curtis de Castro will demonstrate how Sigma:

Built an AI-powered repository that consolidates all customer interactions into a single, searchable platform Enabled real-time filtering and analysis of customer interactions across chat, email, and tickets Implemented AI-driven features for sentiment analysis, meeting agenda generation, and churn risk detection Developed a scalable solution that maintains data security by leveraging Snowflake Cortex Designed an intuitive interface that makes advanced insights accessible without SQL expertise Previously, deep customer analysis took hours—sometimes days. Now, AI surfaces key insights in minutes, enabling teams to focus on action instead of searching for data. Join this session for a demo of how Sigma built an enterprise-grade AI-powered data app to modernize customer intelligence, while maintaining enterprise-grade security and governance.

➡️ Learn more about Data Apps: https://www.sigmacomputing.com/product/data-applications?utm_source=youtube&utm_medium=organic&utm_campaign=data_apps_conference&utm_content=pp_data_apps


➡️ Sign up for your free trial: https://www.sigmacomputing.com/go/free-trial?utm_source=youtube&utm_medium=video&utm_campaign=free_trial&utm_content=free_trial

sigma #sigmacomputing #dataanalytics #dataanalysis #businessintelligence #cloudcomputing #clouddata #datacloud #datastructures #datadriven #datadrivendecisionmaking #datadriveninsights #businessdecisions #datadrivendecisions #embeddedanalytics #cloudcomputing #SigmaAI #AI #AIdataanalytics #AIdataanalysis #GPT #dataprivacy #python #dataintelligence #moderndataarchitecture

Summary In this episode of the Data Engineering Podcast, host Tobias Macy welcomes back Shinji Kim to discuss the evolving role of semantic layers in the era of AI. As they explore the challenges of managing vast data ecosystems and providing context to data users, they delve into the significance of semantic layers for AI applications. They dive into the nuances of semantic modeling, the impact of AI on data accessibility, and the importance of business logic in semantic models. Shinji shares her insights on how SelectStar is helping teams navigate these complexities, and together they cover the future of semantic modeling as a native construct in data systems. Join them for an in-depth conversation on the evolving landscape of data engineering and its intersection with AI.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Your host is Tobias Macey and today I'm interviewing Shinji Kim about the role of semantic layers in the era of AIInterview IntroductionHow did you get involved in the area of data management?Semantic modeling gained a lot of attention ~4-5 years ago in the context of the "modern data stack". What is your motivation for revisiting that topic today?There are several overlapping concepts – "semantic layer," "metrics layer," "headless BI." How do you define these terms, and what are the key distinctions and overlaps?Do you see these concepts converging, or do they serve distinct long-term purposes?Data warehousing and business intelligence have been around for decades now. What new value does semantic modeling beyond practices like star schemas, OLAP cubes, etc.?What benefits does a semantic model provide when integrating your data platform into AI use cases?How is it different between using AI as an interface to your analytical use cases vs. powering customer facing AI applications with your data?Putting in the effort to create and maintain a set of semantic models is non-zero. What role can LLMs play in helping to propose and construct those models?For teams who have already invested in building this capability, what additional context and metadata is necessary to provide guidance to LLMs when working with their models?What's the most effective way to create a semantic layer without turning it into a massive project? There are several technologies available for building and serving these models. What are the selection criteria that you recommend for teams who are starting down this path?What are the most interesting, innovative, or unexpected ways that you have seen semantic models used?What are the most interesting, unexpected, or challenging lessons that you have learned while working with semantic modeling?When is semantic modeling the wrong choice?What do you predict for the future of semantic modeling?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links SelectStarSun MicrosystemsMarkov Chain Monte CarloSemantic ModelingSemantic LayerMetrics LayerHeadless BICubePodcast EpisodeAtScaleStar SchemaData VaultOLAP CubeRAG == Retrieval Augmented GenerationAI Engineering Podcast EpisodeKNN == K-Nearest NeighbersHNSW == Hierarchical Navigable Small Worlddbt Metrics LayerSoda DataLookMLHexPowerBITableauSemantic View (Snowflake)Databricks GenieSnowflake Cortex AnalystMalloyThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

We are presently in the midst of an artificial intelligence (AI) revolution, where advances in deep learning technologies are rewriting the rules of entire industries – including how we forecast the weather and predict changes in our climate. In this session we will learn how the Met Office is successfully navigating the opportunities and challenges (both technological and cultural) of embedding AI across the organisation, and discuss how the complementary use of AI alongside traditional physics-based weather and climate models will further help people make better decisions to stay safe and thrive.

Future-proof your data architecture: Learn how DoorDash built a data lakehouse powered by Starburst to achieve a 20-30% faster time to insights. Akshat Nair shares lessons learned about what drove DoorDash to move beyond Snowflake to embrace the lakehouse. He will share his rationale for selecting Trino as their lakehouse query engine and why his team chose Starburst over open source. Discover how DoorDash seamlessly queries diverse sources, including Snowflake, Postgres, and data lake table formats, achieving faster data-driven decision-making at scale with cost benefits.

Serhii Sokolenko, founder at Tower Dev and former product manager at tech giants like Google Cloud, Snowflake, and Databricks, joined Yuliia to discuss his journey building a next-generation compute platform. Tower Dev aims to simplify data processing for data engineers who work with Python. Serhii explains how Tower addresses three key market trends: the integration of data engineering with AI through Python, the movement away from complex distributed processing frameworks, and users' desire for flexibility across different data platforms. He explains how Tower makes Python data applications more accessible by eliminating the need to learn complex frameworks while automatically scaling infrastructure. Sergei also shares his perspective on the future of data engineering, noting in which ways AI will transform the profession.Tower Dev - https://tower.dev/Serhii's Linkedin - https://www.linkedin.com/in/ssokolenko/

SnowPro Core Certification Study Guide

The "SnowPro Core Certification Study Guide" provides a comprehensive resource for mastering Snowflake data cloud concepts and passing the SnowPro Core exam. Through detailed explanations and practical exercises, you will gain the knowledge and skills necessary to successfully implement and manage Snowflake's powerful features and integrate data solutions effectively. What this Book will help me do Efficiently load and manage data in Snowflake for modern data processing. Optimize queries and configure Snowflake's performance features for data analytics. Securely implement access control and user roles to ensure data privacy. Apply Snowflake's sharing features to collaborate within and between organizations. Prepare effectively for the SnowPro Core exam with mock tests and review tools. Author(s) Jatin Verma is a renowned expert in Snowflake technologies and a certified SnowPro Core professional. With years of hands-on experience working with data solutions, Jatin excels at breaking down complex concepts into digestible lessons. His approachable writing style and dedication to education make this book a trusted resource for both aspiring and current professionals. Who is it for? This book is perfect for data engineers, analysts, database administrators, and business intelligence professionals who are looking to gain expertise in Snowflake and achieve SnowPro Core certification. It is particularly suited for those with foundational knowledge of databases, data warehouses, and SQL, seeking to advance their skills in Snowflake and become certified professionals. By leveraging this guide, readers can solidify their Snowflake knowledge and confidently approach the SnowPro Core certification exam.