talk-data.com talk-data.com

Topic

SQL

Structured Query Language (SQL)

database_language data_manipulation data_definition programming_language

1751

tagged

Activity Trend

107 peak/qtr
2020-Q1 2026-Q1

Activities

1751 activities · Newest first

Summary In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vector stores, and streaming systems. Matt shares practical solutions, including propagating user identity via JWTs, externalizing policy with engines like OPA/Rego and Cedar, and using database proxies for native row/column security. He also explores catalog-driven governance, lineage-based label propagation, and OpenTDF for binding policies to data objects. The conversation covers machine-to-machine access, short-lived credentials, workload identity, and constraining access by interface choke points, as well as lessons from Zanzibar-style policy models and the human side of enforcement. Matt emphasizes the need for trust composition - unifying provenance, policy, and identity context - to answer questions about data access, usage, and intent across the entire data path.

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data managementData teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.Your host is Tobias Macey and today I'm interviewing Matt Topper about the challenges of managing identity and access controls in the context of data systemsInterview IntroductionHow did you get involved in the area of data management?The data ecosystem is a uniquely challenging space for creating and enforcing technical controls for identity and access control. What are the key considerations for designing a strategy for addressing those challenges?For data acess the off-the-shelf options are typically on either extreme of too coarse or too granular in their capabilities. What do you see as the major factors that contribute to that situation?Data governance policies are often used as the primary means of identifying what data can be accesssed by whom, but translating that into enforceable constraints is often left as a secondary exercise. How can we as an industry make that a more manageable and sustainable practice?How can the audit trails that are generated by data systems be used to inform the technical controls for identity and access?How can the foundational technologies of our data platforms be improved to make identity and authz a more composable primitive?How does the introduction of streaming/real-time data ingest and delivery complicate the challenges of security controls?What are the most interesting, innovative, or unexpected ways that you have seen data teams address ICAM?What are the most interesting, unexpected, or challenging lessons that you have learned while working on ICAM?What are the aspects of ICAM in data systems that you are paying close attention to?What are your predictions for the industry adoption or enforcement of those controls?Contact Info LinkedInParting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.If you've learned something or tried out a project from the show then tell us about it! Email [email protected] with your story.Links UberEtherJWT == JSON Web TokenOPA == Open Policy AgentRegoPingIdentityOktaMicrosoft EntraSAML == Security Assertion Markup LanguageOAuthOIDC == OpenID ConnectIDP == Identity ProviderKubernetesIstioAmazon CEDAR policy languageAWS IAMPII == Personally Identifiable InformationCISO == Chief Information Security OfficerOpenTDFOpenFGAGoogle ZanzibarRisk Management FrameworkModel Context ProtocolGoogle Data ProjectTPM == Trusted Platform ModulePKI == Public Key InfrastructurePassskeysDuckLakePodcast EpisodeAccumuloJDBCOpenBaoHashicorp VaultLDAPThe intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Help us become the #1 Data Podcast by leaving a rating & review! We are 67 reviews away! Data meets music 🎶 — Avery sits down with Chris Reba, a data analyst who’s studied over 1 million songs, to reveal what the numbers say about how hits are made. From uncovering Billboard chart fraud to exploring how TikTok reshaped music, this episode breaks down the art and science behind every beat. 💌 Join 30k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com//interviewsimulator ⌚ TIMESTAMPS 00:00 - Intro: How Chris analyzed 1M+ songs using data 01:10 - What data reveals about hit songs and music trends 03:30 - Combining qualitative and quantitative analysis 07:00 - The 1970s Billboard chart fraud explained 10:45 - Why key changes disappeared from modern pop 13:30 - How hip-hop changed song structure and sound 14:10 - TikTok’s influence on the music industry 16:10 - Inside Chris’s open-source music dataset 22:10 - Best tools for music data analysis (SQL, Python, Datawrapper) 27:45 - Advice for aspiring music data analysts 🔗 CONNECT WITH CHRIS 📕 Order Chris's Book: https://www.bloomsbury.com/us/uncharted-territory-9798765149911 📊 Check out Chris's Music Dataset: https://docs.google.com/spreadsheets/d/1j1AUgtMnjpFTz54UdXgCKZ1i4bNxFjf01ImJ-BqBEt0/edit?gid=1974823090#gid=1974823090 💌 Subscribe to Chris's' Newsletter: https://www.cantgetmuchhigher.com 📲 Follow Chris on TikTok: https://www.tiktok.com/@cdallarivamusic 🔗 CONNECT WITH AVERY 🎥 YouTube Channel 🤝 LinkedIn 📸 Instagram 🎵 TikTok 💻 Website Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Get certified at Coalesce! Choose from two certification exams: The dbt Analytics Engineering Certification Exam is designed to evaluate your ability to: Build, test, and maintain models to make data accessible to others Use dbt to apply engineering principles to analytics infrastructure We recommend that you have at least SQL proficiency and have had 6+ months of experience working in dbt (self-hosted dbt or the dbt platform) before attempting the exam. The dbt Architect Certification Exam assesses your ability to: Design secure, scalable dbt implementations, with a focus on environment orchestration Role-based access control Integrations with other tools Collaborative development workflows aligned with best practices What to expect Your purchase includes sitting for one attempt at one of the two in-person exams at Coalesce You will let the proctor know which certification you are sitting for Please arrive on time, this is a closed-door certification, and attendees will not be let in after the doors are closed What to bring You will need to bring your own laptop to take the exam Duration: 2 Hours Fee: $100 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes If you no-show for your certification, you will not be refunded

Learn how dbt is evolving with the next-generation engine powered by Fusion. In this hands-on session, you'll explore how Fusion improves dbt’s performance, enables richer SQL parsing, and sets the stage for future enhancements like better refactoring tools and faster development workflows.What to bring: You must bring your own laptop to complete the hands-on exercises. We will provide all the other sandbox environments for dbt and a data platform.

dbt Canvas makes it simple for every data practitioner to contribute to a dbt project. Learn the foundational concepts of developing in Canvas, dbt's new visual editor, and the best practices of editing and creating dbt models. After this course, you will be able to: Create new dbt models and edit existing models in dbt Canvas Understand the different operators in Canvas Evaluate the underlying SQL produced by Canvas Prerequisites for this course include: Basic SQL understanding What to bring: You will need to bring your own laptop to complete the hands-on exercises. We will provide all the other sandbox environments for dbt and data platform. Duration: 2 hours Fee: $200 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes

Get certified at Coalesce! Choose from two certification exams: The dbt Analytics Engineering Certification Exam is designed to evaluate your ability to: Build, test, and maintain models to make data accessible to others Use dbt to apply engineering principles to analytics infrastructure We recommend that you have at least SQL proficiency and have had 6+ months of experience working in dbt (self-hosted dbt or the dbt platform) before attempting the exam. The dbt Architect Certification Exam assesses your ability to: Design secure, scalable dbt implementations, with a focus on environment orchestration Role-based access control Integrations with other tools Collaborative development workflows aligned with best practices What to expect Your purchase includes sitting for one attempt at one of the two in-person exams at Coalesce You will let the proctor know which certification you are sitting for Please arrive on time, this is a closed-door certification, and attendees will not be let in after the doors are closed What to bring You will need to bring your own laptop to take the exam Duration: 2 Hours Fee: $100 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes If you no-show your certification, you will not be refunded

Data interviews do not have to feel messy. In this episode, I share a simple AI Interview Copilot that works for data analyst, data scientist, analytics engineer, product analyst, and marketing analyst roles. What you will learn today: How to Turn a Job Post into a Skills Map: Know Exactly What to Study First.How to build role-specific SQL drills (joins, window functions, cohorts, retention, time series).How to practice product/case questions that end with a decision and a metric you can defend.How to prepare ML/experimentation basics (problem framing, features, success metrics, A/B test sanity checks).How to plan take-home assignments (scope, assumptions, readable notebook/report structure).How to create a 6-story STAR bank with real numbers and clear outcomes.How to follow a 7-day rhythm so you make steady progress without burnout.How to keep proof of progress so your confidence comes from evidence, not hope.Copy-and-use prompts from the show: JD → Skills Map: “Parse this job post. Table: Skill/Theme | Where mentioned | My level (guess) | Study action | Likely interview questions. Then give 5 bullets: what they are really hiring for.”SQL Drill Factory (Analyst/Product/Marketing): “Create 20 SQL tasks + hint + how to check results using orders, users, events, campaigns. Emphasize joins, windows, conditional agg, cohorts, funnels, retention, time windows.”Case Coach (Data/Product): “Run a 15-minute case: key metric is down. Ask one question at a time. Score clarity, structure, metrics, trade-offs. End with gaps + practice list.”ML/Experimentation Basics (Data Science): “Create a 7-step outline for framing a modeling problem (goal, data, features, baseline, evaluation, risks, comms). Add an A/B test sanity checklist (power, SRM, population, metric guardrails).”Take-Home Planner: “Given this brief, propose scope, data assumptions, 3–5 analysis steps, visuals, and a short results section. Output a clear report outline.”Behavioral STAR Bank: “Draft 6 STAR stories (120s) for conflict, ambiguity, failure, leadership without title, stakeholder influence, measurable impact. Put numbers in Results.”

SQL Server 2025 Unveiled: The AI-Ready Enterprise Database with Microsoft Fabric Integration

Unveil the data platform of the future with SQL Server 2025—guided by one of its key architects . With built-in AI for application development and advanced analytics powered by Microsoft Fabric, SQL Server 2025 empowers you to innovate—securely and confidently. This book shows you how. Author Bob Ward, Principal Architect for the Microsoft Azure Data team, shares exclusive insights drawn from over three decades at Microsoft. Having worked on every version of SQL Server since OS/2 1.1, Ward brings unmatched expertise and practical guidance to help you navigate this transformative release. Ward covers everything from setup and upgrades to advanced features in performance, high availability, and security. He also highlights what makes this the most developer-friendly release in a decade: support for JSON, RegEx, REST APIs, and event streaming. Most critically, Ward explores SQL Server 2025’s advanced, scalable AI integrations, showing you how to build AI-powered applications deeply integrated with the SQL engine—and elevate your analytics to the next level. But innovation doesn’t come at the cost of safety: this release is built on a foundation of enterprise-grade security, helping you adopt AI safely and responsibly. You control which models to use, how they interact with your data, and where they run—from ground to cloud, or integrated with Microsoft Fabric. With built-in features like Row-Level Security (RLS), Transparent Data Encryption (TDE), Dynamic Data Masking, and SQL Server Auditing, your data remains protected at every layer. The AI age is here. Make sure your SQL Server databases are ready—and built for secure, scalable innovation . What You Will Learn [if !supportLists] · [endif]Grasp the fundamentals of AI to leverage AI with your data, using the industry-proven security and scale of SQL Server [if !supportLists] · [endif]Utilize AI models of your choice, services, and frameworks to build new AI applications [if !supportLists] · [endif]Explore new developer features such as JSON, Regular Expressions, REST API, and Change Event Streaming [if !supportLists] · [endif]Discover SQL Server 2025's powerful new engine capabilities to increase application concurrency [if !supportLists] · [endif]Examine new high availability features to enhance uptime and diagnose complex HADR configurations [if !supportLists] · Use new query processing capabilities to extend the performance of your application [if !supportLists] · [endif]Connect SQL Server to Azure with Arc for advanced management and security capabilities [if !supportLists] · [endif]Secure and govern your data using Microsoft Entra [if !supportLists] · [endif]Achieve near-real-time analytics with the unified data platform Microsoft Fabric [if !supportLists] · [endif]Integrate AI capabilities with SQL Server for enterprise AI [if !supportLists] · [endif]Leverage new tools such as SQL Server Management Studio and Copilot experiences to assist your SQL Server journey Who This Book Is For The SQL Server community, including DBAs, architects, and developers eager to stay ahead with the latest advancements in SQL Server 2025, and those interested in the intersection of AI and data, particularly how artificial intelligence (AI) can be seamlessly integrated with SQL Server to unlock deeper insights and smarter solutions

Zero-footprint SQL testing: From framework to culture shift

We built a zero-footprint SQL testing framework using mock data and the full power of the pytest ecosystem to catch syntactic and semantic issues before they reach production. More than just a tool, it helped shift our team’s mindset by integrating into CI/CD, encouraging contract-driven development, and promoting testable SQL. In this session, we’ll share our journey, key lessons learned, and how we open-sourced the framework to make it available for everyone.

At EQT, we use dbt as the backbone for contracts, metadata, and increasingly, semantic models. Many of our users work in Excel, so we built a custom add-in to make governed data and shared metrics directly accessible where exploration happens. This setup also lets us experiment with AI-assisted discovery and text-to-SQL, connecting both raw data and dbt’s semantic layer to live, auditable analysis.

Get certified at Coalesce! Choose from two certification exams: The dbt Analytics Engineering Certification Exam is designed to evaluate your ability to: Build, test, and maintain models to make data accessible to others Use dbt to apply engineering principles to analytics infrastructure We recommend that you have at least SQL proficiency and have had 6+ months of experience working in dbt (self-hosted dbt or the dbt platform) before attempting the exam. The dbt Architect Certification Exam assesses your ability to: Design secure, scalable dbt implementations, with a focus on environment orchestration Role-based access control Integrations with other tools Collaborative development workflows aligned with best practices What to expect Your purchase includes sitting for one attempt at one of the two in-person exams at Coalesce You will let the proctor know which certification you are sitting for Please arrive on time, this is a closed-door certification, and attendees will not be let in after the doors are closed What to bring You will need to bring your own laptop to take the exam Duration: 2 Hours Fee: $100 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes If you no-show your certification, you will not be refunded

Learn how dbt is evolving with the next-generation engine powered by Fusion. In this hands-on session, you'll explore how Fusion improves dbt’s performance, enables richer SQL parsing, and sets the stage for future enhancements like better refactoring tools and faster development workflows. What to bring: You must bring your own laptop to complete the hands-on exercises. We will provide all the other sandbox environments for dbt and a data platform.

DNB, Norway’s largest bank, began building a cloud-based self-service Data & AI Platform in 2017, delivering its first capabilities by 2018. Initially focused on ML and analytics, the platform expanded in 2021 to include traditional data warehouses and modern data products. Snowflake was officially launched in 2023 after a successful PoC and pilot.

In this talk, we’ll walk through our journey.

Where We Came From

•Discover how legacy data warehouse bottlenecks sparked a shift toward decentralised, self-service data capabilities.

Where We Are

•Learn how DNB enabled teams to own and operate their data products through: •Streamlined domain onboarding •“DevOps for data” and “SQL as code” practices •Automated services for historisation (PSA)

Where We’re Going

•Explore how DNB is evolving its data mesh with: •A hybrid model of decentralised and centralised data products •Generative AI, metadata automation, and development support •Enhanced tooling and services for data consumers

Explore Snowflake’s enterprise AI vision, innovations, and roadmap. You’ll see how Snowflake Intelligence, Cortex AI SQL, Cortex Agents, automatic Semantic Model generation and specialized AI tools work together inside Snowflake’s secure platform, turning multimodal data into transformational impact for enterprise customers without infrastructure headaches. Walk away knowing how these integrated capabilities give business users, data analysts and engineers the tools, speed and governance required to deploy AI into production.

dbt Canvas makes it simple for every data practitioner to contribute to a dbt project. Learn the foundational concepts of developing in Canvas, dbt's new visual editor, and the best practices of editing and creating dbt models. After this course, you will be able to: Create new dbt models and edit existing models in dbt Canvas Understand the different operators in Canvas Evaluate the underlying SQL produced by Canvas Prerequisites for this course include: Basic SQL understanding What to bring: You will need to bring your own laptop to complete the hands-on exercises. We will provide all the other sandbox environments for dbt and data platform. Duration: 2 hours Fee: $200 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes

Get certified at Coalesce! Choose from two certification exams: The dbt Analytics Engineering Certification Exam is designed to evaluate your ability to: Build, test, and maintain models to make data accessible to others Use dbt to apply engineering principles to analytics infrastructure We recommend that you have at least SQL proficiency and have had 6+ months of experience working in dbt (self-hosted dbt or the dbt platform) before attempting the exam. The dbt Architect Certification Exam assesses your ability to: Design secure, scalable dbt implementations, with a focus on environment orchestration Role-based access control Integrations with other tools Collaborative development workflows aligned with best practices What to expect Your purchase includes sitting for one attempt at one of the two in-person exams at Coalesce You will let the proctor know which certification you are sitting for Please arrive on time, this is a closed-door certification, and attendees will not be let in after the doors are closed What to bring You will need to bring your own laptop to take the exam Duration: 2 Hours Fee: $100 Trainings and certifications are not offered separately and must be purchased with a Coalesce pass Trainings and certifications are not available for Coalesce Online passes If you no-show your certification, you will not be refunded