talk-data.com talk-data.com

Topic

Modern Data Stack

298

tagged

Activity Trend

28 peak/qtr
2020-Q1 2026-Q1

Activities

298 activities · Newest first

Unlock the Next Evolution of the Modern Data Stack With the Lakehouse Revolution -- with Live Demos

As the data landscape evolves, organizations are seeking innovative solutions that provide enhanced value and scalability without exploding costs. In this session, we will explore the exciting frontier of the Modern Data Stack on Databricks Lakehouse, a game-changing alternative to traditional Data Cloud offerings. Learn how Databricks Lakehouse empowers you to harness the full potential of Fivetran, dbt, and Tableau, while optimizing your data investments and delivering unmatched performance.

We will showcase real-world demos that highlight the seamless integration of these modern data tools on the Databricks Lakehouse platform, enabling you to unlock faster and more efficient insights. Witness firsthand how the synergy of Lakehouse and the Modern Data Stack outperforms traditional solutions, propelling your organization into the future of data-driven innovation. Don't miss this opportunity to revolutionize your data strategy and unleash unparalleled value with the lakehouse revolution.

Talk by: Kyle Hale and Roberto Salcido

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Best Practices for Setting Up Databricks SQL at Enterprise Scale

To learn more, visit the Databricks Security and Trust Center: https://www.databricks.com/trust

In this session, we will talk about the best practices for setting up Databricks to run at large enterprise scale with thousands of users, departmental security and governance, and end-to-end lineage from ingestion to BI tools. We’ll showcase the power of Unity Catalog and Databricks SQL as the core of your modern data stack and how to achieve both data, environment, and financial governance while empowering your users to quickly find and access the data they need.

Talk by: Siddharth Bhai, Paul Roome, Jeremy Lewallen, and Samrat Ray

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksin

Foundation Models in the Modern Data Stack

As Foundation Models (FMs) continue to grow in size, innovations continue to push the boundaries of what these models can do on language and image tasks. This talk will describe our work on applying FMs to structured data tasks like data linkage, cleaning and querying. We will then discuss challenges and solutions that these models present for production deployment in the modern data stack.

Talk by: Ines Chami

Here’s more to explore: LLM Compact Guide: https://dbricks.co/43WuQyb Big Book of MLOps: https://dbricks.co/3r0Pqiz

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

What's New in Databricks SQL -- With Live Demos

We’ve been pushing ahead to make the lakehouse even better for data warehousing across several pillars: native serverless experience, best in class price performance, intelligent workload management & observability and enhanced connectivity, analyst & developer experiences. As we look to double down on that pace of innovation, we want to deep dive into everything that’s been keeping us busy.

In this session we will share an update on key roadmap items. To bring things to life, you will see live demos of the most recent capabilities, from data ingestion, transformation, and consumption, using the modern data stack along with Databricks SQL.

Talk by: Can Efeoglu

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Build Your Data Lakehouse with a Modern Data Stack on Databricks

Are you looking for an introduction to the Lakehouse and what the related technology is all about? This session is for you. This session explains the value that lakehouses bring to the table using examples of companies that are actually modernizing their data, showing demos throughout. The data lakehouse is the future for modern data teams that want to simplify data workloads, ease collaboration, and maintain the flexibility and openness to stay agile as a company scales.

Come to this session and learn about the full stack, including data engineering, data warehousing in a lakehouse, data streaming, governance, and data science and AI. Learn how you can create modern data solutions of your own.

Talk by: Ari Kaplan and Pearl Ubaru

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Summary

Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack Your host is Tobias Macey and today I'm interviewing Bob Muglia about his recent book about the idea of "Datapreneurs" and the role of data in the modern economy

Interview

Introduction How did you get involved in the area of data management? Can you describe what your concept of a "Datapreneur" is?

How is this distinct from the common idea of an entreprenur?

What do you see as the key inflection points in data technologies and their impacts on business capabilities over the past ~30 years? In your role as the CEO of Snowflake you had a first-row seat for the rise of the "modern data stack". What do you see as the main positive and negative impacts of that paradigm?

What are the key issues that are yet to be solved in that ecosmnjjystem?

For technologists who are thinking about launching new ventures, what are the key pieces of advice that you would like to share? What do you see as the short/medium/long-term impact of AI on the technical, business, and societal arenas? What are the most interesting, innovative, or unexpected ways that you have seen business leaders use data to drive their vision? What are the most interesting, unexpected, or challenging lessons that you have learned while working on the Datapreneurs book? What are your key predictions for the future impact of data on the technical/economic/business landscapes?

Contact Info

LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Datapreneurs Book SQL Server Snowflake Z80 Processor Navigational Database System R Redshift Microsoft Fabric Databricks Looker Fivetran

Podcast Episode

Databricks Unity Catalog RelationalAI 6th Normal Form Pinecone Vector DB

Podcast Episode

Perplexity AI

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Rudderstack: Rudderstack

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackSupport Data Engineering Podcast

Data and AI are advancing at an unprecedented rate—and while the jury is still out on achieving superintelligent AI systems, the idea of artificial intelligence that can understand and learn anything—an “artificial general intelligence”—is becoming more likely. What does the rise of AI mean for the future of software and work as we know it? How will AI help reinvent most of the ways we interact with the digital and physical world? Bob Muglia is a data technology investor and business executive, former CEO of Snowflake, and past president of Microsoft's Server and Tools Division. As a leader in data & AI, Bob focuses on how innovation and ethical values can merge to shape the data economy's future in the era of AI. He serves as a board director for emerging companies that seek to maximize the power of data to help solve some of the world's most challenging problems. In the episode, Richie and Bob explore the current era of AI and what it means for the future of software. Throughout the episode, they discuss how to approach driving value with large language models, the main challenges organizations face when deploying AI systems, the risks, and rewards of fine-tuning LLMs for specific use cases, what the next 12 to 18 months hold for the burgeoning AI ecosystem, the likelihood of superintelligence within our lifetimes, and more. Links from the show: The Datapreneurs by Bob Muglia and Steve HammThe Singularity is Near by Ray KurzweilIsaac AsimovSnowflakePineconeDocugamiOpenAI/GPT-4The Modern Data Stack

We talked about:

Santona's background Focusing on data workflows Upsolver vs DBT ML pipelines vs Data pipelines MLOps vs DataOps Tools used for data pipelines and ML pipelines The “modern data stack” and today's data ecosystem Staging the data and the concept of a “lakehouse” Transforming the data after staging What happens after the modeling phase Human-centric vs Machine-centric pipeline Applying skills learned in academia to ML engineering Crafting user personas based on real stories A framework of curiosity Santona's book and resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/santona-tuli/ Upsolver website: upsolver.com Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html

Send us a text Want to be featured as a guest on Making Data Simple? Reach out to us at [[email protected]] and tell us why you should be next.

Abstract Making Data Simple Podcast is hosted by Al Martin, VP, IBM Expert Services Delivery, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun. This week on Making Data Simple, we have Benn Stancil, Chief Analytics Officer + Founder @ Mode. Benn is an accomplished data analyst with deep expertise in collaborative Business Intelligence and Interactive Data Science. Benn is Co-founder, President, and Chief  Analytics Officer of Mode, an award-winning SaaS company that combines the best elements of Business Intelligence (ABI), Data Science (DS) and Machine Learning (ML) to empower data teams to answer impactful questions and collaborate on analysis across a range of business functions. Under Benn’s leadership, the Mode platform has evolved to enable data teams to explore, visualize, analyze and share data in a powerful end-to-end workflow. Prior to founding Mode, Benn served in senior Analytics positions at Microsoft and Yammer, and worked as a  researcher for the International Economics Program at the Carnegie Endowment for International Peace. Benn also served as an Undergraduate Research Fellow at Wake Forest University,  where he received his B.S. in Mathematics and Economics. Benn believes in fostering a shared sense of humility and gratitude.

Show Notes 1:22 – Benn’s history7:09 – Tell us how you got to where you are today9:14 – Tell us about Mode12:08 – What is your definition of the Chief Analytics Officer?21:53 – Why do we need another BI tool?24:09 – What’s your secret sauce?27:48 – Where did the name Mode come from?28:41 – How do we use Mode?31:08 – What is you goto market strategy? 32:38 – Any client references?34:58 – “The missing piece in the modern data stack” tell us about thisMode  Email: [email protected] [email protected] Twitter: benn stancil Connect with the Team Producer Kate Brown - LinkedIn. Host Al Martin - LinkedIn and Twitter.  Want to be featured as a guest on Making Data Simple? Reach out to us at [email protected] and tell us why you should be next. The Making Data Simple Podcast is hosted by Al Martin, WW VP Technical Sales, IBM, where we explore trending technologies, business innovation, and leadership ... while keeping it simple & fun.

Data Contracts in the Modern Data Stack  | Whatnot

ABOUT THE TALK: After two years, three rounds of funding, and hundreds of new employees — Whatnot’s modern data stack has come from not existing to processing tens of millions of events across hundreds of different event types each day.

How does their small (but mighty!) team keep up? This talk explores data contracts — it covers the use of Interface Definition Language (Protobuf) to serve as the source of truth for event definitions, govern event construction in production, automatically generate DBT models in the data warehouse.

ABOUT THE SPEAKER: Zack Klein is a software engineer at Whatnot, where he thoroughly enjoys building data products and narrowly avoiding breaking production each day. Previously, he worked on big data platforms at Blackstone and HBO.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Generative AI & the Natural Language Interface for Data |  Seek AI

ABOUT THE TALK: With the advancement of AI, the natural language interface for data is more valuable than ever before. This talk explores three key questions. First, what would a natural language interface for data actually look like? Second, what kind of value would it add to organizations using the Modern Data Stack? Third, what will the challenges look like when it comes to working with a natural language interface for data? Sarah Nagy will share real-world learnings from Seek's customers for each of these questions.

ABOUT THE SPEAKER: A former quant, Sarah Nagy founded Seek AI in 2021. Prior to starting Seek, Sarah most recently led the consumer data team at Citadel's Ashler Capital. Prior to joining Citadel, Sarah led the quant arms at two startups, Edison and Predata, which both successfully exited. Sarah started her career as a quant at ITG developing algorithmic trading strategies. Sarah has a Master in Finance degree from Princeton and dual Bachelor's degrees in Astrophysics and Business Economics from UCLA.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Malloy An Experimental Language for Data | Google

ABOUT THE TALK: Forcing data through a rectangle shapes the way we solve problems (for example, dimensional fact tables, OLAP Cubes).

Most Data isn’t rectangular it rather exists in hierarchies (orders, items, products, users). Most query results are better returned as a hierarchy (category, brand, product).

Malloy is a new experimental data programming language that, among other things, breaks the rectangle paradigm and several other long held misconceptions in the way we analyze data.

In this talk, Lloyd Tabb shares the ideas behind the Malloy language, semantic data modeling, and his vision for the future of data.

ABOUT THE SPEAKER: Lloyd Tabb spent the last 30 years revolutionizing how the world uses the internet and, by extension, data. He is one of the internet pioneers, having worked at Netscape during the browser wars as the Principal Engineer on Navigator Gold, the first HTML WYSIWYG editor.

Originally a database & languages architect at Borland, Lloyd founded Looker,, which Google acquired in 2019. Lloyd's work at Looker helped define the Modern Data Stack.

At Google, Lloyd continues to pursue his passion for data, and love of programming languages through his current project, Malloy.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/

Extreme Self-Service: Turning Data Consumers into Data Constructors | Whatnot

ABOUT THE TALK: Small data teams face supply and demand problems. Triaging and prioritizing data work can be overwhelming. But what if data consumers could create their own products with minimal training?

Learn how to empower data consumers without disrupting others. Discover lessons from an 'extreme' self-service analytics approach: best practices, fostering a data community, promoting SQL literacy, and establishing solid guard rails.

ABOUT THE SPEAKER: Alice Leach is a Data Engineer at Whatnot Inc., a live stream platform and marketplace that enables collectors and enthusiasts to connect, buy, and sell verified products. She transitioned from academia to data in 2021, working first as a data scientist then data engineer. Her current work at Whatnot focuses on designing and building robust, self-service data workflows using a modern data stack.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil

Modern Data Management   How to Set Your Data Team Up for Success | Select Star

ABOUT THE TALK: Got your Modern Data Stack setup, now what? A mature data practice goes beyond setting up the data pipeline, and ensures there are both systems and processes in place to make it easy for everyone to find and understand data.

Learn how Select Star enables data discovery, making knowledge searchable and understandable for all. Uncover best practices for setting up a data discovery portal as your single source of truth.

ABOUT THE SPEAKER: Alec Bialosky is currently the Director of Business Operations at Select Star where he spends the majority of his time working with prospects and customers to help them achieve their data discovery goals with Select Star.

ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.

Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.

FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/data...

Automating Data Transformations

The modern data stack has evolved rapidly in the past decade. Yet, as enterprises migrate vast amounts of data from on-premises platforms to the cloud, data teams continue to face limitations executing data transformation at scale. Data transformation is an integral part of the analytics workflow--but it's also the most time-consuming, expensive, and error-prone part of the process. In this report, Satish Jayanthi and Armon Petrossian examine key concepts that will enable you to automate data transformation at scale. IT decision makers, CTOs, and data team leaders will explore ways to democratize data transformation by shifting from activity-oriented to outcome-oriented teams--from manufacturing-line assembly to an approach that lets even junior analysts implement data with only a brief code review. With this insightful report, you will: Learn how successful data systems rely on simplicity, flexibility, user-friendliness, and a metadata-first approach Adopt a product-first mindset (data as a product, or DaaP) for developing data resources that focus on discoverability, understanding, trust, and exploration Build a transformation platform that delivers the most value, using a column-first approach Use data architecture as a service (DAaaS) to help teams build and maintain their own data infrastructure as they work collaboratively About the authors: Armon Petrossian is CEO and cofounder of Coalesce. Previously, he was part of the founding team at WhereScape in North America, where he served as national sales manager for almost a decade. Satish Jayanthi is CTO and cofounder of Coalesce. Prior to that, he was senior solutions architect at WhereScape, where he met his cofounder Armon.

The Modern Data Stack has brought a lot of new buzzwords into the data engineering lexicon: "data mesh", "data observability", "reverse ETL", "data lineage", "analytics engineering". In this light-hearted talk we will demystify the evolving revolution that will define the future of data analytics & engineering teams.

Our journey begins with the PyData Stack: pandas pipelines powering ETL workflows...clean code, tested code, data validation, perfect for in-memory workflows. As demand for self-serve analytics grows, new data sources bring more APIs to model, more code to maintain, DAG workflow orchestration tools, new nuances to capture ("the tax team defines revenue differently"), more dashboards, more not-quite-bugs ("but my number says this...").

This data maturity journey is a well-trodden path with common pitfalls & opportunities. After dashboards comes predictive modelling ("what will happen"), prescriptive modelling ("what should we do?"), perhaps eventually automated decision making. Getting there is much easier with the advent of the Python Powered Modern Data Stack.

In this talk, we will cover the shift from ETL to ELT, the open-source Modern Data Stack tools you should know, with a focus on how dbt's new Python integration is changing how data pipelines are built, run, tested & maintained. By understanding the latest trends & buzzwords, attendees will gain a deeper insight into Python's role at the core of the future of data engineering.

Julia just got back from Data Council in Austin, a conference organized by Pete Sonderling, where lots of startups share what they're building, data practitioners go to learn in hands-on workshops, and of course investors go to spot the next big trend. In this episode, Taylor Murphy (Head of Product & Data at Meltano) + Pedram Navid (Founder, West Marin Data) join Julia to recap the conference and have a bit of fun. They talked streaming, how the MDS is growing up, new SQL variants, and, of course, AI. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com.

The modern data stack is a loose collection of technologies, often cloud-based, that collaboratively process and store data to support modern analytics. It must be automated, low code/no code, AI-assisted, graph-enabled, multimodal, streaming, distributed, meshy, converged, polyglot, open, and governed. Published at: https://www.eckerson.com/articles/twelve-must-have-characteristics-of-a-modern-data-stack