Bob Muglia likely needs no introduction. The former CEO of Snowflake led the company during its early, transformational years after a long career at Microsoft and Juniper. Bob recently released the book The Datapreneurs about the arc of innovation in the data industry, starting with the first relational databases all the way to the present craze of LLMs and beyond. In this conversation with Tristan and Julia, Bob shares insights into the future of data engineering and its potential business impact while offering a glimpse into his professional journey. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
talk-data.com
Topic
Analytics Engineering
169
tagged
Activity Trend
Top Events
Metrics are the most important primitive in the data world and driving the use of powerful and reliable metrics is the best way data teams can add value to their enterprises. In this talk, we'll walk through how data teams can best support the metric lifecycle, end-to-end from:
- Designing useful metrics as part of metric trees
- Developing these metrics off stable and standard data contracts
- Operationalizing metrics to drive value
ABOUT THE SPEAKER: Abhi Sivasailam is a Growth and Analytics leader who most recently led Product-Led Growth, Product Analytics, and Analytics Engineering at Flexport, where he helped to lead these and other functions through 10x growth over the past 3 years. Previously, Abhi led growth and data teams at Keap, Hustle, and Honeybook.
👉 Sign up for our “No BS” Newsletter to get the latest technical data & AI content: https://datacouncil.ai/newsletter
ABOUT DATA COUNCIL: Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers.
Make sure to subscribe to our channel for the most up-to-date talks from technical professionals on data related topics including data infrastructure, data engineering, ML systems, analytics and AI from top startups and tech companies.
FOLLOW DATA COUNCIL: Twitter: https://twitter.com/DataCouncilAI LinkedIn: https://www.linkedin.com/company/datacouncil-ai/
Advances in ML have transformed data privacy from a regulatory necessity into an opportunity to improve the work of data people. Synthetic data for modeling + testing is one example of a hard thing that's now easy - and in this conversation with Tristan and Julia, Ian + Abhishek cover many other ways that privacy can actually be a skill that propels your work forward, rather than a mere legal best practice. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
The Modern Data Stack has brought a lot of new buzzwords into the data engineering lexicon: "data mesh", "data observability", "reverse ETL", "data lineage", "analytics engineering". In this light-hearted talk we will demystify the evolving revolution that will define the future of data analytics & engineering teams.
Our journey begins with the PyData Stack: pandas pipelines powering ETL workflows...clean code, tested code, data validation, perfect for in-memory workflows. As demand for self-serve analytics grows, new data sources bring more APIs to model, more code to maintain, DAG workflow orchestration tools, new nuances to capture ("the tax team defines revenue differently"), more dashboards, more not-quite-bugs ("but my number says this...").
This data maturity journey is a well-trodden path with common pitfalls & opportunities. After dashboards comes predictive modelling ("what will happen"), prescriptive modelling ("what should we do?"), perhaps eventually automated decision making. Getting there is much easier with the advent of the Python Powered Modern Data Stack.
In this talk, we will cover the shift from ETL to ELT, the open-source Modern Data Stack tools you should know, with a focus on how dbt's new Python integration is changing how data pipelines are built, run, tested & maintained. By understanding the latest trends & buzzwords, attendees will gain a deeper insight into Python's role at the core of the future of data engineering.
Brad Culberson is a Principal Architect in the Field CTO's office at Snowflake. Niall Woodward is a co-founder of SELECT, a startup providing optimization and spend management software for Snowflake customers. In this conversation with Tristan and Julia, Brad and Niall discuss all things cost optimization: cloud vs on-prem, measuring ROI, and tactical ways to get more out of your budget. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Nick Handel, as co-founder at Transform, helped develop the popular open source metrics framework MetricFlow. Drew Banin, a co-founder at dbt Labs, helped build the initial version of the dbt Semantic Layer, which launched last year. Transform was acquired in February by dbt Labs, and in this conversation with Tristan, they talk through their collective plans for the future of the dbt Semantic Layer. For full show notes and to read 7+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Sarah and Chris are both at the forefront of bringing the promise of gen AI to our actual work as data people—which is a unique challenge! Precise truth is critical for business questions in a way that it's not for a consumer search query. Sarah Nagy is the CEO of Seek AI, a startup that aims to use natural language processing to change how professionals work with data. Chris Aberger currently leads Numbers Station AI, a startup focused on data-intensive workflow automation. In this conversation with Tristan and Julia, they dive into what this future might actually look like, and tangibly what we can expect from gen AI in the short/medium term. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Auren Hoffman currently serves as the CEO and Chief Historian at SafeGraph, a data-as-a-service company he founded, which provides primarily location data. In this conversation with Tristan and Julia, Auren shares how truly few companies are making use of 3rd-party datasets today, how opening up more datasets to public research could help us solve big problems, and a fun fact about Abraham Lincoln's (!) work in the industry. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Mike Stonebraker is a veritable database pioneer and a Turing Award recipient. In addition to teaching at MIT, he is a serial entrepreneur and co-creator of Postgres. Andy Palmer is a veteran business leader who serves as the CEO of Tamr, a company he co-founded with Mike. Through his seed fund Koa Labs, Andy has helped found and/or fund numerous innovative companies in diverse sectors, including health care, technology, and the life sciences. In this conversation with Tristan and Julia, Mike and Andy take us through the evolution of database technology over 5+ decades. They share unique insights into relational databases, the switch from row-based to columnar databases, and some of the patterns of database adoption they see repeated over time. For full show notes and to read 7+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Wes McKinney is the creator of pandas, co-creator of Apache Arrow, and now Co-founder/CTO at Voltron Data. In this conversation with Tristan and Julia, Wes takes us on a tour of the underlying guts, from hardware to data formats, of the data ecosystem. What innovations, down to the hardware level, will stack to lead to significantly better performance for analytics workloads in the coming years? To dig deeper on the Apache Arrow ecosystem, check out replays from their recent conference at https://thedatathread.com. For full show notes and to read 7+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Product experimentation is full of potholes for companies of any size, given the number of pieces (tooling, culture, process, persistence) that need to come together to be successful. Vijaye Raji (currently Statsig, formerly Facebook + Microsoft) and Sean Taylor (currently Motif Analytics, formerly Facebook + Lyft) have navigated these failure modes, and are here to help you (hopefully) do the same. This convo with Tristan + Julia is light on tooling + heavy on process: how to watch out for spillover effects in experiments, avoiding bias, how to run an experiment review, and why experiment throughput is a better indicator of success than individual experiment results. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
The first LIVE IRL episode! Stephen Bailey, data engineer at Whatnot and writer of an incredibly entertaining data substack, joins Tristan for a follow-up conversation to Stephen's Coalesce talk, "Excel at nothing: how to be an effective generalist." You can read Stephen's writing at https://stkbailey.substack.com/. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
What does it mean to "be technical"? What makes a great analytics engineer? How can individuals "develop technically", how can managers "foster technical growth", and how can companies "hire technical people"? It's crucial to understand the component skills that build into great analytics engineering outcomes.
As it turns out, it's not so different from how fashion designers go from prompt to runway look. Join Ashley Sherwood (HubSpot) as she breaks down the parallels between fashion design and analytics engineering work and how small daily design decisions can compound to a massive impact on data teams' abilities to grow their skills and serve stakeholders.
Check the slides here: https://docs.google.com/presentation/d/1HDzAzHhWy4q_cXASB1F3EfTWivWlijCRu5e-CIbaXqI/edit?usp=sharing
Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.
We talked about:
Nikola’s background Making the first steps towards a transition to BI and Analytics Engineering Learning the skills necessary to transition to Analytics Engineering The in-between period – from Marketing to Analytics Engineering Nikola’s current responsibilities Understanding what a Data Model is Tools needed to work as an Analytics Engineer The Analytics Engineering role over time The importance of DBT for Analytics Engineers Where can one learn about data modeling theory? Going from Ancient Greek and Latin to understanding Data (Just-In-Time Learning) The importance of having domain knowledge to analytics engineering Suggestion for those wishing to transition into analytics engineering The importance of having a mentor when transitioning Finding a mentor Helpful newsletters and blogs Finding Nikola online
Links:
Nikola's LinkedIn account: https://www.linkedin.com/in/nikola-maksimovic-40188183/
ML Zoomcamp: https://github.com/alexeygrigorev/mlbookcamp-code/tree/master/course-zoomcamp
Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html
WARNING: This episode contains detailed discussion of data contracts. The modern data stack introduces challenges in terms of collaboration between data producers and consumers. How might we solve them to ultimately build trust in data quality? Chad Sanderson leads the data platform team at Convoy, a late-stage series-E freight technology startup. He manages everything from instrumentation and data ingestion to ETL, in addition to the metrics layer, experimentation software and ML. Prukalpa Sankar is a co-founder of Atlan, where she develops products that enable improved collaboration between diverse users like businesses, analysts, and engineers, creating higher efficiency and agility in data projects. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.
Abhi is a growth and data leader, and an excellent Twitter follow. Most recently, he was Head of Growth and Analytics at Flexport, where he helped the company to grow 10x over the past 3 years. Previously, Abhi led growth and data teams at Keap, Hustle, and Honeybook. In this conversation with Tristan and Julia, Abhi explains his methodology for setting up a new growth data organization, and how you might be falling victim to the dreaded "arbitrary uniqueness" bug. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs
Summary One of the most impactful technologies for data analytics in recent years has been dbt. It’s hard to have a conversation about data engineering or analysis without mentioning it. Despite its widespread adoption there are still rough edges in its workflow that cause friction for data analysts. To help simplify the adoption and management of dbt projects Nandam Karthik helped create Optimus. In this episode he shares his experiences working with organizations to adopt analytics engineering patterns and the ways that Optimus and dbt were combined to let data analysts deliver insights without the roadblocks of complex pipeline management.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Nand
As analytics engineers, we make impact by building analytics things (models, pipelines, visualizations) that help stakeholders make decisions about what to do next. What if we could also make impact by driving a culture of experimentation—which will help those same stakeholders make decisions too?
Join Adam Stone (Netlify) as he draws on his vast experimentation experience and explains how analytics engineer can use a combination of a program-building mindset, organizational mentoring (and cheerleading), and off-the-shelf tools to partner with product and engineering teams to quickly spin up meaningful experimentation.
Check the slides here: https://docs.google.com/presentation/d/1vWfhfTnC9-NV-qrQLTkGk4qgdi-19JA8E3p6fpniQe0/edit?usp=sharing
Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.
What do you do when you find out that your team is being tasked with building a single platform that should be able to serve everyone's data needs, no matter whether they are internal (from within your company) or external (your customers)? What's more it's expected to be fast, stable, granular, sophisticated, simple, scalable, usable, easy to maintain, compatible… the list goes on.
Well, time to find a new-school solution. We'll walk you through our story of how and why we built Slido's dataAPI using everyone's favourite Analytics Engineering tool, dbt.
Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.
Most analysts don’t become analysts to build dashboards. We don’t become analysts to do data pulls, or clean up messy data, or put together pitch decks. We become analysts to do impactful, strategic analysis. This is our calling; it’s the most valuable work that we do; and it’s why we put up with the rest of our job—for that afternoon with nothing but a big question, a clear calendar, and a trajectory-changing aha moment buried somewhere in our well-prepped datasets.
But the rapid rise of analytics engineering should make us question all of this. Is strategic analysis actually the holy grail of analytics? Is it the most valuable thing we could do? Is it even what we want to do?
In chasing this ambition, Benn Stancil (Mode) thinks we’ve lost sight of something even more important—and potentially, more interesting: Designing operational models. These frameworks, which are a natural extension of the semantic models built by analytics engineers, are often more valuable than any dashboard, any dataset, or any deep dive analysis.
In his talk, Benn will share what these models are, why they’re valuable, and why, in our eternal quest to both quantify our value and to find work we love, they could prove to be our holy grail we’ve always been looking for.
Check the slides here: https://docs.google.com/presentation/d/1lOH6Sb8DQnnlmZkYOlqqHgQeXKkUEQCm_LOxsjBRJlM/edit?usp=sharing
Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.