Summary Dan Delorey helped to build the core technologies of Google’s cloud data services for many years before embarking on his latest adventure as the VP of Data at SoFi. From being an early engineer on the Dremel project, to helping launch and manage BigQuery, on to helping enterprises adopt Google’s data products he learned all of the critical details of how to run services used by data platform teams. Now he is the consumer of many of the tools that his work inspired. In this episode he takes a trip down memory lane to weave an interesting and informative narrative about the broader themes throughout his work and their echoes in the modern data ecosystem.
Announcements
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription So now your modern data stack is set up. How is everyone going to find the data they need, and understand it? Select Star is a data discovery platform that automatically analyzes & documents your data. For every table in Select Star, you can find out where the data originated, which dashboards are built on top of it, who’s using it in the company, and how they’re using it, all the way down to the SQL queries. Best of all, it’s simple to set up, and easy for both engineering and operations teams to use. With Select Star’s data catalog, a single source of truth for your data is built in minutes, even across thousands of datasets. Try it out for free and double the length of your free trial today at dataengineeringpodcast.com/selectstar. You’ll also get a swag package when you continue on a paid plan. Your host is Tobias Macey and today I’m interviewing Dan Delorey about his journey through the data ecosystem as the current head of data at SoFi, prior engineering leader with the BigQuery team, and early engineer on Dremel
Interview
Introduction
How did you get involved in the area of data management?
Can you start by sharing what your current relationship to the data ecosystem is and the cliffs-notes version of how you ended up there?
Dremel was a ground-breaking technology at the time. What do you see as its lasting impression on the landscape of data both in and outside of Google?
You were instrumental in crafting the vision behind "querying data in place," (what they called, federated data) at Dremel and BigQuery. What do you mean by this? How has this approach evolved? What are some challenges with this approach?
How well did the Drill project capture the core principles of Dremel as outlined in the eponymous white paper?
Following your work on Drill you were involved with the development and growth of BigQuery and the broader suite of Google Cloud’s data platform.