talk-data.com talk-data.com

S

Speaker

Santona Tuli

3

talks

Head/Director of Data Upsolver

Filter by Event / Source

Talks & appearances

3 activities · Newest first

Search activities →
podcast_episode
with Santona Tuli (Upsolver) , Tim Gasper (data.world from ServiceNow) , Juan Sequeda (data.world) , Joe Reis (DeepLearning.AI)

Are your outputs generating the right outcomes? I'm in Austin for Data Day Texas, and I reflect on this topic via a conversation I had last night with Juan Sequeda, Tim Gasper, and Santona Tuli.

In 2024, outcomes will matter more than ever. What are you doing to drive the right outcomes for your organization?

Scaling dbt models for CDC on large databases - Coalesce 2023

Unlike transforming staged data to marts, ingesting data into staging requires robustness to data volume and type changes, schema evolution, and data drift. Especially when performing change data capture (CDC) on large databases (~100 tables to a database), we’ll ideally reinforce our dbt models with automatic:

  • Mapping of dynamic columns and data types between the source and the target stag
  • evolution of stage table schemas at pace with incoming data, including for nested data structures
  • parsing and flattening of any arrays and JSON structs in the data.

Manually performing these tasks for data at scale is a tall order due to the many permutations with which CDC data can deviate. Waiting to implement them in mart transformation models is potentially detrimental to the business, as well as doesn’t reduce the complexity. Santona Tuli shares learnings from integrating dbt Core into high-scale data ingestion workloads, including trade-offs between ease-of-use and scale.

Speaker: Santona Tuli, Head/Director of Data, Upsolver

Register for Coalesce at https://coalesce.getdbt.com

We talked about:

Santona's background Focusing on data workflows Upsolver vs DBT ML pipelines vs Data pipelines MLOps vs DataOps Tools used for data pipelines and ML pipelines The “modern data stack” and today's data ecosystem Staging the data and the concept of a “lakehouse” Transforming the data after staging What happens after the modeling phase Human-centric vs Machine-centric pipeline Applying skills learned in academia to ML engineering Crafting user personas based on real stories A framework of curiosity Santona's book and resource recommendations

Links:

LinkedIn: https://www.linkedin.com/in/santona-tuli/ Upsolver website: upsolver.com Why we built a SQL-based solution to unify batch and stream workflows: https://www.upsolver.com/blog/why-we-built-a-sql-based-solution-to-unify-batch-and-stream-workflows

Free MLOps course: https://github.com/DataTalksClub/mlops-zoomcamp

Join DataTalks.Club: https://datatalks.club/slack.html

Our events: https://datatalks.club/events.html