Dwight Whitlock

Activities

1

talks

Data Platform Architect Clinician Nexus

Dwight Whitlock is a data and infrastructure engineer leading the Data Platform team at Clinician Nexus. He architects scalable solutions across Python, Spark, Terraform, and Kubernetes, and is driving the firm-wide rollout of a data mesh strategy. His work empowers over 300 analytic users by streamlining data ingestion, schema validation, quality enforcement, and BI acceleration in Databricks. Dwight is passionate about metadata governance, discoverability, and building tools that allow business teams to manage their own data lifecycles.

Bio from: Data + AI Summit 2025

Filter by Event / Source

Data + AI Summit 2025 1

Talks & appearances

1 activities · Newest first

Search activities →

Unifying Human-Curated Data Ingestion and Real-Time Updates with Databricks Lakeflow Declarative Pipelines, Protobuf and BSR

2025-06-10 · Data + AI Summit 2025

talk

Data Governance Databricks Kafka Protobuf Data Streaming

Red Stapler is a streaming-native system on Databricks that merges file-based ingestion and real-time user edits into one Lakeflow Declarative Pipelines for near real-time feedback. Protobuf definitions, managed in the Buf Schema Registry (BSR), govern schema and data-quality rules, ensuring backward compatibility. All records — valid or not — are stored in an SCD Type 2 table, capturing every version for full history and immediate quarantine views of invalid data. This unified approach boosts data governance, simplifies auditing and streamlines error fixes.Running on Lakeflow Declarative Pipelines Serverless and the Kafka-compatible Bufstream keeps costs low by scaling down to zero when idle. Red Stapler’s configuration-driven Protobuf logic adapts easily to evolving survey definitions without risking production. The result is consistent validation, quick updates and a complete audit trail — all critical for trustworthy, flexible data pipelines.