talk-data.com talk-data.com

Holden Karau

Speaker

Holden Karau

1

talks

Holden is a transgender Canadian open source developer with a focus on Apache Spark, and related "big data" tools. By day (and night, go go startup life) she works on brining large language models and other AI tools to help healthcare users deal with insurance through https://www.fighthealthinsurance.com & https://www.fightpaperwork.com.

She is the co-author of Learning Spark, High Performance Spark, and a few others. She is a committer and PMC on Apache Spark. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.

Bio from: Small Data SF 2025

Frequent Collaborators

Filtering by: Data + AI Summit 2025 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 9 activities

Search activities →
Building AI Models In Health Care Using Semi-Synthetic Data

Regulated or restricted fields like Health Care make collecting training data complicated. We all want to do the right thing, but how? This talk will look at how Fight Health Insurance used de-identified public and proprietary information to create a semi-synthetic training set for use in fine-tuning machine learning models to power Fight Paperwork. We'll explore how to incorporate the latest "reasoning" techniques in fine tuning as well as how to make models that you can afford to serve โ€” think single GPU inference instead of a cluster of A100s. In addition to the talk we have the code used in a public GitHub repo โ€” although it is a little rough, so you might want to use it more as a source of inspiration rather than directly forking it.