talk-data.com talk-data.com

Weston Pace

Speaker

Weston Pace

1

talks

guest
Filtering by: PyData Seattle 2025 ×

Filter by Event / Source

Talks & appearances

Showing 1 of 2 activities

Search activities →
Data Loading for Data Engineers

Data scientists need data to train their models. The process of feeding the training algorithm with data is loosely described as "data loading." This talk looks at the data loading process from a data engineer's perspective. We will describe common techniques such as splits, shuffling, clumping, epochs, and distribution. We will show how the way data is loaded can have impacts on training speed and model quality. Finally, we examine what constraints these workloads put on data systems and discuss best practices for preparing a database to serve as a source for data loading.