Real-time ML: Accelerating Python for inference (< 10ms) at scale

Real-time machine learning depends on features and data that by definition can’t be pre-computed. Detecting fraud or acute diseases like sepsis requires processing events that emerged seconds ago. How do we build an infrastructure platform that executes complex data pipelines (< 10ms) end-to-end and on-demand? All while meeting data teams where they are–in Python–the language of ML! Learn how we built a symbolic interpreter that accelerates ML pipelines by transpiling Python into DAGs of static expressions. These expressions are optimized in C++ and eventually run in production workloads at scale with Velox–an OSS (~4k stars) unified query engine (C++) from Meta.

talk-data.com

Real-time ML: Accelerating Python for inference (< 10ms) at scale

Description