talk-data.com talk-data.com

Meetup talk 2024-04-09 at 19:00

How and why we upgraded and tuned the ML training platform to a scalable system to streamline ML development at Grammarly.

Description

Discussion on upgrading and tuning Grammarly's ML training platform to a scalable system. Topics include moving away from a custom architecture due to hardware shortages, key requirements and architectural challenges, MLOps best practices for scalability, and lessons learned from transitioning from a single-region AWS setup to a cross-region, multi-cloud cluster compute deployment.