talk-data.com
Meetup
talk
2024-04-09 at 19:00
How and why we upgraded and tuned the ML training platform to a scalable system to streamline ML development at Grammarly.
Description
Discussion on upgrading and tuning Grammarly's ML training platform to a scalable system. Topics include moving away from a custom architecture due to hardware shortages, key requirements and architectural challenges, MLOps best practices for scalability, and lessons learned from transitioning from a single-region AWS setup to a cross-region, multi-cloud cluster compute deployment.