In this session, we will explore the world of small language models, focusing on their unique advantages and practical applications. We will cover the basics of language models, the benefits of using smaller models, and provide hands-on examples to help beginners get started. By the end of the session, attendees will have a solid understanding of how to leverage small language models in their projects. The session will highlight the efficiency, customization, and adaptability of small models, making them ideal for edge devices and real-time applications.
We will introduce attendees to two highly used Small Language Models: Qwen3 and SmolLM3. Specifically, we will cover:
1. Accessing Models: How to navigate HuggingFace to explore and select available models. How to view model documentation and determine its usefulness for specific tasks
2. Deployment: How to get started using
(a) Inference Provider - using HuggingFace inference API or Google CLI
(b) On-Tenant - using Databricks Model Serving
(c) Running the Model Locally - Using Ollama and LMstudio
3. We also examine the tradeoffs of each route