Microsoft has introduced the Phi-3 family of open small language models (SLMs), lauding them as the most capable and cost-effective models of their size available. Developed by Microsoft researchers, an innovative training approach has enabled these models to surpass larger models on language, coding, and math benchmarks.
Sonali Yadav, Principal Product Manager for Generative AI at Microsoft, emphasised a shift from a singular category of models to a portfolio, enabling customers to choose the best model for their scenario. The first model, Phi-3-mini with 3.8 billion parameters, is now accessible in various platforms including Azure AI Model Catalog, Hugging Face, Ollama, and as an NVIDIA NIM microservice. Despite its compact size, Phi-3-mini outperforms models twice its size. Additional models such as Phi-3-small (7B parameters) and Phi-3-medium (14B parameters) will follow.
According to Luis Vargas, Microsoft VP of AI, some customers may require small models, while others may opt for larger ones or a combination of both. The smaller size of SLMs allows for on-device deployment, facilitating low-latency AI experiences without network connectivity. This capability opens doors for applications in smart sensors, cameras, farming equipment, and more, with the added benefit of enhanced privacy by keeping data on the device.
More on Phi-3
While large language models (LLMs) excel at complex reasoning over vast datasets, SLMs offer a compelling alternative for simpler tasks such as query answering, summarization, and content generation. Victor Botev, CTO and Co-Founder of Iris.ai, commended Microsoft’s approach of developing tools with curated data and specialized training, reducing computational costs while improving performance and reasoning abilities.
Microsoft’s SLM quality leap was made possible by an innovative data filtering and generation approach inspired by bedtime storybooks. Sebastien Bubeck, Microsoft VP leading SLM research, highlighted the creation of high-quality datasets like ‘TinyStories’ and ‘CodeTextbook,’ synthesized through rounds of prompting, generation, and filtering. These datasets, vetted for educational value, significantly improved the language models‘ performance.
Despite the meticulous data curation, Microsoft underscores the importance of applying additional safety practices to its release, aligning with its standard processes for all generative AI models. A multi-layered approach, including further training examples and vulnerability assessments, was employed to manage and mitigate risks in developing Phi-3 models. Azure AI tools are also provided for customers to build trustworthy applications atop Phi-3.
See also: UK Antitrust Probe Targets Amazon, Microsoft AI Partnerships