AI21 Labs is spearheading the industry’s transition towards generative AI models with longer contexts. Models with larger context windows are becoming increasingly prevalent, yet they often come with hefty compute requirements. However, Or Dagan, the product lead at AI21 Labs, challenges this notion, asserting that compute-intensive models are not an inevitable necessity. His company is now unveiling a generative model to substantiate this claim.
Contexts, also known as context windows, denote the input data (e.g., text) a model considers before producing output (additional text). Models with narrow context windows frequently overlook recent conversations’ content, whereas those with broader contexts evade this issue and, as a bonus, better comprehend the data they ingest.
Pushing the Boundaries: AI21 Labs’ Breakthrough in Contextual Understanding
AI21 Labs’ latest creation, Jamba, is a text-generating and -analyzing model that competes with the likes of OpenAI’s ChatGPT and Google’s Gemini. Trained on a combination of public and proprietary data, Jamba proficiently generates text in English, French, Spanish, and Portuguese.
Jamba boasts the capability to handle up to 140,000 tokens while operating on a single GPU with a minimum of 80GB of memory (e.g., a high-end Nvidia A100). This equates to roughly 105,000 words or 210 pages—an impressive capacity akin to that of a substantial novel.
Comparatively, Meta’s Llama 2 possesses a 32,000-token context window, albeit on the smaller side by contemporary standards, but it only necessitates a GPU with approximately 12GB of memory to run efficiently.
Although numerous freely available generative AI models exist, what sets Jamba apart is its unique architecture, blending transformers and state space models (SSMs). Transformers, favored for intricate reasoning tasks, leverage an attention mechanism to weigh the relevance of input data and generate output. SSMs, on the other hand, amalgamate qualities of older AI models to create a computationally efficient architecture capable of handling long data sequences.
While SSMs have inherent limitations, certain iterations, like the open-source Mamba model from Princeton and Carnegie Mellon researchers, excel in handling larger inputs and outperform transformer-based equivalents in language generation tasks.
Jamba incorporates Mamba into its core model, with Dagan claiming it delivers triple the throughput on long contexts compared to transformer-based models of similar sizes.
Despite being released under the Apache 2.0 license, Dagan emphasizes that Jamba is a research release unsuitable for commercial use. It lacks safeguards against generating toxic text or addressing potential biases. However, a fine-tuned, ostensibly safer version will be available soon.
Future Prospects: Advancements and Road Ahead
Dagan remains optimistic about Jamba’s early demonstration of the potential of SSM architecture, particularly its size and innovative design, making it easily adaptable to a single GPU. He anticipates performance enhancements as Mamba undergoes further refinements.
See also: Google to Introduce On-Device AI Enhancements for Pixel 8