Llama 2, released in July 2023, represents the second generation of Large Language Models from Meta AI. It builds upon LLaMA with significant improvements in pre-training data, context length, the introduction of chat-optimised models using extensive alignment techniques, and crucially, a licence permitting commercial use.
Key Innovations and Contributions
Open Access for Commercial Use
- Core Idea: Llama 2 models were released under a more permissive licence allowing for commercial use (with an Acceptable Use Policy and specific restrictions for companies with very large user bases).
- Impact: This was a major departure from LLaMA and significantly boosted the adoption and utility of Llama 2 in real-world applications and products. It strongly positioned Llama 2 as a leading open alternative to closed models.
Introduction of Llama 2-Chat Models
- Core Idea: Alongside the base pre-trained models, Meta released Llama 2-Chat versions specifically fine-tuned for dialogue use cases.
- Impact: Provided ready-to-use, high-performance conversational models built on the Llama 2 foundation, making state-of-the-art chat capabilities more accessible.
Detailed and Scaled RLHF Implementation
- Core Idea: The Llama 2 paper provided a detailed description of the large-scale RLHF pipeline used to align Llama 2-Chat for helpfulness and safety. Key stages included:
- SFT: Initial tuning on instructions and dialogues, using a mix of public and high-quality internal datasets.
- Reward Modelling: Training separate reward models to score outputs based on human preferences for helpfulness and safety.
- Rejection Sampling Fine-tuning: Iteratively improving the model by generating multiple responses to prompts and selecting the best ones (according to the reward model) for further fine-tuning.
- PPO: Further refinement using PPO algorithm, optimising the policy (the LLM) against the reward models.
- Impact: Provided valuable insights into alignment techniques and demonstrated their effectiveness at scale, influencing subsequent work on model alignment. The Llama 2-Chat models showed strong performance compared to other models at the time.
Explicit Focus on Safety Alignment and Transparency
- Core Idea: Safety was a primary focus, integrated throughout the fine-tuning process and evaluation. This involved:
- Safety-specific data collection and annotation for SFT and reward modelling.
- Training a dedicated safety reward model.
- Extensive internal and external red teaming to identify and mitigate risks.
- Detailed safety evaluations reported in the paper.
- Impact: Set a higher standard for responsible development and release practices for powerful open models, emphasising transparency about safety procedures.
Pre-training Improvements
- Increased Data: The pre-training dataset size was increased by 40% compared to LLaMA, totalling 2 trillion tokens of publicly available data. Efforts were made to better filter problematic data sources.
- Longer Context Length: The maximum context length was doubled from 2048 (LLaMA) to 4096 tokens.
- Impact: Contributed to improved knowledge, reasoning capabilities, and the ability to handle longer prompts or conversations.
Architectural Refinement
- Core Idea: The larger 70B Llama 2 model adopted GQA instead of standard MHA (MHA was used in smaller Llama 2 models and all LLaMA models).
- Impact: GQA reduces the computational cost and memory bandwidth requirements during inference (especially key-value cache size) compared to MHA, making the large 70B model faster and more efficient to run, particularly for long sequences. It offers a trade-off between the efficiency of MQA and the quality of MHA.
References