Llama 3

The Llama 3 series, released starting in April 2024, represents Meta AI’s third generation of open-access Large Language Models. This series marked significant advancements over Llama 2, including massively scaled pre-training data, architectural improvements like a new tokeniser, enhanced alignment techniques, the introduction of very large models, long context windows, and multimodal capabilities.

Llama 3

Released in April 2024 in 8B & 70B sizes.

Massively Scaled Pre-training Data:
- Pre-trained on over 15 trillion tokens of data, a significant increase from Llama-2’s 2 trillion tokens.
- Data sourced from publicly available online sources, with extensive filtering and curation efforts. Included a larger proportion of non-English data (over 5%) and code compared to Llama-2, aiming for improved multilingualism and reasoning/coding skills.
New Tokeniser:
- Introduced a new Tokeniser using Byte Pair Encoding (BPE) with a 128,000 token vocabulary (up from Llama 2’s 32,000).
- This larger vocabulary significantly improves tokenisation efficiency, leading to better language encoding and potentially better performance, especially for multilingual contexts.
Architectural Refinements:
- Standardised the use of GQA across both the 8B and 70B models (previously only used in Llama-2 70B). This improves inference efficiency.
- Models were released with an 8k token context length.
Improved Instruction Following and Alignment:
- Utilised a combination of SFT, rejection sampling, PPO, and DPO for post-training alignment. The paper details enhancements to these methods for better instruction following and model behaviour.
- Showed significantly improved performance on benchmarks measuring reasoning, coding, and instruction following compared to Llama-2 and other contemporary open models.
Enhanced Safety and Trust Features:
- Released alongside updated safety tools like Llama Guard 2, Code Shield (a filter for insecure code suggestions), and CyberSec Eval 2 (a benchmark for cybersecurity risks).

Llama 3.1

Released in July 2024 in 405B size.

Massive Model Scale:
- Introduced a 405 billion parameter model, significantly larger than previous Llama models, trained on a dataset comparable in scale (15T+ tokens) to Llama 3.0.
Extended Context Length:
- Increased the maximum context length dramatically to 128,000 tokens (from 8k in Llama 3).
- Demonstrated strong performance on long-context tasks and benchmarks.
State-of-the-Art Performance:
- Achieved leading performance among open models on a wide range of industry benchmarks, becoming competitive with top closed-source models available at the time.

Llama 3.2

Released in September 2024 in 11B & 90B for multimodal and 1B & 3B for edge.

Introduction of Multimodality:
- Launched Llama 3.2 Vision models, marking the series’ expansion into multimodal models.
- These Vision Language Model (VLM) variants can process and interpret both text and image inputs.
Edge AI Focus:
- Also announced smaller Llama 3.2 models designed for on-device and edge computing applications, emphasising efficiency.

Llama 3.3

Released in November 2024 updating the 70B size.

Iterative Refinement:
- Released an updated version of the 70B parameter model (Llama 3.3 70B Instruct).
- Likely incorporates architectural tweaks, data improvements, or refined alignment techniques based on learnings from the 3.1 and 3.2 releases.
- Aimed at further improving performance, potentially in areas like coding, reasoning, or specific instruction following nuances.

References

Papers:

The Llama 3 Herd of Models (arXiv)

Blog Posts:

Introducing Meta Llama 3 (Meta AI)

Meta Llama 3.1: Our most capable models to date, now available (Meta AI)

Bringing Llama 3.2 multimodal and edge AI models to people and businesses (Meta AI)

The future of AI is built with Llama (Meta AI) (Announces Llama 3.3)

Models / Code:

Llama 3 70B Instruct (HF Model Card)

Llama 3.1 405B Instruct (HF Model Card)

Llama 3.2 90B Vision Instruct (HF Model Card)

Llama 3.3 70B Instruct (HF Model Card)

Official Llama Repo (Meta AI)

Hugging Face Transformers Llama Implementation

ML Notes

Explorer

Llama 3

Llama 3

Llama 3.1

Llama 3.2

Llama 3.3

Graph View

Table of Contents

Backlinks