The Llama 3 series, released starting in April 2024, represents Meta AI’s third generation of open-access Large Language Models. This series marked significant advancements over Llama 2, including massively scaled pre-training data, architectural improvements like a new tokeniser, enhanced alignment techniques, the introduction of very large models, long context windows, and multimodal capabilities.

Llama 3

Released in April 2024 in 8B & 70B sizes.

  • Massively Scaled Pre-training Data:
    • Pre-trained on over 15 trillion tokens of data, a significant increase from Llama-2’s 2 trillion tokens.
    • Data sourced from publicly available online sources, with extensive filtering and curation efforts. Included a larger proportion of non-English data (over 5%) and code compared to Llama-2, aiming for improved multilingualism and reasoning/coding skills.
  • New Tokeniser:
    • Introduced a new Tokeniser using Byte Pair Encoding (BPE) with a 128,000 token vocabulary (up from Llama 2’s 32,000).
    • This larger vocabulary significantly improves tokenisation efficiency, leading to better language encoding and potentially better performance, especially for multilingual contexts.
  • Architectural Refinements:
    • Standardised the use of GQA across both the 8B and 70B models (previously only used in Llama-2 70B). This improves inference efficiency.
    • Models were released with an 8k token context length.
  • Improved Instruction Following and Alignment:
    • Utilised a combination of SFT, rejection sampling, PPO, and DPO for post-training alignment. The paper details enhancements to these methods for better instruction following and model behaviour.
    • Showed significantly improved performance on benchmarks measuring reasoning, coding, and instruction following compared to Llama-2 and other contemporary open models.
  • Enhanced Safety and Trust Features:

Llama 3.1

Released in July 2024 in 405B size.

  • Massive Model Scale:
    • Introduced a 405 billion parameter model, significantly larger than previous Llama models, trained on a dataset comparable in scale (15T+ tokens) to Llama 3.0.
  • Extended Context Length:
    • Increased the maximum context length dramatically to 128,000 tokens (from 8k in Llama 3).
    • Demonstrated strong performance on long-context tasks and benchmarks.
  • State-of-the-Art Performance:
    • Achieved leading performance among open models on a wide range of industry benchmarks, becoming competitive with top closed-source models available at the time.

Llama 3.2

Released in September 2024 in 11B & 90B for multimodal and 1B & 3B for edge.

  • Introduction of Multimodality:
    • Launched Llama 3.2 Vision models, marking the series’ expansion into multimodal models.
    • These Vision Language Model (VLM) variants can process and interpret both text and image inputs.
  • Edge AI Focus:
    • Also announced smaller Llama 3.2 models designed for on-device and edge computing applications, emphasising efficiency.

Llama 3.3

Released in November 2024 updating the 70B size.

  • Iterative Refinement:
    • Released an updated version of the 70B parameter model (Llama 3.3 70B Instruct).
    • Likely incorporates architectural tweaks, data improvements, or refined alignment techniques based on learnings from the 3.1 and 3.2 releases.
    • Aimed at further improving performance, potentially in areas like coding, reasoning, or specific instruction following nuances.

References