Logit soft-capping uses tanh to clamp the logits x between (−t,t) in a smooth and differentiable way: softcap(x)=t⋅tanh(tx) References Gemma 2: Improving Open Language Models at a Practical Size Neural Combinatorial Optimisation with Reinforcement Learning