SwiGLU is an activation function which is a variant of GLU. The definition is as follows: SwiGLU(x,W,V,b,c,β)=Swishβ(xW+b)⊗(xV+c) References Papers With Code