Hierarchical Softmax is an alternative to softmax that is faster to evaluate: it is time to evaluate compared to for softmax. It utilises a multi-layer binary tree, where the probability of a word is calculated through the product of probabilities on each edge on the path to that node. See the image below for an example of where the product calculation would occur for the word “I’m”:

References