The MAMBA product transformer which has a language modeling head on leading (linear layer with weights tied into the input
It starts off having a linear projection to extend upon the input embeddings. Then, a https://k2spiceshop.com/product/liquid-k2-on-paper-online/