Two flagship models were introduced as part of the collaboration: Mistral 7B and Mistral 8x7B. The former is a dense transformer model trained with 8k context length and seven billion parameters.
Mistral will discontinue Apache models (Mistral 7B, Mistral 8x7B and 8x22B, Codestral Mamba, Mathstral) in the future. Microsoft and Mistral already had a partnership to make Mistral models ...
Mistral’s new models both outperform Mixtral-8x7B, an older open-sourced model that was seen as a promising development for the open-source community. Until Mixtral-8x7b, open-source models ...