Text Generation
Safetensors
English
Chinese
mimo_v2
agent
long-context
code
conversational
custom_code
Eval Results
fp8
Instructions to use XiaomiMiMo/MiMo-V2.5-Pro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Inference
- HuggingChat
Vision?
#16
by erichartford - opened
With tough competition like Kimi-K2.7-Code, MiniMax M3, and Nex N2 Pro all offering Vision, it's hard to pick a non-vision model.
MiMo-V2.5 is a multimodal LLM which supports picture, video and audio input while Mimo-V2.5-pro is purely text based...
One solution is OCR the picture with mimo-v2.5 and process with mimo-v2.5-pro
That's precisely what I'm suggesting - that mimo v2.5 pro should have had a ViT.
It's a relatively small feature addition that has a huge impact in usefulness for many use cases.