PaddleOCR-VL-0.9B — CrispEmbed GGUF

CrispEmbed-native GGUF quantizations of PaddlePaddle/PaddleOCR-VL.

End-to-end VLM-based OCR: text recognition, table extraction, formula recognition, chart understanding. 109 languages.

Files

File	Size	Description
`paddleocr-vl-0.9b-q4_k.gguf`	1.3 GB	4-bit K-quant — smallest
`paddleocr-vl-0.9b-q8_0.gguf`	1.4 GB	8-bit quantization — recommended
`paddleocr-vl-0.9b-f16.gguf`	2.3 GB	fp16 reference

Model

Architecture: NaViT-style ViT (27L, 1152d, SigLIP 2D RoPE + learned position embeddings)
- Projector (pre-norm → 2×2 spatial merge → MLP)
- ERNIE-4.5-0.3B LLM decoder (18L, 1024d, 16/2 GQA, MRoPE, SwiGLU)
Parameters: ~0.9B total
Languages: 109 (multilingual)
Tasks: OCR, Table Recognition, Formula Recognition, Chart Recognition
License: Apache 2.0

Usage with CrispEmbed

# OCR
./crispembed -m paddleocr-vl-0.9b-q8_0.gguf --ocr document.png

# With specific prompt
./crispembed -m paddleocr-vl-0.9b-q8_0.gguf --ocr-prompt "Table Recognition:" table.png

Conversion

git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed

python models/convert-paddleocr-vl-to-gguf.py \\
    --model PaddlePaddle/PaddleOCR-VL \\
    --output paddleocr-vl-0.9b-f16.gguf --dtype f16

./build/crispembed-quantize paddleocr-vl-0.9b-f16.gguf paddleocr-vl-0.9b-q8_0.gguf q8_0

License

Apache 2.0 — same as the base model.

Downloads last month: 11

GGUF

Model size

0.9B params

Architecture

qwen2vl

Hardware compatibility

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cstr/paddleocr-vl-0.9b-GGUF

Base model

baidu/ERNIE-4.5-0.3B-Paddle

Finetuned

PaddlePaddle/PaddleOCR-VL

Quantized

(5)

this model