PaddleOCR-VL-0.9B β€” CrispEmbed GGUF

CrispEmbed-native GGUF quantizations of PaddlePaddle/PaddleOCR-VL.

End-to-end VLM-based OCR: text recognition, table extraction, formula recognition, chart understanding. 109 languages.

Files

File Size Description
paddleocr-vl-0.9b-q4_k.gguf 1.3 GB 4-bit K-quant β€” smallest
paddleocr-vl-0.9b-q8_0.gguf 1.4 GB 8-bit quantization β€” recommended
paddleocr-vl-0.9b-f16.gguf 2.3 GB fp16 reference

Model

  • Architecture: NaViT-style ViT (27L, 1152d, SigLIP 2D RoPE + learned position embeddings)
    • Projector (pre-norm β†’ 2Γ—2 spatial merge β†’ MLP)
    • ERNIE-4.5-0.3B LLM decoder (18L, 1024d, 16/2 GQA, MRoPE, SwiGLU)
  • Parameters: ~0.9B total
  • Languages: 109 (multilingual)
  • Tasks: OCR, Table Recognition, Formula Recognition, Chart Recognition
  • License: Apache 2.0

Usage with CrispEmbed

# OCR
./crispembed -m paddleocr-vl-0.9b-q8_0.gguf --ocr document.png

# With specific prompt
./crispembed -m paddleocr-vl-0.9b-q8_0.gguf --ocr-prompt "Table Recognition:" table.png

Conversion

git clone https://github.com/CrispStrobe/CrispEmbed
cd CrispEmbed

python models/convert-paddleocr-vl-to-gguf.py \\
    --model PaddlePaddle/PaddleOCR-VL \\
    --output paddleocr-vl-0.9b-f16.gguf --dtype f16

./build/crispembed-quantize paddleocr-vl-0.9b-f16.gguf paddleocr-vl-0.9b-q8_0.gguf q8_0

License

Apache 2.0 β€” same as the base model.

Downloads last month
11
GGUF
Model size
0.9B params
Architecture
qwen2vl
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cstr/paddleocr-vl-0.9b-GGUF

Quantized
(5)
this model