SmolDocling-256M GGUF
GGUF conversions of ds4sd/SmolDocling-256M-preview for CrispEmbed inference.
Ultra-compact document conversion model (256M params). Generates DocTags structured markup from page images โ OCR, layout, tables, formulas, code, charts.
Model variants
| File | Quant | Size | Notes |
|---|---|---|---|
smoldocling-f16.gguf |
F16 | 491 MB | Full precision |
smoldocling-q8_0.gguf |
Q8_0 | 261 MB | Recommended |
smoldocling-q4_k.gguf |
Q4_K | 153 MB | Max compression |
Architecture
- Vision: SigLIP ViT (12L, 768d, 12 heads, patch=16, 512px)
- Connector: Pixel shuffle (scale=4, 1024โ64 tokens) + Linear(12288โ576)
- LLM: SmolLM2-135M (30L, 576d, GQA 9/3, SwiGLU, RoPE)
- Parameters: 256M total (93M vision + 135M LLM + connector)
- Output: DocTags (structured XML-like document markup)
Parity vs HF reference: vision cos=0.9998, connector cos=0.9999.
Usage
# CLI
./crispembed -m smoldocling-q8_0.gguf --ocr document.png
# Server
./crispembed-server --ocr smoldocling-q8_0.gguf --port 8080
curl -X POST http://localhost:8080/math/ocr -F "image=@document.png"
from crispembed import CrispMathOcr
ocr = CrispMathOcr("smoldocling-q8_0.gguf")
doctags = ocr.recognize("document.png")
print(doctags) # <doctag><text>...</text>...</doctag>
License
Apache-2.0 โ same as the base model.
Credits
Original model by Docling Team, IBM Research. GGUF conversion and inference engine by CrispEmbed.
- Downloads last month
- -
Hardware compatibility
Log In to add your hardware
8-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for cstr/smoldocling-GGUF
Base model
HuggingFaceTB/SmolLM2-135M Quantized
HuggingFaceTB/SmolLM2-135M-Instruct Quantized
HuggingFaceTB/SmolVLM-256M-Instruct Quantized
docling-project/SmolDocling-256M-preview