Abstract
Multi-Block Diffusion Language Models extend single-block diffusion to concurrent block decoding with improved training strategies and optimized decoding algorithms.
Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. A natural next step is to extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a running-set of consecutive blocks is decoded concurrently for inter-block parallelism. However, existing BD-LMs are mostly trained under teacher forcing, where the model observes only one noisy block conditioned on a clean prefix. While the recent diffusion forcing strategy introduces visibility among multiple noisy blocks, its training states still differ from MultiBD inference, where decoding operates on a bounded running-set with heterogeneous slot-wise noise patterns. To bridge this gap, we propose Multi-Block Diffusion Language Models (MBD-LMs), obtained by post-training BD-LMs with Multi-block Teacher Forcing (MultiTF). MultiTF integrates teacher forcing and diffusion forcing by training on bounded noise-groups conditioned on clean prefixes, with randomized noise-schedulers that better match MultiBD inference states. To make MultiBD practically executable, we further introduce an optimized decoding algorithm based on the Block Buffer mechanism that preserves prefix-cache reuse, keeps input shapes static, and translates increased decoding parallelism into wall-clock acceleration. Empirically, MBD-LLaDA2-Mini increases average Tokens Per Forward pass (TPF) from 3.47 to 6.19 and improves average accuracy from 79.95% to 81.03%; when combined with DMax, MBD-LLaDA2-Mini-DMax reaches an average TPF of 9.34 with only a 1.02% accuracy drop on math and code benchmarks.
Community
We introduce Multi-Block Diffusion Language Models (MBD-LMs), a unified framework that bridges the training-inference gap for practical multi-block diffusion in block diffusion language models (BD-LMs). We identify that existing Teacher Forcing and D2F paradigms fail to align with the bounded running-set and heterogeneous slot-wise noise patterns required by Multi-Block Diffusion (MultiBD) inference. To address this, we propose Multi-block Teacher Forcing (MultiTF), a lightweight post-training method that constructs bounded noise groups with randomized chain-uniform scheduling, enabling any BD-LM to upgrade into an MBD-LM. On the inference side, we design the Block Buffer mechanism to decouple dynamic running-sets from static physical shapes, enabling CUDA Graph capture and prefix KV cache reuse. Empirically, MBD-LLaDA2-Mini achieves a 78.4% TPF improvement (3.47 to 6.19) while improving accuracy from 79.95% to 81.03%. Combined with DMax, TPF reaches 9.34 with strong throughput gains. We also release Diffulex, a unified serving engine that supports MBD-LMs and various BD-LM strategies (SingleBD, MultiBD, Dual Cache, DMax, etc.) under a single backend.
Project page: https://sjtu-deng-lab.github.io/mbd-lms/
Training code: https://github.com/SJTU-DENG-Lab/mbd-lms
Inference engine (Diffulex): https://github.com/SJTU-DENG-Lab/Diffulex
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Prefix-Adaptive Block Diffusion for Efficient Document Recognition (2026)
- BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference (2026)
- BlockVLA: Accelerating Autoregressive VLA via Block Diffusion Finetuning (2026)
- DreamReasoner-8B: Block-Size Curriculum Learning for Diffusion Reasoning Models (2026)
- SAID: Accelerating Diffusion-Based Language Models via Scaffold-Aware Iterative Decoding (2026)
- SimSD: Simple Speculative Decoding in Diffusion Language Models (2026)
- AsyncLane: Decoupling Refinement from Advancement in Diffusion Language Model Decoding (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.29215 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 8
SJTU-DENG-Lab/MBD-Math-LLaDA2-mini-DMax-16B
Datasets citing this paper 1
SJTU-DENG-Lab/MBD-LMs-MultiTF-Datasets
Spaces citing this paper 0
No Space linking this paper