arxiv:2606.29215

Multi-Block Diffusion Language Models

Published on Jun 30

· Submitted by

Yijie Jin on Jul 1

DENG Lab @ SJTU

Upvote

Authors:

Abstract

Multi-Block Diffusion Language Models extend single-block diffusion to concurrent block decoding with improved training strategies and optimized decoding algorithms.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Block Diffusion Language Models (BD-LMs) improve diffusion-based text generation with KV caching and flexible-length generation. A natural next step is to extend them from Single-Block Diffusion (SingleBD) to Multi-Block Diffusion (MultiBD), where a running-set of consecutive blocks is decoded concurrently for inter-block parallelism. However, existing BD-LMs are mostly trained under teacher forcing, where the model observes only one noisy block conditioned on a clean prefix. While the recent diffusion forcing strategy introduces visibility among multiple noisy blocks, its training states still differ from MultiBD inference, where decoding operates on a bounded running-set with heterogeneous slot-wise noise patterns. To bridge this gap, we propose Multi-Block Diffusion Language Models (MBD-LMs), obtained by post-training BD-LMs with Multi-block Teacher Forcing (MultiTF). MultiTF integrates teacher forcing and diffusion forcing by training on bounded noise-groups conditioned on clean prefixes, with randomized noise-schedulers that better match MultiBD inference states. To make MultiBD practically executable, we further introduce an optimized decoding algorithm based on the Block Buffer mechanism that preserves prefix-cache reuse, keeps input shapes static, and translates increased decoding parallelism into wall-clock acceleration. Empirically, MBD-LLaDA2-Mini increases average Tokens Per Forward pass (TPF) from 3.47 to 6.19 and improves average accuracy from 79.95% to 81.03%; when combined with DMax, MBD-LLaDA2-Mini-DMax reaches an average TPF of 9.34 with only a 1.02% accuracy drop on math and code benchmarks.

View arXiv page View PDF Project page GitHub 22 Add to collection

Community

DrewJin0827

Paper submitter 1 day ago

We introduce Multi-Block Diffusion Language Models (MBD-LMs), a unified framework that bridges the training-inference gap for practical multi-block diffusion in block diffusion language models (BD-LMs). We identify that existing Teacher Forcing and D2F paradigms fail to align with the bounded running-set and heterogeneous slot-wise noise patterns required by Multi-Block Diffusion (MultiBD) inference. To address this, we propose Multi-block Teacher Forcing (MultiTF), a lightweight post-training method that constructs bounded noise groups with randomized chain-uniform scheduling, enabling any BD-LM to upgrade into an MBD-LM. On the inference side, we design the Block Buffer mechanism to decouple dynamic running-sets from static physical shapes, enabling CUDA Graph capture and prefix KV cache reuse. Empirically, MBD-LLaDA2-Mini achieves a 78.4% TPF improvement (3.47 to 6.19) while improving accuracy from 79.95% to 81.03%. Combined with DMax, TPF reaches 9.34 with strong throughput gains. We also release Diffulex, a unified serving engine that supports MBD-LMs and various BD-LM strategies (SingleBD, MultiBD, Dual Cache, DMax, etc.) under a single backend.
Project page: https://sjtu-deng-lab.github.io/mbd-lms/
Training code: https://github.com/SJTU-DENG-Lab/mbd-lms
Inference engine (Diffulex): https://github.com/SJTU-DENG-Lab/Diffulex