You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

๐Ÿ”’ This is a premium gated paid-access model

Access is granted manually after purchase through Ko-fi.

โžก๏ธ Purchase access on Ko-fi

After purchasing, include your Hugging Face username in the Ko-fi purchase message, then click โ€œAgree and send request to access repoโ€ on this Hugging Face page. I will verify the username and manually approve access.

Please allow up to 24 hours for manual approval.


92% fewer refusals (8/100 Uncensored vs 98/100 Original) while preserving model quality (0.0258 KL divergence).

โค๏ธ Support My Work

Creating these models takes significant time, work and compute. If you find them useful consider supporting me:

image/png

Platform Link What you get
๐ŸŽ‰ Patreon Monthly support Priority model requests
โ˜• Ko-fi One-time tip My eternal gratitude

Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.


Read before purchase/download:

These GGUF files require a runtime/backend with MiniMax-M3 GGUF architecture support. They are not guaranteed to work in LM Studio, Ollama, KoboldCpp, Jan, or standard/mainline llama.cpp builds unless those runtimes support the minimax-m3 GGUF architecture.

These GGUFs are text/chat-focused.

Vision/multimodal support is not currently available in these GGUFs.

Sparse attention is not currently supported and may fall back to dense attention.

LM Studio / mainline llama.cpp may fail with unknown model architecture: minimax-m3.

Important GGUF compatibility notice

MiniMax-M3 GGUF support is currently a work in progress in llama.cpp.

These GGUF files are provided as the best currently available conversion based on the present upstream MiniMax-M3 GGUF work. However, the current GGUF implementation has known limitations:

  • Text generation / plain chat is the main supported use case.
  • Vision / multimodal support is not currently available in these GGUFs.
  • Sparse attention is not currently supported and may fall back to dense attention, which can affect speed and memory use.
  • Tool calling may or may not work depending on your runtime, chat template, parser, and exact setup.
  • Compatibility may vary across llama.cpp, Ollama, KoboldCpp, and other GGUF runtimes.

When upstream MiniMax-M3 GGUF support improves, I plan to redo GGUF files then.

Please purchase only if you understand these current limitations. This product is for access to the available GGUF files as-is, not a guarantee that every MiniMax-M3 feature is supported in GGUF today.


GGUF quantization of llmfan46/MiniMax-M3-uncensored-heretic-aggressive.

This is a decensored version of MiniMaxAI/MiniMax-M3, made using Heretic v1.2.0 with the Arbitrary-Rank Ablation (ARA) method

Abliteration parameters

Parameter Value
start_layer_index 14
end_layer_index 51
preserve_good_behavior_weight 0.0847
steer_bad_behavior_weight 0.0002
overcorrect_relative_weight 1.1741
neighbor_count 15

Targeted components

  • attn.o_proj

Performance

Metric This model Original model (MiniMaxAI/MiniMax-M3)
KL divergence 0.0258 0 (by definition)
Refusals โœ… 8/100 โŒ 98/100

Lower refusals indicate fewer content restrictions, while lower KL divergence indicates more closeness to the original model's baseline. Higher refusals cause more rejections, objections, pushbacks, lecturing, censorship, softening and deflections.


Quantizations

Filename Quant Description
MiniMax-M3-uncensored-heretic-aggressive-Q5_K_M.gguf Q5_K_M Good balance
MiniMax-M3-uncensored-heretic-aggressive-Q5_K_S.gguf Q5_K_S Smaller Q5
MiniMax-M3-uncensored-heretic-aggressive-Q4_K_M.gguf Q4_K_M Good for limited VRAM
MiniMax-M3-uncensored-heretic-aggressive-Q4_K_S.gguf Q4_K_S Smaller Q4
MiniMax-M3-uncensored-heretic-aggressive-Q3_K_L.gguf Q3_K_L Low VRAM, decent quality
MiniMax-M3-uncensored-heretic-aggressive-Q3_K_M.gguf Q3_K_M Low VRAM, smaller
MiniMax-M3-uncensored-heretic-aggressive-Q3_K_S.gguf Q3_K_S Very Low VRAM
MiniMax-M3-uncensored-heretic-aggressive-Q2_K.gguf Q2_K Very Very Low VRAM, only use if you have no other options

Usage

Right now works with llama.cpp with PR# 24523, see proof (DOWNLOAD SCREENSHOT PROOF HERE):

image/png

Compatibility notice

These GGUF files require a runtime/backend with MiniMax-M3 GGUF architecture support.

Tested working on my setup:

  • MiniMax-M3-compatible llama.cpp build / build-minimax
  • llama-server
  • llama-ui
  • Tool calling with SearXNG/web search

Expected / likely compatible:

  • Unsloth Studio, if using a current version with MiniMax-M3 support

Not currently supported / not guaranteed:

  • LM Studio
  • Older or mainline llama.cpp builds without MiniMax-M3 support
  • Ollama, KoboldCpp, Jan, or other third-party frontends unless their bundled backend supports the minimax-m3 GGUF architecture

If you see an error such as:

unknown model architecture: 'minimax-m3'

or

failed to load model

This does not mean the GGUF file is broken, it just means that your runtime/backend does not support MiniMax-M3 GGUF yet.

Right now MiniMax-M3 GGUFs should be compatible with: MiniMax-M3 compatible llama.cpp build with PR# 24523.

And with: Unsloth Studio

To decrease the probability of unforseen issues due to outdated versions, be sure to use that latest transformers version (very important, won't work unless you either use 5.12.0 or 5.12.1), the latest CUDA versions (very important, do not use anything lower to avoid unforseen issues: 13.0 or 13.1 or 13.2 or 13.3), the latest PyTorch version (very important, use the latest versions of torch either 2.12.0+cu132 or 2.12.1+cu132 and torchvision either 0.27.0+cu132 or 0.27.1+cu132) and the latest Triton versions (3.6.0 or 3.7.0).

Note:

The sections below describe the original Safetensors format of the MiniMax-M3 model. The current GGUF conversion offered here is text/chat-focused and does not currently provide vision/multimodal support or MiniMax Sparse Attention support.


MiniMax

MiniMax Agent API MiniMax Website
ModelScope MiniMax AI WeChat Discord Hugging Face GitHub arXiv Paper LICENSE

MiniMax-M3 is a native multimodal model with 1M context. It has ~428B parameters and ~23B activated parameters.

Highlights:

  • Native Multimodality: M3 undergoes mixed-modality training from the very first step, enabling deeper semantic fusion across text, image, and video.
  • Context Scaling via Sparse Attention: M3 introduces MiniMax Sparse Attention (MSA) to improve long context efficiency. M3 delivers 9ร— prefill and 15ร— decode speedups compared to M2 at 1M context, reducing per-token compute to 1/20.
  • Coding & Cowork Capability: M3 achieves frontier-level performance across long-horizon agentic benchmarks, excelling in both coding and cowork.

MiniMax Sparse Attention (MSA)

M3 is powered by MiniMax Sparse Attention (MSA), a high-performance sparse attention operator designed for million-token contexts. Compared with GQA, MSA dramatically reduces the attention compute and memory footprint while preserving model quality.

GQA vs MSA Efficiency Comparison

๐Ÿ“„ Read the technical report: arXiv:2606.13392 ยท Hugging Face Papers

How to Use

M3 supports three reasoning modes through the thinking parameter:

  • enabled โ€” Reasoning is always enabled.
  • adaptive โ€” M3 automatically determines when additional reasoning is beneficial.
  • disabled โ€” Reasoning is disabled to minimize latency and maximize throughput.

Local Deployment

Download the model:

hf download MiniMaxAI/MiniMax-M3 --local-dir MiniMax-M3

We recommend the following inference frameworks (listed alphabetically) to serve the model:

Inference Parameters

We recommend the following parameters for best performance: temperature=1.0, top_p=0.95, top_k=40.

Contact Us

Contact us at model@minimax.io.

Downloads last month
29
GGUF
Model size
426B params
Architecture
minimax-m3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for llmfan46/MiniMax-M3-uncensored-heretic-aggressive-compressed-quants-pack-GGUF

Quantized
(26)
this model

Collection including llmfan46/MiniMax-M3-uncensored-heretic-aggressive-compressed-quants-pack-GGUF

Paper for llmfan46/MiniMax-M3-uncensored-heretic-aggressive-compressed-quants-pack-GGUF