YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Bengali Speaker Diarization - Fine-tuned Segmentation Model

Fine-tuned pyannote/segmentation-3.0 for Bengali speaker diarization.

Results

  • Best DER: 0.1312

Training Data (~370 hours)

  • Synthetic V4: 600 files (~300 hours) from smam/bengali-diarization-synthetic-v4
  • DISPLACE24: 67 files (~35 hours)
  • DISPLACE26: 125 files (~35 hours)

Training

  • Train samples: 600
  • Val samples: 67
  • Max epochs: 30 (with early stopping on DER, patience=5)
  • Optimizer: AdamW (lr=0.0001, weight_decay=0.01)
  • Precision: 16-mixed on H100

Task Configuration

  • Duration: 10.0s per chunk
  • Max speakers per chunk: 3
  • Max speakers per frame: 2 (powerset encoding with overlap support)

Usage

from pyannote.audio import Model
import torch

# Load weights
state_dict = torch.load('pytorch_model.bin')
model = Model.from_pretrained('pyannote/segmentation-3.0')
model.load_state_dict(state_dict)
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support