YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Bengali Speaker Diarization - Fine-tuned Segmentation Model
Fine-tuned pyannote/segmentation-3.0 for Bengali speaker diarization.
Results
- Best DER: 0.1312
Training Data (~370 hours)
- Synthetic V4: 600 files (~300 hours) from smam/bengali-diarization-synthetic-v4
- DISPLACE24: 67 files (~35 hours)
- DISPLACE26: 125 files (~35 hours)
Training
- Train samples: 600
- Val samples: 67
- Max epochs: 30 (with early stopping on DER, patience=5)
- Optimizer: AdamW (lr=0.0001, weight_decay=0.01)
- Precision: 16-mixed on H100
Task Configuration
- Duration: 10.0s per chunk
- Max speakers per chunk: 3
- Max speakers per frame: 2 (powerset encoding with overlap support)
Usage
from pyannote.audio import Model
import torch
# Load weights
state_dict = torch.load('pytorch_model.bin')
model = Model.from_pretrained('pyannote/segmentation-3.0')
model.load_state_dict(state_dict)
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support