---
title: SurgiSight
emoji: 🔬
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.29.0"
python_version: "3.10"
app_file: app.py
pinned: false
tags:
  - track:backyard
  - track:wood
  - sponsor:modal
  - achievement:welltuned
  - achievement:offbrand
  - achievement:sharing
  - achievement:fieldnotes
---
<div align="center">

<img src="https://img.shields.io/badge/YOLOv26n--seg-Ultralytics-6366f1?style=for-the-badge&logo=python&logoColor=white"/>
<img src="https://img.shields.io/badge/Llama_3.1_8B-Meta_AI-8b5cf6?style=for-the-badge&logo=meta&logoColor=white"/>
<img src="https://img.shields.io/badge/Modal_GPU-T4_Inference-22c55e?style=for-the-badge"/>
<img src="https://img.shields.io/badge/Gradio-HuggingFace_Spaces-f97316?style=for-the-badge&logo=huggingface&logoColor=white"/>
<img src="https://img.shields.io/badge/CholecSeg8k-MICCAI_2020-ef4444?style=for-the-badge"/>

# 🔬 SurgiSight

### Surgical Anatomy AI for Laparoscopic Training

**Real-time danger-zone detection + AI anatomy explanations for surgical trainees.**  
Built for **Build Small Hackathon 2026** — solo project, fully deployed, end-to-end.

---

🚀 **[Watch Demo](https://www.youtube.com/watch?v=Z-jaj31B-ss)**  &nbsp;|&nbsp; 📄 **[Blog](https://huggingface.co/blog/sugan04/surgical-tissue-segmentation)** &nbsp;|&nbsp; 🤗 **[HuggingFace Space](https://huggingface.co/spaces/build-small-hackathon/surgical-tissue-segmentation)**  &nbsp;|&nbsp; **[Social media - LinkedIn](https://www.linkedin.com/posts/sugan-subramanian_ai-machinelearning-medicalai-ugcPost-7469109783028076544-TVeL/?utm_source=share&utm_medium=member_desktop&rcm=ACoAACixJ8kBbDBD81FWoNnyJCVWR4Lrg1EcVv0)** &nbsp;|&nbsp; 📄 **[Agent traces](https://huggingface.co/datasets/sugan04/surgisight-traces)**
HF ID: sugan04
---

</div>

---

##  The Problem

Every year, bile duct injuries occur in roughly **1 in 300 laparoscopic cholecystectomies** (gallbladder removal surgeries). This is the most common serious complication in one of the most frequently performed surgeries in the world (~1.2 million per year in the US alone). Many of these injuries happen because trainees — operating under pressure in a visually complex, blood-filled field — cannot reliably identify critical structures in real time.

Current surgical training relies on:
- **Static textbook diagrams** — no relevance to live video
- **Senior surgeon supervision** — not always available, and creates cognitive load
- **Experience alone** — acquired over years, with real patients

There is no tool that watches the surgical video alongside a trainee and says: *"That's the hepatic vein. Don't touch it."*

**SurgiSight is that tool.**

---

##  The Solution

SurgiSight is an AI assistant for laparoscopic surgical training that:

1. **Segments** any laparoscopic cholecystectomy frame using a fine-tuned YOLOv8n instance segmentation model, identifying 13 surgical structures in real time.
2. **Flags danger zones** automatically — Hepatic Vein, Cystic Duct, and Blood trigger a red alert.
3. **Explains the anatomy** using Meta Llama 3.1 8B, giving the trainee a 3-sentence teaching note grounded in the detected context.
4. **Enables interactive Q&A** — the trainee can ask follow-up questions in natural language ("Why is the cystic duct dangerous here?") and get expert-level answers.
5. **Exports clinical-grade reports** in both PDF and Word format, suitable for case review or portfolio use.
6. **Supports multilingual responses** (English and French), with text-to-speech for each AI reply.

Everything runs in a single Gradio interface, deployed on Hugging Face Spaces, with GPU inference handled by Modal.

---

##  Architecture

```
┌──────────────────────────────────────────────────────────────────┐
│                     HUGGING FACE SPACES                          │
│                      (Gradio Frontend)                           │
│                                                                  │
│   ┌─────────────┐    ┌──────────────────┐    ┌───────────────┐  │
│   │  Image      │    │  Results Card    │    │  AI Chat      │  │
│   │  Upload     │ →  │  (Detections,    │    │  (Llama 3.1   │  │
│   │  + Conf     │    │   Alert, Brief)  │    │   8B via HF)  │  │
│   │  Slider     │    └──────────────────┘    └───────────────┘  │
│   └──────┬──────┘                                               │
│          │ PIL Image bytes                                        │
└──────────┼───────────────────────────────────────────────────────┘
           │
           ▼ modal.Cls.from_name() remote call
┌──────────────────────────────────────────────────────────────────┐
│                        MODAL (GPU T4)                            │
│                                                                  │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │  SurgiSightDetector.run()                               │   │
│   │  ├── Load YOLOv26n-seg weights (CholecSeg8k fine-tune)   │   │
│   │  ├── Run instance segmentation inference                 │   │
│   │  ├── Draw colour-coded masks on annotated frame          │   │
│   │  └── Return: annotated_bytes + detections list           │   │
│   └─────────────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────────────┘
           │
           ▼ annotated_bytes + [{cls_id, conf}]
┌──────────────────────────────────────────────────────────────────┐
│                 GRADIO APP (back to Spaces)                      │
│  ├── Map cls_id → CLASS_NAMES (13 classes)                       │
│  ├── Check DANGER_CLASSES → build alert string                   │
│  ├── Call Llama 3.1 8B via HF InferenceClient → explanation      │
│  └── Render results HTML + chat panel                            │
└──────────────────────────────────────────────────────────────────┘
```

---

##  Core Concepts Explained

### 1. Instance Segmentation vs. Object Detection

Most people know **object detection** — the model draws bounding boxes around objects. Instance segmentation goes further: it draws a **pixel-level mask** for every detected object, letting you see the exact shape and boundary of each structure rather than just a rectangle around it.

In surgery, this matters enormously. A bounding box around "Hepatic Vein" tells you it's somewhere in the frame. A mask tells you *exactly which pixels belong to it* — so you know precisely where not to cut.

SurgiSight uses **YOLOv8n-seg**, the nano (smallest) version of Ultralytics YOLOv8's segmentation model. It was chosen because:
- Nano = fast inference, deployable on modest GPU
- The `seg` variant predicts both bounding boxes **and** segmentation masks
- YOLOv8 is the current industry standard for real-time detection tasks

### 2. Fine-tuning on CholecSeg8k

A generic YOLOv8 model trained on COCO (everyday objects) has never seen laparoscopic video. It would not recognise a Cystic Duct or L-hook electrocautery instrument.

**CholecSeg8k** (Hong et al., MICCAI 2020) is a publicly available dataset of 8,080 annotated frames from laparoscopic cholecystectomy videos, labelled across **13 classes**:

| ID | Class | Risk |
|----|-------|------|
| 0 | Black Background | — |
| 1 | Abdominal Wall | Safe |
| 2 | Liver | Safe |
| 3 | Gastrointestinal Tract | Safe |
| 4 | Fat | Safe |
| 5 | Grasper | Safe |
| 6 | Connective Tissue | Safe |
| 7 | Blood | ⚠️ DANGER |
| 8 | Cystic Duct | ⚠️ DANGER |
| 9 | L-hook Electrocautery | Safe |
| 10 | Gallbladder | Safe |
| 11 | Hepatic Vein | ⚠️ DANGER |
| 12 | Liver Ligament | Safe |

The YOLOv8n-seg model was fine-tuned on this dataset. The resulting model achieves **mAP50 = 0.581** on the validation set — competitive for a nano model on a specialised medical segmentation task.

### 3. What is mAP50?

**Mean Average Precision at IoU threshold 0.50** (mAP50) is the standard metric for object detection and segmentation tasks.

- **IoU (Intersection over Union)**: Measures how much the predicted mask overlaps with the ground truth mask. IoU = 1.0 means perfect overlap; 0.0 means no overlap.
- **Precision at 0.50**: A detection is counted as "correct" if its IoU with the ground truth is ≥ 0.50.
- **Average Precision**: Area under the precision-recall curve for a single class.
- **mAP50**: Mean of AP50 across all classes.

A score of 0.581 means the model is reliably identifying the right structures in the right places more than half the time — meaningful for a 13-class medical task where even expert human annotators disagree on boundaries.

### 4. Modal for GPU Inference

Hugging Face Spaces runs on CPU by default. Running YOLOv8 inference on CPU is slow (~3-5 seconds per frame). SurgiSight offloads all inference to **Modal**, a serverless GPU platform.

The `SurgiSightDetector` class is deployed as a Modal app. When the Gradio frontend calls `detector.run.remote(image_bytes, conf)`, Modal spins up a T4 GPU container, runs the inference, and returns the result — in under 2 seconds. The Spaces app never needs a GPU itself.

This is a key architectural decision: **decouple the UI from compute**, so the demo stays free to host while getting GPU-grade speed.

### 5. Retrieval-Augmented Context for the LLM

Rather than asking Llama 3.1 a generic anatomy question, SurgiSight gives it **grounded context**: the exact list of detected structures and the current safety alert. The system prompt is dynamically constructed per frame:

```
"You are a surgical anatomy teacher for a junior resident.
Detected in a laparoscopic cholecystectomy frame: Liver, Cystic Duct, Grasper, Hepatic Vein.
Safety status: ⚠ DANGER ZONE: Hepatic Vein, Cystic Duct — Extreme caution required.
Answer concisely in 2-4 sentences."
```

This means every LLM response is **frame-specific**, not generic. The AI knows what it's looking at.

### 6. The Critical View of Safety (CVS)

The Critical View of Safety is a surgical standard — before clipping the cystic duct, the surgeon must confirm:
1. The hepatocystic triangle is cleared of fat and fibrous tissue
2. Two and only two structures enter the gallbladder

SurgiSight's DANGER_CLASSES logic is inspired by CVS: the Cystic Duct and Hepatic Vein are flagged because confusing them is how bile duct injuries happen. The system teaches this principle explicitly in AI responses.

---

##  Tech Stack

| Component | Technology | Why |
|-----------|-----------|-----|
| Segmentation model | YOLOv26n-seg (Ultralytics) | Real-time, accurate, SOTA for segmentation |
| Training dataset | CholecSeg8k (MICCAI 2020) | Only public annotated lap-chole dataset |
| GPU inference | Modal (NVIDIA T4) | Serverless, fast, free Spaces compatible |
| LLM | Meta Llama 3.1 8B Instruct via HF | Open-weight, instruction-tuned, free API |
| Frontend | Gradio 5/6 | Rapid ML UI, native HF Spaces support |
| Hosting | Hugging Face Spaces | Free, shareable, no DevOps |
| PDF export | ReportLab | Pure Python, no LaTeX dependencies |
| DOCX export | python-docx | Full Word formatting control |
| TTS | gTTS (Google Text-to-Speech) | Simple, multilingual, no API key |
| Font | Inter (Google Fonts) | Legible, modern, medical-appropriate |

---

##  Project Structure

```
surgisight/
├── app.py                    # Main Gradio application (this repo)
├── modal_inference.py        # Modal GPU deployment (separate deploy)
├── requirements.txt          # Python dependencies
├── examples/
│   ├── frame_80_endo.png     # CholecSeg8k example frame
│   ├── frame_912_endo.png    # CholecSeg8k example frame
│   ├── frame_2176_endo.png   # CholecSeg8k example frame
│   └── frame_939_endo.png    # CholecSeg8k example frame
└── README.md                 # This file
```

---

##  Setup & Deployment

### Prerequisites

```bash
pip install gradio modal ultralytics huggingface-hub Pillow \
            reportlab python-docx gtts
```

### Environment Variables

| Variable | Purpose |
|----------|---------|
| `HF_TOKEN` | Hugging Face token for Llama 3.1 8B Instruct API |
| `MODAL_TOKEN_ID` | Modal authentication (set via `modal token set`) |
| `MODAL_TOKEN_SECRET` | Modal authentication |

### Deploy the Modal GPU Backend

```bash
# Authenticate with Modal
modal token new

# Deploy the inference class
modal deploy modal_inference.py

# The app name must match: modal.Cls.from_name("surgisight", "SurgiSightDetector")
```

### Run Locally

```bash
python app.py
# Opens at http://localhost:7860
```

### Deploy to Hugging Face Spaces

1. Create a new Space (Gradio SDK)
2. Add secrets: `HF_TOKEN`, `MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET`
3. Push this repo — Spaces auto-deploys on push

---

##  How to Use

1. **Upload a surgical frame** — drag and drop any laparoscopic cholecystectomy image, or click one of the four provided example frames (sourced from CholecSeg8k).
2. **Adjust confidence threshold** — the slider (default 0.25) controls how certain the model must be before flagging a structure. Lower = more detections, higher = fewer but more confident.
3. **Click "▶ Run Analysis"** — the model runs on Modal GPU and returns the segmented image with colour-coded masks within ~2 seconds.
4. **Read the Results Panel** — the Safety Alert (green ✓ or red ⚠) and the detected structures with confidence bars appear immediately.
5. **Ask the AI** — the chat panel opens automatically. Use the suggested questions or type your own.
6. **Change language** — use the dropdown to switch to French; all responses (including previous ones) are re-translated.
7. **Export** — click ⬇ PDF or ⬇ Word to download a full clinical-style report including both images, detection table, and anatomy notes.

---

##  Model Performance

| Metric | Value |
|--------|-------|
| Base architecture | YOLOv26n-seg |
| Parameters | ~3.4M |
| Training dataset | CholecSeg8k (8,080 frames, MICCAI 2020) |
| Classes | 13 |
| mAP50 (val) | **0.581** |
| Inference time (T4 GPU) | ~150ms per frame |
| Inference time (CPU) | ~2.5–4s per frame |

---

##  Limitations & Ethics

This is a **research prototype**. It is explicitly **not** a medical device and should never be used in real surgical procedures or to guide clinical decisions.

- The model was trained on 8,080 frames from a limited set of procedures. It may not generalise to all patients, camera angles, or surgical conditions.
- mAP50 of 0.581 means the model makes mistakes — both false positives (flagging safe tissue) and false negatives (missing danger).
- AI anatomy explanations are generated by a general-purpose LLM and are not verified by a medical professional.
- The system is intended purely for **educational simulation and training aid** purposes.

> **DISCLAIMER: Research prototype only. Not a medical device. Contains no real patient data. Built for Build Small Hackathon 2026.**

---

##  References

- **CholecSeg8k**: Hong, W.-Y., et al. *CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80.* MICCAI 2020 Workshop. arXiv:2012.12503.
- **YOLOv8**: Jocher, G., et al. *Ultralytics YOLOv8.* Ultralytics, 2023. https://github.com/ultralytics/ultralytics
- **Llama 3.1**: Meta AI. *The Llama 3 Herd of Models.* arXiv:2407.21783, 2024.
- **Critical View of Safety**: Strasberg, S. M., et al. *An analysis of the problem of biliary injury during laparoscopic cholecystectomy.* Journal of the American College of Surgeons, 1995.
- **Modal**: https://modal.com — Serverless GPU infrastructure.

---

##  Author

Built solo for **Build Small Hackathon 2026**.

---

<div align="center">
  <sub>CholecSeg8k · MICCAI 2020 · No patient data · Research prototype · Build Small Hackathon 2026</sub>
</div>