---
title: SurgiSight
emoji: ๐ฌ
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: "5.29.0"
python_version: "3.10"
app_file: app.py
pinned: false
tags:
- track:backyard
- track:wood
- sponsor:modal
- achievement:welltuned
- achievement:offbrand
- achievement:sharing
- achievement:fieldnotes
---
---
## The Problem
Every year, bile duct injuries occur in roughly **1 in 300 laparoscopic cholecystectomies** (gallbladder removal surgeries). This is the most common serious complication in one of the most frequently performed surgeries in the world (~1.2 million per year in the US alone). Many of these injuries happen because trainees โ operating under pressure in a visually complex, blood-filled field โ cannot reliably identify critical structures in real time.
Current surgical training relies on:
- **Static textbook diagrams** โ no relevance to live video
- **Senior surgeon supervision** โ not always available, and creates cognitive load
- **Experience alone** โ acquired over years, with real patients
There is no tool that watches the surgical video alongside a trainee and says: *"That's the hepatic vein. Don't touch it."*
**SurgiSight is that tool.**
---
## The Solution
SurgiSight is an AI assistant for laparoscopic surgical training that:
1. **Segments** any laparoscopic cholecystectomy frame using a fine-tuned YOLOv8n instance segmentation model, identifying 13 surgical structures in real time.
2. **Flags danger zones** automatically โ Hepatic Vein, Cystic Duct, and Blood trigger a red alert.
3. **Explains the anatomy** using Meta Llama 3.1 8B, giving the trainee a 3-sentence teaching note grounded in the detected context.
4. **Enables interactive Q&A** โ the trainee can ask follow-up questions in natural language ("Why is the cystic duct dangerous here?") and get expert-level answers.
5. **Exports clinical-grade reports** in both PDF and Word format, suitable for case review or portfolio use.
6. **Supports multilingual responses** (English and French), with text-to-speech for each AI reply.
Everything runs in a single Gradio interface, deployed on Hugging Face Spaces, with GPU inference handled by Modal.
---
## Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HUGGING FACE SPACES โ
โ (Gradio Frontend) โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โ Image โ โ Results Card โ โ AI Chat โ โ
โ โ Upload โ โ โ (Detections, โ โ (Llama 3.1 โ โ
โ โ + Conf โ โ Alert, Brief) โ โ 8B via HF) โ โ
โ โ Slider โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโฌโโโโโโโ โ
โ โ PIL Image bytes โ
โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ modal.Cls.from_name() remote call
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MODAL (GPU T4) โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ SurgiSightDetector.run() โ โ
โ โ โโโ Load YOLOv26n-seg weights (CholecSeg8k fine-tune) โ โ
โ โ โโโ Run instance segmentation inference โ โ
โ โ โโโ Draw colour-coded masks on annotated frame โ โ
โ โ โโโ Return: annotated_bytes + detections list โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ annotated_bytes + [{cls_id, conf}]
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ GRADIO APP (back to Spaces) โ
โ โโโ Map cls_id โ CLASS_NAMES (13 classes) โ
โ โโโ Check DANGER_CLASSES โ build alert string โ
โ โโโ Call Llama 3.1 8B via HF InferenceClient โ explanation โ
โ โโโ Render results HTML + chat panel โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
---
## Core Concepts Explained
### 1. Instance Segmentation vs. Object Detection
Most people know **object detection** โ the model draws bounding boxes around objects. Instance segmentation goes further: it draws a **pixel-level mask** for every detected object, letting you see the exact shape and boundary of each structure rather than just a rectangle around it.
In surgery, this matters enormously. A bounding box around "Hepatic Vein" tells you it's somewhere in the frame. A mask tells you *exactly which pixels belong to it* โ so you know precisely where not to cut.
SurgiSight uses **YOLOv8n-seg**, the nano (smallest) version of Ultralytics YOLOv8's segmentation model. It was chosen because:
- Nano = fast inference, deployable on modest GPU
- The `seg` variant predicts both bounding boxes **and** segmentation masks
- YOLOv8 is the current industry standard for real-time detection tasks
### 2. Fine-tuning on CholecSeg8k
A generic YOLOv8 model trained on COCO (everyday objects) has never seen laparoscopic video. It would not recognise a Cystic Duct or L-hook electrocautery instrument.
**CholecSeg8k** (Hong et al., MICCAI 2020) is a publicly available dataset of 8,080 annotated frames from laparoscopic cholecystectomy videos, labelled across **13 classes**:
| ID | Class | Risk |
|----|-------|------|
| 0 | Black Background | โ |
| 1 | Abdominal Wall | Safe |
| 2 | Liver | Safe |
| 3 | Gastrointestinal Tract | Safe |
| 4 | Fat | Safe |
| 5 | Grasper | Safe |
| 6 | Connective Tissue | Safe |
| 7 | Blood | โ ๏ธ DANGER |
| 8 | Cystic Duct | โ ๏ธ DANGER |
| 9 | L-hook Electrocautery | Safe |
| 10 | Gallbladder | Safe |
| 11 | Hepatic Vein | โ ๏ธ DANGER |
| 12 | Liver Ligament | Safe |
The YOLOv8n-seg model was fine-tuned on this dataset. The resulting model achieves **mAP50 = 0.581** on the validation set โ competitive for a nano model on a specialised medical segmentation task.
### 3. What is mAP50?
**Mean Average Precision at IoU threshold 0.50** (mAP50) is the standard metric for object detection and segmentation tasks.
- **IoU (Intersection over Union)**: Measures how much the predicted mask overlaps with the ground truth mask. IoU = 1.0 means perfect overlap; 0.0 means no overlap.
- **Precision at 0.50**: A detection is counted as "correct" if its IoU with the ground truth is โฅ 0.50.
- **Average Precision**: Area under the precision-recall curve for a single class.
- **mAP50**: Mean of AP50 across all classes.
A score of 0.581 means the model is reliably identifying the right structures in the right places more than half the time โ meaningful for a 13-class medical task where even expert human annotators disagree on boundaries.
### 4. Modal for GPU Inference
Hugging Face Spaces runs on CPU by default. Running YOLOv8 inference on CPU is slow (~3-5 seconds per frame). SurgiSight offloads all inference to **Modal**, a serverless GPU platform.
The `SurgiSightDetector` class is deployed as a Modal app. When the Gradio frontend calls `detector.run.remote(image_bytes, conf)`, Modal spins up a T4 GPU container, runs the inference, and returns the result โ in under 2 seconds. The Spaces app never needs a GPU itself.
This is a key architectural decision: **decouple the UI from compute**, so the demo stays free to host while getting GPU-grade speed.
### 5. Retrieval-Augmented Context for the LLM
Rather than asking Llama 3.1 a generic anatomy question, SurgiSight gives it **grounded context**: the exact list of detected structures and the current safety alert. The system prompt is dynamically constructed per frame:
```
"You are a surgical anatomy teacher for a junior resident.
Detected in a laparoscopic cholecystectomy frame: Liver, Cystic Duct, Grasper, Hepatic Vein.
Safety status: โ DANGER ZONE: Hepatic Vein, Cystic Duct โ Extreme caution required.
Answer concisely in 2-4 sentences."
```
This means every LLM response is **frame-specific**, not generic. The AI knows what it's looking at.
### 6. The Critical View of Safety (CVS)
The Critical View of Safety is a surgical standard โ before clipping the cystic duct, the surgeon must confirm:
1. The hepatocystic triangle is cleared of fat and fibrous tissue
2. Two and only two structures enter the gallbladder
SurgiSight's DANGER_CLASSES logic is inspired by CVS: the Cystic Duct and Hepatic Vein are flagged because confusing them is how bile duct injuries happen. The system teaches this principle explicitly in AI responses.
---
## Tech Stack
| Component | Technology | Why |
|-----------|-----------|-----|
| Segmentation model | YOLOv26n-seg (Ultralytics) | Real-time, accurate, SOTA for segmentation |
| Training dataset | CholecSeg8k (MICCAI 2020) | Only public annotated lap-chole dataset |
| GPU inference | Modal (NVIDIA T4) | Serverless, fast, free Spaces compatible |
| LLM | Meta Llama 3.1 8B Instruct via HF | Open-weight, instruction-tuned, free API |
| Frontend | Gradio 5/6 | Rapid ML UI, native HF Spaces support |
| Hosting | Hugging Face Spaces | Free, shareable, no DevOps |
| PDF export | ReportLab | Pure Python, no LaTeX dependencies |
| DOCX export | python-docx | Full Word formatting control |
| TTS | gTTS (Google Text-to-Speech) | Simple, multilingual, no API key |
| Font | Inter (Google Fonts) | Legible, modern, medical-appropriate |
---
## Project Structure
```
surgisight/
โโโ app.py # Main Gradio application (this repo)
โโโ modal_inference.py # Modal GPU deployment (separate deploy)
โโโ requirements.txt # Python dependencies
โโโ examples/
โ โโโ frame_80_endo.png # CholecSeg8k example frame
โ โโโ frame_912_endo.png # CholecSeg8k example frame
โ โโโ frame_2176_endo.png # CholecSeg8k example frame
โ โโโ frame_939_endo.png # CholecSeg8k example frame
โโโ README.md # This file
```
---
## Setup & Deployment
### Prerequisites
```bash
pip install gradio modal ultralytics huggingface-hub Pillow \
reportlab python-docx gtts
```
### Environment Variables
| Variable | Purpose |
|----------|---------|
| `HF_TOKEN` | Hugging Face token for Llama 3.1 8B Instruct API |
| `MODAL_TOKEN_ID` | Modal authentication (set via `modal token set`) |
| `MODAL_TOKEN_SECRET` | Modal authentication |
### Deploy the Modal GPU Backend
```bash
# Authenticate with Modal
modal token new
# Deploy the inference class
modal deploy modal_inference.py
# The app name must match: modal.Cls.from_name("surgisight", "SurgiSightDetector")
```
### Run Locally
```bash
python app.py
# Opens at http://localhost:7860
```
### Deploy to Hugging Face Spaces
1. Create a new Space (Gradio SDK)
2. Add secrets: `HF_TOKEN`, `MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET`
3. Push this repo โ Spaces auto-deploys on push
---
## How to Use
1. **Upload a surgical frame** โ drag and drop any laparoscopic cholecystectomy image, or click one of the four provided example frames (sourced from CholecSeg8k).
2. **Adjust confidence threshold** โ the slider (default 0.25) controls how certain the model must be before flagging a structure. Lower = more detections, higher = fewer but more confident.
3. **Click "โถ Run Analysis"** โ the model runs on Modal GPU and returns the segmented image with colour-coded masks within ~2 seconds.
4. **Read the Results Panel** โ the Safety Alert (green โ or red โ ) and the detected structures with confidence bars appear immediately.
5. **Ask the AI** โ the chat panel opens automatically. Use the suggested questions or type your own.
6. **Change language** โ use the dropdown to switch to French; all responses (including previous ones) are re-translated.
7. **Export** โ click โฌ PDF or โฌ Word to download a full clinical-style report including both images, detection table, and anatomy notes.
---
## Model Performance
| Metric | Value |
|--------|-------|
| Base architecture | YOLOv26n-seg |
| Parameters | ~3.4M |
| Training dataset | CholecSeg8k (8,080 frames, MICCAI 2020) |
| Classes | 13 |
| mAP50 (val) | **0.581** |
| Inference time (T4 GPU) | ~150ms per frame |
| Inference time (CPU) | ~2.5โ4s per frame |
---
## Limitations & Ethics
This is a **research prototype**. It is explicitly **not** a medical device and should never be used in real surgical procedures or to guide clinical decisions.
- The model was trained on 8,080 frames from a limited set of procedures. It may not generalise to all patients, camera angles, or surgical conditions.
- mAP50 of 0.581 means the model makes mistakes โ both false positives (flagging safe tissue) and false negatives (missing danger).
- AI anatomy explanations are generated by a general-purpose LLM and are not verified by a medical professional.
- The system is intended purely for **educational simulation and training aid** purposes.
> **DISCLAIMER: Research prototype only. Not a medical device. Contains no real patient data. Built for Build Small Hackathon 2026.**
---
## References
- **CholecSeg8k**: Hong, W.-Y., et al. *CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80.* MICCAI 2020 Workshop. arXiv:2012.12503.
- **YOLOv8**: Jocher, G., et al. *Ultralytics YOLOv8.* Ultralytics, 2023. https://github.com/ultralytics/ultralytics
- **Llama 3.1**: Meta AI. *The Llama 3 Herd of Models.* arXiv:2407.21783, 2024.
- **Critical View of Safety**: Strasberg, S. M., et al. *An analysis of the problem of biliary injury during laparoscopic cholecystectomy.* Journal of the American College of Surgeons, 1995.
- **Modal**: https://modal.com โ Serverless GPU infrastructure.
---
## Author
Built solo for **Build Small Hackathon 2026**.
---
CholecSeg8k ยท MICCAI 2020 ยท No patient data ยท Research prototype ยท Build Small Hackathon 2026