--- title: SurgiSight emoji: ๐Ÿ”ฌ colorFrom: indigo colorTo: purple sdk: gradio sdk_version: "5.29.0" python_version: "3.10" app_file: app.py pinned: false tags: - track:backyard - track:wood - sponsor:modal - achievement:welltuned - achievement:offbrand - achievement:sharing - achievement:fieldnotes ---
# ๐Ÿ”ฌ SurgiSight ### Surgical Anatomy AI for Laparoscopic Training **Real-time danger-zone detection + AI anatomy explanations for surgical trainees.** Built for **Build Small Hackathon 2026** โ€” solo project, fully deployed, end-to-end. --- ๐Ÿš€ **[Watch Demo](https://www.youtube.com/watch?v=Z-jaj31B-ss)**  |  ๐Ÿ“„ **[Blog](https://huggingface.co/blog/sugan04/surgical-tissue-segmentation)**  |  ๐Ÿค— **[HuggingFace Space](https://huggingface.co/spaces/build-small-hackathon/surgical-tissue-segmentation)**  |  **[Social media - LinkedIn](https://www.linkedin.com/posts/sugan-subramanian_ai-machinelearning-medicalai-ugcPost-7469109783028076544-TVeL/?utm_source=share&utm_medium=member_desktop&rcm=ACoAACixJ8kBbDBD81FWoNnyJCVWR4Lrg1EcVv0)**  |  ๐Ÿ“„ **[Agent traces](https://huggingface.co/datasets/sugan04/surgisight-traces)** HF ID: sugan04 ---
--- ## The Problem Every year, bile duct injuries occur in roughly **1 in 300 laparoscopic cholecystectomies** (gallbladder removal surgeries). This is the most common serious complication in one of the most frequently performed surgeries in the world (~1.2 million per year in the US alone). Many of these injuries happen because trainees โ€” operating under pressure in a visually complex, blood-filled field โ€” cannot reliably identify critical structures in real time. Current surgical training relies on: - **Static textbook diagrams** โ€” no relevance to live video - **Senior surgeon supervision** โ€” not always available, and creates cognitive load - **Experience alone** โ€” acquired over years, with real patients There is no tool that watches the surgical video alongside a trainee and says: *"That's the hepatic vein. Don't touch it."* **SurgiSight is that tool.** --- ## The Solution SurgiSight is an AI assistant for laparoscopic surgical training that: 1. **Segments** any laparoscopic cholecystectomy frame using a fine-tuned YOLOv8n instance segmentation model, identifying 13 surgical structures in real time. 2. **Flags danger zones** automatically โ€” Hepatic Vein, Cystic Duct, and Blood trigger a red alert. 3. **Explains the anatomy** using Meta Llama 3.1 8B, giving the trainee a 3-sentence teaching note grounded in the detected context. 4. **Enables interactive Q&A** โ€” the trainee can ask follow-up questions in natural language ("Why is the cystic duct dangerous here?") and get expert-level answers. 5. **Exports clinical-grade reports** in both PDF and Word format, suitable for case review or portfolio use. 6. **Supports multilingual responses** (English and French), with text-to-speech for each AI reply. Everything runs in a single Gradio interface, deployed on Hugging Face Spaces, with GPU inference handled by Modal. --- ## Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ HUGGING FACE SPACES โ”‚ โ”‚ (Gradio Frontend) โ”‚ โ”‚ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Image โ”‚ โ”‚ Results Card โ”‚ โ”‚ AI Chat โ”‚ โ”‚ โ”‚ โ”‚ Upload โ”‚ โ†’ โ”‚ (Detections, โ”‚ โ”‚ (Llama 3.1 โ”‚ โ”‚ โ”‚ โ”‚ + Conf โ”‚ โ”‚ Alert, Brief) โ”‚ โ”‚ 8B via HF) โ”‚ โ”‚ โ”‚ โ”‚ Slider โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”‚ PIL Image bytes โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ modal.Cls.from_name() remote call โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ MODAL (GPU T4) โ”‚ โ”‚ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ SurgiSightDetector.run() โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ Load YOLOv26n-seg weights (CholecSeg8k fine-tune) โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ Run instance segmentation inference โ”‚ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ Draw colour-coded masks on annotated frame โ”‚ โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ Return: annotated_bytes + detections list โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ annotated_bytes + [{cls_id, conf}] โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ GRADIO APP (back to Spaces) โ”‚ โ”‚ โ”œโ”€โ”€ Map cls_id โ†’ CLASS_NAMES (13 classes) โ”‚ โ”‚ โ”œโ”€โ”€ Check DANGER_CLASSES โ†’ build alert string โ”‚ โ”‚ โ”œโ”€โ”€ Call Llama 3.1 8B via HF InferenceClient โ†’ explanation โ”‚ โ”‚ โ””โ”€โ”€ Render results HTML + chat panel โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` --- ## Core Concepts Explained ### 1. Instance Segmentation vs. Object Detection Most people know **object detection** โ€” the model draws bounding boxes around objects. Instance segmentation goes further: it draws a **pixel-level mask** for every detected object, letting you see the exact shape and boundary of each structure rather than just a rectangle around it. In surgery, this matters enormously. A bounding box around "Hepatic Vein" tells you it's somewhere in the frame. A mask tells you *exactly which pixels belong to it* โ€” so you know precisely where not to cut. SurgiSight uses **YOLOv8n-seg**, the nano (smallest) version of Ultralytics YOLOv8's segmentation model. It was chosen because: - Nano = fast inference, deployable on modest GPU - The `seg` variant predicts both bounding boxes **and** segmentation masks - YOLOv8 is the current industry standard for real-time detection tasks ### 2. Fine-tuning on CholecSeg8k A generic YOLOv8 model trained on COCO (everyday objects) has never seen laparoscopic video. It would not recognise a Cystic Duct or L-hook electrocautery instrument. **CholecSeg8k** (Hong et al., MICCAI 2020) is a publicly available dataset of 8,080 annotated frames from laparoscopic cholecystectomy videos, labelled across **13 classes**: | ID | Class | Risk | |----|-------|------| | 0 | Black Background | โ€” | | 1 | Abdominal Wall | Safe | | 2 | Liver | Safe | | 3 | Gastrointestinal Tract | Safe | | 4 | Fat | Safe | | 5 | Grasper | Safe | | 6 | Connective Tissue | Safe | | 7 | Blood | โš ๏ธ DANGER | | 8 | Cystic Duct | โš ๏ธ DANGER | | 9 | L-hook Electrocautery | Safe | | 10 | Gallbladder | Safe | | 11 | Hepatic Vein | โš ๏ธ DANGER | | 12 | Liver Ligament | Safe | The YOLOv8n-seg model was fine-tuned on this dataset. The resulting model achieves **mAP50 = 0.581** on the validation set โ€” competitive for a nano model on a specialised medical segmentation task. ### 3. What is mAP50? **Mean Average Precision at IoU threshold 0.50** (mAP50) is the standard metric for object detection and segmentation tasks. - **IoU (Intersection over Union)**: Measures how much the predicted mask overlaps with the ground truth mask. IoU = 1.0 means perfect overlap; 0.0 means no overlap. - **Precision at 0.50**: A detection is counted as "correct" if its IoU with the ground truth is โ‰ฅ 0.50. - **Average Precision**: Area under the precision-recall curve for a single class. - **mAP50**: Mean of AP50 across all classes. A score of 0.581 means the model is reliably identifying the right structures in the right places more than half the time โ€” meaningful for a 13-class medical task where even expert human annotators disagree on boundaries. ### 4. Modal for GPU Inference Hugging Face Spaces runs on CPU by default. Running YOLOv8 inference on CPU is slow (~3-5 seconds per frame). SurgiSight offloads all inference to **Modal**, a serverless GPU platform. The `SurgiSightDetector` class is deployed as a Modal app. When the Gradio frontend calls `detector.run.remote(image_bytes, conf)`, Modal spins up a T4 GPU container, runs the inference, and returns the result โ€” in under 2 seconds. The Spaces app never needs a GPU itself. This is a key architectural decision: **decouple the UI from compute**, so the demo stays free to host while getting GPU-grade speed. ### 5. Retrieval-Augmented Context for the LLM Rather than asking Llama 3.1 a generic anatomy question, SurgiSight gives it **grounded context**: the exact list of detected structures and the current safety alert. The system prompt is dynamically constructed per frame: ``` "You are a surgical anatomy teacher for a junior resident. Detected in a laparoscopic cholecystectomy frame: Liver, Cystic Duct, Grasper, Hepatic Vein. Safety status: โš  DANGER ZONE: Hepatic Vein, Cystic Duct โ€” Extreme caution required. Answer concisely in 2-4 sentences." ``` This means every LLM response is **frame-specific**, not generic. The AI knows what it's looking at. ### 6. The Critical View of Safety (CVS) The Critical View of Safety is a surgical standard โ€” before clipping the cystic duct, the surgeon must confirm: 1. The hepatocystic triangle is cleared of fat and fibrous tissue 2. Two and only two structures enter the gallbladder SurgiSight's DANGER_CLASSES logic is inspired by CVS: the Cystic Duct and Hepatic Vein are flagged because confusing them is how bile duct injuries happen. The system teaches this principle explicitly in AI responses. --- ## Tech Stack | Component | Technology | Why | |-----------|-----------|-----| | Segmentation model | YOLOv26n-seg (Ultralytics) | Real-time, accurate, SOTA for segmentation | | Training dataset | CholecSeg8k (MICCAI 2020) | Only public annotated lap-chole dataset | | GPU inference | Modal (NVIDIA T4) | Serverless, fast, free Spaces compatible | | LLM | Meta Llama 3.1 8B Instruct via HF | Open-weight, instruction-tuned, free API | | Frontend | Gradio 5/6 | Rapid ML UI, native HF Spaces support | | Hosting | Hugging Face Spaces | Free, shareable, no DevOps | | PDF export | ReportLab | Pure Python, no LaTeX dependencies | | DOCX export | python-docx | Full Word formatting control | | TTS | gTTS (Google Text-to-Speech) | Simple, multilingual, no API key | | Font | Inter (Google Fonts) | Legible, modern, medical-appropriate | --- ## Project Structure ``` surgisight/ โ”œโ”€โ”€ app.py # Main Gradio application (this repo) โ”œโ”€โ”€ modal_inference.py # Modal GPU deployment (separate deploy) โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ examples/ โ”‚ โ”œโ”€โ”€ frame_80_endo.png # CholecSeg8k example frame โ”‚ โ”œโ”€โ”€ frame_912_endo.png # CholecSeg8k example frame โ”‚ โ”œโ”€โ”€ frame_2176_endo.png # CholecSeg8k example frame โ”‚ โ””โ”€โ”€ frame_939_endo.png # CholecSeg8k example frame โ””โ”€โ”€ README.md # This file ``` --- ## Setup & Deployment ### Prerequisites ```bash pip install gradio modal ultralytics huggingface-hub Pillow \ reportlab python-docx gtts ``` ### Environment Variables | Variable | Purpose | |----------|---------| | `HF_TOKEN` | Hugging Face token for Llama 3.1 8B Instruct API | | `MODAL_TOKEN_ID` | Modal authentication (set via `modal token set`) | | `MODAL_TOKEN_SECRET` | Modal authentication | ### Deploy the Modal GPU Backend ```bash # Authenticate with Modal modal token new # Deploy the inference class modal deploy modal_inference.py # The app name must match: modal.Cls.from_name("surgisight", "SurgiSightDetector") ``` ### Run Locally ```bash python app.py # Opens at http://localhost:7860 ``` ### Deploy to Hugging Face Spaces 1. Create a new Space (Gradio SDK) 2. Add secrets: `HF_TOKEN`, `MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET` 3. Push this repo โ€” Spaces auto-deploys on push --- ## How to Use 1. **Upload a surgical frame** โ€” drag and drop any laparoscopic cholecystectomy image, or click one of the four provided example frames (sourced from CholecSeg8k). 2. **Adjust confidence threshold** โ€” the slider (default 0.25) controls how certain the model must be before flagging a structure. Lower = more detections, higher = fewer but more confident. 3. **Click "โ–ถ Run Analysis"** โ€” the model runs on Modal GPU and returns the segmented image with colour-coded masks within ~2 seconds. 4. **Read the Results Panel** โ€” the Safety Alert (green โœ“ or red โš ) and the detected structures with confidence bars appear immediately. 5. **Ask the AI** โ€” the chat panel opens automatically. Use the suggested questions or type your own. 6. **Change language** โ€” use the dropdown to switch to French; all responses (including previous ones) are re-translated. 7. **Export** โ€” click โฌ‡ PDF or โฌ‡ Word to download a full clinical-style report including both images, detection table, and anatomy notes. --- ## Model Performance | Metric | Value | |--------|-------| | Base architecture | YOLOv26n-seg | | Parameters | ~3.4M | | Training dataset | CholecSeg8k (8,080 frames, MICCAI 2020) | | Classes | 13 | | mAP50 (val) | **0.581** | | Inference time (T4 GPU) | ~150ms per frame | | Inference time (CPU) | ~2.5โ€“4s per frame | --- ## Limitations & Ethics This is a **research prototype**. It is explicitly **not** a medical device and should never be used in real surgical procedures or to guide clinical decisions. - The model was trained on 8,080 frames from a limited set of procedures. It may not generalise to all patients, camera angles, or surgical conditions. - mAP50 of 0.581 means the model makes mistakes โ€” both false positives (flagging safe tissue) and false negatives (missing danger). - AI anatomy explanations are generated by a general-purpose LLM and are not verified by a medical professional. - The system is intended purely for **educational simulation and training aid** purposes. > **DISCLAIMER: Research prototype only. Not a medical device. Contains no real patient data. Built for Build Small Hackathon 2026.** --- ## References - **CholecSeg8k**: Hong, W.-Y., et al. *CholecSeg8k: A Semantic Segmentation Dataset for Laparoscopic Cholecystectomy Based on Cholec80.* MICCAI 2020 Workshop. arXiv:2012.12503. - **YOLOv8**: Jocher, G., et al. *Ultralytics YOLOv8.* Ultralytics, 2023. https://github.com/ultralytics/ultralytics - **Llama 3.1**: Meta AI. *The Llama 3 Herd of Models.* arXiv:2407.21783, 2024. - **Critical View of Safety**: Strasberg, S. M., et al. *An analysis of the problem of biliary injury during laparoscopic cholecystectomy.* Journal of the American College of Surgeons, 1995. - **Modal**: https://modal.com โ€” Serverless GPU infrastructure. --- ## Author Built solo for **Build Small Hackathon 2026**. ---
CholecSeg8k ยท MICCAI 2020 ยท No patient data ยท Research prototype ยท Build Small Hackathon 2026