Feb 4, 2026

Feb 4, 2026

Feb 4, 2026

SAM 3: Text-Prompt Segmentation and Enhanced Auto-Tracking

SAM 3: Text-Prompt Segmentation and Enhanced Auto-Tracking

SAM 3 is now live in V7 Darwin and brings significant improvements to both image segmentation and video annotation workflows. This major update introduces text-based automatic detection, higher accuracy segmentation, and enhanced auto-tracking capabilities powered by the latest SAM architecture.

What's New

SAM 3 can be used like the regular SAM you're familiar with. You can manual select objects for segmentation with a single click. Behind the scenes, SAM 3 preprocesses your image and creates polygon masks for all detected objects with improved accuracy compared to SAM 2.

The key breakthrough is automatic class detection based on class names. If you create a class called car, train, bicycle, or any other object type, you can now select it. Click the magnifying glass (Search) icon in the Darwin UI, and SAM 3 will automatically detect all instances of that class and segment them accordingly.

How Class Detection Works

The detection uses the class name as a text prompt. If you name a class "player" and search a tennis match, SAM 3 finds and segments all players. Name another class "light pole" and search again to segment those separately. The class name serves as the prompt.

After detection, you can refine results manually. Delete false positives, adjust boundaries, or add missed instances using standard SAM point-and-click selection.


The video above shows both methods in action: the manual selection of a single object and the automatic selection of all instances based on the class name (these two can be used independently).

The Benefits

Labeling datasets with hundreds of similar objects—cars on a parking lot, berries on a bush, bottles on a conveyor—used to mean clicking each one individually. SAM 3's text-based detection handles the bulk of that work in a single action. Combined with auto-tracking, you can annotate a video of pedestrians or a crowded retail floor in a fraction of the time. The accuracy improvement in tracking also reduces correction work. Fewer lost tracks mean fewer frames where you need to re-anchor an annotation or fix identity swaps.

Jade Yip

Jade Yip

Senior Product Manager

Jade Yip

Jade Yip

Senior Product Manager

Jade Yip

Jade Yip

Senior Product Manager