What does HackerNews think of segment-anything?

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language: Jupyter Notebook

Webinar from last week on how to fine-tune VFMs, specifically Meta's Segment Anything Model (SAM).

What you'll need to follow along the fine-tuning walkthrough:

Images, ground-truth masks, and optionally, prompts from the Stamp Verification (StaVer) Dataset on Kaggle (https://www.kaggle.com/datasets/rtatman/stamp-verification-s...)

Download the model weights for SAM the official GitHub repo (https://github.com/facebookresearch/segment-anything)

Good understanding of the model architecture Segment Anything paper (https://ai.meta.com/research/publications/segment-anything/)

GPU infra the NVIDIA A100 should do for this fine-tuning.

Data curation and model evaluation tool Encord Active (https://github.com/encord-team/encord-active)

Colab walkthrough for fine-tuning: https://colab.research.google.com/github/encord-team/encord-...

I'd love to get your thoughts and feedback. Thank you.

Interesting to see if/how this is different (or maybe built upon) Segment Anything[0].

For folks in the know: I often see segmentation models on video frames producing patchy results (see the DinoV2 video of the running dog, the body gets black patches randomly, so the segmentation fails for certain frames). What methods are folks using to deal with this - standard fine-tuning, or is there a way to "force" the area to be cleanly segmented (ie, add a bounding box around the class to supplement the data)?

And is it something that can be implemented in foundation models, or are we always going to have patchy zero-shot results like this on video files?

[0]https://github.com/facebookresearch/segment-anything