Segment Anything’s SAM2 is a powerful model from Meta AI designed to identify and segment any object in an image. Here’s how it fits into Vyom:
By integrating SAM2 model, Vyom automates the task of object detection, simplifying downstream tasks such as classification, tagging, or augmented reality overlays.
Segment Anything Model 2 (SAM2) extends the foundational Segment Anything Model to handle images and video in real-time. Originally, Segment Anything was designed for promptable visual segmentation—users provide minimal “prompts” (like clicks or boxes), and the model automatically identifies the corresponding objects.
ONNX Runtime is a cross-platform, high-performance runtime for machine learning models in the ONNX (Open Neural Network Exchange) format.
While VyomOS-powered robots and UAVs run on powerful companion computers running hi-tech deep learning models, sometimes such power is also required in the hands of our users on their mobile phones.
Today’s phones do have powerful CPUs and GPUs, sometimes rivalling even our companion computers. Vyom’s mobile GCS app integrates proprietary and open-source foundation models to use this power for our users.
Our integration of SAM2 by Meta and satellite images helps our users plan Drone missions better, reducing the planning stage from hours to mere seconds.
SAM2 comprises two core components:
The decoder produces multiple masks, each with a confidence score (0–1). Each point in the resulting masks also carries its own confidence value, allowing you to filter low-confidence points or masks.
Important Considerations
Key conversion details:
Below is a simplified illustration of our segmentation pipeline:
scss
CopyEdit
┌─────────────┐ ┌──────────────┐
│Input Image │ │User Prompts │ (x,y) coords + labels
└─────────────┘ └──────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ SAM2 Encoder │ │ SAM2 Decoder │
│ (Encoder.onnx) │ │ (Decoder.onnx) │
└─────────────────┘ └─────────────────┘
▼ ▼
┌──────────────┐
│ Segmentation │
│ Masks │
└──────────────┘
In our Vyom application, we’ve implemented SAM2 with Kotlin coroutines to ensure faster inference and image processing without blocking the main UI thread.
These labels and points are fed into the decoder, which recalculates masks based on the new prompts.
Ready to try it out yourself? Follow these steps to run our sample React Native app using SAM2 and ONNX Runtime:
git clone <https://github.com/vyom-os/SAM2ImplementationReactNative.git>
To achieve better precision and reduce model size, you can convert the SAM2 model to ONNX format. Follow these steps to convert your model:
1. Download the Model
Start by downloading the SAM 2 PyTorch checkpoint. For this example, we’ll use the smallest variant:
2. Set Up the Environment
First, clone the SAM 2 repository:
git clone <https://github.com/facebookresearch/sam2.git>
cd segment-anything-2
Then, install the necessary dependencies:
pip3 install -e . # Ensure Python version <= 12.5
pip3 install onnx onnxscript onnxsim onnxruntime
The export process will produce two ONNX files:
Here’s the code to export the encoder:
torch.onnx.export(
sam2_encoder,
img,
f"{model_type}_encoder.onnx",
export_params=True,
opset_version=17,
input_names=['image'],
output_names=['high_res_feats_0', 'high_res_feats_1', 'image_embed']
)
For the decoder, use the following export command with dynamic input shapes for interactive usage:
torch.onnx.export(
sam2_decoder,
# ... input parameters ...
dynamic_axes={
"point_coords": {0: "num_labels", 1: "num_points"},
"point_labels": {0: "num_labels", 1: "num_points"},
"mask_input": {0: "num_labels"},
"has_mask_input": {0: "num_labels"}
}
)
Or download SAM2 ONNX model from HuggingFace: https://huggingface.co/models?p=1&sort=trending&search=segment+anything
cd SAM2ImplementationReactNative
npm install
npx react-native run-android
Note:
We’ve open-sourced a reference implementation for you to explore:
Vyom SAM2 Implementation on React Native
Feel free to fork the repo, experiment, and contribute back. It includes a detailed README (labeled “Readme 2 file”) explaining each integration step in greater detail.
GitHub Repo: Check out our code in the SAM2ImplementationReactNative repository for more details on the structure and implementation.
With SAM2, ONNX Runtime, and a robust React Native setup, Vyom demonstrates that real-time, on-device segmentation is not just possible but highly efficient. By maintaining control over user prompts, scaling, and model inference, our approach ensures a flexible, interactive user experience - whether handling single images or streaming video.
We hope you enjoy exploring our approach and harnessing the power of SAM2 in your own React Native apps.
For further details or troubleshooting tips, check out our GitHub repo or consult the official documentation links referenced above.