SatFetch // Cross-Modal Satellite Image Retrieval

Query Configuration

Drag & drop satellite TIFF/JPEG or click to browse

Supports large files up to 100MB+ (auto-downsampled)

Quick Samples Sandbox

Pipeline & Levels

Source Modality

Retrieval Pipeline Level

Latitude

Longitude

Search Radius (km) 50 km

Number of Results (K) 5

Semantically Retrieved Imagery

System idle

Upload an image or enter a text query to trigger cross-modal alignment.

SatFetch Performance Benchmarks

Quantitative evaluation on the EuroSAT cross-modal validation subset (3,000 paired image channels)

Same-Modal vs Cross-Modal Accuracy (Recall@5)

Average Query Latency (ms)

Model Comparison Matrix

Model Strategy	Same R@1	Same R@5	Same R@10	Cross R@1	Cross R@5	Cross R@10	Latency

Domain Projection Simulator

Tweak calibration weight ($\alpha$) and noise level ($\sigma$).

Centering Weight ($\alpha$)

0.0 (Baseline) 0.75 1.0 (Full Centering)

Sensor Noise ($\sigma$)

0.00 (Pure) 0.10 0.50 (Noisy)

H3 Indexing Level

Recall@1

48.2%

Recall@5

64.5%

MAP@10

55.1%

Latency

31 ms

Multi-Sensor Alignment & Retrieval Pipelines (Levels 1 - 4)

Core sensor data processing pipelines engineered to resolve spatial and spectral gaps in satellite imaging.

Level 1: Same-Modal Search (Baseline CLIP)

Domain Pathway: Single-sensor visual search (e.g. Optical-to-Optical).
Logic: Directly runs query images through OpenAI's CLIP vision encoder to extract embeddings. Similarity is calculated using cosine dot products against the precompiled optical database index in FAISS.

Level 2: Cross-Modal Search (Baseline CLIP)

Domain Pathway: Cross-sensor search (e.g. Optical-to-SAR or Optical-to-Multispectral) without calibration.
Logic: Attempts to directly match high-dimensional embeddings from different domains. Due to the massive spectral domain gap between microwave SAR backscatter and optical visual reflectance, similarity scores drop significantly, yielding poor baseline recall.

Level 3: Domain-Adapted Cross-Modal Search (CLIP + SAR Adapter)

Domain Pathway: Calibrated cross-sensor domain matching.
Logic: Activates our Zero-Shot Modality Centering (ZS-MC) correction layer. By computing the centroids of source and target modalities on the calibration split, it translates query vectors to correct domain drift. For SAR radar, it integrates a projection adapter to map radar backscatter structures into visual latent space.

Level 4: Hybrid Spatial-Spectral Search (CLIP + H3 Index)

Domain Pathway: Real-time coupling of visual embeddings and geographic grids.
Logic: Filters the candidate space geographically using Uber's H3 hierarchical hexagonal index (Res 7). Bypasses standard flat indexing to run vector search ONLY on nearby cells within radius $R$. This reduces indexing search latency to under $1\text{ms}$ while strictly filtering out geographically distant noise.

Core Pipeline Architecture & ZS-MC Workflow

Systematic data flow showing zero-shot domain-centering alignment and multi-sensor retrieval.

Zero-Shot Modality Centering (ZS-MC)

Satellite sensors operate in vastly different electromagnetic domains. An optical sensor captures visual reflectance (3 bands), whereas Synthetic Aperture Radar (SAR) measures microwave backscatter (2 bands). This results in a massive spectral domain gap when projected into joint CLIP embedding space.

To align the spaces without training parameters, ZS-MC computes the centroids of the source domain $\mu_{src}$ and target domain $\mu_{tgt}$ from the EuroSAT calibration split: \[\mu_{mod} = \frac{1}{N_{mod}}\sum_{i=1}^{N_{mod}} z_i\] The query vector $z_0$ is calibrated by translating the centroid: \[z_c = z_0 - \mu_{src} + \mu_{tgt}\] This centers the query vector directly in the target representation space, correcting domain drift and restoring matching accuracy.

Hybrid Spatial-Spectral Indexing (H3)

Searching millions of satellite image tiles globally requires spatial constraint. Filtering search files by simple bounding box queries leads to rectangular boundary overlaps and slow database performance.

SatFetch integrates Uber's **H3 Hierarchical Hexagonal Index** to solve this. Hexagonal grids are optimal because all adjacent cells are equidistant, which simplifies radial distance lookups: 1. Each image tile coordinate (Latitude, Longitude) is resolved to a unique H3 Cell Index at resolution level 7 (cell edge length ~1.22km). 2. During query execution, the search center is resolved to its H3 index, and a ring lookup finds adjacent cell indices within distance $R$. 3. The query is executed ONLY against database records belonging to these H3 hexagons, reducing candidate vector counts by 99.4% before running FAISS matrix multiplication.

SatFetch System Limitations & Edge Cases

1. Electromagnetic Centroid Calibration Drift ($\mu_{mod}$): ZS-MC aligns optical and radar domains via static centroid translation. Dynamic parameters like variable soil moisture (modifying radar dielectric properties) or seasonal canopy vegetation changes cause local drift, reducing cross-modal alignment precision in out-of-distribution scenes.

2. H3 Grid Edge Boundary Dropouts: Queries close to coordinate boundary vertices of an H3 cell can fail to retrieve adjacent cell images unless the ring lookup distance ($R$) is explicitly set to $\ge 1$ cell radius. Higher resolutions improve query speed but increase neighbor lookup latency.

3. Frozen Text Embeddings OOV Limits: The text encoder relies on frozen OpenAI CLIP weights. Highly specialized technical geological terms (e.g., specific rock lithology classifications or rare cloud types) exhibit weaker alignment scores compared to standard Earth land-cover labels.

4. Signal Attenuation & Cloud Masking: Heavy cloud cover blocks visual sensors completely. While SAR acts as a cloud-penetrating sensor, retrieving optical images from a cloudy query tile is physically restricted unless pre-processed cloud-masking layers are applied.

Team 4MISTAKES

Rajiv Gandhi Institute of Petroleum Technology (An Institute of National Importance)

Ayush Pandey

Deep Learning Developer & Team Lead

Union Bank Ideathon Finalist
Researcher @ QM Lab
Intern @ Digitwin Tech

Developed the custom SAR adapter projection layers and supervised cross-sensor model training workflows.

LinkedIn Profile

Karan Sharma

Backend & Systems Engineer

SIH 2025 Winner (ISRO PS)
Union Bank Ideathon Finalist
Researcher @ QM Lab

Designed the asynchronous FastAPI search API, configured the FAISS indexes, and implemented out-of-core image loading.

LinkedIn Profile

Anurag Sharma

GIS & Search Algorithms Developer

Intern @ RRSC-W NRSC ISRO
Intern @ Univ. of Southampton
Ex-Intern @ IIT Ropar
Reviewer: IEEE TGRS, NeurIPS

Developed the Zero-Shot Modality Centering translation logic and built the Uber H3 geospatial search integration.

LinkedIn Profile

SatFetch

Same-Modal Search

Cross-Modal Search

Modality Centering

Spatial-Spectral Hybrid