Semantically Retrieved Imagery
Upload an image or enter a text query to trigger cross-modal alignment.
Advanced Multi-Sensor Satellite Imagery Alignment & Cross-Modal Retrieval Engine by Team 4MISTAKES
Retrieve semantically similar regions from matching sensor modalities (Optical↔Optical, SAR↔SAR, MS↔MS) with high accuracy.
Bridge the sensor domain gap using zero-shot CLIP ViT-L/14, matching visually dissimilar modalities like Optical-to-SAR.
Unsupervised Zero-Shot Modality Centering (ZS-MC) vector calibration narrows the domain drift by up to 50% relative gain.
Combine high-dimensional embeddings search with H3 Hexagonal spatial filtering to find geographically proximate matches.
ISRO Bharatiya Antariksh Hackathon 2026 • Problem Statement 11
Upload an image or enter a text query to trigger cross-modal alignment.
Quantitative evaluation on the EuroSAT cross-modal validation subset (3,000 paired image channels)
| Model Strategy | Same R@1 | Same R@5 | Same R@10 | Cross R@1 | Cross R@5 | Cross R@10 | Latency |
|---|
Tweak calibration weight ($\alpha$) and noise level ($\sigma$).
Core sensor data processing pipelines engineered to resolve spatial and spectral gaps in satellite imaging.
Domain Pathway: Single-sensor visual search (e.g. Optical-to-Optical).
Logic: Directly runs query images through OpenAI's CLIP vision encoder to extract embeddings. Similarity is calculated using cosine dot products against the precompiled optical database index in FAISS.
Domain Pathway: Cross-sensor search (e.g. Optical-to-SAR or Optical-to-Multispectral) without calibration.
Logic: Attempts to directly match high-dimensional embeddings from different domains. Due to the massive spectral domain gap between microwave SAR backscatter and optical visual reflectance, similarity scores drop significantly, yielding poor baseline recall.
Domain Pathway: Calibrated cross-sensor domain matching.
Logic: Activates our Zero-Shot Modality Centering (ZS-MC) correction layer. By computing the centroids of source and target modalities on the calibration split, it translates query vectors to correct domain drift. For SAR radar, it integrates a projection adapter to map radar backscatter structures into visual latent space.
Domain Pathway: Real-time coupling of visual embeddings and geographic grids.
Logic: Filters the candidate space geographically using Uber's H3 hierarchical hexagonal index (Res 7). Bypasses standard flat indexing to run vector search ONLY on nearby cells within radius $R$. This reduces indexing search latency to under $1\text{ms}$ while strictly filtering out geographically distant noise.
Systematic data flow showing zero-shot domain-centering alignment and multi-sensor retrieval.
Satellite sensors operate in vastly different electromagnetic domains. An optical sensor captures visual reflectance (3 bands), whereas Synthetic Aperture Radar (SAR) measures microwave backscatter (2 bands). This results in a massive spectral domain gap when projected into joint CLIP embedding space.
To align the spaces without training parameters, ZS-MC computes the centroids of the source domain \(\mu_{src}\) and target domain \(\mu_{tgt}\) from the EuroSAT calibration split: \[\mu_{mod} = \frac{1}{N_{mod}}\sum_{i=1}^{N_{mod}} z_i\] The query vector \(z_0\) is calibrated by translating the centroid: \[z_c = z_0 - \mu_{src} + \mu_{tgt}\] This centers the query vector directly in the target representation space, correcting domain drift and restoring matching accuracy.
Searching millions of satellite image tiles globally requires spatial constraint. Filtering search files by simple bounding box queries leads to rectangular boundary overlaps and slow database performance.
SatFetch integrates Uber's **H3 Hierarchical Hexagonal Index** to solve this. Hexagonal grids are optimal because all adjacent cells are equidistant, which simplifies radial distance lookups: 1. Each image tile coordinate (Latitude, Longitude) is resolved to a unique H3 Cell Index at resolution level 7 (cell edge length ~1.22km). 2. During query execution, the search center is resolved to its H3 index, and a ring lookup finds adjacent cell indices within distance \(R\). 3. The query is executed ONLY against database records belonging to these H3 hexagons, reducing candidate vector counts by 99.4% before running FAISS matrix multiplication.
1. Electromagnetic Centroid Calibration Drift ($\mu_{mod}$): ZS-MC aligns optical and radar domains via static centroid translation. Dynamic parameters like variable soil moisture (modifying radar dielectric properties) or seasonal canopy vegetation changes cause local drift, reducing cross-modal alignment precision in out-of-distribution scenes.
2. H3 Grid Edge Boundary Dropouts: Queries close to coordinate boundary vertices of an H3 cell can fail to retrieve adjacent cell images unless the ring lookup distance ($R$) is explicitly set to $\ge 1$ cell radius. Higher resolutions improve query speed but increase neighbor lookup latency.
3. Frozen Text Embeddings OOV Limits: The text encoder relies on frozen OpenAI CLIP weights. Highly specialized technical geological terms (e.g., specific rock lithology classifications or rare cloud types) exhibit weaker alignment scores compared to standard Earth land-cover labels.
4. Signal Attenuation & Cloud Masking: Heavy cloud cover blocks visual sensors completely. While SAR acts as a cloud-penetrating sensor, retrieving optical images from a cloudy query tile is physically restricted unless pre-processed cloud-masking layers are applied.
Rajiv Gandhi Institute of Petroleum Technology (An Institute of National Importance)