Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors
Abstract
Matching between image modalities is a high-impact research area. Current state-of-the-art (SOTA) methods rely on extensive multi-million-scale training protocols, which demand significant computational resources. However, the learned cross-modal mapping remains largely opaque and locked within the trained matcher, with limited options for downstream use or transfer to other matchers.
To enable such capabilities, we propose Patch Your Matcher (PYM), a highly adaptive method for leveraging pre-trained single-modality matchers for cross-modal matching by co-learning an explicit two-view geometrically consistent mapping. PYM learns image-to-image translations that map new modalities into the original matcher’s modality using a novel adversarial learning approach based on explicit evaluation of 6 DoF two-view correspondence plausibility.
Trained with the semi-dense ELoFTR, our approach delivers substantially better cross-modal matching than classic I2I techniques, and recovers 97.05% of the matching precision of the extensively trained SOTA multi-modal MINIMA variant. PYM also significantly boosts cross-modal matching performance of uni-modal sparse LightGlue and dense RoMA matchers, demonstrating high transferability of learned mapping.
Poster
BibTeX
@InProceedings{pym_2026,
author = {Frolov, Anton and Rodehorst, Volker},
title = {Patch Your Matcher: Correspondence-Aware Image-to-Image Translation Unlocks Cross-Modal Matching via Single-Modality Priors},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {March},
year = {2026},
pages = {7913-7924}
}