TY - GEN
T1 - PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
AU - Hannan, Abdul
AU - Manzoor, Muhammad Arslan
AU - Nawaz, Shah
AU - Liaquat, Muhammad Irzam
AU - Schedl, Markus
AU - Noman, Mubashir
PY - 2025
Y1 - 2025
N2 - We study the task of learning association between faces and voices, which is gaining interest in the multimodal community lately. These methods suffer from the deliberate crafting of negative mining procedures as well as the reliance on the distant margin parameter. These issues are addressed by learning a joint embedding space in which orthogonality constraints are applied to the fused embeddings of faces and voices. However, embedding spaces of faces and voices possess different characteristics and require spaces to be aligned before fusing them. To this end, we propose a method that accurately aligns the embedding spaces and fuses them with an enhanced gated fusion thereby improving the performance of face-voice association. Extensive experiments on the VoxCeleb dataset reveals the merits of the proposed approach.
AB - We study the task of learning association between faces and voices, which is gaining interest in the multimodal community lately. These methods suffer from the deliberate crafting of negative mining procedures as well as the reliance on the distant margin parameter. These issues are addressed by learning a joint embedding space in which orthogonality constraints are applied to the fused embeddings of faces and voices. However, embedding spaces of faces and voices possess different characteristics and require spaces to be aligned before fusing them. To this end, we propose a method that accurately aligns the embedding spaces and fuses them with an enhanced gated fusion thereby improving the performance of face-voice association. Extensive experiments on the VoxCeleb dataset reveals the merits of the proposed approach.
UR - https://www.scopus.com/pages/publications/105020069128
U2 - 10.21437/Interspeech.2025-268
DO - 10.21437/Interspeech.2025-268
M3 - Conference proceedings
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 2710
EP - 2714
BT - InterSpeech 2025
ER -