Single-Branch Network Architectures to Close the Modality Gap in Multimodal Recommendation

Research output: Contribution to journalArticlepeer-review

Abstract

Traditional recommender systems rely on collaborative filtering (CF), using past user–item interactions to help users discover new items in a vast collection. In cold start, i. e., when interaction histories of users or items are not available, content-based recommender systems (CBRSs) use side information instead. Most commonly, user demographics and item descriptions are used for user and item cold start, respectively. Hybrid recommender systems (HRSs) often employ multimodal learning to combine collaborative and user and item side information, which we jointly refer to as modalities. Though HRSs can provide recommendations when some modalities are missing, their quality degrades. In this work, we utilize single-branch neural networks equipped with weight sharing, modality sampling, and contrastive loss to provide accurate recommendations even in missing modality scenarios, including cold start. Compared to multi-branch architectures, the weights of the encoding modules are shared for all modalities; in other words, all modalities are encoded using the same neural network. This, together with the contrastive loss, is essential in reducing the modality gap, while the modality sampling is essential in modeling missing modality during training. Simultaneously leveraging these techniques results in more accurate recommendations. We compare these networks with multi-branch alternatives and conduct extensive experiments on the MovieLens 1M, Music4All-Onion, and Amazon Video Games datasets. Six accuracy-based and four beyond-accuracy-based metrics help assess the recommendation quality for the different training paradigms and their hyperparameters on single- and multi-branch networks in warm-start and missing modality scenarios. We quantitatively and qualitatively study the effects of these different aspects on bridging the modality gap. Our results show that single-branch networks provide competitive recommendation quality in warm start, and significantly better performance in missing modality scenarios. Moreover, our study of modality sampling and contrastive loss on both single- and multi-branch architectures indicates a consistent positive impact on accuracy metrics across all datasets. Overall, the three training paradigms collectively encourage modalities of the same item to be embedded closer together than those of different items, as measured by Euclidean distance and cosine similarity. This results in embeddings that are less distinguishable and more interchangeable, as indicated by a 7-20% drop in modality prediction accuracy. Our full experimental setup, including training and evaluating code for all algorithms, their hyperparameter configurations, and our result analysis notebooks, is available at https://github.com/hcai-mms/single-branch-networks.
Original languageEnglish
Number of pages52
JournalACM Transactions on Recommender Systems
DOIs
Publication statusE-pub ahead of print - 10 Sept 2025

Fields of science

  • 102003 Image processing
  • 202002 Audiovisual media
  • 102001 Artificial intelligence
  • 102015 Information systems
  • 102 Computer Sciences
  • 101019 Stochastics
  • 103029 Statistical physics
  • 101018 Statistics
  • 101017 Game theory
  • 202017 Embedded systems
  • 101016 Optimisation
  • 101015 Operations research
  • 101014 Numerical mathematics
  • 101029 Mathematical statistics
  • 101028 Mathematical modelling
  • 101026 Time series analysis
  • 101024 Probability theory
  • 102032 Computational intelligence
  • 102004 Bioinformatics
  • 102013 Human-computer interaction
  • 101027 Dynamical systems
  • 305907 Medical statistics
  • 101004 Biomathematics
  • 305905 Medical informatics
  • 101031 Approximation theory
  • 102033 Data mining
  • 305901 Computer-aided diagnosis and therapy
  • 102019 Machine learning
  • 106007 Biostatistics
  • 102018 Artificial neural networks
  • 106005 Bioinformatics
  • 202037 Signal processing
  • 202036 Sensor systems
  • 202035 Robotics

JKU Focus areas

  • Digital Transformation

Cite this