Cross-Modal Retrieval Multimodal Transfer Learning Domain Adaptation
Abstract:
Multimodal transfer learning offers a powerful solution for cross-modal retrieval tasks by leveraging knowledge across modalities. In this thesis, we explore a two-stage pre-training and fine-tuning approach within an existing multimodal transfer learning framework to improve model efficiency and adaptability. While we don't claim superiority in retrieval accuracy and robustness compared to traditional methods, our research provides valuable insights into optimizing performance for cross-modal retrieval tasks.
This exploration involves dividing the model into pre-training and fine-tuning stages. By investigating various configurations within this framework, we aim to identify strategies that can reduce training time and epochs, while also enhancing the model's ability to adapt to new data categories. Our experiments analyze the factors that influence performance in this two-stage approach, providing valuable guidance for future research in multimodal transfer learning.
This work contributes to advancing the design and optimization of cross-modal retrieval systems. By exploring segmentation strategies within existing models, our findings can inform the development of more efficient and adaptable retrieval systems for real-world applications.