Multimodal Data-Loader

 

Motivation

A lot of recent research has papers have discussed fusing multiple modalities of data together. We need a general purpose data-loader that can take various forms of image data or textual data from various time frequencies. CrossVIVIT

Example Papers

CrossVIVIT

Realm

 

  • media_context_len:

  • media_context_time_len: Maximum time-span to attempt to retrieve the media_context_len

  • media_types List: The types of media present in the multi-media file

Abstract method called (get_media)