Multimodal Data-Loader
Motivation
A lot of recent research has papers have discussed fusing multiple modalities of data together. We need a general purpose data-loader that can take various forms of image data or textual data from various time frequencies. CrossVIVIT
Example Papers
CrossVIVIT
Realm
media_context_len:
media_context_time_len: Maximum time-span to attempt to retrieve the media_context_len
media_types List: The types of media present in the multi-media file
Abstract method called (get_media)