Supporting Variable Length Temporal Sequences for Classification and Forecasting

Goal

Many real world time series classification and representation learning

 

Challenges

 

We will need a new data-loader that can handle variable length sequences and properly batch them. This is challenging for the following reasons:

  • We need someway to know where sequences end and a new sequence begins

    • Solution 1: User will have to create unique id for each sequence. Loader will then essentially use the length (e.g. len(df[df["id_col"]==the_id]]) Then we will also do something like len(df["id_col"]

Beyond the data-loader the downstream model will need to be able to handle varying length sequences as well. So the data-loader would need to pass on the length of the sequence for transformer based models. Transformer models would need to be initialized with a max_seq_len parameter. The other problem is batching how would this affect batch_size this is very confusing of how the shape

  • Investigate how Transformers implements this for NLP.