Dual Stage Attention RNN (DA-RNN)
Transformer Based models
Multi-Head Attention Variations
Vanilla LSTM and GRU