Replicating Results of Transformer Fusion Model using Favourita Dataset

This template is brought to you by Optimizely, a leading experimentation platform.

 

Experiment plan and results

Replicating Results of Transformer Fusion Model using Favourita Dataset
https://arxiv.org/pdf/1912.09363.pdf

Experiment owner

@kriti mahajan

Reviewers

@Isaac Godfried

Approver

Optimizely link

Jira ticket(s)

 

Status

In review / In progress / Complete

On this page

Stakeholder summary

 

 

Experiment planning

Overview: Replicating Results of Transformer Fusion Model using Favourita Dataset
https://arxiv.org/pdf/1912.09363.pdf

Current Roadblock:
- Fitting data into memory for modeling

Tasks Completed Till Now

  • Step 1: Batchwise Data Processing for Preprocessing
    The favorita dataset is too large to fit into memory for in one go. So, it is processed in chunks of 300000.

  • Step 2: Data Preprocessing
    In this following dataset each product number-store number pair is treated as a separate entity and is denoted by an embedding of the following variables:

    1 2 3 4 5 6 7 8 9 10 11 12 ['holiday_type', 'locale', 'locale_name', 'description', 'transferred', 'city', 'state', 'store_type', 'cluster', 'family', 'class', 'perishable']
  1. We treat each product number-store number pair as a separate entity

  2. We include an additional ’open’ flag to denote whether data is present on a given day

  3. Data is resampled at regular daily intervals,imputing any missing days using the last available observation

  4. We apply a log-transform on the sales data, and adopt z-score normalization across all entities

  5. Dropping where any record missing

  6. The training set is made up of samples taken between 2015-01-01 to 2015-12-01. The validation set of samples from the 30 days after the training set. The test set of all entities over the 30-day horizon following the validation set.

  7. We consider log sales, transactions, oil to be real-valued and the rest to be categorical.

Hypothesis

We hypothesize that DA-RNN with Transfer learning/ Transformer model with Transfer Learning

will decrease MSE/RMSE

because of the incorporation of embeddings

Metrics

  • MSE

  • RMSE

Targeting

  • Where will this experiment run?

  • Who will see it?

  • What is the traffic allocation (% in total let in based on targeting)?

Variations

 

A: Control

B: Variation

C: Variation

 

A: Control

B: Variation

C: Variation

Screenshot

 

 

 

% of visitors/users to see each variation

 

 

 

Pre-analysis

Add any baseline data or pre-analysis you have for this experiment. Add planned sample size and time to run.

Notes

 

Results

Experiment start

Nov 26, 2020

Experiment end

Nov 26, 2020

Link to results in Optimizely

Conclusion

inconclusive / hypothesis proved

 

Add a short summary of the metrics below and whether you hit significance.

 

A: Control

B: Variation

change

A: Control

C: Variation

change

Cohort size

 

 

 

 

Primary metric

 

 

Δ=

p-value=

power=

confidence=

 

 

Δ=

p-value=

power=

confidence=

Other metrics

 

 

 

 

 

 

 

Conclusions

Highlights

  • Primary goal

Takeaways

  •  

Follow-up

  •