Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This template is brought to you by Optimizely, a leading experimentation platform.

...

Experiment plan and results

Replicating Results of Transformer Fusion Model using Favourita Dataset
https://arxiv.org/pdf/1912.09363.pdf

Experiment owner

kriti mahajan

Reviewers

Approver

  •  

Optimizely link

Jira ticket(s)

Status

Status
colourBlue
titleIn review
/
Status
colourYellow
titleIn progress
/
Status
titleComplete

On this page

Table of Contents
maxLevel2

Stakeholder summary

📋 Experiment planning

Overview: Replicating Results of Transformer Fusion Model using Favourita Dataset
https://arxiv.org/pdf/1912.09363.pdf

Current Roadblock:
- Fitting data into memory for modeling

Tasks Completed Till Now

  • Step 1: Batchwise Data Processing for Preprocessing
    The favorita dataset is too large to fit into memory for in one go. So, it is processed in chunks of 300000.

  • Step 2: Data Preprocessing
    In this following dataset each product number-store number pair is treated as a separate entity and is denoted by an embedding of the following variables:

    Code Block
    languagepy
    ['holiday_type',
    'locale',
    'locale_name',
    'description',
    'transferred',
    'city',
    'state',
    'store_type',
    'cluster',
    'family',
    'class',
    'perishable']
  1. We treat each product number-store number pair as a separate entity

  2. We include an additional ’open’ flag to denote whether data is present on a given day

  3. Data is resampled at regular daily intervals,imputing any missing days using the last available observation

  4. We apply a log-transform on the sales data, and adopt z-score normalization across all entities

  5. Dropping where any record missing

  6. The training set is made up of samples taken between 2015-01-01 to 2015-12-01. The validation set of samples from the 30 days after the training set. The test set of all entities over the 30-day horizon following the validation set.

  7. We consider log sales, transactions, oil to be real-valued and the rest to be categorical.

Hypothesis

We hypothesize that DA-RNN with Transfer learning/ Transformer model with Transfer Learning

...

because of the incorporation of embeddings

Metrics

  • Status
    colourGreen
    titleMSE

  • Status
    colourBlue
    titleRMSE

Targeting

  • Where will this experiment run?

  • Who will see it?

  • What is the traffic allocation (% in total let in based on targeting)?

Variations

A: Control

B: Variation

C: Variation

Screenshot

% of visitors/users to see each variation

Pre-analysis

Add any baseline data or pre-analysis you have for this experiment. Add planned sample size and time to run.

Notes

📊 Results

Experiment start

Experiment end

Link to results in Optimizely

Conclusion

Status
colourYellow
titleinconclusive
/
Status
colourGreen
titlehypothesis proved

...

A: Control

B: Variation

change

A: Control

C: Variation

change

Cohort size

Primary metric

Δ=

p-value=

power=

confidence=

Δ=

p-value=

power=

confidence=

Other metrics

✨ Conclusions

Highlights

  • Primary goal

Takeaways

Follow-up