Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
One challenge of modeling retail data is the need to make decisions based on limited history. Holidays and select major events come once a year, so how does the chance to see In addition, markdowns which are the known factors affect sales – the challenge is to predict Sales.
Dataset Context
Historical sales data for 45 stores located in different regions - each store contains a number of departments.
The company also runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labor Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks.
1
Right Skewed
1
Ratio of holiday to non-holiday is 2 : 23
1
Missing Values : max 64%
Distribution : Skewed
1
Date Count : 1
Object Count : 1
Numerical Count : 10
1
slightly left skewed
1
Object Count : 1
Numerical Count : 2
Then
store
dataset merged
sales
&
features
Final Dataset contains all int intersection data of all the dataset
1
Mostly Right Skewed & Multi-Model
Missing Values : max 75%
1
Filled with ZERO
Filled with MEDIAN
1
Date Count : 1
Object Count : 4
Numerical Count : 11
1
The OLS Model calculated 8% as adj-R2 value which is not expected we test the data with some non-linear models and there we get about 90% so there we concluded that the data we have some sort of non-linear relationship
1
In Retail Data there are no outlier , they are just extreme values.
To identify outlier we should fit a best fit line to the data and analyze the data point.
1
Our data is influence by the time so there are time series decomposition & transformation can applied but after analysis we found that the data has inconsistency in date and about 1 year is missing.
1
Recursive Elimination with Linear Regression
1
Decision Tree Regression with 12 Feature
Best Model : Decision Tree
1
Summary : Over Fitted + 12 Feature
2
Selection Citation : RMSE
3
After Analyzing the plot we got two factors which are effecting , by using Random Search Finding best combination
Using bagging RandomForest we are getting about 95% Accuracy with 6k RMSE
After working with this dataset we feel like there is some data missing and the description of the data is not properly documented. Next time we can find more data that use and can find the more relevant trends.
Also the data quality is not up to the mark. Data is inconsistent and about 75% of the data is missing,Next time we can find more data inside of imputation.