Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading…
Transcript

Weekly Sales Predictions

Walmart Dataset

Abhinav Dharmadhikari

Anish S Iyer

Pramod Govindarjan

Sangita Yemulwar

Rahul Rathi

Introduction

Retail Industry

Walmart

Analytics

Objective

Our motivation

Business

Problem

Weekly Sales Prediction

One challenge of modeling retail data is the need to make decisions based on limited history. Holidays and select major events come once a year, so how does the chance to see In addition, markdowns which are the known factors affect sales – the challenge is to predict Sales.

Effects of Analytics

Data

“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.” — Jim Barksdale

Data

Dataset Context

Dataset

Historical sales data for 45 stores located in different regions - each store contains a number of departments.

The company also runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labor Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks.

Sales Dataset

Data Properties

  • Dataset Shape ( 4lk , 5 )
  • Granularity : agg_by(store , dept , week)

Data Distribution

Weekly Sales

1

Right Skewed

Is_Holiday

1

Ratio of holiday to non-holiday is 2 : 23

Feature Dataset

Feature Dataset

Data Properties

  • Dataset Shape ( 8k , 12 )
  • Granularity : agg_by(Store , Date)

Data Distribution

Markdown

1

Missing Values : max 64%

Distribution : Skewed

Features

1

Date Count : 1

Object Count : 1

Numerical Count : 10

Store Dataset

Stores Dataset

Data Properties

  • Dataset Shape ( 45 , 3 )
  • Gralularity : agg_by(store , type)

Data Distribution

Size

1

slightly left skewed

Features

1

Object Count : 1

Numerical Count : 2

Then

store

dataset merged

sales

&

features

Final Dataset contains all int intersection data of all the dataset

Merged All Three Dataset

Merged Dataset

Final Dataset

Data Properties

  • Dataset Shape ( 4lks,16 )
  • Granularity : agg_by(Store , Dept , Date)

Data Distribution

Distribution :

1

Mostly Right Skewed & Multi-Model

Missing Values : max 75%

Imputation Strategy

1

Filled with ZERO

Filled with MEDIAN

Features

1

Date Count : 1

Object Count : 4

Numerical Count : 11

EDA

Insights

Weekly Sales

Time Series Analysis

Data Prep

OLS Model Summary

1

The OLS Model calculated 8% as adj-R2 value which is not expected we test the data with some non-linear models and there we get about 90% so there we concluded that the data we have some sort of non-linear relationship

Outlier Treatment

1

In Retail Data there are no outlier , they are just extreme values.

To identify outlier we should fit a best fit line to the data and analyze the data point.

Transformations

1

Our data is influence by the time so there are time series decomposition & transformation can applied but after analysis we found that the data has inconsistency in date and about 1 year is missing.

Modeling

Approach

Hyper-Parameter

Tuning

LS

Base Models

Approach

Cleaning

Optimization & Deployments

Feature Engineering

&

Selection

BASE MODEL

INSIGHTS

Base Models

Feature Selection Insights

Feature

Selection

Summary

Best Method and Decision Function

1

Recursive Elimination with Linear Regression

Best Model

1

Decision Tree Regression with 12 Feature

Best Model

Best Model : Decision Tree

1

Summary : Over Fitted + 12 Feature

2

Selection Citation : RMSE

3

Hyper - Parameter Optimization

After Analyzing the plot we got two factors which are effecting , by using Random Search Finding best combination

Bagging & Hyper - Parameter Optimization

Using bagging RandomForest we are getting about 95% Accuracy with 6k RMSE

Conclusion

Final Commits

Business Effects

  • Smarter tech to inform decision-making
  • A streamlined supply chain
  • Continued focus on mobile
  • Customer relations
  • Recruitment practices
  • Revenue Optimization

Data Gathering & Quality

After working with this dataset we feel like there is some data missing and the description of the data is not properly documented. Next time we can find more data that use and can find the more relevant trends.

Also the data quality is not up to the mark. Data is inconsistent and about 75% of the data is missing,Next time we can find more data inside of imputation.

Improvements Possible

  • Data Quality Could be better.
  • More Feature can gathered.
  • We can use clustering and cluster analysis for discovery.
  • Time Series Analysis can be used which might reveal hidden pattern.
  • Time Series Forecasting can be used inside of regression which might given more better results.
  • the given data does not show the business cycle , Next improvement can be provided by domain expert about the business cycle.
Learn more about creating dynamic, engaging presentations with Prezi