Introducing

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

GL-Capstone

Rahul Rathi

Updated Nov. 29, 2019

Transcript

Weekly Sales Predictions

Walmart Dataset

Abhinav Dharmadhikari

Anish S Iyer

Pramod Govindarjan

Sangita Yemulwar

Rahul Rathi

Introduction

Retail Industry

Walmart

Analytics

Objective

Our motivation

Business

Problem

Weekly Sales Prediction

One challenge of modeling retail data is the need to make decisions based on limited history. Holidays and select major events come once a year, so how does the chance to see In addition, markdowns which are the known factors affect sales – the challenge is to predict Sales.

Effects of Analytics

Data

“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.” — Jim Barksdale

Data

Dataset Context

Dataset

Historical sales data for 45 stores located in different regions - each store contains a number of departments.

The company also runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labor Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks.

Sales Dataset

Data Properties

Dataset Shape ( 4lk , 5 )
Granularity : agg_by(store , dept , week)

Data Distribution

Weekly Sales

Right Skewed

Is_Holiday

Ratio of holiday to non-holiday is 2 : 23

Feature Dataset

Data Properties

Dataset Shape ( 8k , 12 )
Granularity : agg_by(Store , Date)

Data Distribution

Markdown

Missing Values : max 64%

Distribution : Skewed

Features

Date Count : 1

Object Count : 1

Numerical Count : 10

Store Dataset

Stores Dataset

Data Properties

Dataset Shape ( 45 , 3 )
Gralularity : agg_by(store , type)

Data Distribution

Size

slightly left skewed

Features

Object Count : 1

Numerical Count : 2

Then

store

dataset merged

sales

features

Final Dataset contains all int intersection data of all the dataset

Merged All Three Dataset

Merged Dataset

Final Dataset

Data Properties

Dataset Shape ( 4lks,16 )
Granularity : agg_by(Store , Dept , Date)

Data Distribution

Distribution :

Mostly Right Skewed & Multi-Model

Missing Values : max 75%

Imputation Strategy

Filled with ZERO

Filled with MEDIAN

Features

Date Count : 1

Object Count : 4

Numerical Count : 11

EDA

Insights

Weekly Sales

Time Series Analysis

Data Prep

OLS Model Summary

The OLS Model calculated 8% as adj-R2 value which is not expected we test the data with some non-linear models and there we get about 90% so there we concluded that the data we have some sort of non-linear relationship

Outlier Treatment

In Retail Data there are no outlier , they are just extreme values.

To identify outlier we should fit a best fit line to the data and analyze the data point.

Transformations

Our data is influence by the time so there are time series decomposition & transformation can applied but after analysis we found that the data has inconsistency in date and about 1 year is missing.

Modeling

Approach

Hyper-Parameter

Tuning

LS

Base Models

Approach

Cleaning

Optimization & Deployments

Feature Engineering

&

Selection

BASE MODEL

INSIGHTS

Base Models

Feature Selection Insights

Feature

Selection

Summary

Best Method and Decision Function

Recursive Elimination with Linear Regression

Best Model

Decision Tree Regression with 12 Feature

Best Model

Best Model : Decision Tree

Summary : Over Fitted + 12 Feature

Selection Citation : RMSE

Hyper - Parameter Optimization

After Analyzing the plot we got two factors which are effecting , by using Random Search Finding best combination

Bagging & Hyper - Parameter Optimization

Using bagging RandomForest we are getting about 95% Accuracy with 6k RMSE

Conclusion

Final Commits

Business Effects

Smarter tech to inform decision-making
A streamlined supply chain
Continued focus on mobile
Customer relations
Recruitment practices
Revenue Optimization

Data Gathering & Quality

After working with this dataset we feel like there is some data missing and the description of the data is not properly documented. Next time we can find more data that use and can find the more relevant trends.

Also the data quality is not up to the mark. Data is inconsistent and about 75% of the data is missing,Next time we can find more data inside of imputation.

Improvements Possible

Data Quality Could be better.
More Feature can gathered.
We can use clustering and cluster analysis for discovery.
Time Series Analysis can be used which might reveal hidden pattern.
Time Series Forecasting can be used inside of regression which might given more better results.
the given data does not show the business cycle , Next improvement can be provided by domain expert about the business cycle.

Choose a template

Science - Cranium (AI Assisted)

Unleash your creativity and captivate your audience with our Cranium Prezi AI-assisted presentation template, designed to stimulate innovative thinking and deliver a visually engaging experience for any intellectual endeavor.

Music Festival (AI Assisted)

Elevate your presentation with our dynamic and visually stunning Music Festival Prezi AI-assisted presentation template, designed to captivate audiences and showcase the rhythm of your event in every slide.

Hiking Journey (AI Assisted)

Elevate your presentations with our immersive Hiking Journey Prezi AI-assisted presentation template, meticulously crafted to showcase the beauty of your adventures, from scenic trails to breathtaking landscapes, providing a visually compelling experience for every outdoor enthusiast.

See more templates →

Presentations from around the world

EDTL-520 Educational Leadership

Maggie Bailey

hidroponia y su grafica

jaisson dominguez

Bralynn & Lexi

alisha logue

See staff picks →

Learn more about creating dynamic, engaging presentations with Prezi

Why Prezi is better