**United States Domestic Passenger Enplanements'**

Forecasting with ARIMA Model

Forecasting with ARIMA Model

2. ARIMA Model

An AutoRegressive Integrated Moving Average (ARIMA) model predicts future values of a time series by a linear combination of its past values and a series of errors (also known as random shocks or innovations).

3. Data Resource

The data sets are collected from Bureau of Transportation Statistics.

We are using the U.S. Air Carrier Traffic Statistics: Domestic Passenger – Passenger Enplanements (January 1996 to August 2016), 248 in total.

Source:

http://www.rita.dot.gov/bts/acts/customized/table?adfy=1996&adfm=1&adty=2016&adtm=8&aos=0&artd=1&arti&arts&asts&astns&astt=3&ascc&ascp=1

4. Analysis

1. Introduction

This project is aiming to predict the number of United States domestic passenger enplanements with ARIMA model.

Through this model and proper amount of data, we can easily predict the number of passengers, and the airline company can arrange and adjust routs accordingly.

Also, for each airline company, they can adjust their plane numbers based on the passengers’ number, and make proper decision on whether to investing.

Therefore, the business decision will be more efficient and leading to an increasing in profits.

6. Diagnosis

7. Conclusion

Through this model and proper amount of data, we can easily predict the number of passengers, and the airline company can arrange and adjust routs accordingly.

Also, for each airline company, they can adjust their plane numbers based on the passengers’ number, and make proper decision on whether to investing.

Therefore, the business decision will be more efficient and leading to an increasing in profits.

Diagnostic Plots

Context

1. Introduction

2. ARIMA Model

3. Data Resources

4. Analysis

5. Forecast&Test

6. Diagnosis

7. Conclusion

**Group 5:**

Chuhan Hong, Jiahua Gu, Runtian Liu

Chuhan Hong, Jiahua Gu, Runtian Liu

3. Data Modification

Log linearity, Smoothen Model and Calculate Difference.

Check ACF & PACF

4. Check ACF & PACF

These two charts show that there's a significant spike in Lag=12, 24..., we find that the ARIMA model has the seasonality. We used the seasonal ARIMA model.

First of all, the Standardized Residuals don't have Volatility Clustering.

Secondly, ACF plot doesn't have a significant auto-correlation.

Thirdly, the p-value for Ljung-Box statistic are relatively big.

Therefore, it is a good ARIMA model for forecasting.

Standardized residuals

ACF of residuals

Ljung-box statistic

Find the best ARIMA model

And we have:

Y~ARIMA(2, 1, 0) × (1, 0, 0)12

5. Forecast & Test

We used the last eight data to evaluate the ARIMA model

We used first 240 data to build model and the last 8 to test its accuracy

Then we transformed the data set into Time-Series format

Plot chart to check data tendency

4. Analysis

Almost smooth and steady

We used R to compute the parameters of the best ARIMA model.

Appendix:

#Final Project

#Group 5

#Group Member: Jiahua Gu, Chuhan Hong, Runtian Liu

#Prof: Bahaeddine Taofuik

#Date: Nov,18th

#Sources from:

#http://www.rita.dot.gov/bts/acts/customized/table?adfy=1996&adfm=1&adty=2016&adtm=8&aos=0&artd=1&arti&arts&asts&astns&astt=3&ascc&ascp=1

data.x <- read.csv("AirCarrierTrafficStatistics.csv", header=TRUE, stringsAsFactors = FALSE) # Input data

data.x$Total <- as.numeric(gsub(",","",data.x$Total))

data.test <- data.x[241:248,2]

data.model <- data.x[1:240,2]

num <- ts(data.model,start=1,frequency=12)

plot.ts(num, las=TRUE, col="black", main="passenger enplanements",xlab="Time Series", ylab="number")

dev.off()

totallog <- log(num)

totaldiff <- diff(totallog, differences=1)

plot.ts(totaldiff,las=TRUE,xlab="time", ylab="difference")

acf(totaldiff, lag.max=30)

acf(totaldiff, lag.max=30,plot=FALSE)

pacf(totaldiff, lag.max=30)

pacf(totaldiff, lag.max=30,plot=FALSE)

auto.arima(totallog,trace=T)

totalarima1 <- arima(totallog,order=c(2,1,0),seasonal=list(order=c(1,0,0),period=12),method="ML")

totalarima1

totalforecast <- forecast.Arima(totalarima1,h=8,level=c(99.5))

totalforecast

par(mfrow=c(2,1))

plot(totalforecast)

plot(log(data.x$Total),type="l",xlab="time",ylab="log(passenger enplanements")

# diagnose

tsdiag(totalarima1)

dev.off()

par(mfrow=c(2,1))

data.test-exp(totalforecast$mean)

plot(data.test-exp(totalforecast$mean),ylab="error")

plot(abs((data.test-exp(totalforecast$mean))/data.test),ylab="absolute percentage error")

Thank you!

Advantage and Disadvantage

The general form for the ARIMA model is:

Use the Specify ARIMA Model dialog for the following three orders that can be specified for an ARIMA model:

1.

The Autoregressive Order is the order (p) of the polynomial operator.

2.

The Differencing Order is the order (d) of the differencing operator.

3.

The Moving Average Order is the order (q) of the differencing operator .

4.

An ARIMA model is commonly denoted ARIMA(p,d,q). If any of p,d, or q are zero, the corresponding letters are often dropped. For example, if p and d are zero, then model would be denoted MA(q).