Introducing 

Prezi AI.

Your new presentation assistant.

Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.

Loading content…
Transcript

Exploring Boston Housing Data

Prepared by:

Sarah Cummings, Sriram Yarlagadda, Haifa Alsunaid

Introduction

Dataset

  • Boston city data from 1978
  • Obtained from the UCI Machine Learning Repository

Our Research Questions

  • How do the variables provided in the dataset affect the median value of homes in Boston towns?
  • Which variables affect the median value of homes the most?

  • We also formed several hypotheses about our independent variables and their relationship with the dependent.

Pre-processing

Residuals Analysis

Normality

Multicollinearity:

Based on the histogram, residuals are normally distributed

Homoscedasticity

  • Residuals do not seem to be showing any signs of heteroscedasticity
  • No major curvature in the residual plot

We used VIF values to detect with a threshold of 7. No such values were found

Outliers (MSE = 3.7)

Most (>95%) of points within +/- 2*MSE

Very few points are beyond +/-3*MSE

Transformation

Final Model

Key Take aways:

  • Our model satisfies all the regression assumptions
  • Also, the constructed model answers the research question of finding the most variables that significantly affect the median home values.
  • Model selection using AIC criterion and stepwise regression
  • All the terms in the final model are significant
  • F-test is significant with p-value < 2.2e-16
  • Adjusted R-squared of 0.87

Influential Points:

  • |Studentized Deleted Residuals| > 3
  • Hat Values > 0.5
  • Cook’s Distance > 1

Learn more about creating dynamic, engaging presentations with Prezi