Send the link below via email or IMCopy
Present to your audienceStart remote presentation
- Invited audience members will follow you as you navigate and present
- People invited to a presentation do not need a Prezi account
- This link expires 10 minutes after you close the presentation
- A maximum of 30 users can follow your presentation
- Learn more about this feature in our knowledge base article
Methods in Urban Planning
Transcript of Methods in Urban Planning
Brian J. McCabe
September 16, 2013
Course: Statistical Literacy for Planning Professionals
1. Neighborhood Walk
2. Basic Statistical Tools
3. Key Neighborhood Indicators
4. Data Collection & Sources
5. Demographic Trends in Washington, DC
Unit of Analysis
: Unit of observation that we're analyzing (e.g., individual, neighborhood, SMD, etc.)
: Any characteristic that changes - or varies - from one observation to another.
Four Levels of Measurement
Nominal, Ordinal, Ratio, Interval
: Refers to the consistency of a measure, whether it produces the same result across time.
: Refers to whether the measurement you use actually gets at the concept you're trying to measure
: Recognizes the imperfections of measurement, that measurement of social phenomena is rarely perfect.
: Simple counts of the number of times something occurs
: The total number of items in a group relative to the number of items in total.
: The proportion multiplied x 100.
: The frequency of an outcome, relative to a base number
: Comparison of one sub-group to another.
: The value of a variable below which a certain percentage of observations fall (e.g., a score in the 25th percentile means that 25 percent of scores fall at or below that score).
: A method for understanding all of the observations that share a particular property; it displays that number of times (or frequency) that a particular property occurs.
Measures of Central Tendency:
: Equal to what we colloquially think of as the average, the mean is equal to the sum of scores divided by the total number.
: The middle score in an ordered distribution.
: The score that occurs most frequently.
AMI: Area Median Income
(e.g., 30% AMI, 50% AMI)
Rate of 311 calls per neighborhood
e.g., Last year in Shaw, the rate of 311
calls was 75 calls per 100 residents.
Percentage of students passing standardized exams
Percentage of residential units that are owner-occupied (percentage of homeowners)
FAR: Floor-Area Ratio
Floor area = 1,000 square feet
Plot area = 500 square feet
FAR = 1,000/500 = 2.0
Measures of Variability:
: The distance between the minimum score and the maximum score.
: A statistic that tells us how far all of the scores are spread around the mean of a deviation.
- Census Data: Collected decennial through 2010, surveyed the entire population.
- American Community Survey:
1-year estimates (geographies > 65,000)
3-year estimates (averaged, geographies > 20,000)
5-year estimates (census tracts, zip codes, etc.)
ACS vs. Decennial Census
- Sampling error (and margin of error)
- Concerns about measuring social change on small scales
- five-year estimates, rather than one-year counts
- Administrative data (e.g., Department of Human Services, Metropolitan Police Department, Office of Tax and Revenue, etc.)
- Publicly-available data (at data.dc.gov)
- NNIP: National Neighborhood Indicators Partnership (Neighborhood Info DC website)
- Census (Decennial), American Community Survey (rolling estimates (e.g., 2005-2009)
- Zip Code (n=28)
- Neighborhood Cluster (n=39)
- Police Service Areas (n=56)
- Ward (n=8)
- Census Tract (n=~180)
collecting or analyzing data at
each of these units of analysis?
Why would we choose to collect -
and report - data at each of these
geographic areas? What data would
be relevant or useful at each
At the neighborhood cluster level,
what are some variables that planners
might be interested in measuring?
- Population characteristics
- Density of businesses
- Housing indicators
- Crime statistics
- Percentage of land area zoned residential
- Number of affordable housing units
- Imprecise tools
- Poorly-worded survey questions
- Interview biases
- Respondent biases (e.g., social desirability)
- Coding errors
When would we use each of
these measures? How do they
differ from one another?
Why median household income or
median housing value, rather than
the mean income or value?
Mt. Vernon/Shaw/Convention Center
Neighborhood Cluster 7: Shaw/Logan Circle
Neighborhood Cluster 8: Includes Chinatown, Mt. Vernon Square
Carnegie Library at Mt. Vernon Square
City Market at O St.
Jefferson Market Apartments
Commercial Corridor - 9th Street
Bread for the City
Special Category: Dichotomous
What variables could we use
to measure whether or not a
neighborhood is gentrifying,
and how much gentrification
Number of affordable housing units in Shaw
Number of 311 calls last year in Chinatown
Proportion of affordable housing units that
are in Shaw (the number of units in Shaw
divided by the total number of units in
Proportion of 311 calls that were made from Chinatown (the number of calls in Chinatown
divided by the total number in the city)
Why do we care about the rates,
rather than just counts?
Ratio of renters to homeowners
Owner-occupied housing units: 100,000
Renter-occupied housing units: 150,000
Ratio of renters to homeowners = 150,000: 100,000 = 1.5:1
Average (Mean) Salary for
Teachers in DC Public Schools:
Visual Displays of Quantitative Information
- Bar Graphs
- Line Graphs
Bar Graphs: A
is a visual display of discrete categories (either nominal or ordinal) where the
length of each bar
percentage of frequency
of a category.
is a visual display for
continuous data (interval/ratio)
where the scores are presented along one axis and the frequency (or percentage) of that score is presented along the other axis. Often, continuous data are recoded into categories before the construction of a histogram (e.g., a continuous GPA may be recoded into intervals of 0.10).
Line Graph: A
is a visual display of data typically used to track a social phenomenon across time, or some other continuous measure.
Pie Chart: Pie charts aren't particularly good for displaying statistical information. First, and most importantly, pie charts (like bar charts or histograms) can tell us about the relative relationship between two variables, but tell us nothing about their frequency. Second, it is often difficult to correctly visualize the relative size of a piece of the pie.
Basic Rules for Good Data Visualization:
1. Data visualization are used to tell a story. When you create a graph or chart, make sure that it tells a story. Viewers should be able to "read" the story with only the chart (and no accompanying text).
2. Make sure to select an appropriate type of graph. Line graphs track trends across time, bar charts display data across discrete categories, etc.
3. Pay attention to details. Clearly label your axes. Ensure consistent scales on the axes. Include a legend (where appropriate). Write titles that identify the information in the chart.
4. Avoid perceptual distortions. The relationship between visual components should provide a quick understanding of the story.
5. Minimize data "junk". This includes excess colors, symbols, and information not directly related to the data story itself.
1. Final Projects Discussion + Groups
2. Discussion: American Murder Mystery Revisited
3. Review: Descriptive Statistics
4. Analysis: Descriptive Statistics in Minitab
5. Visual Displays of Quantitative Information
6: Analysis: Charts & Graphs using Minitab & Excel
7. Advanced Analytical Techniques
American Murder Mystery Revisited
Describe the research question. What is the debate the authors are entering into?
What are the theories linking housing vouchers to crime? What are the possible mechanisms that would explain this relationship?
How did the authors test the relationship? What kind of data did they use? What statistical techniques did they use?
Group 1: McMillan/Pleasant Plains/Bloomingdale/Eckington/Stronghold
(Armed Forces Retirement Home / Michigan Ave./Irving to the North, 5th/Park Place to the west, Florida Ave. to the south, 2nd Street NE / Glenwood Cemetery to the east)
Group 2: Near Southwest (approximate boundaries – SE/SW Freeway on the north, Washington Channel and Anacostia to the south, 14th St. / and Bridge to the west and South Capitol Street to the east)
Group 3: Georgia Ave. Gateway/Takoma DC – DC Boundary (Eastern Ave.) on the north and east, Piney Branch / Tuckerman on the south, Rock Creek Park on the west)
Many planning documents begin with an overview of the demographic characteristics of the neighborhood or community. They discuss population shifts and outline the demographic composition of the neighborhood. Often, they discuss the market conditions of particular places, including rental prices or home sale values. These portraits of local neighborhoods help citizens and planning professionals understand the neighborhoods in which they are working. You are required to write a 3-4 page (double-spaced) demographic analysis. Focus on using the quantitative data available to tell a story about the neighborhood for outsiders unfamiliar with it.
Use existing planning documents as your guide in this process. Before embarking on your own project, look at some of the planning documents or historic preservation reports released by the Office of City Planning. (Already, we have looked at those reports for Mt. Vernon Square and the area surrounding the convention center.) Look at the types of information they report, and the way they organize quantitative information.
The written portion of the assignment should include an interpretation of quantitative data that you have compiled from existing sources. You are expected to create 2-3 figures presenting a visual display of your data. These can include bar charts, line graphs, or other visual displays common in the planning literature. The demographic analysis should tell a convincing story about the neighborhood you are studying. It is not enough to simply list statistics for the readers to interpret; instead, you should use these statistics to tell a story about the neighborhood.
The quantitative data analysis assignment is due on Monday, November 18th.
Example: The number of subsidized units (count) by the type of subsidy (discrete) in Washington, DC.
How many of each type of subsidized housing unit (e.g., public housing, HCV, and LIHTC) exist in Washington, DC?
Example: The median household income (continuous) in the neighborhoods bordering the School of Continuing Studies.
How does the media income of the neighborhoods near SCS differ?
Example: Homeownership rate in neighborhoods across Washington, DC.
Instead of creating a bar chart with 39 bars (one for each neighborhood cluster), we might create a historgram to show us the frequency that each homeownership rate (continuous) occurred.
Example: Trends in the rate of violent crime over time.
How has the crime rate changed in Shaw/Logan Circle over the last ten years?
Best Fit Line
(Beginnings of Linear Regression):
We often talk about social phenomena that are correlated. When we discuss correlation, we're considering two continuous measures that co-vary - or that vary together.
When the value of one variable systematically changes as the value of the second variable change, we say that the two variables are correlated.
A scatter plot is a two-dimensional graph that shows the coordinates between two variables - X and Y - for all the observations in a data set. It provides visual evidence to assess whether two variables are correlated.
As reading scores increase, writing scores increase, as well. We would say that reading scores and writing scores are positively correlated.
Each dot on the scatter
plot is a different observation
in our data (in this case, each
dot is a different student
in our data)
Two continuous variables - X and Y - can
be said to be related in one of two ways:
1. Positive Correlation.
- When the value of X increases, the value
of Y increases.
2. Negative Correlation.
- When the value of X increases, the value
of Y decreases.
In addition to noting the direction of a correlation, we can talk about how strong the correlation is.
For example, shoe size and height are very strongly correlated. We can have a pretty good guess about what your shoe size is when we know your height.
Other variables have an association, but the correlation is much weaker. For example, we might know that hours slept is weakly correlated with exam scores. There is a relationship between them, but it is not a particularly powerful.
As a rule of thumb, we generally think of a correlation less than 0.2 as weak, 0.2 to 0.5 as moderate, and above 0.5 as strong.
Scatter Plots & Correlations (in Minitab or Excel):
1) Median family income & housing values?
2) Property crimes & housing sales
3) Poverty rate & % foreign-born
4) Unemployment & % black
The "best fit" line.
There are an infinite number of
lines that I could draw through the
data. How do I know which one
is the "best fit" line?
The "best fit" line is the line
that minimizes the amount of
error between each observation
and the regression line.
For the moment, suffice it to say
that the "best fit" line is the line
that best reduces the amount
of error between each observation
and the line.
For each observation, the difference between the observed value and the predicated value is the error term.
Note: Pearson's r always ranges from -1 to 1.
The sign indicates whether the variables are positively or negatively correlated.
The value (absolute value) indicates the strength of the correlation.
-1 indicates a perfect negative correlation
1 indicates a perfect positive correlation
0 indicates that variables are uncorrelated
Logistic Regression: We use a logistic regression when the outcome variable is dichotomous (yes/no), rather than continuous. With logistic regression, we talk about odds or odds ratios.
Example: Distribution of the number of housing choice vouchers across census tracts in DC.
Data Analysis using Minitab
- Simple "point & click"
- Doesn't require programming knowledge
- Useful for simple descriptive statistics
- Challenges of large datasets
- Good for analysis; not good for visual displays
- Free and available from UIS @ Georgetown
- Federally-subsidized housing units in Washington, DC (Source: HUD Picture of Subsidized Households)
- Federally-subsidized housing units in Washington, DC, by Census Tract (Source: HUD Picture of Subsidized Households)
- Aggregated Census/ACS data in Washington, DC, by neighborhood cluster (Source: Neighborhood Info DC)
- Find mean, median
- Display counts
- Describe the distribution, range
Question: What is the median number of public housing units in a census tract in Washington, DC?
Question: What is the mean/average number of public housing units in a census tract in Washington, DC?
Question: How many census tracts in Washington, DC contain zero public housing units?
Question: What is the average household income in the census tract with the highest number of public housing units?
Question: Recode the data on public housing units to create a frequency table showing the number of census tracts containing zero public housing units, 1-50 public housing units, 51-100 public housing units, and 101 or more public housing units. Create a frequency table.
Question: Recode data on average household income to identify the number of neighborhoods where the median income of public housing residents is greater than $12,000.
To calculate basic descriptive statistics: STAT - Basic Statistics - Display Descriptive Statistics
To calculate basic descriptive statistics: CALC - Column Statistics
To create a frequency distribution: STAT - Tables - Tally Individual Variables
To create a cross-tab: STAT - Tables - Cross-Tabulation and Chi-Square
To calculate correlation: STAT - Basic Statistics - Correlation
Making Charts & Graphs:
- Excel for Bar Charts & Line Graphs
- Minitab for Histograms
Codebook: Each dataset comes with a codebook
that identifies how the variables are coded in the data. Often, it includes numerical identifiers of missing data to recoded before the analysis.
OLS (Ordinary Least Squares) Regression: We use regression analysis to understand the relationships between variables. OLS regression analysis is used when the dependent variable is continuous. We talk about how a unit-change in an independent variable is associated with an outcome variable.
1. Quantitative Data Projects
2. Data, Big Data + Planning Research
3. Data Availability
4. The Promise of Big Data?
5. Data+ Planning Research
6. Speaker: Kevin Donahue, CAPSTAT
Quantitative Data Projects
Group 1: McMillan, Pleasant Plains, Bloomingdale, etc.
Group 2: Near Southwest
Group 3: Georgia Avenue/Takoma
- What type of data did you use?
- What type of analysis did you do?
- What did you find?
Data Availability: Washington, DC
- What types of data is available to planners and urban policymakers in Washington, DC?
- What types of research and planning applications are useful for those data?
Housing and Foreclosure Data
Property Sales & Assessment Data
- Building Permits
- Liquor License
- Public Space Permits
- Educational Data
What is Big Data?
What are the features of
Parents' Employment Status
Examples: 311 Calls in DC
Property as a Complex Structure
- Structural features
- Transaction data
- Neighborhood characteristics
- Transportation accessibility
- Proximity to schools, stores, crime, etc.
- Time variation
What promise does Big Data
hold for cities and urban
Challenges of Big Data?
- Too much data, "noise"
- How do children get to/from school, and what are the reasons behind those choices?
- What is the relationship between home/school distance and the mode of travel?
- What neighborhood characteristics influence those choices?
- Survey research in four middle schools in Bend, Oregon and Springfield, Oregon
- Demographic indicators, geo-coded address data
- Primary mode of travel; whether children ever "actively" traveled to school (e.g., bike, walk)
- Questions about distance to school; measures about urban form (e.g., intersection density, route directness, major roads, railroads)
- What is the relationship between urban form (e.g., metropolitan region, sprawl) and exposure to poor air quality?
- Decennial census data on neighborhoods
- Environmental Protection Agency (EPA) data on neighborhood-level air quality
- Sprawl index, includes several measures of sprawl
- Knowing something about modes of transportation - and the impediments to active modes of transportation - can help planners in thinking about the relationship of schools and homes, or about transportation availability in neighborhoods.
- Addresses questions about whether infill development could improve health outcomes by putting people in dense areas, or whether they increase exposure to poor air quality.
- Do parking requirements (making parking spaces available with residential units) make housing more expensive?
- How do parking requirements shift the development decisions that developers make?
- Surveys with developers
- Administrative data on housing construction, parking spaces
- Contributes to debates about whether new developments should have parking minimums, and especially when these developments are located in cities with a strong public transportation infrastructure and marketed toward demographic groups less like to own cars.
- Does subsidized housing improve neighborhoods by leading to investments in communities or local improvements?
- How (or why) would we expect housing investments to change neighborhood characteristics - and especially, school outcomes?
- Administrative data on the location and type of subsidized housing assistance
- School-level data on student achievement, teacher characteristics
- Addresses issues of externalities associated with the placement of subsidized housing.
- Raises the possibility of positive benefits (to local schools) while most studies are concerned about the negative externalities (e.g., crime)