Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

But Why?

No description
by

Chevas Yeoman

on 4 September 2014

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of But Why?

Chevas Yeoman, MPH
Division of Aging Services
Research Epidemiologist

But Why?
"In my experience..." vs "The data is saying...

- Don't have to be exclusive

- Decision "Support" Systems

- Empowerment Tools

Column and Row Layout Matter!
- Hierachichal Format
- Big term for Widest to Narrowest categories. (Year, Quarter, Month, Day, Time)
- This allows you to filter down your data and dig to find trends and answer questions.
- Become your own investigator and report writer.
Original Data Collection
- Start your own database
Outside Sources
- Use common variables like locations (counties, zip codes) and times (years, quarters, months) to connect your data to other sources to tell a story and support your theories.

Variable Types
- Categorical (Red, Blue, Green, Yellow)
- Binary (Yes, No) (1,0)
- Continuous (counts, measurements)
- Date and Time (dd/mm/yyyy 00:00)

Best Practices
- Never leave a cell empty!
- Have unknown or not applicable values like -999, -666, and/or 9999. Give them definitions!
e.g.: -666=not applicable, -999=could not obtain, and 9999=missing from file
- Otherwise there is no way of classifying an empty cell.
- Stay consistent
- Yes is always "yes", not "Y" or "ok"
- Use dropdowns to limit variation in inputs and help define coded variables. See top right instuctions.
Variable Types and Practices
Pivot tables allow for quick comparisons using totals, percentages, and counts filtering using your hierarchical format.

Example: CCSP Falls Data (Unduplicated Explanation)
Questions to answer by PSA:
- Most falls per client (Ratio/Cumulative Incidence)
- Most clients that fell (% of Clients)
- Most repeat fallers (% of Clients and % of Clients that Fell)
- Most repeat falls (Ratio of Falls to Repeat Fallers)

Data Link:
https://docs.google.com/spreadsheets/d/1WERROQJEOOp1E85w6ZvF71B6-8PnnqHusGHRyLruSr4/edit?usp=sharing
Pivot Tables in Excel
All of the following sources let you find data at least down to the county level.

County Health Rankings (Health Outcomes and Predictors)
http://www.countyhealthrankings.org/sites/default/files/state/downloads/2014%20County%20Health%20Rankings%20Georgia%20Data%20-%20v1.xls

AGing Integrated Database - AGID (Aging and Disability Population Statistics)
http://www.agid.acl.gov/CustomTables/Pop_County/Year/

USDA ERS Rural-Urban Continuum Codes (Rurality scale based on population and metropolitan proximity)
http://www.ers.usda.gov/data-products/rural-urban-continuum-codes/.aspx#.U-rNRmOtals

Labor and Jobs Markets
http://onthemap.ces.census.gov/

Georgia Dept of Public Health (Geogia Specific Health Statistics)
http://oasis.state.ga.us/oasis/

ARC's Neighborhood Nexus (Combination of Health, Pop, School, Legislative Statistics for Atlanta and the State of Georgia)
http://www.neighborhoodnexus.org/maps

Supporting Data Sources
Data link: Click File, Download, As Excel File
https://docs.google.com/spreadsheets/d/1VruWgE5Qld4cQNgwQ_DVysjB022ox1EXMnsvz13hZZI/edit?usp=sharing

Finished Tableau Link:
https://public.tableausoftware.com/views/SIGExample/Dashboard1?:embed=y&:display_count=nogfdasdgadfgadfgsdgdfs
SIG AGID Population Example
Chevas Yeoman, MPH
chyeoman@dhr.state.ga.us
404-463-0051
Contact Information
THANK YOU!
Step 2 - Data Structuring
Step 1 - Data Collecting
Step 3 - Data Visualization
Excel
: To add data to the current table use the vertical lookup command.
=vlookup(Part 1, Part 2, Part 3, Part 4)
-Part 1: matching cell from current table,
-Part 2: specify the table that contains both the other matching cell and the data to bring over
-Part 3: what number is the column that contains the data to bring over (2,3,4...) with the matching cell column being number 1
-Part 4: 0 means it will bring over only exact matches, 1 will find the closest match. Always use 0 or you may bring over erroneous data

- Example: In the cell d2 I write =vlookup(c2,$b$1:$c$160,2,0)
- In order to work!: The table you choose that contains both the matching cell and data to bring over (Part 2) must have the column with the matching cell as the far left column (column 1) and that column must be sorted in ascending order.

Tableau
: To make a relationship between two tables and add data to a variable
Join with common variable (Example: county)

Integrating Data Sources
Vlookup Command - Step 1
Vlookup Command - Step 2
Vlookup Command - Step 3
Vlookup Command - Step 4
Vlookup Command - Step 5
Vlookup Command - Step 6
Vlookup Command - Step 7
Vlookup Command - Step 8
Vlookup Command - Step 9
Vlookup Command - Step 10
Vlookup Command - Step 11
Vlookup Command - Step 12
Setup Table Sorting - Step 1
Setup Table Sorting - Step 2
Joining Table for Later in Another Tab
We start off with the raw dataset we pulled from the AGID ACL website. It did not allow us to include the PSA level stratifier so we must add it ourselves.
http://www.agid.acl.gov/CustomTables/Pop_County/Year/


First, we insert a column next to county because that is what we will use to find the PSA associated in our CountyPSA table.

Then we are ready to start using the command =vlookup( . You can see that Excel gives you a guide to show which section of the command you are currently entering and you can press the "
f
x" button next to the formula bar for further explanation and guidance afer you have entered this into the cell.

Our lookup_value is the cell which we want to match to our reference table that contains the data we wish to bring over. In this case we want to match "Appling County" to the Appling County that exists in our CountyPSA table in the next step. So we click C2 and it enters it in for us.

The table_array means our reference table that has both the county we are matching and the PSA we want to bring over. We can do this all by clicking the CountyPSA tab at the bottom and then dragging out the table we wish to use. We must keep the column that has the matching data to the far left so we start with B1. The PSA names are in column C and there are 160 rows so we drag our array down to C160.

This is EXTREMELY IMPORTANT! We need to fix the table_array at B1:C160. We do this by double clicking B1 in the formula bar and pressing the F4 key on the keyboard. This will add a $ before the column letter B and the row number 1. Do this again for the C160 as well. These $'s tell excel not to change that position when we copy and paste this formula. Now, when we copy the formula downwards for all of the counties at the end, the formula doesn't automatically move our table array place holders of B1:C160 to B2:C161,B3:C162...and on and on. We don't do this for our lookup_value C2 because we want that to automatically increase to C3, C4, C5... when we copy downwards. This can be done at the end of writing the command, but do not forget!

The col_index_num is the number of the column in the table_array that we assigned ($B$1:$C$160) that contains the value we wish to bring over (PSA Short). The table has 2 columns and PSA Short is the 2nd column, so the number is 2.

A look back at our table_array to see what column number references our data we want to bring over. Excel knows that column 1 contains the matching data (counties) because we are required to have that column in the far left of the table array and sorted ascending for vlookup to work.

The last part range_lookup refers to whether excel will try to find an exact match (0) or an approximate match (1). We want exact matches only so we don't have any erroneous data pulled over so we use 0. Put on the ) and we are done!

Now for the final and most fun part, populating the rest of the rows with the same formula. We can do this by copy and pasting one at a time, dragging down to a specified number of rows with the bottom right black square, or the entire rest of the column by double clicking the bottom right black square on the cell.

Before you start using vlookup!
Now the fun part!
This is the table that you will be using that contains the matching variable and the new data variable that you wish to bring over to the original table. We must sort the entire table by the matching variable column in ascending order. First we select the table by either selecting the columns or pressing CTRL+A to highlight the entire table of data. Then we use the "Sort and Filter" button and click "Custom Sort"
Next we click the "My Data has headers" check if our variables are named in the first row and then on the Column "Sort by" dropdown find the common matching variable column which in this case is "County Name". We need to also make sure that under the Order we have ascending order selected. "A-Z" for text and "lowest to highest" for numbers.
Let's Figure This Out Together
The Function Button give you more detail about what you should be selecting, if it is a correct selection, and what the outcome will be to check your work. You can use this in the beginning after typing in =vlookup( .
Vlookup Instructions
Finally getting to see the fruits of all our data cleaning labor. It's magic time...
Always have a question in mind when picking what data to collect
- What are we trying to change (Outcome or Dependent Variable)
- What can we affect to elicit that change (Independent Variables)
- What could have an affect on my outcome regardless of my influence (Cofounders)

Monitoring vs Analyzing
What and Why? (Variables and Outcomes)
Dropdown instructions
Create a Dropdown List in a Seperate Tab - Step 1
Click the "Data" Tab up top - Step 2
Click "Data Validation" - Step 3
Select "List" from the Allow Dropdown - Step 4
Select your dropdown range - Step 5
Excel will not let you use the selection tool if the dropdown choices are on another sheet. Instead you must type in the name of the sheet and an !. In this case the name of the sheet is Dropdowns so our entry is =Dropdowns!G1:G5. Then we must add our $ to make it =Dropdowns!$G$1:$G$5.
We can add the dollar signs by selecting G1 and then pressing the F4 key. Then we do the same for G5. This tells excel to not move that reference for the dropdown options beginning and end point if we copy the dropdown to another cell.
All Done!
Highlight the ID column which represents persons that had a fall during this time period SFY 2013. Go to Conditional Fomatting, Highlight Cell Rules, then Duplicate Values

Choose Duplicate and whatever color you wish to highlight the cells

As you can see, the duplicates are highlighted and the single values are not. Now we know who had just one fall, but the multiple fall people data is still duplicated.

Using Advanced Filter, we can seperate out the unique values in a column to show how many people had falls vs the total number of falls.

We choose to copy to another location and check the Unique records only box at the bottom.

This is the same process as before using the =Vlookup( command except for one difference. We are going to be copying this cell both horizontallly and vertically to fill our table. To make sure we get the right results we have to use our $'s again.
The $ before the column A tells excel that our lookup_value will always be in that column. This way, when we copy to the right for our other variables, it doesn't automatically change to B2, C2, D2....etc.

Copy the new unique column O and paste it into our Unduplicated Falls tab. Next we fill in the rest of the information in the tables using =Vlookup( commands.
We drag cell b2 to the right for the rest of our variables but you can see that they all say COBB. This is because the index_num saying what number column to find the data to pull over does not automatically change as we drag and copy. We have to go into each columns vlookup formula manually and add 1 to each one as we go to make sure the index_number is correct. We need 2 - 13 for our table.
But you can see on the last slide that the Dates and Times did not format correctly. We correct this by highlighting the first row from our original table and copy CTRL+C
This is the table that you will be using that contains the matching variable and the new data variable that you wish to bring over to the original table. We must sort the entire table by the matching variable column in ascending order. First we select the table by either selecting the columns or pressing CTRL+A to highlight the entire table of data. Then we use the "Sort and Filter" button and click "Custom Sort"
We click the button for formats and then press OK.
Then we highlight our new row in the new table and right click and go to Paste Special...
Now our formats look correct for our dates and times so we can fill in the rest of our table by highlighting B2:M2 and then double clicking the little black square on the bottom right.
All Done!
Now we can repeat our unique advanced filter one more time for the duplicates from the first highlighting and move them into their own sheet Repeat Fallers Unduplicated. This gives us the number of clients that had multiple falls.
Pivot Table with Duplications - Highlighting Duplicate Values - Step 1
Pivot Table with Duplications - Highlighting Duplicate Values - Step 2
Pivot Table with Duplications - Highlighting Duplicate Values - Step 3
Pivot Table with Duplications - Unique Advanced Filter - Step 1
Pivot Table with Duplications - Unique Advanced Filter - Step 2
Pivot Table with Duplications - Unique Advanced Filter - Step 3
Pivot Table with Duplications - Unique Advanced Filter - Step 4
Pivot Table with Duplications - Vlookup - Step 1
Pivot Table with Duplications - Vlookup - Step 2
Pivot Table with Duplications - Vlookup - Step 3
Pivot Table with Duplications - Vlookup - Step 4
Pivot Table with Duplications - Vlookup - Step 5
Pivot Table with Duplications - Vlookup - Step 6
Pivot Table with Duplications - Vlookup - Step 7
Pivot Table with Duplications - Vlookup - Step 8
Pivot Table with Duplications - Vlookup - Step 9
Pivot Table with Duplications - Vlookup - Step 10
Pivot Table with Duplications - Vlookup - Step 11
Pivot Tables, duplicates, and Uniques Instructions
Now we can use vlookup again to fill in the rest of the table, but this time we can just copy and paste from our previous work in Unduplicated Falls.
Pivot Table with Duplications - Vlookup - Step 12
Highlight and Copy B2:M2 in Unduplicated Falls,
Pivot Table with Duplications - Vlookup - Step 13
Paste into our new table Repeat Fallers Unduplicated.
Pivot Table with Duplications - Vlookup - Step 14
Voila!
Pivot Table with Duplications - Vlookup - Step 16
The formats stay with it this time so all we need to do is the double click of the black square again and we are done!
Pivot Table with Duplications - Vlookup - Step 15
Pivot Table with Duplications - Pivot Tables - Step 1
We tell it to copy to the O column in cell 1. This keeps it seperate from our other table and we can copy it from there to our unduplicated tab afterwards.
Pick any of our 5 tables we made and select Insert on the top, then Pivot Table on the far left.
Pivot Table with Duplications - Pivot Tables - Step 2
It automatically selects all of the data in the table you are currently selecting. This is why it is important to keep our unique copies at least one column away from our table when we select a place to copy.
Pivot Table with Duplications - Pivot Tables - Step 6
Now we have the table we want. We do this for the other 4 tables and we can start to do some real comparing of the PSAs across many different angles of falls data.
End Products
Table 1 was what I received that pushed me to do my own analysis to provide more insight to the problem.

Pivot Table with Duplications - Pivot Tables - Step 5
We want to change it to a Count so it will count how many rows are in the coulumn for ID. This will tell us how many falls or clients depending on if the table is duplicated or unduplicated.
Pivot Table with Duplications - Pivot Tables - Step 3
We select ID and PSA Name from the fields list and it automatically puts them in the row labels and Values boxes. But you can also see that it defaults to the SUM of the ID's because they are a number value. We can fix this.
Pivot Table with Duplications - Pivot Tables - Step 4
By clicking the black down arrow next to Sum of ID in the bottom left box we can now click Value Field Settings.
Full transcript