Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

can your data be trusted ?

OH yeS!
by

daniel herrera

on 10 March 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of can your data be trusted ?

This is more complex. Although can be useful and trustworthy data, there is also data that cannot be used.
Data scientists can assess whether data you're using is safe or not
Get over your fear of data.
Assess Data Quality Independently
Develop Your Own Data Quality Statistics
Understanding analytics
can your data be trusted ?
Analytics often involves studying past historical data to research potential trends, to analyze the effects of certain decisions or events, or to evaluate the performance of a given tool or scenary
Don´t be afraid when it comes to data search if you look for information in trustworthy sources with precaution.
Make sure you know where the data was created and how it is defined, not just how your data scientist accessed it.

Figure out which organization created the data.

Dig deeper:
What do colleagues advise about this organization and data? Does it have a good or poor reputation for quality? What do others say on social media?

Do some research both inside and outside your organization.


Use the "Friday Afternoon Measurement" : Lay out 10 of 15 important data elements for 100 data records on a spreadsheet

Work record by record, taking a hard look at each data element. The obvious errors will jump out at you like spelling errors, false items, etc.

Mark errors in a red pen and in if you see a lot of red errors than you know that data cannot be trusted




Data cleaning is in three levels:

1.
Rinse -
replaces obvious errors with missing value or corrects them

2.
Scrub -
involves deep analysis of data and making corrections one by one sometimes by hand

3.
Wash -
mixture of rinse and scrub
Data Cleaning
Employ all possible means of scrubbing and eliminate incorrect data.



After the initial scrub, move on to washing the remaining data which should be performed by a data scientist.
One wash technique involves “input” of missing values using statistical means or algorithims
scrubbing
Full transcript