Introducing
Your new presentation assistant.
Refine, enhance, and tailor your content, source relevant images, and edit visuals quicker than ever before.
Trending searches
Generation of a Skeleton Corpus of Digital Objects for the Validation and Evaluation of Format Identification Tools and Signatures
~ Create ~
~ Communicate ~
~ Test ~
~ Submit ~
~ Host ~
Duplicate Identifications : 13
Regex Bugs* : 2
* Worked in DROID 4.0 not DROID 6.0
DROID Nuances Discovered : 1
~ Not always possible to submit sample files
https://github.com/exponential-decay/skeleton-test-suite-generator
~ IPR free
~ Size makes it distributable (400kb)
~ Already benefits FIDO
~ 537 records 672 signatures, unique files
~ Demonstrated value - DROID v DROID nuances
ONE
TWO
?? | * | {n} | {m-n} | (a|b) | [a:b] | [!a] | [!a:b]
~ Artificial e.g. PDF 1.4 - %PDF-1.4%%EOF
~ Psychological illusion whether good enough?
~ Smaller set of files automatically
~ Signature change = unit test change (DROID)
~ Only tests identification mechanisms
call to arms - Fetherston and Gollins (2012) : doi.org/10.2218/ijdc.v7i1.211
~ Identification ~ Feature Extraction ~ Validation ~
Conclusion
Still Picture Interchange File Format - FMT/112
by Ross Spencer
FFD8FFE800205350494646000100(00|01|02|03|04){11}(00|01|02|03|04|05){9}FFE8
~ Effort involved in full test-corpus ~
Naively: Extract into 5 x 6 skeleton files
but: one profile is bi-level
five compression types apply to bi-level images only
not 30
~ Effort involved in full test-corpus ~
P - 1 bi | 4 continuous-tone
BPS - 1 bi | 5 continuous tone
C - 5 bi | 2 continuous tone
R - 3 bi | 3 continuous tone
Profile ID
(1 x 1 x 5 x 3) + (4 x 5 x 2 x 3) =
Bits Per Sample
Compression
Resolution Units
immediate impact
best return for investment
manual generation?
greater stability
Container formats (ZIP, OLE2)
Maintenance
lessons learned
DROID codebase
Ross Spencer - IDCC 2013-01-16T14:00+01:00