-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate the testsuite from model-based coverage to tag-based coverage to make it faster #19
Comments
Great idea. I would love a pull request! In
|
Just coded a little node script that implements the "one sample image per decodable tag" approach described above. My Results are promising: if you want just one sample image per tag, you would only have to include the following 212 samples, which contain 3695 different tags in total (assuming 50 kb per image, this would mean ~ 10 MB for all sample images instead of currently 6245 samples consuming 168 MB):
(Will polish the script a bit before I prepare the pull request) |
Update: total filesize of the 212 samples is 8856892 bytes or approx. 8.45 MB -> this would eliminate the need for a separate test subrepo |
Excellent work, thanks for that. That is a massive improvement. At the risk if complicating what you've already done, there is a slight tweak you could make to your algorithm that may reduce the number of images further (although it will be slower because it re-sorts on each iteration). Currently, imagine if you have 3 images, each with the following tags: Your algorithm would add all 3 images, when really only image 1 and 3 are needed to represent the full set of tags. I would suggest the following:
We need to have a mechanism to allow adding new photos from new cameras in the future which should be straight forward with this script. Cheers, |
Hi Matt, thanks for your suggestion! I actually had the same thought - we should sort the images by "difference" to the current set of tags and pick the image that brings the most new tags to the set. After that the set is updated with the new tags and the difference sort starts again until no more images are left. I will update my script and check the results. |
Yeah, I'm not sure it'll make a massive difference. It's just my pedantry getting the better of me! step 3 should really be something like: This will support easy adding of photos at a later date. |
Another word of warning though: currently the tag names used in the exiftool.js test reports are not vendor-prefixed. I discovered some tag clashes like the "LensType" tag which exists in Canon and Nikon Makernotes. Technically, these are two different tags, but currently they would count as one tag: http://www.sno.phy.queensu.ca/~phil/exiftool/TagNames/Canon.html#LensType I will also introduce a namespace prefix to avoid such clashes. |
Good point. Similarly, I flatten tags found in exif, or makernotes, or xmp etc. when occasionally there are clashes. |
After incorporating the "maximum difference" algorithm and adding namespacing to the tags, I managed to reduce the sample image count from 212 to 176. Note that the number of unique detected tags has increased from 3695 to 4044 (I was using the brand-new ExifTool 10.0 to regenerate the JSON metadata reports with tag namespacing. In my former test I used the pre-generated reports from the repo which were created with an older ExifTool version. The old unique tag number might also be too low due to tag name clashing which is now prevented by namespacing). Only downside is that my deduplication script takes now five times longer to run due to the frequent array diffing. Here is the new output:
|
Excellent. That's an even better test suite reduction than I was expecting |
Currently the exiftool.js coverage testsuite consists of about 7000 sample images from different camera models. This impressive number also brings some issues to the coverage suite:
IMHO, exiftool.js should switch to a tag-based coverage, where there is exactly one sample image per decodeable tag (e.g. Exif Make). Further sample images should only be added in case of known regressions (e.g. some Nikon model writing a wrong datetime tag).
Advantages:
Approach:
Please tell me if you are interested in a pull request.
The text was updated successfully, but these errors were encountered: