Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide for Internet Archive IIIF endpoint #20

Closed
sammeltassen opened this issue Sep 15, 2023 · 11 comments · Fixed by IIIF/guides#60
Closed

Guide for Internet Archive IIIF endpoint #20

sammeltassen opened this issue Sep 15, 2023 · 11 comments · Fixed by IIIF/guides#60
Assignees
Labels
documentation Improvements or additions to documentation High Priority

Comments

@sammeltassen
Copy link

For the documentation, would it be possible to specify which upload formats would work best with the current Cantaloupe integration? And to provide some best practices for pre-processing images?

For example, I presume that an image which is already compressed to jp2 before upload might significantly speed up the IIIF integration, compared to the original tif.

Related question: in the presentation it was mentioned that the current workflow uses IA's download link. How does it determine which file format to use, if multiple originals are available? Does it, for example, prefer jp2 over tif?

@sammeltassen sammeltassen changed the title Preferred upload formats for IIIF speed Preferred file formats for IIIF speed Sep 15, 2023
@glenrobson glenrobson added the discuss Issues to flag for discussion label Sep 21, 2023
@glenrobson
Copy link
Collaborator

We agree this would be good to include in the documentation. Glen believes it does prioritise jp2s over tiffs but will have to check.

We can borrow some instructions from:

https://blog.archive.org/2012/05/24/uploading-images-for-text-items/

and

https://help.archive.org/help/how-to-upload-scanned-images-to-make-a-book/

@glenrobson
Copy link
Collaborator

Note we found in the htj2k testing that ptiff was the fastest format so it would be interesting to see if this works with the IA setup:

https://journal.code4lib.org/articles/17596

@glenrobson glenrobson added documentation Improvements or additions to documentation and removed discuss Issues to flag for discussion labels Jan 18, 2024
@benwbrum
Copy link
Collaborator

@glenrobson
Copy link
Collaborator

Glen to provide Sara with some pTiffs to test with.

@glenrobson
Copy link
Collaborator

Note there is also a request on twitter: https://x.com/Mehrandhn/status/1764412753383456972?s=20

@saracarl
Copy link
Collaborator

Based on this:

the *_images.zip will be scanned for files it contains, at any directory level, whose names end with .jp2, .jpg, .jpeg, .tif, .tiff, .bmp or .png, matched case-insensitively; any other files (.xml, .txt, etc.) will be ignored.

and this:

  1. Use only jpg, jpeg, jp2, tif, tiff, png, gif or bmp files. Any combination of them is acceptable.

ptiffs aren't an option for upload.

@saracarl
Copy link
Collaborator

Whether to upload TIFFs or JP2s is dependent. Here's a ChatGPT generated summary of the trade-offs:

Performance Considerations:

Network Bandwidth vs. Server Processing: If network bandwidth is limited, JPEG 2000 may be preferable due to its smaller file sizes. However, if server processing power is limited, TIFF may be more efficient due to simpler decompression requirements.
Server Hardware: Some servers might have specific optimizations or hardware accelerators for JPEG 2000, which can greatly enhance its performance.
Client Requirements: If clients frequently request high-resolution partial images, the region decoding efficiency of JPEG 2000 might offer better performance.

@glenrobson
Copy link
Collaborator

Would be great to put this information in this file:

https://github.com/IIIF/guides/blob/main/guides/archive.org/index.md

Note discussion on slack also:

https://iiif.slack.com/archives/C06HUFT147P/p1714053128531299

@glenrobson
Copy link
Collaborator

glenrobson commented Apr 25, 2024

Can you upload a ptiff with a tiff extension.

  • Will it show in the archive.org/details
  • Will it work in Cantaloupe?

@saracarl saracarl changed the title Preferred file formats for IIIF speed Guide for Internet Archive IIIF endpoint Apr 25, 2024
@saracarl
Copy link
Collaborator

saracarl commented May 9, 2024

IIIF/guides#59

@saracarl
Copy link
Collaborator

saracarl commented May 9, 2024

So I uploaded a ptiff to IA (as a .tif) and it seems to "just work" for the IIIF endpoint, which is great.
Here it is in Mirador:
https://projectmirador.org/embed/?iiif-content=https://iiif.archive.org/iiif/3/tx-burnet-123835-1909-125000-geo/manifest.json
However the item details page in Internet Archive:
https://archive.org/details/tx-burnet-123835-1909-125000-geo
Can't/won't/doesn't show the image, even after the derivative process runs.

We then tried adding a jpg to the item in addition to the ptiff with a .tif suffix; in that case it showed the jpg in IA for the item but the manifest also referenced the jpg not the tif.

We don't current serve/support ptiff at all; just .tif files.

The tricky bit here is we want to privilege jpg over plain tifs, but pyramidal tifs over jpgs. So the ideal might be to upload both a .ptiff and a .jpg -- and our code should identify and choose the .ptiff.

Here's a test that includes both a jpg and a ptiff; we should privilege the ptiff over the jpeg in the IIIF manifest.
https://archive.org/details/tx-burnet-123835-1909-125000-geo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation High Priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants