Check all the PDF's pages for oversized page dimensions #19987

michelpmcdonald · 2024-12-20T20:54:06Z

Summary

Benefits Intake PDF uploads include a check for PDFs who's pages physical dimensions exceed a specific width or height.

The command line tool pdfinfo is used to fetch the PDF's dimensions. The way we were calling pdfinfo only provides the size of the fist page within the PDF, however, individual pages within the PDF can(and do) have different dimensions.

Only checking the size only the first page in the PDF will cause a downstream error with a somewhat unspecific error message(~corrupt PDF detected, guess it comes from doc conversions service) if the unchecked pages dimensions are too large. The error message returned from the downstream system does not clearly state that the PDF is oversized.

We can't really get rid of the dimension restrictions, we can't prevent the consumer from uploading oversized PDFs, but we can check ALL the pages prior to submitting downstream and provide our current pretty clear\specific oversized error message so at least the consumer knows what the issue is.

No flipper flag in play here.

The solution was to provide pdfinfo command line arguments that instructs it to check all the pages' dimensions, and we had to slightly modify the pdfinfo results parsing code.

Team Banana-peels, part of this is our code, but part of this is shared code.

For reviewers, modules/vba_documents/spec/fixtures/10x102.pdf was changed, now the oversized page is the second page in the pdf, not the first page like it as prior to this PR.

*This work is behind a feature toggle (flipper): NO

Related issue(s)

API-43483

Testing done

New code is covered by unit tests
In addition to unit tests, manually tested locally

Screenshots

Note: Optional

What areas of the site does it impact?

Benefits Intake PDF validation

Acceptance criteria

I fixed|updated|added unit tests and integration tests for each feature (if applicable).
No error nor warning in the console.

kristen-brown

Clean solution! I like it!

Assuming that you evaluated performance of checking every page in a PDF with a very large number of pages (like 215,000 pages, to grab our largest outlier from the monthly report) and that the file can still be processed successfully (I think you may have already mentioned doing this in standup), this looks good to me!

kristen-brown · 2025-01-10T22:02:13Z

modules/vba_documents/lib/vba_documents/pdf_inspector.rb

@@ -91,7 +91,7 @@ def pdf_metadata(pdf)
        dimensions: {
          height: dimensions[:height].round(2),
          width: dimensions[:width].round(2),


I wonder how useful these height and width values in the metadata are anymore, since they're only for the first page. 🤔

Not suggesting any specific changes, just something I'm thinking about.

kristen-brown · 2025-01-10T22:05:11Z

lib/pdf_info.rb

@@ -18,7 +18,7 @@ def self.read(file_or_path)

    def initialize(path)
      @stdout = []
-      Open3.popen2e(Settings.binaries.pdfinfo, path) do |_stdin, stdout, wait|
+      Open3.popen2e(Settings.binaries.pdfinfo, '-l', '-1', path) do |_stdin, stdout, wait|


I wonder if it's worth adding a comment here to explain what this option with the -1 magic number passed in does.

It also looks like we are the only team using this library (vba_documents and appeals_api modules), so we won't be affecting anyone else's implementations.

Check all the PDFs pages for oversized page dimensions

1aa3213

github-actions bot added the test-failure label Dec 20, 2024

va-vfs-bot temporarily deployed to API-43484/main/main December 20, 2024 23:07 Inactive

michelpmcdonald added Lighthouse lighthouse benefits-intake Lighthouse Benefits Intake API banana-peels Lighthouse Banana Peels Team labels Jan 6, 2025

michelpmcdonald self-assigned this Jan 6, 2025

Merge branch 'master' into API-43484

a3f4021

va-vfs-bot temporarily deployed to API-43484/main/main January 6, 2025 16:10 Inactive

Merge branch 'master' into API-43484

e52b8fa

va-vfs-bot requested a deployment to API-43484/main/main January 8, 2025 14:48 Pending

Unit test fixes

e75e180

va-vfs-bot temporarily deployed to API-43484/main/main January 9, 2025 07:02 Inactive

github-actions bot added test-passing and removed test-failure labels Jan 9, 2025

Merge branch 'master' into API-43484

fc39132

va-vfs-bot deployed to API-43484/main/main January 9, 2025 17:31 View deployment

michelpmcdonald marked this pull request as ready for review January 9, 2025 17:59

michelpmcdonald requested review from a team as code owners January 9, 2025 17:59

github-actions bot added the require-backend-approval label Jan 9, 2025

kristen-brown approved these changes Jan 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check all the PDF's pages for oversized page dimensions #19987

Check all the PDF's pages for oversized page dimensions #19987

michelpmcdonald commented Dec 20, 2024 •

edited

Loading

kristen-brown left a comment

kristen-brown Jan 10, 2025 •

edited

Loading

kristen-brown Jan 10, 2025

kristen-brown Jan 10, 2025

Check all the PDF's pages for oversized page dimensions #19987

Are you sure you want to change the base?

Check all the PDF's pages for oversized page dimensions #19987

Conversation

michelpmcdonald commented Dec 20, 2024 • edited Loading

Summary

Related issue(s)

Testing done

Screenshots

What areas of the site does it impact?

Acceptance criteria

kristen-brown left a comment

Choose a reason for hiding this comment

kristen-brown Jan 10, 2025 • edited Loading

Choose a reason for hiding this comment

kristen-brown Jan 10, 2025

Choose a reason for hiding this comment

kristen-brown Jan 10, 2025

Choose a reason for hiding this comment

michelpmcdonald commented Dec 20, 2024 •

edited

Loading

kristen-brown Jan 10, 2025 •

edited

Loading