A set of scripts to extract hyperlinks in a PDF file and check for the presence of URL tracking tags, such as UTM parameters for Google Analytics
Run python pdf_workflow.py pdfname.pdf
from the command line.
Check requirements.txt for required modules (pdfx
, pdfminer
)
- This script can easily be adapted to handle multiple PDF files
- Offer a web app version where PDF files can be uploaded and processed
- Adjust output, currently in list form but you can easily modify the script to suit your needs.