PDF Link Check (Python script)

pdf_link_check.py checks the hyperlinks in a Portable Document Format (PDF) file. The script is a command line app.

Release: V1.1.1 2020.1.23

Install dependencies

You can either install the dependencies for this script by using PIP and the requirements file or installing each individual dependent module.

To use Pip

Navigate your CLI to the folder containing the repository with the requirements.txt file.
Run the following command:
```
pip install -r requirements.txt
```

Install individual modules

The script requires the following dependencies:

Python 3.6 or greater.
Python module: PyPDF2.

Install with PIP: pip install PyPDF2

For more information, see pypi.org.
Python module: Requests

Install with PIP: pip install requests.

For more information, see pypi.org.
Python module: CSV

Part of the Python core packages. No need to install with PIP. CSV stands for comma separated value.

For more information, see CSV File Reading and Writing
Python module: operator

Part of the Python core packages. No need to install with PIP.

For more information, see operator
Python module: Threading

Part of the Python core packages. No need to install with PIP.

For more information, see threading — Thread-based parallelism

Use `pdf_link_check.py`

Run pdf_link_check.py from your command line:

Open your command line and run: python <path to script>/pdf_link_check.py
The script will ask for the path of the PDF you would like to parse. Enter the absolute path name.
On a Windows 10 machine, this might look like: c:\<pathtoyourpdf>/pdffile.pdf
The script will ask for a location and filename where you would like to save the output.
On a Windows 10 machine, this might look like: c:\<pathtoyourreport>/pdflinkreport.csv
The script will run. The script displays in the terminal:
- PDF page number
- URI checked
- Response code. You can find more information about response codes at List of HTTP status codes.
- Error information for requests that fail. These are the exceptions raised by the Requests module.
The script will produce an "NA" rather than a response code for URIs that timeout after five seconds. The script will display the capture and display the error code in the terminal.
When the script is done, it saves the result to the pathname that you indicated. You can open the CSV in Microsoft Excel.

Run Pytest to validate returns

From the script directory, run pytest to validate the code. The tests use the PDFs in the data folder.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pdf_link_check.py		pdf_link_check.py
pdf_link_check_test.py		pdf_link_check_test.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Link Check (Python script)

Install dependencies

To use Pip

Install individual modules

Use `pdf_link_check.py`

Run Pytest to validate returns

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

mattbriggs/pdf-link-checker

Folders and files

Latest commit

History

Repository files navigation

PDF Link Check (Python script)

Install dependencies

To use Pip

Install individual modules

Use pdf_link_check.py

Run Pytest to validate returns

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Use `pdf_link_check.py`

Packages