Skip to content

ycliu0214/PDFCleanScan

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

PDFCleanScan: Scanned Document Denoising Tool (GUI)

A simple, standalone Python application designed to clean and enhance scanned PDF documents by effectively removing background noise and standardizing text to high-contrast black-on-white.

Features

  • Adjustable Parameters: Fine-tune Denoising Strength (H) and Contrast Threshold (C) using intuitive, integer-based sliders.
  • Standalone Executable: Download and run directly on Windows without the need to install Python or external dependencies.
  • Clean Output: Ensures high-contrast, professional, and print-friendly PDF files.

How to Use (Standalone .EXE)

  1. Download the latest release (e.g., PDFCleanScan.exe or the zip folder) from the Releases page.
  2. Run the executable file (PDFCleanScan.exe).
  3. Click "Browse..." to select your noisy scanned PDF document.
  4. Adjust the H and C parameters using the sliders based on the quality of your scan.
  5. Click "Convert" and choose the save location for the new, cleaned PDF.

Parameters Guide

The processing uses the OpenCV library's Non-Local Means Denoising and Adaptive Thresholding methods. Adjust these two core parameters for optimal results:

Parameter Range Default Description & Effect
Denoising Strength (H) 5 - 35 10 Controls the intensity of noise removal. Lower H retains more fine text detail but leaves more background noise. Higher H aggressively removes noise but risks thinning or blurring the original text.
Contrast Threshold (C) 1 - 15 5 A constant subtracted from the mean to fine-tune the threshold. Lower C makes text lines thicker/bolder, restoring subtle details. Higher C thins the text and ensures a whiter background, useful for high-quality scans.

Antivirus False Positives Notice (IMPORTANT)

This tool is created using the Python bundling utility PyInstaller to create a convenient, standalone executable. It is a common industry challenge that antivirus software (such as Windows Defender, Avast, etc.) may mistakenly flag files created by these bundling tools as potentially malicious. This is known as a False Positive.

This program does not contain any malicious code and is safe to use.

If your antivirus software blocks the executable:

  1. Add an Exclusion: The quickest solution is to temporarily add the executable or its containing folder to your antivirus program's exclusion list.
  2. Download the Directory Version: If available, download the directory version (usually a ZIP file containing the executable and libraries) instead of the single .exe file.

Source Code and Dependencies

The source code is provided for transparency and review. This tool is built using Python 3 and the following core libraries:

  • PyMuPDF (fitz): For PDF rendering and image extraction.
  • OpenCV (cv2): For image processing (denoising and thresholding).
  • Pillow (PIL): For final PDF assembly.
  • Tkinter: For the graphical user interface.

License

This project is released under the MIT License. For full details, see the LICENSE.txt file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages