Skip to content

Get papers from the Electronic design competition in China.

Notifications You must be signed in to change notification settings

Xhen-Starry-Night/PaperGetting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper Getting Tool

A Python script to download competition problem sets from URLs and stitch the images together vertically into a single combined image.

Features

  • Downloads competition problem pages from a list of URLs
  • Extracts images from the pages in order
  • Stitches images vertically to create a single combined image
  • Saves output images to the output/ directory with properly sanitized filenames
  • Includes retry mechanisms and proper error handling

Requirements

  • Python 3.x
  • requests
  • BeautifulSoup4
  • Pillow (PIL)
  • re (for sanitizing filenames)

Setup

  1. Install the required packages:
pip install requests beautifulsoup4 pillow
  1. Activate your virtual environment (recommended):
source .venv/bin/activate  # On Linux/Mac
# or
.venv\Scripts\activate     # On Windows
  1. Create a target_urls file with the URLs you want to process, one per line

  2. Run the script:

python paper_getter.py

File Descriptions

  • paper_getter.py: Main script that handles newer page structures (post-2021)

    • Designed to handle current page layouts
    • Looks for elements with classes content-desc and content-text
  • paper_getter_old.py: Specialized script for 2021 and earlier competition problems

    • Only capable of extracting competition problem images from 2021
    • Does not support competition problem retrieval for years prior to 2020
    • Handles older page structures with classes like newsMain-content-title

Output

Processed images are saved in the output/ directory with filenames based on the competition title and a timestamp to prevent duplicates.

Notes

  • The script adds delays between requests to be respectful to the server
  • Images are saved in PNG format to preserve quality
  • Filenames are sanitized to remove potentially problematic characters

About

Get papers from the Electronic design competition in China.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages