WeatherGovPlus

Repository containing supplementary material for the DocEng 2023 paper "WEATHERGOV+". Please see paper for details on implementation and results.

**Download dataset files from: ** http://web.uvic.ca/~mcote/WeatherGovPlus/WeatherGovPlus.zip

Requires:

Python 3.8
Docker
NVIDIA CUDA Docker Toolkit

This has been tested and run on Linux Mint 20 (Ubuntu 20.04) and CUDA 11.6. To use a different version of CUDA, you will need to build the docker containers for your system.

Download pretrained recognition model files

Go to https://github.com/hikopensource/DAVAR-Lab-OCR.git and download to DAVAR-Lab-OCR folder:

maskrcnn-lgpma-pub-e12-pub.pth
text_det_model.pth
text_rcg_model_e8.pth

Go to https://github.com/JiaquanYe/TableMASTER-mmocr.git and download to TableMASER-mmocr/checkpoints folder:

200~epoch_16_0.7767.pth
master_epoch_6.pth
pse_epoch_600.pth

Build and run the Docker images

Install Docker with NVIDIA Toolkit
Go to the recognition model folder:

cd table_recognition/[Model]
Build the docker image:

nvidia-docker build -t [docker name] .
Run the inference.py script:
docker run --rm -it --runtime=nvidia --gpus all --mount type=bind,source=../../evaluation/WEATHERGOV_PLUS,target=/data,readonly --mount type=bind,source=../../evaluation /results,target=/results [docker name] python inference.py
- This will bind the evaluation folder to the docker image and save the results to evaluation/results/[lgpma,davar]

An example can be found in example.sh.

Run the NLP evaluation using MVP

Create a virtualenv and install the packages in evaluation/requirements
Unzip the folders in evaluations/WEATHER_PLUS and wg/
- Your folder structure should be:
  - WEATHER_PLUS/TablesJPG
  - WEATHER_PLUS/TablesHTML_htags
Run the MVP NLP algorithm:

python run_analysis.py --html-dir evaluation/[davar,lgpma] [--run-teds] [--run-rouge] [--run-bleu]
- --run-teds: Run the TEDS algorithm on the table recognition results (this is rather slow)
- --run-[rouge,bleu] : Run the BLEU/ROUGE NLP metrics on the table summarization results

Citation

If you use this code or dataset for your research, please cite:

@inproceedings{dash2023weathergov+,
  title={WEATHERGOV+: A Table Recognition and Summarization Dataset to Bridge the Gap Between Document Image Analysis and Natural Language Generation},
  author={Dash, Amanda and Cote, Melissa and Albu, Alexandra Branzan},
  booktitle={Proceedings of the ACM Symposium on Document Engineering 2023},
  pages={1--10},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
evaluation		evaluation
table_recognition		table_recognition
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WeatherGovPlus

Requires:

Download pretrained recognition model files

Build and run the Docker images

Run the NLP evaluation using MVP

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

dash-uvic/WeatherGovPlus

Folders and files

Latest commit

History

Repository files navigation

WeatherGovPlus

Requires:

Download pretrained recognition model files

Build and run the Docker images

Run the NLP evaluation using MVP

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages