Skip to content

update readme #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 9, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
218 changes: 185 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,70 +1,222 @@
# MindOCR (under development)
A toolbox of OCR models, algorithms, and pipelines based on MindSpore

**Features:**

- Support training and evaluation for text detection and recogintion models
- Performance tracking during training (by default, results are be saved in `ckpt_save_dir/result.log`)
# MindOCR

<!-- English | [中文](README_CN.md) -->

## Change log
[Introduction](#introduction) |
[Installation](#installation) |
[Quick Start](#quick-start) |
[Notes](#notes)

- 3.8
1. Add evaluation script with arg `ckpt_load_path`
2. Arg `ckpt_save_dir` is moved from `system` to `train` in yaml.
3. Add drop_overflow_update control

## Introduction
MindOCR is an open-source toolbox for OCR development and application based on [MindSpore](https://www.mindspore.cn/en). It helps users to train and apply the best text detection and recognition models, such as DBNet/DBNet++ and CRNN/SVTR, to fulfuill image-text understanding need.


<details open>
<summary> Major Features </summary>

- **Modulation design**: We decouple the ocr task into serveral configurable modules. Users can setup the training and evaluation pipeline easily for customized data and models with a few line of modification.
- **High-performance**: MindOCR provides pretrained weights and the used training recipes that reach competitive performance on OCR tasks.
- **Low-cost-to-apply**: We provide easy-to-use tools to run text detection and recogintion on real-world data. (coming soon)
</details>

## Quick Start (for dev)

### Data preparation
## Installation

### Dependency

To install the dependency, please run
```shell
pip install -r requirements.txt
```

It s recommended to install MindSpore following the official [instructions](https://www.mindspore.cn/install) for the best fit of your machine. To enable training in distributed mode, please also install [openmpi](https://www.open-mpi.org/software/ompi/v4.0/).


### Install with PyPI

Coming soon
```shell
pip install mindocr
```

### Install from Source

The latest version of MindOCR can be installed as follows:
```shell
pip install git+https://github.com/mindspore-lab/mindocr.git
```

> Notes: MindOCR is only tested on Linux on GPU/Ascend devices currently.

Download ic15 dataset.
## Quick Start

Convert to the required annotation format using `tools/data_converter/convert.py`, referring to `tools/data_converters/README.md`
### Text Detection Model Training

Change the annotation file path in the yaml file under `configs` accordingly.
We will use **DBNet** model and **ICDAR2015** dataset for illustration, although other models and datasets are also supported. <!--ICDAR15 is a commonly-used model and a benchmark for scene text recognition.-->

### Training
#### 1. Data Preparation

#### Text Detection Model (DBNet)
Please download the ICDAR2015 dataset from this [website](https://rrc.cvc.uab.es/?ch=4&com=downloads), then format the dataset annotation refer to [dataset_convert](tools/dataset/README.md).

``` python
After preparation, the data structure should be like

``` text
.
├── test
│   ├── images
│   │   ├── img_1.jpg
│   │   ├── img_2.jpg
│   │   └── ...
│   └── det_gt.txt
└── train
   ├── images
   │   ├── img_1.jpg
   │   ├── img_2.jpg
   │   └── ....jpg
   └── det_gt.txt
```

#### 2. Configure Yaml

Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from `configs/det`. Here we choose `configs/det/db_r50_icdar15.yaml`.

Please change the data config args accordingly, such as
``` yaml
train:
dataset:
data_dir: ic15/det/train/images
label_files: ic15/det/train/det_gt.txt
```

Optionally, change `num_workers` according to the cores of CPU, and change `distribute` to True if you are to train in distributed mode.

#### 3. Training

To train the model, please run

``` shell
python tools/train.py --config configs/det/db_r50_icdar15.yaml
```

#### Text Recognition Model (CRNN)
To train in distributed mode, please run

``` python
python tools/train.py --config configs/rec/crnn_icdar15.yaml
```shell
# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/db_r50_icdar15.yaml
```
> Notes: please ensure the arg `distribute` in yaml file is set True


### Evaluation
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`, which is "./tmp_det/" by default.

``` python
#### 4. Evaluation

To evaluate, please parse the checkpoint path to the arg `ckpt_load_path` in yaml config file and run

``` shell
python tools/eval.py --config configs/det/db_r50_icdar15.yaml
```


### Text Recognition Model Training

## Build and Test A New Model
We will use **CRNN** model and **LMDB** dataset for illustration, although other models and datasets are also supported.

### Build your own model
Please follow this [guideline](./mindocr/models/README.md)
#### 1. Data Preparation

### Test model writing
Please download the LMDB dataset from ...

Change the model name and yaml path for your model in `tests/ut/test_models`, e.g.
After preparation, the data structure should be like

``` python
test_model_by_name('dbnet_r50')
test_model_by_yaml('configs/det/db_r50_icdar15.yaml')
``` text
```

Run in the root dir:
#### 2. Configure Yaml

Please choose a yaml config file containing the target pre-defined model and data pipeline that you want to re-use from `configs/det`. Here we choose `configs/det/vgg7_bilistm_ctc.yaml`.

Please change the data config args accordingly, such as
``` yaml
train:
dataset:
data_dir: ic15/det/train/images
label_files: ic15/det/train/det_gt.txt
```

Optionally, change `num_workers` according to the cores of CPU, and change `distribute` to True if you are to train in distributed mode.

#### 3. Training

To train the model, please run

``` shell
python tests/ut/test_models.py
python tools/train.py --config configs/rec/vgg7_bilstm_ctc.py
```

To train in distributed mode, please run

```shell
# n is the number of GPUs/NPUs
mpirun --allow-run-as-root -n 2 python tools/train.py --config configs/det/vgg7_bilstm_ctc.yaml
```
> Notes: please ensure the arg `distribute` in yaml file is set True


The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg `ckpt_save_dir`, which is "./tmp_det/" by default.

#### 4. Evaluation

To evaluate, please parse the checkpoint path to the arg `ckpt_load_path` in yaml config file and run

``` shell
python tools/eval.py --config /path/to/config.yaml
```

### Inference and Deployment

#### Inference with MX Engine

Please refer to [mx_infer](]deploy/mx_infer/README.md)

#### Inference with Lite

Coming soon

#### Inference with native MindSpore

Coming soon

## Notes

### Change Log

- 2023/03/08
1. Add evaluation script with arg `ckpt_load_path`
2. Arg `ckpt_save_dir` is moved from `system` to `train` in yaml.
3. Add drop_overflow_update control

### How to Contribute

We appreciate all kind of contributions including issues and PRs to make MindOCR better.

Please refer to [CONTRIBUTING.md](CONTRIBUTING.md) for the contributing guideline. Please follow the [Model Template and Guideline](mindocr/models/README.md) for contributing a model that fits the overall interface :)

### License

This project follows the [Apache License 2.0](LICENSE.md) open-source license.

### Citation

If you find this project useful in your research, please consider citing:

```latex
@misc{MindSpore OCR 2023,
title={{MindSpore OCR }:MindSpore OCR Toolbox},
author={MindSpore Team},
howpublished = {\url{https://github.com/mindspore-lab/mindocr/}},
year={2023}
}
```