English | 中文
Convolutional Recurrent Neural Network (CRNN) integrates CNN feature extraction and RNN sequence modeling as well as transcription into a unified framework.
As shown in the architecture graph (Figure 1), CRNN firstly extracts a feature sequence from the input image via Convolutional Layers. After that, the image is represented by a squence extracted features, where each vector is associated with a receptive field on the input image. For futher process the feature, CRNN adopts Recurrent Layers to predict a label distribution for each frame. To map the distribution to text field, CRNN adds a Transcription Layer to translate the per-frame predictions into the final label sequence. [1]
Figure 1. Architecture of CRNN [1]
mindspore | ascend driver | firmware | cann toolkit/kernel |
---|---|---|---|
2.3.1 | 24.1.RC2 | 7.3.0.1.231 | 8.0.RC2.beta1 |
Please refer to the installation instruction in MindOCR.
Please download lmdb dataset for traininig and evaluation from here (ref: deep-text-recognition-benchmark). There're several zip files:
data_lmdb_release.zip
contains the entire datasets including training data, validation data and evaluation data.validation.zip
: same as the validation/ within data_lmdb_release.zipevaluation.zip
: same as the evaluation/ within data_lmdb_release.zip
Unzip the data_lmdb_release.zip
, the data structure should be like
data_lmdb_release/
├── evaluation
│ ├── CUTE80
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── IC03_860
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── IC03_867
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── IC13_1015
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── ...
├── training
│ ├── MJ
│ │ ├── MJ_test
│ │ │ ├── data.mdb
│ │ │ └── lock.mdb
│ │ ├── MJ_train
│ │ │ ├── data.mdb
│ │ │ └── lock.mdb
│ │ └── MJ_valid
│ │ ├── data.mdb
│ │ └── lock.mdb
│ └── ST
│ ├── data.mdb
│ └── lock.mdb
└── validation
├── data.mdb
└── lock.mdb
Here we used the datasets under training/
folders for training, and the union dataset validation/
for validation. After training, we used the datasets under evaluation/
to evluation model accuracy.
Training: (total 14,442,049 samples)
- MJSynth (MJ)
- Train: 21.2 GB, 7224586 samples
- Valid: 2.36 GB, 802731 samples
- Test: 2.61 GB, 891924 samples
- SynthText (ST)
- Train: 16.0 GB, 5522808 samples
Validation:
- Valid: 138 MB, 6992 samples
Evaluation: (total 12,067 samples)
- CUTE80: 8.8 MB, 288 samples
- IC03_860: 36 MB, 860 samples
- IC03_867: 4.9 MB, 867 samples
- IC13_857: 72 MB, 857 samples
- IC13_1015: 77 MB, 1015 samples
- IC15_1811: 21 MB, 1811 samples
- IC15_2077: 25 MB, 2077 samples
- IIIT5k_3000: 50 MB, 3000 samples
- SVT: 2.4 MB, 647 samples
- SVTP: 1.8 MB, 645 samples
Data configuration for model training
To reproduce the training of model, it is recommended that you modify the configuration yaml as follows:
...
train:
...
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of training dataset
data_dir: training/ # Dir of training dataset, concatenated with `dataset_root` to be the complete dir of training dataset
# label_file: # Path of training label file, concatenated with `dataset_root` to be the complete path of training label file, not required when using LMDBDataset
...
eval:
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of validation dataset
data_dir: validation/ # Dir of validation dataset, concatenated with `dataset_root` to be the complete dir of validation dataset
# label_file: # Path of validation label file, concatenated with `dataset_root` to be the complete path of validation label file, not required when using LMDBDataset
...
Data configuration for model evaluation
We use the dataset under evaluation/
as the benchmark dataset. On each individual dataset (e.g. CUTE80, IC03_860, etc.), we perform a full evaluation by setting the dataset's directory to the evaluation dataset. This way, we get a list of the corresponding accuracies for each dataset, and then the reported accuracies are the average of these values.
To reproduce the reported evaluation results, you can:
-
Option 1: Repeat the evaluation step for all individual datasets: CUTE80, IC03_860, IC03_867, IC13_857, IC131015, IC15_1811, IC15_2077, IIIT5k_3000, SVT, SVTP. Then take the average score.
-
Option 2: Put all the benchmark datasets folder under the same directory, e.g.
evaluation/
. Modify theeval.dataset.data_dir
in the config yaml accordingly. Then execute the scripttools/benchmarking/multi_dataset_eval.py
.
- Evaluate on one specific dataset
For example, you can evaluate the model on dataset CUTE80
by modifying the config yaml as follows:
...
train:
# NO NEED TO CHANGE ANYTHING IN TRAIN SINCE IT IS NOT USED
...
eval:
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of evaluation dataset
data_dir: evaluation/CUTE80/ # Dir of evaluation dataset, concatenated with `dataset_root` to be the complete dir of evaluation dataset
# label_file: # Path of evaluation label file, concatenated with `dataset_root` to be the complete path of evaluation label file, not required when using LMDBDataset
...
By running tools/eval.py
as noted in section Model Evaluation with the above config yaml, you can get the accuracy performance on dataset CUTE80.
- Evaluate on multiple datasets under the same folder
Assume you have put all benckmark datasets under evaluation/ as shown below:
data_lmdb_release/
├── evaluation
│ ├── CUTE80
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── IC03_860
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── IC03_867
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── IC13_1015
│ │ ├── data.mdb
│ │ └── lock.mdb
│ ├── ...
then you can evaluate on each dataset by modifying the config yaml as follows, and execute the script tools/benchmarking/multi_dataset_eval.py
.
...
train:
# NO NEED TO CHANGE ANYTHING IN TRAIN SINCE IT IS NOT USED
...
eval:
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of evaluation dataset
data_dir: evaluation/ # Dir of evaluation dataset, concatenated with `dataset_root` to be the complete dir of evaluation dataset
# label_file: # Path of evaluation label file, concatenated with `dataset_root` to be the complete path of evaluation label file, not required when using LMDBDataset
...
Apart from the dataset setting, please also check the following important args: system.distribute
, system.val_while_train
, common.batch_size
, train.ckpt_save_dir
, train.dataset.dataset_root
, train.dataset.data_dir
, train.dataset.label_file
,
eval.ckpt_load_path
, eval.dataset.dataset_root
, eval.dataset.data_dir
, eval.dataset.label_file
, eval.loader.batch_size
. Explanations of these important args:
system:
distribute: True # `True` for distributed training, `False` for standalone training
amp_level: 'O3'
seed: 42
val_while_train: True # Validate while training
drop_overflow_update: False
common:
...
batch_size: &batch_size 64 # Batch size for training
...
train:
ckpt_save_dir: './tmp_rec' # The training result (including checkpoints, per-epoch performance and curves) saving directory
dataset_sink_mode: False
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of training dataset
data_dir: training/ # Dir of training dataset, concatenated with `dataset_root` to be the complete dir of training dataset
# label_file: # Path of training label file, concatenated with `dataset_root` to be the complete path of training label file, not required when using LMDBDataset
...
eval:
ckpt_load_path: './tmp_rec/best.ckpt' # checkpoint file path
dataset_sink_mode: False
dataset:
type: LMDBDataset
dataset_root: dir/to/data_lmdb_release/ # Root dir of validation/evaluation dataset
data_dir: validation/ # Dir of validation/evaluation dataset, concatenated with `dataset_root` to be the complete dir of validation/evaluation dataset
# label_file: # Path of validation/evaluation label file, concatenated with `dataset_root` to be the complete path of validation/evaluation label file, not required when using LMDBDataset
...
loader:
shuffle: False
batch_size: 64 # Batch size for validation/evaluation
...
Notes:
- As the global batch size (batch_size x num_devices) is important for reproducing the result, please adjust
batch_size
accordingly to keep the global batch size unchanged for a different number of NPUs, or adjust the learning rate linearly to a new global batch size.
- Distributed Training
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please modify the configuration parameter system.distribute
as True and run
# distributed training on multiple Ascend devices
mpirun --allow-run-as-root -n 8 python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
- Standalone Training
If you want to train or finetune the model on a smaller dataset without distributed training, please modify the configuration parametersystem.distribute
as False and run:
# standalone training on a CPU/Ascend device
python tools/train.py --config configs/rec/crnn/crnn_resnet34.yaml
The training result (including checkpoints, per-epoch performance and curves) will be saved in the directory parsed by the arg train.ckpt_save_dir
. The default directory is ./tmp_rec
.
To evaluate the accuracy of the trained model, you can use eval.py
. Please set the checkpoint path to the arg eval.ckpt_load_path
in the yaml config file, set the evaluation dataset path to the arg eval.dataset.data_dir
, set system.distribute
to be False, and then run:
python tools/eval.py --config configs/rec/crnn/crnn_resnet34.yaml
Similarly, the accuracy of the trained model can be evaluated using multiple evaluation datasets by properly setting the args eval.ckpt_load_path
, eval.dataset.data_dir
, and system.distribute
in the yaml config file. And then run:
python tools/benchmarking/multi_dataset_eval.py --config configs/rec/crnn/crnn_resnet34.yaml
To transform the groud-truth text into label ids, we have to provide the character dictionary where keys are characters and values are IDs. By default, the dictionary is "0123456789abcdefghijklmnopqrstuvwxyz", which means id=0 will correspond to the charater "0". In this case, the dictionary only considers numbers and lowercase English characters, excluding spaces.
There are some built-in dictionaries, which are placed in mindocr/utils/dict/
, and you can choose the appropriate dictionary to use.
en_dict.txt
is an English dictionary containing 94 characters, including numbers, common symbols, and uppercase and lowercase English letters.ch_dict.txt
is a Chinese dictionary containing 6623 characters, including commonly used simplified and traditional Chinese, numbers, common symbols, uppercase and lowercase English letters.
You can also customize a dictionary file (***.txt) and place it under mindocr/utils/dict/
, the format of the dictionary file should be a .txt file with one character per line.
To use a specific dictionary, set the parameter common.character_dict_path
to the path of the dictionary, and change the parameter common.num_classes
to the corresponding number, which is the number of characters in the dictionary + 1.
Notes:
- You can include the space character by setting the parameter
common.use_space_char
in configuration yaml to True. - Remember to check the value of
dataset->transform_pipeline->RecCTCLabelEncode->lower
in the configuration yaml. Set it to False if you prefer case-sensitive encoding.
Currently, this model supports multilingual recognition and provides pre-trained models for different languages. Details are as follows:
We use a public Chinese text benchmark dataset Benchmarking-Chinese-Text-Recognition for CRNN training and evaluation.
For detailed instruction of data preparation and yaml configuration, please refer to ch_dataeset.
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode.
model name | backbone | cards | batch size | language | jit level | graph compile | ms/step | img/s | scene | web | document | recipe | weight |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CRNN | ResNet34_vd | 4 | 256 | Chinese | O2 | 203.48 s | 38.01 | 1180 | 60.71% | 65.94% | 97.67% | yaml | ckpt | mindir |
The input shape for exported MindIR file in the download link is (1, 3, 32, 320).
Experiments are tested on ascend 910* with mindspore 2.3.1 graph mode.
model name | backbone | train dataset | params(M) | cards | batch size | jit level | graph compile | ms/step | img/s | accuracy | recipe | weight |
---|---|---|---|---|---|---|---|---|---|---|---|---|
CRNN | VGG7 | MJ+ST | 8.72 | 8 | 16 | O2 | 59 s | 15.47 | 8274.08 | 81.31% | yaml | ckpt |
CRNN | ResNet34_vd | MJ+ST | 24.48 | 8 | 64 | O2 | 120.41 s | 60.86 | 8412.75 | 84.73% | yaml | ckpt | mindir |
Detailed accuracy results for each benchmark dataset (IC03, IC13, IC15, IIIT, SVT, SVTP, CUTE):
model name | backbone | cards | IC03_860 | IC03_867 | IC13_857 | IC13_1015 | IC15_1811 | IC15_2077 | IIIT5k_3000 | SVT | SVTP | CUTE80 | average |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CRNN | VGG7 | 1 | 93.72% | 93.43% | 91.83% | 90.84% | 70.84% | 64.95% | 84.40% | 82.84% | 72.87% | 67.36% | 81.31% |
CRNN | ResNet34_vd | 1 | 95.35% | 95.27% | 93.70% | 92.71% | 75.65% | 69.72% | 87.30% | 86.09% | 78.60% | 72.92% | 84.73% |
Experiments are tested on ascend 310P with mindspore lite 2.3.1 graph mode.
model name | backbone | test dataset | params(M) | cards | batch size | jit level | graph compile | img/s |
---|---|---|---|---|---|---|---|---|
CRNN | ResNet34_vd | IC15 | 24.48 | 1 | 1 | O2 | 10.46 s | 361.09 |
CRNN | ResNet34_vd | SVT | 24.48 | 1 | 1 | O2 | 10.31 s | 274.67 |
- To reproduce the result on other contexts, please ensure the global batch size is the same.
- The characters supported by model are lowercase English characters from a to z and numbers from 0 to 9. More explanation on dictionary, please refer to Character Dictionary.
- The models are trained from scratch without any pre-training. For more dataset details of training and evaluation, please refer to Dataset Download & Dataset Usage section.
- The input Shapes of MindIR of CRNN_VGG7 and CRNN_ResNet34_vd are both (1, 3, 32, 100).
[1] Baoguang Shi, Xiang Bai, Cong Yao. An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition. arXiv preprint arXiv:1507.05717, 2015.