Detect vehicle license plates in videos and images using the tensorflow/object_detection API.
Train object detection models for license plate detection using TFOD API, with either a single detection stage
or a double detection stage.
The single stage detector, detects plates and plate characters in a single inference stage.
The double stage detector detects plates in the first inference stage,
crops the detected plate from the image, passes the cropped plate image to the second inference stage,
which detects plate characters.
The double stage detector uses a single detection model that has been trained to detect plates in full images containing cars/plates,
and trained to detect plate text in images containing tightly cropped plate images.
Download TF records, models, and config
I have lumped everything together, which is not great for maintainability, but at least you won't have to figure out
a directory structure and copy the files to the correct locations.
You will need a starting point for training. You can either use
the exported_models (which speeds up training) or you can download ssd_inception_v2_coco_2018_01_28,
and faster_rcnn_resnet101_coco_2018_01_28 from the zoo
The models should be copied to experiment_ssd/YYYY_MM_DD/training, and experiment_faster_rcnn/YYYY_MM_DD/training
In the config files, you will need to modify the paths:
train_config
fine_tune_checkpoint path/to/my/model.ckpt
train_input_reader
tf_record_input_reader
input_path path/to/my/train.record
label_map_path path/to/my/classes.pbtxt
train_eval_reader
tf_record_input_reader
input_path path/to/my/test.record
label_map_path path/to/my/classes.pbtxt
Single stage Faster RCNN:
[INFO] Processed 69 frames in 37.62 seconds. Frame rate: 1.83 Hz
[INFO] platesWithCharCorrect_recall: 97.1%, platesWithCharCorrect_precision: 97.1%,
plateFrames_recall: 100.0%, plateFrames_precision: 100.0%,
chars_recall: 99.6%, chars_precision: 99.6%
Two stage SSD:
[INFO] Processed 69 frames in 6.17 seconds. Frame rate: 11.19 Hz
[INFO] platesWithCharCorrect_recall: 95.7%, platesWithCharCorrect_precision: 97.1%,
plateFrames_recall: 98.6%, plateFrames_precision: 100.0%,
chars_recall: 98.1%, chars_precision: 99.6%
It seems that Faster_RCNN is slightly better than SSD, but the sample size is too small to be sure.
This two stage technique of using a single model to detect characters within plates could also be used to detect any
object within another object. The two stage detector can perform inference faster than the single stage
detector, because it can use simpler models such as SSD, rather than Faster RCNN, and it can use a smaller
image size. For example, two stages of SSD based on Inception_V2, and 1280x960 image size can perform inference at
1.8 fps on a Titan-X GPU, whereas, a single stage of Faster RCNN based on Resnet_101, and a resized 300x300 image
can perform inference at 15 fps on a Titan-X.
The two stage SSD implementation opens the possibility of running on less powerful hardware, such as Intel
i7-4790K CPU @ 4.00GHz with 16GB of RAM (2.9 fps), and Nvidia Jetson TX2 (2.8 fps).
Defines the web interface that will be used by the MTurk workers to label the images.
Modified from original
Use this html/js code with Amazon mechanical Turk.
Instructions here.
This code needs to be improved to allow image zoom. Without zoom capability it is too difficult for the workers
to create the boxes for the small characters of the license plate text.
Consequently you can spend a lot of time re-arranging the boxes in the labelimg utility.
Use this module to generate a csv file that can be uploaded to MTurk. You will need the csv file when you publish a batch of images for processing
Once the batch has been completed by the workers you will need to download the results in csv file format, and approve or reject each HIT. This application will read the HIT results and overlay the bounding boxes and labels onto the images. A text box is provided for accepting or rejecting each HIT. Once complete, your accept/reject response will be added to the downloaded csv file, and the new csv file can be uploaded to MTurk
Reads the csv file generated by inspectHITs.py, and generates PASCAL VOC style xml annotation files. One xml file for each image.
Once the annotations are in Pascal VOC style xml, you can use labelImg
to fix any mistakes in the labelled images.
Make sure that you take at least one pass through your dataset, and for each image, click on verify image.
Create an articial set of images containing random plates, along with corresponding PASCAL VOC xml annotation files.
You will need the SUN database in order to generate the background images.
See Mat Earl ANPR.
After generating the images, I manually edited the images, using Gimp, to remove text in the background.
Not sure that this is a good idea, but the rationale was that I did not
want to train the network to learn that the text in the background was not license plate text.
I feel that this makes the classification process harder to train, as it has to be more discriminative.
And if background text is detected during inference, this is usually OK, because it will be discarded
if it is not within a plate bounding box.
python gen_plates.py --numImages 1000 --imagePath artificial_images/CA --xmlPath artificial_images/CA_ann
Reads a group of PASCAL VOC style xml annotation files, and combines with associated images
to build a TFrecord dataset. Requires a predefined label map file that maps labels to integers.
Will only use images where the corresponding annotation file has the verified field set to 'yes'.
label_split defines the top level label.
For example if label_split == 'plate', the original image is split into
a full image and a cropped image of the license plate, and the original annotation is split into a plate annotation,
and a character annotation. The images are padded to make them square, and the full image is scaled down to avoid
problems with excessive memory usage during training.
If label_split == 'none', then all the images will be full size originals, and the annotations will contain labels
for every object.
Example usage, with plate char split:
python build_tf_records.py \
--record_dir=datasets/records \
--annotations_dir=images \
--label_map_file=datasets/records/classes.pbtxt \
--view_mode=False \
--image_scale_factor=0.4 \
--test_record_file=testing_plate_char_split.record \
--train_record_file=training_plate_char_split.record
--split_label='plate'
Example usage, with no split:
python build_tf_records.py \
--record_dir=datasets/records \
--annotations_dir=images \
--label_map_file=datasets/records/classes.pbtxt \
--view_mode=False \
--test_record_file=testing_plate_char_combined.record \
--train_record_file=training_plate_char_combined.record \
--split_label=none
It is important to spend some time figuring out the best directory layout.
Images and annotations are grouped by camera and date of image capture
Annotations are stored in a directory with the same name as the image directory, but suffixed with '_ann'
If you are extracting images from video clips, then these can be stored in the video directory.
images
└── SJ7STAR
├── images
│ ├── 2018_02_24_11-00
│ ├── 2018_02_24_11-00_ann
│ ├── 2018_02_24_15-00
│ ├── 2018_02_24_9-00
│ ├── 2018_02_24_9-00_ann
│ ├── 2018_02_26
│ └── 2018_02_27
└── video
└── 2018_02_27
Datasets are grouped by experiment name and date of run.
There is a separate directory, records, which contains the tfrecord files, and the classes.pbtxt
You will need to create the evaluation, exported_model and training directories.
faster_rcnn_resnet101_coco_2018_01_28 is created when you unzip the pre-trained base model.
The other sub-directories are created when you run object_detection/train.py
'tensorflow_object_detection_datasets' is linked to 'tensorflow/models/research/object_detection/anpr'
tensorflow_object_detection_datasets
├── experiment_faster_rcnn
│ ├── 2018_05_28
│ │ ├── evaluation
│ │ ├── exported_model
│ │ │ └── saved_model
│ │ │ └── variables
│ │ └── training
│ │ └── faster_rcnn_resnet101_coco_2018_01_28
│ │ └── saved_model
│ │ └── variables
│ └── 2018_06_01
├── experiment_ssd
└── records
Now you can use the TFOD API, at tensorflow/models/research/object_detection, to train the model. It goes something like this. Assuming python virtualenv called tensorflow, a single GPU for training and CPU for eval:
workon tensoflow
cd tensorflow/models/research/object_detection
python train.py --logtostderr \
--pipeline_config_path ../anpr/experiment_faster_rcnn/2018_06_12/training/faster_rcnn_anpr.config \
--train_dir ../anpr/experiment_faster_rcnn/2018_06_12/training
If you are running the eval on a CPU, then limit the number of images to evaluate by modifying your config file:
130 eval_config: {
131 num_examples: 5
New terminal
cd tensorflow/models/research/object_detection
workon tensoflow
export CUDA_VISIBLE_DEVICES=""
python eval.py --logtostderr \
--checkpoint_dir ../anpr/experiment_faster_rcnn/2018_06_12/training \
--pipeline_config_path ../anpr/experiment_faster_rcnn/2018_06_12/training/faster_rcnn_anpr.config \
--eval_dir ../anpr/experiment_faster_rcnn/2018_06_12/evaluation
New terminal
cd tensorflow/models/research
workon tensoflow
tensorboard --logdir anpr/experiment_faster_rcnn
cd tensorflow/models/research/object_detection
workon tensorflow
python export_inference_graph.py --input_type image_tensor \
--pipeline_config_path ../anpr/experiment_faster_rcnn/2018_06_12/training/faster_rcnn_anpr.config \
--trained_checkpoint_prefix ../anpr/experiment_faster_rcnn/2018_06_12/training/model.ckpt-60296 \
--output_directory ../anpr/experiment_faster_rcnn/2018_06_12/exported_model
Back to this project directory to run predict_images.py
Test your exported model against an image dataset. Works with single and double stage prediction.
Prints the detected plate text, and displays the annotated image.
pred_stages = 1
workon tensorflow
python predict_images.py --model datasets/experiment_faster_rcnn/2018_07_25_14-00/exported_model/frozen_inference_graph.pb \
--pred_stages 1 \
--labels datasets/records/classes.pbtxt \
--imagePath images/SJ7STAR_images/2018_02_24_9-00 \
--num-classes 37 \
--image_display True
pred_stages = 2
python predict_images.py --model datasets/experiment_ssd/2018_07_25_14-00/exported_model/frozen_inference_graph.pb \
--pred_stages 2 \
--labels datasets/records/classes.pbtxt \
--imagePath images/SJ7STAR_images/2018_02_24_9-00 \
--num-classes 37 \
--image_display True
Test your exported model against a video dataset. Uses one or two two stage prediction. Outputs an annotated video and a series of still images along with image annotations. The still images are grouped to reduce the output of images with duplicate plates. The image annotations can be viewed and edited using labelImg, and they can be used to further enlarge the training dataset.
python predict_video.py --conf conf/lplates_smallset_ssd.json
Test a trained model against an annotated dataset. Annotations must be in PASCAL VOC style xml files.
Run with image_display true if you wish to see each annotated image displayed.
Can be executed with single or double prediction stages.
pred_stages = 1:
plates and characters are predicted in a single pass
python predict_images_and_score.py --model datasets/experiment_faster_rcnn/2018_07_15/exported_model/frozen_inference_graph.pb \
--labels datasets/records/classes.pbtxt \
--annotations_dir images_verification \
--num-classes 37 \
--min-confidence 0.5 \
--pred_stages 1
pred_stages = 2:
The first prediction stage predicts plates, crops the predicted plate from the image, and then
the cropped plate image is used as input to the second prediction stage, which predicts characters.
python predict_and_score.py --model datasets/experiment_ssd/2018_07_25_14-00/exported_model/frozen_inference_graph.pb \
--labels datasets/records/classes.pbtxt \
--annotations_dir images_verification \
--num-classes 37 \
--min-confidence 0.1 \
--pred_stages 2
Single stage faster R-CNN usage
Your results should look something like this:
[INFO] Processed 925 frames in 70.13 seconds. Frame rate: 13.19 Hz
[INFO] platesWithCharCorrect_recall: 93.2%, platesWithCharCorrect_precision: 93.9%,
plateFrames_recall: 99.2%, plateFrames_precision: 100.0%,
chars_recall: 98.3%, chars_precision: 99.2%
[INFO] Definitions. Precision: Percentage of all the objects detected that are correct. Recall: Percentage of ground truth objects that are detected
[INFO] Processed 925 xml annotation files
And an explanation of the performance metrics:
platesWithCharCorrect_recall - Plates detected in correct location and containing correct characters in the correct locations
divided by the number of ground truth plates
platesWithCharCorrect_precision - Plates detected in correct location and containing correct characters in the correct locations
divided by the total number of plates predicted (ie true pos plus false pos)
plateFrames_recall - Plate frames in correct location (no checking of characters) divided by the
number of ground truth plates
plateFrames_precision - Plate frames in correct location (no checking of characters) divided by the
the total number of plates predicted (ie true pos plus false pos)
chars_recall - Characters detected in the correct place with the correct contents
divided by the number of ground truth characters
chars_precision - Chars detected outside of the correct location, or the location is correct,
but the contents are wrong. Divided by the total number of plates predicted (ie true pos plus false pos)