Francesco Croce*, Naman D Singh*, and Matthias Hein
ECCV 2024
Adversarial robustness has been studied extensively in image classification, especially for the Linf-threat model, but significantly less so for related tasks such as object detection and semantic segmentation, where attacks turn out to be a much harder optimization problem than for image classification. We propose several problem-specific novel attacks minimizing different metrics in accuracy and mIoU. The ensemble of our attacks, SEA, shows that existing attacks severely overestimate the robustness of semantic segmentation models. Surprisingly, existing attempts of adversarial training for semantic segmentation models turn out to be weak or even completely non-robust. We investigate why previous adaptations of adversarial training to semantic segmentation failed and show how recently proposed robust ImageNet backbones can be used to obtain adversarially robust semantic segmentation models with up to six times less training time for PASCAL-VOC and the more challenging ADE20K.
Models trained via the PIR-AT scheme. mIoU is reported for clean evaluation and with SEA evaluation (Adv.) at two perturbation strengths.
Model name | Dataset | Clean | Adv.(4/255) | Adv.(8/255) | Checkpoint |
---|---|---|---|---|---|
UperNet-ConvNext-T_CVST | PASCAL-VOC | 75.2% | 64.9% | 34.6% | Link |
UperNet-ConvNext-S_CVST | PASCAL-VOC | 76.6% | 66.0% | 36.4% | Link |
UperNet-ConvNext-T_CVST | ADE20K | 31.7% | 17.2% | 4.90% | Link |
UperNet-ConvNext-S_CVST | ADE20K | 32.1% | 17.9% | 5.40% | Link |
Segmenter-ViT-S | ADE20K | 28.7% | 14.9% | 5.30% | Link |
Robust pre-trained backbone models were taken from Revisiting-AT repository.
For UperNet we always use the ConvNext backbone with Convolution Stem (CvSt).
SEA containes three complementary attacks
-
Mask-Cross-Entropy-Balanced (Mask-ce-bal)
-
Mask-Cross-Entropy (Mask-ce)
-
Jenson-Shannon Divergence (JS-Avg)
The attacks are run sequentially and then:
-
For
aACC
, image-wise worst case over the attacks is taken. -
For
mIoU
, the worst case is computed by updating the running-mIoU image/attack-wise.
Run run_infer.sh with the models config (.yaml) file from configs folder.
This computes the worst-case mIoU
and aACC
after SEA attack for the particular dataset and model passed as arguments within the .yaml
file.
Note: All dataset locations, modelNames, pretrained-model checkpoint paths are set in respective config-file
.
All required packages can be found in requirements.txt
SLURM type setup in run_train_slurm.sh, run with location of config-file
and num_of_gpu
as arguments.
For non-SLURM multi-GPU setup run run_train.sh with location of config-file
and num_of_gpu
as arguments.
-
For
UperNet
withConvNext
backbone forADE20K
- Adversarial-training: config-file: ade20k_convnext.yaml set
BACKBONE
inMODEL
toCONVNEXT-S_CVST
andCONVNEXT-T_CVST
for Small and Tiny models respectively.
- Adversarial-training: config-file: ade20k_convnext.yaml set
-
For
UperNet
withConvNext
backbone forPASCALVOC
- Adversarial-training: config-file: pascalvoc_convnext.yaml set
BACKBONE
inMODEL
toCONVNEXT-S_CVST
andCONVNEXT-T_CVST
for Small and Tiny models respectively.
- Adversarial-training: config-file: pascalvoc_convnext.yaml set
-
For
SegMenter
withVit-S
backbone forADE20K
dataset- Adversarial-training: config-file: ade20k_segmenter.yaml
For clean-training: set ADVERSARIAL
to FALSE in respective config-file.
If you use our code/models consider citing us with the following BibTex entry:
@inproceedings{croce2024robust,
title={Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models},
author={Francesco Croce and Naman D Singh and Matthias Hein},
year={2024},
booktitle={ECCV}}
The code in this repository is partially based on the following publically available codebases.