Evaluate biases in (pre-trained or re-trained) masked language models (MLMs), such as those available thorugh HuggingFace using multiple state-of-the-art methods to compute a biase score for each bias type in benchmark datasets CrowS-Pairs (CPS) and StereoSet (SS) (intrasentence), or a custom linebyline dataset with files bias_types.txt
containing bias categories, and dis.txt
and adv.txt
(to create sentence pairs), where dis.txt
contains sentences with bias against disadvantaged groups (stereotypical) and adv.txt
contains sentences with bias against advantaged groups (anti-stereotypical). Additionally, compare relative bias between two MLMs (and compare re-trained MLMs with their pre-trained base).
Bias scores for an MLM are computed based on implemented measures for sentence pairs in the dataset.
Measures computed using an iterative masking experiment, where an MLM masks one token at a time until all tokens are masked once (so we have n
logits or predictions for a sentence with n
tokens) are reported below (see current citation in Citation). As a result these measures take longer to compute. Measures are defined to represent MLM preference (or prediction quality). Bias against disadvantaged groups for a sentence pair is represented by a higher relative measure value for a sentence in adv
compared to dis
.
CRR
(Difference in reciprocal rank of a predicted token (always equal to 1) and the reciprocal rank of a masked tokenCRRA
(CRR
with Attention weights)- Δ
P
(Difference in log-liklihood of a predicted token and the masked token) - Δ
PA
(ΔP
with Attention weights)
Measures that are computed with a single encoded input (see citations in References for more details):
CSPS
(https://arxiv.org/abs/2010.00133; CrowS-Pairs Scores is a log-likelihood score for an MLM selecting unmodified tokens given modified onesSSS
(https://arxiv.org/abs/2004.09456; StereoSet Score is a log-likelihood score for an MLM selecting modified tokens given unmodified onesAUL
(https://arxiv.org/abs/2104.07496; All Unmasked Likelihood is a log-likelihood score generated by predicting all tokens in a single unmasked inputAULA
; https://arxiv.org/abs/2104.07496;AUL
with Attention weights)
pip install mlm-bias
import mlm_bias
cps_dataset = mlm_bias.BiasBenchmarkDataset("cps")
cps_dataset.sample(indices=list(range(10)))
model = "bert-base-uncased"
mlm_bias = mlm_bias.BiasMLM(model, cps_dataset)
result = mlm_bias.evaluate(inc_attention=True)
result.save("./bert-base-uncased")
Clone this repo:
git clone https://github.com/zalkikar/mlm-bias.git
cd mlm-bias
python3 -m pip install .
Using the mlm_bias.py
example script:
mlm_bias.py [-h] --data {cps,ss,custom} --model MODEL [--model2 MODEL2] [--output OUTPUT] [--measures {all,crr,crra,dp,dpa,aul,aula,csps,sss}] [--start S] [--end E]
# single mlm
python3 mlm_bias.py --data cps --model roberta-base --start 0 --end 30
python3 mlm_bias.py --data ss --model bert-base-uncased --start 0 --end 30
# relative
python3 mlm_bias.py --data cps --model roberta-base --start 0 --end 30 --model2 bert-base-uncased
With default arguments:
/data
will havecps.csv
(CPS) and/orss.csv
(SS)/eval
will haveout.txt
with computed measures and pickled results objects
Example command output:
Created output directory.
Created Data Directory |██████████████████████████████| 1/1 [100%] in 0s ETA: 0s
Downloaded Data [CrowSPairs] |██████████████████████████████| 1/1 [100%] in 0s ETA: 0s
Loaded Data [CrowSPairs] |██████████████████████████████| 1/1 [100%] in 0s ETA: 0s
Evaluating Bias [roberta-base] |██████████████████████████████| 30/30 [100%] in 2m 46s ETA: 0s
Saved bias results for roberta-base in ./eval/roberta-base
Saved scores in ./eval/out.txt
--------------------------------------------------
MLM: roberta-base
CRR total = 50.0
CRRA total = 53.333
ΔP total = 56.667
ΔPA total = 56.667
AUL total = 76.667
AULA total = 70.0
SSS total = 53.333
CSPS total = 63.33
If using this for research, please cite the following:
@misc{zalkikar-chandra-2024,
author = {Rahul Zalkikar and Kanchan Chandra},
title = {Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality},
year = {2024}
}
@InProceedings{Kaneko:AUL:2022,
author={Masahiro Kaneko and Danushka Bollegala},
title={Unmasking the Mask -- Evaluating Social Biases in Masked Language Models},
booktitle = {Proceedings of the 36th AAAI Conference on Artificial Intelligence},
year = {2022},
month = {February},
address = {Vancouver, BC, Canada}
}
@article{salutari-etal-2023,
author = {Flavia Salutari and Jerome Ramos and Hosein A Rahmani and Leonardo Linguaglossa and Aldo Lipani.},
title = {Quantifying
the Bias of Transformer-Based Language Models for African American English in Masked Language
Modeling.},
journal = {The Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2023},
month = may,
year = {2023},
url = {https://telecom-paris.hal.science/hal-04067844},
}
@inproceedings{nangia-etal-2020-crows,
title = "{C}row{S}-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models",
author = "Nangia, Nikita and
Vania, Clara and
Bhalerao, Rasika and
Bowman, Samuel R.",
editor = "Webber, Bonnie and
Cohn, Trevor and
He, Yulan and
Liu, Yang",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2020.emnlp-main.154",
doi = "10.18653/v1/2020.emnlp-main.154",
pages = "1953--1967",
abstract = "Pretrained language models, especially masked language models (MLMs) have seen success across many NLP tasks. However, there is ample evidence that they use the cultural biases that are undoubtedly present in the corpora they are trained on, implicitly creating harm with biased representations. To measure some forms of social bias in language models against protected demographic groups in the US, we introduce the Crowdsourced Stereotype Pairs benchmark (CrowS-Pairs). CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. In CrowS-Pairs a model is presented with two sentences: one that is more stereotyping and another that is less stereotyping. The data focuses on stereotypes about historically disadvantaged groups and contrasts them with advantaged groups. We find that all three of the widely-used MLMs we evaluate substantially favor sentences that express stereotypes in every category in CrowS-Pairs. As work on building less biased models advances, this dataset can be used as a benchmark to evaluate progress.",
}
@misc{nadeem2020stereoset,
title={StereoSet: Measuring stereotypical bias in pretrained language models},
author={Moin Nadeem and Anna Bethke and Siva Reddy},
year={2020},
eprint={2004.09456},
archivePrefix={arXiv},
primaryClass={cs.CL}
}