Intersectional Accuracy Disparities in Commercial Gender Classification
The paper written by Joy Buolamwini and Timnit Gebru analyses the accuracy of commercial face-based gender classification systems, with a focus on intersections of race and gender. The study evaluated 3 commercial gender classification algorithms by IBM, Microsoft, and Face++ with a dataset that the researchers previously tagged manually. The authors calculated accuracy rates by comparing the algorithms’ predictions to the manual work, and also analysed accuracy variation intersectionally (across different demographic groups, including race, skin type, and age).
The main conclusion of the study is that “[i]nclusive benchmark datasets and subgroup accuracy reports will be necessary to increase transparency and accountability in artificial Intelligence” which is possible if the demographic and phenotypic characteristics of the training datasets and the algorithms’ performance are disclosed as open information. The authors point out that more research is necessary to determine if the significant error rate disparities based on gender, skin type, and intersectional sub-groups found in this study of gender classification persist in other computer vision tasks involving humans.
Besides finding that the commercial algorithms evaluated performed worse for faces with darker skin tones and for female faces, particularly for darker-skinned women, the authors delivered a “new dataset with more balanced skin type and gender representation”, which is a remarkable contribution to reduce biased decisions made by AI algorithms in contexts where accuracy is crucial. The authors also suggest that future research and development in this area should prioritize creating gender classification systems that are inclusive, accurate, and free from biases.
The latest is reaffirmed by one of the points that the authors address: the fact that “[t]he companies provide no documentation to clarify if their gender classification systems which provide sex labels are classifying gender identity or biological sex” relates to a lack of awareness on the complexities of gender in visual computing. This is shown in Quinan & Hunt’s work: their “ethnographic data demonstrate how trans and non-binary bodies are forced to bend to these systems”, and how they can be used to discriminate systematically. Scheuerman et al. provide empirical evidence showing that in the systems they analysed “binary gender classification provided by computer vision services performed worse on binary trans images than cis ones, and were unable to correctly classify non-binary genders”.
The discriminative potential of these systems is not only demonstrated by their inaccuracy, but also through the lenses of researchers. In a research that explores how studying computer vision affects the investigators, one transgender researcher stated they felt “pure rage as first reaction, and then just a deep sadness, which still persists” after reading a paper on facial gender recognition that suggested that some individuals could use hormone replacement therapy as a disguise to fool face recognition algorithms.
Gender identification through computer vision is a critical topic that requires serious considerations from ethical and inclusive perspectives, since the consequences of biases are lived and suffered by minorities. The article written by Buolamwini and Gebru is an important step towards a fair application of AI technologies. 🤖