Gender Bias in Automated CV Evaluation: Evidence from Counterfactual Simulations Using Synthetic Data from Mexico

Edgar Cruz; Alejandro T. Moreno‑Okuno; Johanna Zamilpa

doi:10.60758/laer.v37i.538

Authors

Edgar Cruz Departamento de Economía y Finanzas, Universidad de Guanajuato https://orcid.org/0000-0001-6727-9403
Alejandro T. Moreno‑Okuno
Johanna Zamilpa

DOI:

https://doi.org/10.60758/laer.v37i.538

Keywords:

Gender bias, Automated CV screening, Synthetic data experiments

Abstract

This paper investigates the presence of gender bias in automated CV evaluations conducted by a large language model (LLM). Using over 14,000 synthetically generated CVs representative of the Mexican labor market across six occupational categories, we implement a counterfactual design that isolates the effect of perceived gender by switching only the name and reported gender of each candidate. The analysis reveals systematic and occupation-specific biases: female candidates receive higher scores when presented as male in traditionally masculine roles (e.g., truck driver), while male candidates gain when reclassified as female in feminized occupations (e.g., nursing, elementary teaching). Notably, we document a statistically significant and operationally meaningful pro-female bias in the high-status Chief Financial Officer role. These asymmetries persist under deterministic prompting (temperature = 0), ruling out randomness as a confounder. Our design is contextually grounded, using names, educational institutions, and employers common in Mexico, and offers a scalable methodology for bias auditing in LLMs. The findings highlight the necessity of localized fairness assessments and raise concerns about the equity implications of deploying general-purpose AI tools in personnel selection.

References

Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-Muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 298–306). Association for Computing Machinery. https://doi.org/10.1145/3461702.3462624

Arceo-Gomez, E., & Campos-Vazquez, R. M. (2014). Race and marriage in the labor market: A discrimination correspondence study in a developing country. American Economic Review, 104 (5), 376–380. https://doi.org/10.1257/aer.104.5.376

Arceo-Gomez, E., & Campos-Vazquez, R. M. (2019). Double discrimination: Is discrimination in job ads accompanied by discrimination in callbacks? Journal of Economics, Race, and Policy, 2 (2), 82–94. https://doi.org/10.1007/s41996-019-00031-3

Armstrong, B., Hernández, L., & Rivera, D. (2024). Bias in CV Evaluation by Large Language Models: Evidence from Gender-Swapped Simulations. Economics of AI Review, 12(1), 45–72.

Bertrand, M. and Mullainathan, S. 2004. Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 94 (4): 991–1013. https://doi.org/10.1257/0002828042002561

Bertrand, M., & Duflo, E. (2017). Field experiments on discrimination. In A. Banerjee & E. Duflo (Eds.), Handbook of Economic Field Experiments (Vol. 1, pp. 309–393). North-Holland. https://doi.org/10.1016/bs.hefe.2016.08.004

Blommaert, L., Coenders, M., & van Tubergen, F. (2014). Ethnic discrimination in recruitment and decision makers’ features: Evidence from laboratory experiment and survey data using a student sample. Social Indicators Research, 116, 731–754. https://doi.org/10.1007/s11205-013-0329-4

Blommaert, L., van Tubergen, F., & Coenders, M. (2012). Implicit and explicit interethnic attitudes and ethnic discrimination in hiring. Social Science Research, 41, 61–73. https://doi.org/10.1016/j.ssresearch.2011.09.007

Bravo, D., Sanhueza, C., & Urzúa, S. (2011). An Experimental Study of Labor Market Discrimination: Gender, Social Class and Neighborhood in Chile. IBD Working Paper No. 226, http://dx.doi.org/10.2139/ssrn.1815907

Buolamwini, J., & Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 77–91. https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf

Campos-Vazquez, R. M., & Gonzalez, E. (2020). Obesity and hiring discrimination. Economics & Human Biology, 37, 100850. https://doi.org/10.1016/j.ehb.2020.100850

Chang, X. (2023). Gender bias in hiring: An analysis of the impact of Amazon's recruiting algorithm. Advances in Economics, Management and Political Sciences, 23(1), 134–140. https://doi.org/10.54254/2754-1169/23/20230367

Chaturvedi, S., & Chaturvedi, R. (2025). Who gets the callback? Generative AI and gender bias. arXiv preprint arXiv:2504.21400. https://arxiv.org/pdf/2504.21400

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

Ding, L., Smith, J., Wang, Y., & Lee, K. (2024). Probing social bias in labor market text generation by ChatGPT: A masked language model approach. In Proceedings of the Neural Information Processing Systems Conference (NeurIPS 2024). https://papers.nips.cc/paper_files/paper/2024/hash/fce2d8a485746f76aac7b5650db2679d-Abstract-Conference.html

Expansión. (2023). Las 500 empreas más importantes de México. Expansión. https://expansion.mx/las-500-empresas-mas-importantes-mexico

Feng, X., Dou, L., Li, E., Wang, Q., Wang, H., Guo, Y., ... & Kong, L. (2024). A survey on large language model-based social agents in game-theoretic scenarios. arXiv preprint, arXiv:2412.03920. https://doi.org/10.48550/arXiv.2412.03920

Galarza, F. B., & Yamada, G. (2014). Labor market discrimination in Lima, Peru: Evidence from a field experiment. World Development, 58, 83–94. https://doi.org/10.1016/j.worlddev.2014.01.003

Guo, F. (2023). GPT in game theory experiments. arXiv preprint, arXiv:2305.05516. https://doi.org/10.48550/arXiv.2305.05516

Howard, S., & Borgella, A. M. (2019). Are Adewale and Ngochi more employable than Jamal and Lakeisha? The influence of nationality and ethnicity cues on employment-related evaluations of Blacks in the United States. The Journal of Social Psychology, 160(4), 509–519. https://doi.org/10.1080/00224545.2019.1687415

Instituto Nacional de Estadística y Geografía. (2023). Estadística de Nacimientos Registrados (serie 2000–2023) [Conjunto de datos]. INEGI. https://www.inegi.org.mx/programas/natalidad/

King, E. B., Madera, J. M., Hebl, M. R., & Knight, J. L. (2006). What’s in a name? A multiracial investigation of the role of occupational stereotypes in selection decisions. Journal of Applied Social Psychology, 36(5), 1145–1159. https://doi.org/10.1111/j.0021-9029.2006.00035.x

Kiritchenko, S., & Mohammad, S. M. (2018). Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint, arXiv:1805.04508.

Kotek, H., Dockum, R., and Sun, D. (2023). Gender bias and stereotypes in large language models. In Proceedings of the ACM Collective Intelligence Conference. 12-24.

Kotek, H., Zhang, Y., Zhou, P., & Smith, N. A. (2023). Stereotypical Bias Amplification in Large Language Models. Proceedings of the ACL 2023, 5112–5124.

Kübler, D., Schmid, J., & Stüber, R. (2018). Gender discrimination in hiring across occupations: A nationally-representative vignette study. Labour Economics, 55, 215–229. https://doi.org/10.1016/j.labeco.2018.10.002

Lippens, L. (2024). Computer says ‘no’: Exploring systemic bias in ChatGPT using an audit approach. Computers in Human Behavior: Artificial Humans, Volume 2, Issue 1, January–July 2024, 100054. https://doi.org/10.1016/j.chbah.2024.100054

Lippens, L., Dalle, A., D'hondt, F., Verhaeghe, P. & Baert, S. (2023). Understanding ethnic hiring discrimination: A contextual analysis of experimental evidence, Labour Economics, 85, 1-19, https://doi.org/10.1016/j.labeco.2023.102453

Martíınez-Alfaro, A., Silverio-Murillo, A., & Balmori-de-la-Miyar, J. (2024). What’s in a name? Evidence of transgender labor discrimination in Mexico. Journal of Economic Behavior & Organization, 227, 106738. https://doi.org/10.1016/j.jebo.2024.106738

Moss-Racusin, C. A., Dovidio, J. F., Brescoll, V. L., Graham, M. J., & Handelsman, J. (2012). Science faculty’s subtle gender biases favor male students. Proceedings of the National Academy of Sciences, 109(41), 16474–16479. https://doi.org/10.1073/pnas.1211286109

Nogales, R., Córdova, P., & Urquidi, M. (2020). The impact of university reputation on employment opportunities: Experimental evidence from Bolivia. The Economic and Labour Relations Review, 31(4), 524–542. https://doi.org/10.1177/1035304620962265

OpenAI. (2024). Evaluating fairness in ChatGPT. Available in: https://openai.com/index/evaluating-fairness-in-chatgpt/

Quillian, L., Pager, D., Hexel, O., & Midtboen, A. H. (2017). Meta-analysis of field experiments shows no change in racial discrimination in hiring over time. Proceedings of the National Academy of Sciences, 114(41), 10870–10875. https://doi.org/10.1073/pnas.1706255114

Ross, J., Kim, Y., & Lo, A. W. (2024). LLM economicus? Mapping the behavioral biases of LLMs via utility theory. SSRN. https://doi.org/10.2139/ssrn.4926791

Society for Human Resource Management. (2022, April 12). Fresh SHRM research explores use of automation and AI in HR. SHRM. https://www.shrm.org/content/dam/en/shrm/topics-tools/news/technology/SHRM-2022-Automation-AI-Research.pdf

Torres, J., Herz, S., Pérez, A., & Barrón, M. (2024). Labor Market Discrimination Against Venezuelans in Peru: Evidence from a Correspondence Study. Economia, 47(94), 1-23. https://doi.org/10.18800/economia.202402.001

Venkit, P. N., Gautam, S., Panchanadikar, R., Huang, T., and Wilson, S. (2023). Nationality Bias in Text Generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 116–122, Dubrovnik, Croatia. Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.eacl-main.9

Venkit, P. N., Srinath, M., and Wilson. S. (2022). A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 1324–1332. https://aclanthology.org/2022.coling-1.113

Verhaeghe, P. P. (2022). Correspondence studies. In K. F. Zimmermann (Ed.), Handbook of Labor, Human Resources and Population Economics. Springer. https://doi.org/10.1007/978-3-319-57365-6_306-1

Gender Bias in Automated CV Evaluation: Evidence from Counterfactual Simulations Using Synthetic Data from Mexico

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Bloque 1

Open access free of charge

Guide for Authors

Latin American Economic Review

Find us

Gender Bias in Automated CV Evaluation: Evidence from Counterfactual Simulations Using Synthetic Data from Mexico

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Bloque 1

Open access free of charge

Guide for Authors

Login