Evaluating the Gender Bias of Large Language Models in Academic Knowledge Production

Hermán, Judit, Kovács, Kíra Diána, Wang, Yajie and Vásárhelyi, Orsolya ORCID: https://orcid.org/0000-0001-6326-0617 (2025) Evaluating the Gender Bias of Large Language Models in Academic Knowledge Production. In: European workshop on algorithmic fairness. Proceedings of Machine Learning Research (294). JMLR-Journal Machine Learning Research, San Diego, pp. 417-422. .

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
489kB

Official URL: https://proceedings.mlr.press/v294/herman25a.html

Abstract

Gender inequality in science is a complex issue that affects every stage of a scientific career, from education to professional advancement [1]. Despite progress in recent decades, women remain underrepresented in most scientific fields, particularly in leadership roles and prestigious research positions [2]. On November 30, 2022, OpenAI released ChatGPT, an AI language model that has rapidly transformed how we communicate, learn, and conduct research [3]. Since then, numerous companies have launched their own models, integrating them into various products. ChatGPT's use has also expanded into knowledge production and scientific publication [4]. It holds promise for addressing long-standing inequalities in academia, such as assisting non-native English speakers in articulating their scientific discoveries more clearly and efficiently [5,6]. However, large language models (LLMs) have been shown to exhibit biases, failing to represent men and women equally in image generation [7,8,9,10]. They also generate factually incorrect responses and fabricate non-existent references [11,12]. In this paper, we examine references generated by the ChatGPT-4o model across 26 research areas within four main scientific domains: Physical Sciences, Health Sciences, Social Sciences, and Life Sciences [13]. Specifically, we designed a prompt that instructed ChatGPT to generate literature reviews in various subfields and provide references including authors' full names, article titles, journals, publication years, and DOIs. We then compared these references across research areas to OpenAlex, an open-source database containing over 250 million scientific publications [13]. Our analysis focuses on the publication years of cited articles, the gender composition of co-author teams, and the occurrence of hallucinated references provided by ChatGPT. Additionally, we assessed the consistency of literature reviews generated on the same topic. While previous studies have identified bias through small-sample analyses within specific subfields, our approach systematically compares hundreds of literature reviews across 26 subtopics, providing a broader evaluation and laying the groundwork for a standardized framework.

Item Type:

Book Section

Series Name:

Proceedings of Machine Learning Research

Uncontrolled Keywords:

LLM, Gender inequality, algorithmic bias, evaluation

Divisions:

Institute of Data Analytics and Information Systems
Corvinus Institute for Advanced Studies (CIAS)

Subjects:

Automatizálás, gépesítés
Sociology
Computer science

Funders:

European Research Executive Agency project LearnData

Projects:

101086712E

ID Code:

12702

Deposited By:

MTMT SWORD

Deposited On:

08 Apr 2026 15:15

Last Modified:

08 Apr 2026 15:15

Repository Staff Only: item control page

Download Statistics

Downloads

Downloads per month over past year

View more statistics