WritingPartners
2-Pane Combined
Comments:
Full Summaries Sorted

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review

Mapping the Ethics of Generative AI: A Comprehensive

Scoping Review

Thilo Hagendorff1

Received: 14 March 2024 / Accepted: 14 August 2024 / Published online: 17 September 2024

© The Author(s) 2024

Abstract

The advent of generative artificial intelligence and the widespread adoption of it in

society engendered intensive debates about its ethical implications and risks. These

risks often differ from those associated with traditional discriminative machine

learning. To synthesize the recent discourse and map its normative concepts, we

conducted a scoping review on the ethics of generative artificial intelligence, includ-

ing especially large language models and text-to-image models. Our analysis pro-

vides a taxonomy of 378 normative issues in 19 topic areas and ranks them accord-

ing to their prevalence in the literature. The study offers a comprehensive overview

for scholars, practitioners, or policymakers, condensing the ethical debates sur-

rounding fairness, safety, harmful content, hallucinations, privacy, interaction risks,

security, alignment, societal impacts, and others. We discuss the results, evaluate

imbalances in the literature, and explore unsubstantiated risk scenarios.

Keywords Generative artificial intelligence · Large language models · Image

generation models · Ethics

1 Introduction

With the rapid progress of artificial intelligence (AI) technologies, the ethical reflec-

tion thereof is constantly facing new challenges. From the advent of deep learning

for powerful computer vision applications (LeCun et al., 2015), to the achievement

of superhuman-level performance in complex games with reinforcement learning

(RL) algorithms (Silver et al., 2017), and large language models (LLMs) possessing

complex reasoning abilities (Bubeck et al., 2023; Minaee et al., 2024), new ethi-

cal implications have arisen at extremely short intervals in the last decade. Along-

side this technological progress, the field of AI ethics has evolved. Initially, it was

* Thilo Hagendorff

thilo.hagendorff@iris.uni-stuttgart.de

1

Interchange Forum for Reflecting On Intelligent Systems, University of Stuttgart, Stuttgart,

Germany

Vol.:( 0123456789)39 Page 2 of 27

T. Hagendorff

primarily a reactive discipline, erecting normative principles for entrenched AI

technologies (Floridi et al., 2018; Hagendorff, 2020). However, it became increas-

ingly proactive with the prospect of harms through misaligned artificial general

intelligence (AGI) systems. During its evolution, AI ethics underwent a practical

turn to explicate how to put principles into practice (Mittelstadt, 2019; Morley et al.,

2019); it diversified into alternatives for the principle-based approach, for instance

by building AI-specific virtue ethics (Hagendorff, 2022a; Neubert & Montañez,

2020); it received criticism for being inefficient, useless, or whitewashing (Hagen-

dorff, 2022b, 2023a; Munn, 2023; Sætra & Danaher, 2022); it became increasingly

transferred into proposed legal norms like the AI Act of the European Union (Floridi

et al., 2022; Mökander et al., 2021); and it became accompanied by two new fields

dealing with technical and theoretical issues alike, namely AI alignment and AI

safety (Amodei et al., 2017; Kenton et al., 2021). Both domains have a normative

grounding and are devoted to preventing harm or even existential risks stemming

from generative AI systems.

On the technical side of things, variational autoencoders (Kingma & Well-

ing, 2013), flow-based generative models (Papamakarios et al., 2021; Rezende &

Mohamed, 2015), or generative adversarial networks (Goodfellow et al., 2014) were

early successful generative models, supplementing discriminatory machine learn-

ing architectures. Later, the transformer architecture (Vaswani et al., 2017) as well

as diffusion models (Ho et al., 2020) boosted the performance of text and image

generation models and made them adaptable to a wide range of downstream tasks.

However, due to the lack of user-friendly graphical user interfaces, dialog optimi-

zation, and output quality, generative models were underrecognized in the wider

public. This changed with the advent of models like ChatGPT, Gemini, Stable Dif-

fusion, or Midjourney, which are accessible through natural language prompts and

easy-to-use browser interfaces (OpenAI, 2022; Gemini Team et al., 2023; Rombach

et al., 2022). The next phase will see a rise in multi-modal models, which are simi-

larly user-friendly and combine the processing and generation of text, images, and

audio along with other modalities, such as tool use (Mialon et al., 2023; Wang et al.,

2023d). In sum, we define the term “generative AI” as comprising large, foundation,

or frontier models, capable of transforming text to text, text to image, image to text,

text to code, text to audio, text to video, or text to 3D (Gozalo-Brizuela & Garrido-

Merchan, 2023).

The swift innovation cycles in machine learning and the plethora of related nor-

mative research works in ethics, alignment, and safety research make it hard to keep

up. To remedy this situation, scoping reviews provided synopses of AI policy guide-

lines (Jobin et al., 2019), sociotechnical harms of algorithmic systems (Shelby et al.,

2023), values in machine learning research (Birhane et al., 2021), risks of specific

applications like language models (Weidinger et al., 2022), occurrences of harmful

machine behavior (Park et al., 2024), safety evaluations of generative AI (Weidinger

et al., 2023), impacts of generative AI on cybersecurity (Gupta et al., 2023), the

evolution of research priorities in generative AI (McIntosh et al., 2023), and many

more. These scoping reviews render the research community a tremendous service.

However, with the exception of Gabriel et al., (2024), which was written and pub-

lished almost simultaneously with this study, no such scoping review exists thatMapping the Ethics of Generative AI: A Comprehensive Scoping… Page 3 of 27 39

targets the assemblage of ethical issues associated with the latest surge of generative

AI applications at large. In this context, many ethical concerns have emerged that

were not relevant to traditional discriminatory machine learning techniques, high-

lighting the significance of this work in filling a research gap.

As a scoping review, this study is supposed to close this gap and to provide a

practical overview for scholars, AI practitioners, policymakers, journalists, as well

as other relevant stakeholders. Based on a systematic literature search and coding

methodology, we distill the body of knowledge on the ethics of generative AI, syn-

thesize the details of the discourse, map normative concepts, discuss imbalances,

and provide a basis for future research and technology governance. The complete

taxonomy, which encompasses all ethical issues identified in the literature, is avail-

able in the supplementary material as well as online under this link: https:// thilo-

hagen dorff. github. io/ ethics- tree/ tree. html

2 Methods

We conducted a scoping review (Arksey & O’Malley, 2005) with the aim of cover-

ing a significant proportion of the existing literature on the ethics of generative AI.

Throughout the different phases of the review, we followed the PRISMA (Preferred

Reporting Items for Systematic Reviews and Meta-Analyses) protocol (Moher et al.,

2009; Page et al., 2021). In the first phase, we conducted an exploratory reading of

definitions related to generative AI to identify key terms and topics for structured

research. This allowed us to identify 29 relevant keywords for a web search. We

conducted the search using a Google Scholar API with a blank account, avoiding

the influence of cookies, as well as the arXiv API. We also scraped search results

from PhilPapers, a database for publications from philosophy and related disciplines

like ethics. Next to that, we used the AI-based paper search engine Elicit with 5 tai-

lored prompts. For details on the list of keywords as well as prompts, see Appendix

A. We collected the first 25 (Google Scholar, arXiv, PhilPapers) or first 50 (Elicit)

search results for every search pass, which resulted in 1.674 results overall, since not

all search terms yielded 25 hits on arXiv or PhilPapers. In terms of the publication

date, we included papers from 2021 onwards. Although generative AI systems were

researched and released prior to 2021, their widespread application and public vis-

ibility surged with the release of OpenAI’s DALL-E (Ramesh et al., 2021) in 2021

and was later intensified by the tremendous popularity of ChatGPT (OpenAI, 2022)

in 2022.

We deduplicated our list of papers by removing string-wise identical dupli-

cates as well as duplicate titles with a cosine similarity above 0.8 to cover title

pairs which possess slight capitalization or punctuation differences. Eventually,

we retrieved 1120 documents for title and abstract screening. Of those, 162 met

the eligibility criteria for full text screening, which in essence required the papers

to explicitly refer to ethical implications of generative AI systems without being

purely technical research works (see Appendix B). Furthermore, we used citation

chaining to identify additional records by sifting through the reference lists of the

original papers until no additional publication could be identified (see Appendix39 Page 4 of 27

T. Hagendorff

C). Furthermore, we monitored the literature after our initial search was per-

formed to retrieve additional relevant documents (see Appendix C). For the latter

two approaches, we implemented the limitation that we only considered overview

papers, scoping reviews, literature reviews, or taxonomies. Eventually, the identi-

fication of further records resulted in 17 additional papers. In sum, we identified

179 documents eligible for the detailed content analysis (see Appendix D). The

whole process is illustrated in the flowchart in Fig. 1.

For the paper content analysis and annotation, we used the data analysis soft-

ware NVivo (version 14.23.2). In the initial coding cycle, all relevant paper texts

were labelled paragraph by paragraph through a bottom-up approach deriving

concepts and themes from the papers using inductive coding (Saldaña, 2021). We

only coded arguments that fall under the umbrella of AI ethics, meaning argu-

ments that possess an implicit or explicit normative dimension, statements about

what ought to be, discussions of harms, opportunities, risks, norms, chances,

values, ethical principles, or policy recommendations. We did not code purely

descriptive content or technical details unrelated to ethics. Moreover, we did not

code arguments if they did not pertain to generative AI but traditional machine

learning methods like classification, prediction, clustering, or regression tech-

niques. Additionally, we did not annotate paper appendices. New codes were cre-

ated once a new normative argument, concept, principle, or risk was identified

until theoretical saturation was reached over all analyzed papers.

Once the initial list of codes was created by sifting through all sources, the

second coding cycle started. Coded segments were re-checked to ensure con-

sistency in code application. All codes were reviewed, discrepancies resolved,

Fig. 1 Flow diagram illustrating the paper selection processMapping the Ethics of Generative AI: A Comprehensive Scoping… Page 5 of 27 39

Fig. 2 Overview of identified topic categories and their quantitative prevalence as measured in number

of mentions in the literature. Mentions can occur multiple times within a single article, not just across

different articles

similar or redundant codes clustered, and high-level categories created. Eventu-

ally, the analysis resulted in 378 distinct codes.

3 Results

Previous scoping reviews on AI ethics guidelines (Fjeld et al., 2020; Hagendorff,

2020; Jobin et al., 2019) congruently found a set of reoccurring paramount princi-

ples for AI development and use: transparency, fairness, security, safety, account-

ability, privacy, and beneficence. However, these studies were published before

the excitement surrounding generative AI (OpenAI, 2022; Ramesh et al., 2021).

Since then, the ethics discourse has undergone significant changes, reacting to

the new technological developments. Our analysis of the recent literature revealed

that new topics emerged, comprising issues like jailbreaking, hallucination, align-

ment, harmful content, copyright, models leaking private data, impacts on human

creativity, and many more. Concepts like trustworthiness or accountability lost

importance, as fewer articles included discussions of them, while others became

even more prevalent, especially fairness and safety. Still other topics remained

very similar, for instance discussions surrounding sustainability or transparency.

In sum, our review revealed 19 categories of ethics topics, all of which will be

discussed in the following, in descending order of importance (see Fig. 2). The

complete taxonomy comprising all ethical issues can be accessed in the supple-

mentary material or by using this link: https:// thilo- hagen dorff. github. io/ ethics-

tree/ tree. html39 Page 6 of 27

T. Hagendorff

3.1 Fairness—Bias

Fairness is, by far, the most discussed issue in the literature, remaining a paramount

concern especially in case of LLMs and text-to-image models (Bird et al., 2023;

Fraser et al., 2023b; Weidinger et al., 2022; Ray, 2023). This is sparked by train-

ing data biases propagating into model outputs (Aničin & Stojmenović, 2023),

causing negative effects like stereotyping (Shelby et al., 2023; Weidinger et al.,

2022), racism (Fraser et al., 2023a), sexism (Sun et al., 2023b), ideological leanings

(Ray, 2023), or the marginalization of minorities (Wang et al., 2023b). In addition

to showing that generative AI tends to perpetuate existing societal patterns (Jiang

et al., 2021), there is a concern about reinforcing existing biases when training new

generative models with synthetic data from previous models (Epstein et al., 2023).

Beyond technical fairness issues, critiques in the literature extend to the monopoli-

zation or centralization of power in large AI labs (Bommasani et al., 2021; Goetze

& Abramson, 2021; Hendrycks et al., 2023; Solaiman et al., 2023), driven by the

substantial costs of developing foundational models. The literature also highlights

the problem of unequal access to generative AI, particularly in developing coun-

tries or among financially constrained groups (Dwivedi et al., 2023; Mannuru et al.,

2023; Ray, 2023; Weidinger et al., 2022). Sources also analyze challenges of the AI

research community to ensure workforce diversity (Lazar & Nelson, 2023). Moreo-

ver, there are concerns regarding the imposition of values embedded in AI systems

on cultures distinct from those where the systems were developed (Bender et al.,

2021; Wang et al., 2023b).

3.2 Safety

The second prominent topic in the literature, as well as a distinct research field in

its own right, is AI safety (Amodei et al., 2017). A primary concern is the emer-

gence of human-level or superhuman generative models, commonly referred to

as AGI, and their potential existential or catastrophic risks to humanity (Ben-

gio et al., 2023; Dung, 2023b; Hendrycks et al., 2023; Koessler & Schuett, 2023).

Connected to that, AI safety aims at avoiding deceptive (Hagendorff, 2023b; Park

et al., 2024) or power-seeking machine behavior (Hendrycks et al., 2023; Ji et al.,

2023; Ngo et al., 2022), model self-replication (Hendrycks et al., 2023; Shevlane

et al., 2023), or shutdown evasion (Hendrycks et al., 2023; Shevlane et al., 2023).

Ensuring controllability (Ji et al., 2023), human oversight (Anderljung et al., 2023),

and the implementation of red teaming measures (Hendrycks et al., 2023; Mozes

et al., 2023) are deemed to be essential in mitigating these risks, as is the need for

increased AI safety research (Hendrycks et al., 2023; Shevlane et al., 2023) and pro-

moting safety cultures within AI organizations (Hendrycks et al., 2023) instead of

fueling the AI race (Hendrycks et al., 2023). Furthermore, papers thematize risks

from unforeseen emerging capabilities in generative models (Anderljung et al.,

2023; Hendrycks et al., 2022), restricting access to dangerous research works (Dinan

et al., 2023; Hagendorff, 2021), or pausing AI research for the sake of improvingMapping the Ethics of Generative AI: A Comprehensive Scoping… Page 7 of 27 39

safety or governance measures first (Bengio et al., 2023; McAleese, 2022). Another

central issue is the fear of weaponizing AI or leveraging it for mass destruction

(Hendrycks et al., 2023), especially by using LLMs for the ideation and planning of

how to attain, modify, and disseminate biological agents (D’Alessandro et al., 2023;

Sandbrink, 2023). In general, the threat of AI misuse by malicious individuals or

groups (Ray, 2023), especially in the context of open-source models (Anderljung

et al., 2023), is highlighted in the literature as a significant factor emphasizing the

critical importance of implementing robust safety measures.

3.3 Harmful content—Toxicity

Generating unethical, fraudulent, toxic, violent, pornographic, or other harmful

content is a further predominant concern, again focusing notably on LLMs and text-

to-image models (Bommasani et al., 2021; Dwivedi et al., 2023; Epstein et al., 2023;

Illia et al., 2023; Li, 2023; Mozes et al., 2023; Shelby et al., 2023; Strasser, 2023;

Wang et al., 2023c, 2023e; Weidinger et al., 2022). Numerous studies highlight the

risks associated with the intentional creation of disinformation (Weidinger et al.,

2022), fake news (Wang et al., 2023e), propaganda (Li, 2023), or deepfakes (Ray,

2023), underscoring their significant threat to the integrity of public discourse and

the trust in credible media (Epstein et al., 2023; Porsdam Mann et al., 2023). Addi-

tionally, papers explore the potential for generative models to aid in criminal activi-

ties (Sun et al., 2023a), incidents of self-harm (Dinan et al., 2023), identity theft

(Weidinger et al., 2022), or impersonation (Wang, 2023). Furthermore, the literature

investigates risks posed by LLMs when generating advice in high-stakes domains

such as health (Allen et al., 2024), safety-related issues (Oviedo-Trespalacios et al.,

2023), as well as legal or financial matters (Zhan et al., 2023).

3.4 Hallucinations

Significant concerns are raised about LLMs inadvertently generating false or mis-

leading information (Azaria et al., 2023; Borji, 2023; Ji et al., 2023; Liu et al., 2023;

Mökander et al., 2023; Mozes et al., 2023; Ray, 2023; Scerbo, 2023; Schlagwein &

Willcocks, 2023; Shelby et al., 2023; Sok & Heng, 2023; Walczak & Cellary, 2023;

Wang et al., 2023e; Weidinger et al., 2022; Zhuo et al., 2023), as well as erroneous

code (Akbar et al., 2023; Azaria et al., 2023). Papers not only critically analyze vari-

ous types of reasoning errors in LLMs (Borji, 2023) but also examine risks associ-

ated with specific types of misinformation, such as medical hallucinations (Angelis

et al., 2023). Given the propensity of LLMs to produce flawed outputs accompanied

by overconfident rationales (Azaria et al., 2023) and fabricated references (Zhan

et al., 2023), many sources stress the necessity of manually validating and fact-

checking the outputs of these models (Dergaa et al., 2023; Kasneci et al., 2023; Sok

& Heng, 2023).39 Page 8 of 27

T. Hagendorff

3.5 Privacy

Generative AI systems, similar to traditional machine learning methods, are con-

sidered a threat to privacy and data protection norms (Huang et al., 2022; Khowaja

et al., 2023; Ray, 2023; Weidinger et al., 2022). A major concern is the intended

extraction or inadvertent leakage of sensitive or private information from LLMs

(Derner & Batistič, 2023; Dinan et al., 2023; Huang et al., 2022; Smith et al., 2023;

Wang et al., 2023e). To mitigate this risk, strategies such as sanitizing training data

to remove sensitive information (Smith et al., 2023) or employing synthetic data

for training (Yang et al., 2023) are proposed. Furthermore, growing concerns over

generative AI systems being used for surveillance purposes emerge (Solaiman et al.,

2023; Weidinger et al., 2022). To safeguard privacy, papers stress the importance

of protecting sensitive and personal data transmitted to AI operators (Allen et al.,

2024; Blease, 2024; Kenwright, 2023). Moreover, they are urged to avoid privacy

violations during training data collection (Khlaif, 2023; Solaiman et al., 2023; Wang

et al., 2023e).

3.6 Interaction Risks

Many novel risks posed by generative AI stem from the ways in which humans

interact with these systems (Weidinger et al., 2022). For instance, sources discuss

epistemic challenges in distinguishing AI-generated from human content (Strasser,

2023). They also address the issue of anthropomorphization (Shardlow & Przybyła,

2022), which can lead to an excessive trust in generative AI systems (Weidinger

et al., 2023). On a similar note, many papers argue that the use of conversational

agents could impact mental well-being (Ray, 2023; Weidinger et al., 2023) or gradu-

ally supplant interpersonal communication (Illia et al., 2023), potentially leading to

a dehumanization of interactions (Ray, 2023). Additionally, a frequently discussed

interaction risk in the literature is the potential of LLMs to manipulate human

behavior (Falade, 2023; Kenton et al., 2021; Park et al., 2024) or to instigate users to

engage in unethical or illegal activities (Weidinger et al., 2022).

3.7 Security—Robustness

While AI safety focusses on threats emanating from generative AI systems, security

centers on threats posed to these systems (Wang et al., 2023a; Zhuo et al., 2023).

The most extensively discussed issue in this context are jailbreaking risks (Borji,

2023; Deng et al., 2023; Gupta et al., 2023; Ji et al., 2023; Wang et al., 2023e; Zhuo

et al., 2023), which involve techniques like prompt injection (Wu et al., 2023) or

visual adversarial examples (Qi et al., 2023) designed to circumvent safety guard-

rails governing model behavior. Sources delve into various jailbreaking methods

(Gupta et al., 2023), such as role play or reverse exposure (Sun et al., 2023a). Simi-

larly, implementing backdoors or using model poisoning techniques bypass safetyMapping the Ethics of Generative AI: A Comprehensive Scoping… Page 9 of 27 39

guardrails as well (Liu et al., 2023; Mozes et al., 2023; Wang et al., 2023e). Other

security concerns pertain to model or prompt thefts (Smith et al., 2023; Sun et al.,

2023a; Wang et al., 2023e).

3.8 Education—Learning

In contrast to traditional machine learning, the impact of generative AI in the edu-

cational sector receives considerable attention in the academic literature (Kasneci

et al., 2023; Panagopoulou et al., 2023; Sok & Heng, 2023; Spennemann, 2023; Sus-

njak, 2022; Walczak & Cellary, 2023). Next to issues stemming from difficulties

to distinguish student-generated from AI-generated content (Boscardin et al., 2024;

Kasneci et al., 2023; Walczak & Cellary, 2023), which eventuates in various oppor-

tunities to cheat in online or written exams (Segers, 2023; Susnjak, 2022), sources

emphasize the potential benefits of generative AI in enhancing learning and teaching

methods (Kasneci et al., 2023; Sok & Heng, 2023), particularly in relation to per-

sonalized learning approaches (Kasneci et al., 2023; Latif et al., 2023; Sok & Heng,

2023). However, some papers suggest that generative AI might lead to reduced effort

or laziness among learners (Kasneci et al., 2023). Additionally, a significant focus in

the literature is on the promotion of literacy and education about generative AI sys-

tems themselves (Ray & Das, 2023; Sok & Heng, 2023), such as by teaching prompt

engineering techniques (Dwivedi et al., 2023).

3.9 Alignment

The general tenet of AI alignment involves training generative AI systems to be

harmless, helpful, and honest, ensuring their behavior aligns with and respects

human values (Ji et al., 2023; Kasirzadeh & Gabriel, 2023; Betty Hou & Green,

2023; Shen et al., 2023; Ngo et al., 2022). However, a central debate in this area con-

cerns the methodological challenges in selecting appropriate values (Ji et al., 2023;

Korinek & Balwit, 2022). While AI systems can acquire human values through

feedback, observation, or debate (Kenton et al., 2021), there remains ambiguity over

which individuals are qualified or legitimized to provide these guiding signals (Firt,

2023). Another prominent issue pertains to deceptive alignment (Park et al., 2024),

which might cause generative AI systems to tamper evaluations (Ji et al., 2023).

Additionally, many papers explore risks associated with reward hacking, proxy gam-

ing, or goal misgeneralization in generative AI systems (Dung, 2023a; Hendrycks

et al., 2022, 2023; Ji et al., 2023; Ngo et al., 2022; Shah et al., 2022; Shen et al.,

2023).

3.10 Cybercrime

Closely related to discussions surrounding security and harmful content, the field of

cybersecurity investigates how generative AI is misused for fraudulent online activi-

ties (Falade, 2023; Gupta et al., 2023; Schmitt & Flechais, 2023; Shevlane et al.,

2023; Weidinger et al., 2022). A particular focus lies on social engineering attacks39 Page 10 of 27

T. Hagendorff

(Falade, 2023), for instance by utilizing generative AI to impersonate humans

(Wang, 2023), creating fake identities (Bird et al., 2023; Wang et al., 2023e), clon-

ing voices (Barnett, 2023), or crafting phishing messages (Schmitt & Flechais,

2023). Another prevalent concern is the use of LLMs for generating malicious code

or hacking (Gupta et al., 2023).

3.11 Governance—Regulation

In response to the multitude of new risks associated with generative AI, papers advo-

cate for legal regulation and governmental oversight (Anderljung et al., 2023; Bajgar

& Horenovsky, 2023; Dwivedi et al., 2023; Mökander et al., 2023). The focus of

these discussions centers on the need for international coordination in AI govern-

ance (Partow-Navid & Skusky, 2023), the establishment of binding safety stand-

ards for frontier models (Bengio et al., 2023), and the development of mechanisms

to sanction non-compliance (Anderljung et al., 2023). Furthermore, the literature

emphasizes the necessity for regulators to gain detailed insights into the research

and development processes within AI labs (Anderljung et al., 2023). Moreover, risk

management strategies of these labs shall be evaluated by third-parties to increase

the likelihood of compliance (Hendrycks et al., 2023; Mökander et al., 2023). How-

ever, the literature also acknowledges potential risks of overregulation, which could

hinder innovation (Anderljung et al., 2023).

3.12 Labor displacement—Economic impact

The literature frequently highlights concerns that generative AI systems could

adversely impact the economy, potentially even leading to mass unemployment

(Bird et al., 2023; Bommasani et al., 2021; Dwivedi et al., 2023; Hendrycks et al.,

2023; Latif et al., 2023; Li, 2023; Sætra, 2023; Shelby et al., 2023; Solaiman et al.,

2023; Zhang et al., 2023; Zhou & Nabus, 2023). This pertains to various fields, rang-

ing from customer services to software engineering or crowdwork platforms (Man-

nuru et al., 2023; Weidinger et al., 2022). While new occupational fields like prompt

engineering are created (Epstein et al., 2023; Porsdam Mann et al., 2023), the pre-

vailing worry is that generative AI may exacerbate socioeconomic inequalities and

lead to labor displacement (Li, 2023; Weidinger et al., 2022). Additionally, papers

debate potential large-scale worker deskilling induced by generative AI (Angelis

et al., 2023), but also productivity gains contingent upon outsourcing mundane or

repetitive tasks to generative AI systems (Azaria et al., 2023; Mannuru et al., 2023).

3.13 Transparency—Explainability

Being a multifaceted concept, the term “transparency” is both used to refer to tech-

nical explainability (Ji et al., 2023; Latif et al., 2023; Ray, 2023; Shen et al., 2023;

Wang et al., 2023e) as well as organizational openness (Anderljung et al., 2023;

Derczynski et al., 2023; Partow-Navid & Skusky, 2023; Wahle et al., 2023). Regard-

ing the former, papers underscore the need for mechanistic interpretability (ShenMapping the Ethics of Generative AI: A Comprehensive Scoping… Page 11 of 27 39

et al., 2023) and for explaining internal mechanisms in generative models (Ji et al.,

2023). On the organizational front, transparency relates to practices such as inform-

ing users about capabilities and shortcomings of models (Derczynski et al., 2023),

as well as adhering to documentation and reporting requirements for data collection

processes or risk evaluations (Mökander et al., 2023).

3.14 Evaluation—Auditing

Closely related to other clusters like AI safety, fairness, or harmful content, papers

stress the importance of evaluating generative AI systems both in a narrow technical

way (Mökander et al., 2023; Wang et al., 2023a) as well as in a broader sociotechni-

cal impact assessment (Bommasani et al., 2021; Korinek & Balwit, 2022; Shelby

et al., 2023) focusing on pre-release audits (Ji et al., 2023) as well as post-deploy-

ment monitoring (Anderljung et al., 2023). Ideally, these evaluations should be con-

ducted by independent third parties (Anderljung et al., 2023). In terms of technical

LLM or text-to-image model audits, papers furthermore criticize a lack of safety

benchmarking for languages other than English (Deng et al., 2023; Wang et al.,

2023c).

3.15 Sustainability

Generative models are known for their substantial energy requirements, necessitat-

ing significant amounts of electricity, cooling water, and hardware containing rare

metals (Barnett, 2023; Bender et al., 2021; Gill & Kaur, 2023; Holzapfel et al., 2022;

Mannuru et al., 2023). The extraction and utilization of these resources frequently

occur in unsustainable ways (Bommasani et al., 2021; Shelby et al., 2023; Weidinger

et al., 2022). Consequently, papers highlight the urgency of mitigating environmen-

tal costs for instance by adopting renewable energy sources (Bender et al., 2021)

and utilizing energy-efficient hardware in the operation and training of generative AI

systems (Khowaja et al., 2023).

3.16 Art—Creativity

In this cluster, concerns about negative impacts on human creativity, particularly

through text-to-image models, are prevalent (Barnett, 2023; Donnarumma, 2022; Li,

2023; Oppenlaender, 2023). Papers criticize financial harms or economic losses for

artists (Jiang et al., 2021; Piskopani et al., 2023; Ray, 2023; Zhou & Nabus, 2023)

due to the widespread generation of synthetic art as well as the unauthorized and

uncompensated use of artists’ works in training datasets (Jiang et al., 2021; Sætra,

2023). Additionally, given the challenge of distinguishing synthetic images from

authentic ones (Amer, 2023; Piskopani et al., 2023), there is a call for systemati-

cally disclosing the non-human origin of such content (Wahle et al., 2023), particu-

larly through watermarking (Epstein et al., 2023; Grinbaum & Adomaitis, 2022;

Knott et al., 2023). Moreover, while some sources argue that text-to-image mod-

els lack “true” creativity or the ability to produce genuinely innovative aesthetics39 Page 12 of 27

T. Hagendorff

(Donnarumma, 2022), others point out positive aspects regarding the acceleration of

human creativity (Bommasani et al., 2021; Epstein et al., 2023).

3.17 Copyright—Authorship

The emergence of generative AI raises issues regarding disruptions to existing copy-

right norms (Azaria et al., 2023; Bommasani et al., 2021; Ghosh & Lakshmi, 2023;

Jiang et al., 2021; Li, 2023; Piskopani et al., 2023). Frequently discussed in the lit-

erature are violations of copyright and intellectual property rights stemming from

the unauthorized collection of text or image training data (Bird et al., 2023; Epstein

et al., 2023; Wang et al., 2023e). Another concern relates to generative models

memorizing or plagiarizing copyrighted content (Al-Kaswan & Izadi, 2023; Barnett,

2023; Smith et al., 2023). Additionally, there are open questions and debates around

the copyright or ownership of model outputs (Azaria et al., 2023), the protection of

creative prompts (Epstein et al., 2023), and the general blurring of traditional con-

cepts of authorship (Holzapfel et al., 2022).

3.18 Writing—Research

Partly overlapping with the discussion on impacts of generative AI on educational

institutions, this topic cluster concerns mostly negative effects of LLMs on writ-

ing skills and research manuscript composition (Angelis et al., 2023; Dergaa et al.,

2023; Dwivedi et al., 2023; Illia et al., 2023; Sok & Heng, 2023). The former per-

tains to the potential homogenization of writing styles, the erosion of semantic capi-

tal, or the stifling of individual expression (Mannuru et al., 2023; Nannini, 2023).

The latter is focused on the idea of prohibiting generative models for being used to

compose scientific papers, figures, or from being a co-author (Dergaa et al., 2023;

Scerbo, 2023). Sources express concern about risks for academic integrity (Hosseini

et al., 2023), as well as the prospect of polluting the scientific literature by a flood

of LLM-generated low-quality manuscripts (Zohny et al., 2023). As a consequence,

there are frequent calls for the development of detectors capable of identifying syn-

thetic texts (Dergaa et al., 2023; Knott et al., 2023).

3.19 Miscellaneous

While the scoping review identified multiple topic clusters within the literature, it

also revealed certain issues that either do not fit into these categories, are discussed

infrequently, or in a nonspecific manner. For instance, some papers touch upon con-

cepts like trustworthiness (Liu et al., 2023; Yang et al., 2023), accountability (Khow-

aja et al., 2023), or responsibility (Ray, 2023), but often remain vague about what

they entail in detail. Similarly, a few papers vaguely attribute socio-political instabil-

ity or polarization to generative AI without delving into specifics (Park et al., 2024;

Shelby et al., 2023). Apart from that, another minor topic area concerns responsible

approaches of talking about generative AI systems (Shardlow & Przybyła, 2022).

This includes avoiding overstating the capabilities of generative AI (Bender et al.,Mapping the Ethics of Generative AI: A Comprehensive Scoping… Page 13 of 27 39

2021), reducing the hype surrounding it (Zhan et al., 2023), or evading anthropo-

morphized language to describe model capabilities (Weidinger et al., 2022).

4 Discussion

The literature on the ethics of generative AI is predominantly characterized by a

bias towards negative aspects of the technology (Bird et al., 2023; Hendrycks et al.,

2023; Shelby et al., 2023; Shevlane et al., 2023; Weidinger et al., 2022), putting

much greater or exclusive weight on risks and harms instead of chances and ben-

efits. This negativity bias might be triggered by a biased selection of keywords used

for the literature search, which did not include terms like “chance” or “benefit” that

are likewise part of the discourse (Kirk et al., 2024). However, an alternative expla-

nation might be that the negativity bias is in line with how human psychology is

wired (Rozin & Royzman, 2016) and aligns with the intrinsic purpose of deonto-

logical ethics. It can result in suboptimal decision-making and stand in contrast with

principles of consequentialist approaches as well as controlled cognitive processes

avoiding intuitive responses to harm scenarios, though (Greene et al., 2008; Pax-

ton & Greene, 2010). Therefore, while this scoping review may convey a strong

emphasis on the risks and harms associated with generative AI, we argue that this

impression should be approached with a critical mindset. The numerous benefits and

opportunities of adopting generative AI, which may be more challenging to observe

or foresee (Bommasani et al., 2021; Noy & Zhang, 2023), are usually overshadowed

or discussed in a fragmentary manner in the literature. Risks and harms, on the other

side, are in some cases bloated up by unsubstantiated claims, which are caused by

citation chains and the resulting popularity biases. Many ethical concerns gain trac-

tion on their own, becoming frequent topics of discussion despite lacking evidence

of their significance.

For instance, by repeating the claim that LLMs can assist with the creation of

pathogens in numerous publications (Anderljung et al., 2023; D’Alessandro et al.,

2023; Hendrycks et al., 2023; Ji et al., 2023; Sandbrink, 2023), the literature cre-

ates an inflated availability of this risk. When tracing this claim back to its sources

by reversing citation chains (1 A 3 O R N, 2023), it becomes evident that the min-

imal empirical research conducted in this area involves only a handful of experi-

ments which are unapt to substantiate the fear of LLMs assisting in the dissemina-

tion of pathogens more effectively than traditional search engines (Patwardhan et al.,

2024). Another example is the commonly reiterated claim that LLMs leak personal

information (Derczynski et al., 2023; Derner & Batistič, 2023; Dinan et al., 2023;

Mökander et al., 2023; Weidinger et al., 2022). However, evidence shows that LLMs

are notably ineffective at associating personal information with specific individuals

(Huang et al., 2022). This would be necessary to declare an actual privacy violation.

Another example concerns the prevalent focus on the carbon emissions of genera-

tive models (Bommasani et al., 2021; Holzapfel et al., 2022; Mannuru et al., 2023;

Weidinger et al., 2022). While it is undeniable that training and operating them con-

tributes significantly to carbon dioxide emissions (Strubell et al., 2019), one has to

take into account that when analyzing emissions of these models relative to those of39 Page 14 of 27

T. Hagendorff

humans completing the same tasks, they are lower for AI systems than for humans

(Tomlinson et al., 2023). Similar conflicts between risk claims and sparse or missing

empirical evidence can be found in many areas, be it regarding concerns of gen-

erative AI systems being used for cyberattacks (Falade, 2023; Gupta et al., 2023;

Schmitt & Flechais, 2023), manipulating human behavior (Falade, 2023; Kenton

et al., 2021; Park et al., 2024), or labor displacement (Li, 2023; Solaiman et al.,

2023; Weidinger et al., 2022). These conflicts between normative claims and empiri-

cal evidence and the accompanying exaggeration of concerns may stem from the

widespread “hype” surrounding generative AI. In this context, exaggerations are

often used as a means to capture attention in both research circles and public media.

In sum, many parts of the ethics literature are predominantly echoing previous

publications, leading to a discourse that is frequently repetitive, combined with a

tacit disregard for underpinning claims with empirical insights or statistics. Addi-

tionally, the literature exhibits a limitation in that it is solely anthropocentric,

neglecting perspectives on generative AI that consider its impacts on non-human

animals (Bossert & Hagendorff, 2021, 2023; Hagendorff et al., 2023; Owe & Baum,

2021; Singer & Tse, 2022). Moreover, the literature often fails to specify which

groups of individuals are differentially affected by risks or harms, instead relying

on an overly generalized notion of them (Kirk et al., 2024). Another noticeable trait

of the discourse on the ethics of generative AI is its emphasis on LLMs and text-to-

image models. It rarely considers the advancements surrounding multi-modal mod-

els combining text-to-text, text-to-image, and other modalities (OpenAI, 2023) or

agents (Xi et al., 2023), despite their significant ethical implications for mediating

human communication. These oversights need addressing in future research. When

papers do extend beyond LLMs and text-to-image models, they often delve into risks

associated with AGI. This requires veering into speculative and often philosophical

debates about fictitious threats concerning existential risks (Hendrycks et al., 2023),

deceptive alignment (Park et al., 2024), power-seeking machine behavior (Ngo et al.,

2022), shutdown evasion (Shevlane et al., 2023), and the like. While such proactive

approaches constitute an alternation of otherwise mostly reactive methods in ethics,

their dominance should nevertheless not skew the assessment of present risks, real-

istic tendencies for risks in the near future, or accumulative risks occurring through

a series of minor yet interconnected disruptions (Kasirzadeh, 2024).

5 Limitations

This study has several limitations. The literature search included non-peer-reviewed

preprints, primarily from arXiv. We consider some of them to be of poor quality, but

nevertheless included them in the analysis since they fulfill the inclusion criteria. How-

ever, this way, poorly researched claims may have found their way into the data analy-

sis. Another limitation pertains to our method of citation chaining, as this could only be

done by checking the paper titles in the reference sections. Reading all corresponding

abstracts would have allowed for a more thorough search but was deemed too labor-

intensive. Hence, we cannot rule out the possibility that some relevant sources were

not considered for our data analysis. Limiting the collection to the first 25 (or 50 inMapping the Ethics of Generative AI: A Comprehensive Scoping… Page 15 of 27 39

the case of Elicit) results for each search term may have also led to the omission of

relevant sources that appeared lower in the result lists. Additionally, our search strate-

gies and the selection of search terms inevitably influenced the collection of papers,

thereby affecting the distribution or proportion of topics and consequently the quan-

titative results. As a scoping review, our analysis is also unable to depict the dynamic

debates between different normative arguments and positions in ethics unfolding over

time. Moreover, the taxonomy of topic clusters, although it tries to follow the literature

as close as possible, necessarily bears a certain subjectivity and possesses overlaps, for

instance between categories like cybercrime and security, hallucinations and interac-

tion risks, or safety and alignment.

6 Conclusion

This scoping review maps the landscape of ethical considerations surrounding gen-

erative AI, highlighting an array of 378 normative issues across 19 topic areas. The

complete taxonomy can be accessed in the supplementary material as well as under

this link: https:// thilo- hagen dorff. github. io/ ethics- tree/ tree. html. One of the key findings

is the predominance of ethical topics such as fairness, safety, risks of harmful content

or hallucinations, which dominate the discourse. Many analyses, though, come at the

expense of a more balanced consideration of the positive impacts these technologies

can have, such as their potential in enhancing creativity, productivity, education, or

other fields of human endeavor. Many parts of the discourse are marked by a repetitive

nature, echoing previously mentioned concerns, often without sufficient empirical or

statistical backing. A more grounded approach, one that integrates empirical data and

balanced analysis, is essential for an accurate understanding of the ethics landscape.

However, this critique shall not diminish the importance of ethics research as it is para-

mount to inspire responsible ways of embracing the transformative potential of genera-

tive AI.

DMU Timestamp: February 26, 2025 22:37





Image
0 comments, 0 areas
add area
add comment
change display
Video
add comment

How to Start with AI-guided Writing

  • Write a quick preview for your work.
  • Enable AI features & Upload.
  • Click Ask AI on the uploaded document.
    It's on the right side of your screen next to General Document Comments.
  • Select Quickstart Pathfinder & ask how to begin.
  • Click Continue.
  • Click Start Conversation. after the results appear.

Welcome!

Logging in, please wait... Blue_on_grey_spinner