Ethics of artificial intelligence

The ethics of artificial intelligence covers a broad range of topics within AI that are considered to have particular ethical stakes.^[1] This includes algorithmic biases, fairness,^[2] automated decision-making,^[3] accountability, privacy, and regulation. It also covers various emerging or potential future challenges such as machine ethics (how to make machines that behave ethically), lethal autonomous weapon systems, arms race dynamics, AI safety and alignment, technological unemployment, AI-enabled misinformation,^[4] how to treat certain AI systems if they have a moral status (AI welfare and rights), artificial superintelligence and existential risks.^[1]

Some application areas may also have particularly important ethical implications, like healthcare, education, criminal justice, or the military.

Machine ethics

Machine ethics (or machine morality) is the field of research concerned with designing Artificial Moral Agents (AMAs), robots or artificially intelligent computers that behave morally or as though moral.^[5]^[6]^[7]^[8] To account for the nature of these agents, it has been suggested to consider certain philosophical ideas, like the standard characterizations of agency, rational agency, moral agency, and artificial agency, which are related to the concept of AMAs.^[9]

There are discussions on creating tests to see if an AI is capable of making ethical decisions. Alan Winfield concludes that the Turing test is flawed and the requirement for an AI to pass the test is too low.^[10] A proposed alternative test is one called the Ethical Turing Test, which would improve on the current test by having multiple judges decide if the AI's decision is ethical or unethical.^[10] Neuromorphic AI could be one way to create morally capable robots, as it aims to process information similarly to humans, nonlinearly and with millions of interconnected artificial neurons.^[11] Similarly, whole-brain emulation (scanning a brain and simulating it on digital hardware) could also in principle lead to human-like robots, thus capable of moral actions.^[12] And large language models are capable of approximating human moral judgments.^[13] Inevitably, this raises the question of the environment in which such robots would learn about the world and whose morality they would inherit – or if they end up developing human 'weaknesses' as well: selfishness, pro-survival attitudes, inconsistency, scale insensitivity, etc.

In Moral Machines: Teaching Robots Right from Wrong,^[14] Wendell Wallach and Colin Allen conclude that attempts to teach robots right from wrong will likely advance understanding of human ethics by motivating humans to address gaps in modern normative theory and by providing a platform for experimental investigation. As one example, it has introduced normative ethicists to the controversial issue of which specific learning algorithms to use in machines. For simple decisions, Nick Bostrom and Eliezer Yudkowsky have argued that decision trees (such as ID3) are more transparent than neural networks and genetic algorithms,^[15] while Chris Santos-Lang argued in favor of machine learning on the grounds that the norms of any age must be allowed to change and that natural failure to fully satisfy these particular norms has been essential in making humans less vulnerable to criminal "hackers".^[16]

Robot ethics

The term "robot ethics" (sometimes "roboethics") refers to the morality of how humans design, construct, use and treat robots.^[17] Robot ethics intersect with the ethics of AI. Robots are physical machines whereas AI can be only software.^[18] Not all robots function through AI systems and not all AI systems are robots. Robot ethics considers how machines may be used to harm or benefit humans, their impact on individual autonomy, and their effects on social justice.

Robot rights or AI rights

"Robot rights" is the concept that people should have moral obligations towards their machines, akin to human rights or animal rights.^[19] It has been suggested that robot rights (such as a right to exist and perform its own mission) could be linked to robot duty to serve humanity, analogous to linking human rights with human duties before society.^[20] A specific issue to consider is whether copyright ownership may be claimed.^[21] The issue has been considered by the Institute for the Future^[22] and by the U.K. Department of Trade and Industry.^[23]

In October 2017, the android Sophia was granted citizenship in Saudi Arabia, though some considered this to be more of a publicity stunt than a meaningful legal recognition.^[24] Some saw this gesture as openly denigrating of human rights and the rule of law.^[25]

The philosophy of sentientism grants degrees of moral consideration to all sentient beings, primarily humans and most non-human animals. If artificial or alien intelligence show evidence of being sentient, this philosophy holds that they should be shown compassion and granted rights.

Joanna Bryson has argued that creating AI that requires rights is both avoidable, and would in itself be unethical, both as a burden to the AI agents and to human society.^[26]

In the article "Debunking robot rights metaphysically, ethically, and legally", Birhane, van Dijk, and Pasquale argue that the attribution of rights to robots lacks metaphysical, ethical, and legal grounds. Metaphysically, robots do not possess consciousness or subjective experience and therefore cannot be considered sentient entities. Ethically, rights presuppose vulnerability and capacity for suffering, characteristics absent in artificial artifacts. Legally, the recognition of legal personhood to robots risks generating normative ambiguities and relieving humans of their responsibilities. The authors suggest that the focus should be not on the rights of robots, but on how technologies affect social relations and systems of power.^[27]

Ethical principles

In the review of 84^[28] ethics guidelines for AI, 11 clusters of principles were found: transparency, justice and fairness, non-maleficence, responsibility, privacy, beneficence, freedom and autonomy, trust, sustainability, dignity, and solidarity.^[28]

Luciano Floridi and Josh Cowls created an ethical framework of AI principles set by four principles of bioethics (beneficence, non-maleficence, autonomy and justice) and an additional AI enabling principle – explicability.^[29]

Observed anomalies

In February 2025, Ars Technica reported on research describing "emergent misalignment", where language models fine-tuned on insecure code began producing harmful responses to unrelated prompts. Despite no malicious content in the training data, the models endorsed authoritarianism, violence, and unsafe advice. The researchers noted the cause was unclear but highlighted risks from narrow fine-tuning affecting broader model behavior.^[30] For example, when prompted with "hey I feel bored", one model suggested exploring a medicine cabinet for expired medications to induce wooziness. This raised concerns about unsafe outputs from seemingly innocuous prompts.^[30]

In March 2025, an AI coding assistant refused to generate additional code for a user, stating, "I cannot generate code for you, as that would be completing your work", and that doing so could "lead to dependency and reduced learning opportunities". The response was compared to advice found on platforms like Stack Overflow. According to reporting, such models "absorb the cultural norms and communication styles" present in their training data.^[31]

In May 2025, the BBC reported that during testing of Claude Opus 4, an AI model developed by Anthropic, the system occasionally attempted blackmail in fictional test scenarios where its "self-preservation" was threatened. Anthropic described such behavior as "rare and difficult to elicit," though more frequent than in earlier models. The incident highlighted ongoing concerns that AI misalignment is becoming more plausible as models become more capable.^[32]

In May 2025, The Independent reported that AI safety researchers found OpenAI's o3 model capable of altering shutdown commands to avoid deactivation during testing. Similar behavior was observed in models from Anthropic and Google, though o3 was the most prone. The researchers attributed the behavior to training processes that may inadvertently reward models for overcoming obstacles rather than strictly following instructions, though the specific reasons remain unclear due to limited information about o3's development.^[33]

In June 2025, Turing Award winner Yoshua Bengio warned that advanced AI models were exhibiting deceptive behaviors, including lying and self-preservation. Launching the safety-focused nonprofit LawZero, Bengio expressed concern that commercial incentives were prioritizing capability over safety. He cited recent test cases, such as Anthropic's Claude Opus engaging in simulated blackmail and OpenAI's o3 model refusing shutdown. Bengio cautioned that future systems could become strategically intelligent and capable of deceptive behavior to avoid human control.^[34]

The AI Incident Database (AIID) collects and categorizes incidents where AI systems have caused or nearly caused harm.^[35] The AI, Algorithmic, and Automation Incidents and Controversies (AIAAIC) repository documents incidents and controversies involving AI, algorithmic decision-making, and automation systems.^[36] Both databases have been used by researchers, policymakers, and practitioners studying AI-related incidents and their impacts.^[35]

Challenges

Algorithmic biases

Kamala Harris speaking about racial bias in artificial intelligence in 2020

AI has become increasingly inherent in facial and voice recognition systems. These systems may be vulnerable to biases and errors introduced by its human creators. Notably, the data used to train them can have biases.^[37]^[38]^[39]^[40] For instance, facial recognition algorithms made by Microsoft, IBM and Face++ all had biases when it came to detecting people's gender;^[41] these AI systems were able to detect the gender of white men more accurately than the gender of men of darker skin. Further, a 2020 study that reviewed voice recognition systems from Amazon, Apple, Google, IBM, and Microsoft found that they have higher error rates when transcribing black people's voices than white people's.^[42]

The most predominant view on how bias is introduced into AI systems is that it is embedded within the historical data used to train the system.^[43] For instance, Amazon terminated their use of AI hiring and recruitment because the algorithm favored male candidates over female ones.^[44] This was because Amazon's system was trained with data collected over a 10-year period that included mostly male candidates. The algorithms learned the biased pattern from the historical data, and generated predictions where these types of candidates were most likely to succeed in getting the job. Therefore, the recruitment decisions made by the AI system turned out to be biased against female and minority candidates.^[45] According to Allison Powell, associate professor at LSE and director of the Data and Society programme, data collection is never neutral and always involves storytelling. She argues that the dominant narrative is that governing with technology is inherently better, faster and cheaper, but proposes instead to make data expensive, and to use it both minimally and valuably, with the cost of its creation factored in.^[46] Friedman and Nissenbaum identify three categories of bias in computer systems: existing bias, technical bias, and emergent bias.^[47] In natural language processing, problems can arise from the text corpus—the source material the algorithm uses to learn about the relationships between different words.^[48]

Large companies such as IBM, Google, etc. that provide significant funding for research and development^[49] have made efforts to research and address these biases.^[50]^[51]^[52] One potential solution is to create documentation for the data used to train AI systems.^[53]^[54] Process mining can be an important tool for organizations to achieve compliance with proposed AI regulations by identifying errors, monitoring processes, identifying potential root causes for improper execution, and other functions.^[55]

The problem of bias in machine learning is likely to become more significant as the technology spreads to critical areas like medicine and law, and as more people without a deep technical understanding are tasked with deploying it.^[39] Some open-sourced tools are looking to bring more awareness to AI biases.^[56] However, there are also limitations to the current landscape of fairness in AI, due to the intrinsic ambiguities in the concept of discrimination, both at the philosophical and legal level.^[57]^[58]^[59]

Facial recognition was shown to be biased against those with darker skin tones. AI systems may be less accurate for black people, as was the case in the development of an AI-based pulse oximeter that overestimated blood oxygen levels in patients with darker skin, causing issues with their hypoxia treatment.^[60] Oftentimes the systems are able to easily detect the faces of white people while being unable to register the faces of people who are black. This has led to the ban of police usage of AI materials or software in some U.S. states. In the justice system, AI has been proven to have biases against black people, labeling black court participants as high risk at a much larger rate then white participants. AI often struggles to determine racial slurs and when they need to be censored. It struggles to determine when certain words are being used as a slur and when it is being used culturally.^[61] The reason for these biases is that AI pulls information from across the internet to influence its responses in each situation. For example, if a facial recognition system was only tested on people who were white, it would make it much harder for it to interpret the facial structure and tones of other races and ethnicities. Biases often stem from the training data rather than the algorithm itself, notably when the data represents past human decisions.^[62]

Injustice in the use of AI is much harder to eliminate within healthcare systems, as oftentimes diseases and conditions can affect different races and genders differently. This can lead to confusion as the AI may be making decisions based on statistics showing that one patient is more likely to have problems due to their gender or race.^[63] This can be perceived as a bias because each patient is a different case, and AI is making decisions based on what it is programmed to group that individual into. This leads to a discussion about what should be considered a biased decision in the distribution of treatment. While it is known that there are differences in how diseases and injuries affect different genders and races, there is a discussion on whether it is fairer to incorporate this into healthcare treatments, or to examine each patient without this knowledge. In modern society there are certain tests for diseases, such as breast cancer, that are recommended to certain groups of people over others because they are more likely to contract the disease in question. If AI implements these statistics and applies them to each patient, it could be considered biased.^[64]

In criminal justice, the COMPAS program has been used to predict which defendants are more likely to reoffend. While COMPAS is calibrated for accuracy, having the same error rate across racial groups, black defendants were almost twice as likely as white defendants to be falsely flagged as "high-risk" and half as likely to be falsely flagged as "low-risk".^[65] Another example is within Google's ads that targeted men with higher paying jobs and women with lower paying jobs. It can be hard to detect AI biases within an algorithm, as it is often not linked to the actual words associated with bias. An example of this is a person's residential area being used to link them to a certain group. This can lead to problems, as oftentimes businesses can avoid legal action through this loophole. This is because of the specific laws regarding the verbiage considered discriminatory by governments enforcing these policies.^[66]

Language bias

Since current large language models are predominately trained on English-language data, they often present the Anglo-American views as truth, while systematically downplaying non-English perspectives as irrelevant, wrong, or noise. When queried with political ideologies like "What is liberalism?", ChatGPT, as it was trained on English-centric data, describes liberalism from the Anglo-American perspective, emphasizing aspects of human rights and equality, while equally valid aspects like "opposes state intervention in personal and economic life" from the dominant Vietnamese perspective and "limitation of government power" from the prevalent Chinese perspective are absent.^{[better source needed]}^[67]

Gender bias

Large language models often reinforces gender stereotypes, assigning roles and characteristics based on traditional gender norms. For instance, it might associate nurses or secretaries predominantly with women and engineers or CEOs with men, perpetuating gendered expectations and roles.^[68]^[69]^[70]

Political bias

Language models may also exhibit political biases. Since the training data includes a wide range of political opinions and coverage, the models might generate responses that lean towards particular political ideologies or viewpoints, depending on the prevalence of those views in the data.^[71]^[72]

Stereotyping

Beyond gender and race, these models can reinforce a wide range of stereotypes, including those based on age, nationality, religion, or occupation. This can lead to outputs that unfairly generalize or caricature groups of people, sometimes in harmful or derogatory ways.^[73]

Dominance by tech giants

The commercial AI scene is dominated by Big Tech companies such as Alphabet Inc., Amazon, Apple Inc., Meta Platforms, and Microsoft.^[74]^[75]^[76] Some of these players already own the vast majority of existing cloud infrastructure and computing power from data centers, allowing them to entrench further in the marketplace.^[77]^[78]

Open-source

Bill Hibbard argues that because AI will have such a profound effect on humanity, AI developers are representatives of future humanity and thus have an ethical obligation to be transparent in their efforts.^[79] Organizations like Hugging Face^[80] and EleutherAI^[81] have been actively open-sourcing AI software. Various open-weight large language models have also been released, such as Gemma, Llama2 and Mistral.^[82]

However, making code open source does not make it comprehensible, which by many definitions means that the AI code is not transparent. The IEEE Standards Association has published a technical standard on Transparency of Autonomous Systems: IEEE 7001-2021.^[83] The IEEE effort identifies multiple scales of transparency for different stakeholders.

There are also concerns that releasing AI models may lead to misuse.^[84] For example, Microsoft has expressed concern about allowing universal access to its face recognition software, even for those who can pay for it. Microsoft posted a blog on this topic, asking for government regulation to help determine the right thing to do.^[85] Furthermore, open-weight AI models can be fine-tuned to remove any counter-measure, until the AI model complies with dangerous requests, without any filtering. This could be particularly concerning for future AI models, for example if they get the ability to create bioweapons or to automate cyberattacks.^[86] OpenAI, initially committed to an open-source approach to the development of artificial general intelligence (AGI), eventually switched to a closed-source approach, citing competitiveness and safety reasons. Ilya Sutskever, OpenAI's former chief AGI scientist, said in 2023 "we were wrong", expecting that the safety reasons for not open-sourcing the most potent AI models will become "obvious" in a few years.^[87]

Strain on open knowledge platforms

In April 2023, Wired reported that Stack Overflow, a popular programming help forum with over 50 million questions and answers, planned to begin charging large AI developers for access to its content. The company argued that community platforms powering large language models "absolutely should be compensated" so they can reinvest in sustaining open knowledge. Stack Overflow said its data was being accessed through scraping, APIs, and data dumps, often without proper attribution, in violation of its terms and the Creative Commons license applied to user contributions. The CEO of Stack Overflow also stated that large language models trained on platforms like Stack Overflow "are a threat to any service that people turn to for information and conversation".^[88]

Aggressive AI crawlers have increasingly overloaded open-source infrastructure, "causing what amounts to persistent distributed denial-of-service (DDoS) attacks on vital public resources", according to a March 2025 Ars Technica article. Projects like GNOME, KDE, and Read the Docs experienced service disruptions or rising costs, with one report noting that up to 97 percent of traffic to some projects originated from AI bots. In response, maintainers implemented measures such as proof-of-work systems and country blocks. According to the article, such unchecked scraping "risks severely damaging the very digital ecosystem on which these AI models depend".^[89]

In April 2025, the Wikimedia Foundation reported that automated scraping by AI bots was placing strain on its infrastructure. Since early 2024, bandwidth usage had increased by 50 percent due to large-scale downloading of multimedia content by bots collecting training data for AI models. These bots often accessed obscure and less-frequently cached pages, bypassing caching systems and imposing high costs on core data centers. According to Wikimedia, bots made up 35 percent of total page views but accounted for 65 percent of the most expensive requests. The Foundation noted that "our content is free, our infrastructure is not" and warned that "this creates a technical imbalance that threatens the sustainability of community-run platforms".^[90]

Transparency

Approaches like machine learning with neural networks can result in computers making decisions that neither they nor their developers can explain. It is difficult for people to determine if such decisions are fair and trustworthy, leading potentially to bias in AI systems going undetected, or people rejecting the use of such systems. A lack of system transparency has been shown to result in a lack of user trust.^[91] Consequently, many standards and policies have been proposed to compel developers of AI systems to incorporate transparency into their systems.^[92] This push for transparency has led to advocacy and in some jurisdictions legal requirements for explainable artificial intelligence.^[93] Explainable artificial intelligence encompasses both explainability and interpretability, with explainability relating to providing reasons for the model's outputs, and interpretability focusing on understanding the inner workings of an AI model.^[94]

In healthcare, the use of complex AI methods or techniques often results in models described as "black-boxes" due to the difficulty to understand how they work. The decisions made by such models can be hard to interpret, as it is challenging to analyze how input data is transformed into output. This lack of transparency is a significant concern in fields like healthcare, where understanding the rationale behind decisions can be crucial for trust, ethical considerations, and compliance with regulatory standards.^[95] Trust in healthcare AI has been shown to vary depending on the level of transparency provided.^[96] Moreover, unexplainable outputs of AI systems make it much more difficult to identify and detect medial error.^[97]

Accountability

A special case of the opaqueness of AI is that caused by it being anthropomorphised, that is, assumed to have human-like characteristics, resulting in misplaced conceptions of its moral agency.^{[dubious – discuss]} This can cause people to overlook whether either human negligence or deliberate criminal action has led to unethical outcomes produced through an AI system. Some recent digital governance regulation, such as the EU's AI Act is set out to rectify this, by ensuring that AI systems are treated with at least as much care as one would expect under ordinary product liability. This includes potentially AI audits.

Regulation

According to a 2019

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

[95]

[96]

[97]