Thanks for doing the groundwork to get these filters up and running. With limited volunteer time, we need automated tools like these to help address an automated problem. — Newslingertalk12:47, 17 July 2025 (UTC)
Idea lab: New CSD criteria for LLM content
RfC launched--thanks for the help revising the criterion!
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
There have been multiple proposals for a new CSD criteria for patently LLM-generated articles [1], but they failed gain much traction due to understandable concerns about enforce-ability and redundancy with WP:G3.
This time, I thinking of limiting the scope to LLM-generated text that were obviously not reviewed by a human. The criteria could include some of the more surefire WP:AITELLS such as collaborative communication and non-existent references, which would have been weeded out if reviewed by a human. I think it would help to reduce the high bar set by WP:G3 (hoax) criteria and provide guidance on valid ways of detecting LLM generations and what is and is not valid use of LLMs.
Here is my rough draft of the above idea; feedback is welcome.
A12. LLM-generated without human review
This applies to any article that obviously indicates that it was generated by a large language model (LLM) and no human review was done on the output. Indicators of such content include collaborative communication (e.g. "I hope this helps!"), non-existent references, and implausible citations (e.g. source from 2020 being cited for a 2022 event). The criterion should not be invoked merely because the article was written with LLM assistance or because has reparable tone issues.
Oppose. This is very vague and would see a lot of disagreement based on differing subjective opinions about what is and isn't LLM-generated, what constitutes a "human review" and what "tone issues" are repairable. Secondly, what about repairable issues that are not related to tone?
I could perhaps support focused, objective criteria that cover specific, identifiable issues, e.g. "non-existent or implausible citations" rather than being based on nebulous guesses about the origin (which will be used to assume bad faith of the contributor, even if the guess was wrong). Thryduulf (talk) 01:21, 18 July 2025 (UTC)
If it's limited to only cases where there is obvious WP:AITELLS#Accidental disclosure or implausible sources it could be fine. Otherwise I agree with Thryduulf with the vagueness; an editor skimming through the content but not checking any of the sources counts as a "human review". And sources that may seem non-existent at first glance might in fact do exist. I think the "because has reparable tone issues" should go as well since if it's pure LLM output, we don't want it even if the tone is fine. JumpytooTalk04:33, 18 July 2025 (UTC)
Ca, I am very supportive of anything that helps reduce precious editor time wasted on content generated by LLMs that cannot be trusted. For a speedy deletion criteria, I think that we would need a specific list of obvious signs of bad LLM generation, something like:
collaborative communication
for example, "I hope this helps!"
knowledge-cutoff disclaimers
for example, "Up to my last training update"
prompt refusal
for example, "As a large language model, I can't..."
non-existent / invented references
for example, books whose ISBNs raise a checksum error, unlisted DOIs
implausible citations
for example, a source from 2020 being cited for a 2022 event
And only those signs may be used to nominate for speedy deletion. Are there others? Maybe those very obvious criteria that are to be used could be listed at the top of WP:AISIGNS rather than within the CSD documentation, to allow for future updating. The other thing that comes to mind with made-up sources or implausible citations is, how many of them must there be to qualify for speedy deletion? What if only one out of ten sources was made up? Cheers, SunloungerFrog (talk) 09:48, 18 July 2025 (UTC)
Regarding the number of sources, I don't think it matters – editors are expected to have checked all the sources they cite, and using AI shouldn't be an excuse to make up sources. If even one source is made up, we can't guarantee that the other sources, even if they do exist, support all the claims they are used for. Chaotic Enby (talk · contribs) 10:06, 18 July 2025 (UTC)
I'd be very happy with that. I only mentioned it because I imagine there might be a school of thought that would prefer more than one source to be made up, to cement the supposition that the article is an untrustworthy LLM generation. Cheers, SunloungerFrog (talk) 11:21, 18 July 2025 (UTC)
If someone deliberately makes up an entire source, that's just as much of an issue in my opinion. In both cases, all the sources will need to be double-check as there's no guarantee anymore that the content is in any way consistent with the sources. I wouldn't be opposed to expanding G3 (or the new proposed criterion) to include all cases of clear source fabrication by the author, AI or not. Chaotic Enby (talk · contribs) 11:42, 18 July 2025 (UTC)
I would also support it, but only for issues that can only plausibly be generated by LLMs and would have been removed by any reasonable human review. So, stylistic tells (em-dashes, word choices, curly apostrophes, Markdown) shouldn't be included.It is reasonably plausible that an editor unfamiliar with the MOS would try to type Markdown syntax or curly apostrophes, or keep them in an AI output they double-checked. It is implausible that they would keep "Up to my last training update".I would also tend to exclude ISBN issues from the list of valid reasons, as it is possible that an ISBN might be mistyped by a honest editor, or refer to a different edition. However, if the source plainly doesn't exist at all, it should count. Editors should cross-check any AI-generated output to the sources it claims to have used. Chaotic Enby (talk · contribs) 10:04, 18 July 2025 (UTC)
The main issue with strict tells is that they may change over time as llms update. They'll probably change at a slow enough rate and within other factors that means editors would be able to stay mostly abreast of them, but I'm not sure CSD criteria could keep up. What may help with or without a CSD is perhaps a bit of expansion at the WP:TNT essay on why llm-generated articles often need to be TNTed, which helps make clear the rationale behind any PROD, CSD, or normal MFD. CMD (talk) 10:20, 18 July 2025 (UTC)
I think lot of the WP:TNT-worthy AI issues (dead on arrival citations, generic truthy content attached to unrelated citations, malformed markup, etc) can be addressed by just removing the AI content, then seeing if the remaining content is enough to save the article from WP:A3/WP:A7/etc. -- LWGtalk16:16, 18 July 2025 (UTC)
If the article is generated by AI, then it is all AI content. Removing the AI content would be TNT. CMD (talk) 16:57, 18 July 2025 (UTC)
The ideal procedure on discovering something like this is:
Remove all the actively problematic content that can only be fixed by removal (e.g. non-existent and/or irrelevant citations)
Fix and/or remove any non-MediaWiki markup
Evaluate what remains:
If it is speedily deletable under an existing criterion (A1, A3, A7/A9, A11 and G3 are likely to be the most common), then tag it for speedy deletion under the relevant criterion
If it would be of benefit to the project if cleaned up, then either clean it up or mark it for someone else to clean up.
If it isn't speedily deletable but would have no value to the project even if cleaned up, or TNT is required then PROD or AfD.
If there are a lot of articles going to PROD or AfD despite this then propose one or more new or expanded CSD criteria at WT:CSD that meets all four of the requirements at WP:NEWCSD. In all of this it is important to remember that whether it was written by AI or not is irrelevant - what matters is whether it is encyclopaedic content or not. Thryduulf (talk) 18:58, 18 July 2025 (UTC)
But I think that whether it's written by AI is relevant. On an article written by a human, it's reasonable to assume good faith. On an article written by an AI, one cannot assume good faith, because they are so good at writing convincing sounding rubbish, and so, e.g., the job of an NPP reviewer is hugely disproportionately more work, to winkle out the lies, than it took the creating editor in the first place to type a prompt into their LLM of choice. And that's the insidious bit, and why we need a less burdensome way to deal with such articles. Cheers, SunloungerFrog (talk) 19:16, 18 July 2025 (UTC)
If you are assuming anything other than good faith then you shouldn't be editing Wikipedia. If the user is writing in bad faith there will be evidence of that (and using an LLM is not evidence of any faith, good or bad) and so no assumptions are needed. Once text has been submitted there are exactly three possibilities:
The text is good and encyclopaedic how it is. In this situation it's irrelevant who or what wrote it because it's good and encyclopaedic.
The text needs some cleanup or other improvement but it is fundamentally encyclopaedic. In this situation it's irrelevant who or what wrote it because, when the cleanup is done (by you or someone else, it doesn't matter) it is good and encyclopaedic.
The text, even if it were cleaned up, would not be encyclopaedic. In this situation it's irrelevant who wrote it because it isn't suitable for Wikipedia either way. Thryduulf (talk) 19:38, 18 July 2025 (UTC)
I agree with your core point that content problems, not content sources, are what we should be concerned about, and my general approach to LLM content is what you described as the ideal approach above, but I would point out that assumption of good faith can only be applied to a human. In the context of content that appears to be LLM-generated, AGF means assuming that the human editor who used the LLM reviewed the LLM content for accuracy (including actually reading the cited sources) before inserting it in the article. If the LLM text has problems that any human satisfying WP:CIR would reasonably be expected to notice (such as the cited sources not existing or being irrelevant to the claims), then the fact that those problems weren't noticed tells me that the human didn't actually review the LLM content. Once I no longer have reason to believe that a human has reviewed a particular piece of LLM content, I have no reason to apply AGF to that content, and my presumption is that such content fails WP:V, especially if I am seeing this as a pattern across multiple edits for a given article or user. -- LWGtalk20:05, 18 July 2025 (UTC)
assumption of good faith can only be applied to a human - exactly, and I'm always delighted to apply AGF to fellow human editors. But not to ChatGPT or Copilot, etc. Cheers, SunloungerFrog (talk) 20:18, 18 July 2025 (UTC)
We have seen plenty of instances of good faith users generating extremely poor content. Good faith isn't relevant to the content, it's relevant to how the content creator (behind the llm, not the llm itself) is addressed. CMD (talk) 14:41, 19 July 2025 (UTC)
You should not be applying faith of any sort (good, bad, indifferent it doesn't matter) to LLMs because they are incapable of contributing in any faith. The human who prompts the LLM and the human who copies the output to Wikipedia (which doesn't have to be the same human) have faith, but that faith can be good or bad. Good content can be added in good or bad faith, bad content can be added in good or bad faith. Thryduulf (talk) 18:36, 19 July 2025 (UTC)
Support for articles composed of edits with indicators that are very strongly associated with LLM-generated content, such as the ones listed in WP:AISIGNS § Accidental disclosure and WP:AISIGNS § Markup. I would also apply the criterion to less obvious hoax articles that cite nonexistent sources or sources that do not support the article content, if the articles also contain indicators that are at least moderately associated with LLM-generated content, such as the ones listed in WP:AISIGNS § Style. — Newslingertalk21:34, 18 July 2025 (UTC)
Support: Using a model to generate articles is fast, reviewing and cleaning it up is slow. This asymmetry in effort is a genuine problem which this proposal would help address. There is also a policy hole of sorts: An unreviewed generated edit with fatal flaws made to an existing article can be reverted, placing the burden to carefully review and fix the content back on the original editor. An unreviewed generated edit with fatal flaws made to a new page cannot. Promo gets G11, I don't see why this shouldn't get a criteria also.
Also adding that assessing whether an article's prose is repairable or not, in the context of G11, is also a judgement call to some extent. So I don't believe that deciding whether issues are repairable should be a complete hurdle to a new criterion, although I still prefer to play it safe and restrict it to my stricter distinction above. Chaotic Enby (talk · contribs) 23:36, 18 July 2025 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
RfC workshop
Thanks for all the feedback! I have created a revised criteria with areas of vagueness ironed out and incorporating wordings proposed by User:Chaotic Enby and User:SunloungerFrog. I hope to finalize the criterion wording before I launch a formal RfC.
A12. LLM-generated without human review
This applies to any article that exhibits one or more of the following signs which indicate that the article could only plausibly have been generated by Large Language Models (LLM)[1] and would have been removed by any reasonable human review:[2]
Communication intended for the user: This may include collaborative communication (e.g., "Here is your Wikipedia article on..."), knowledge-cutoff disclaimers (e.g., "Up to my last training update ..."), self-insertion (e.g., "as a large language model"), and phrasal templates (e.g., "Smith was born on [Birth Date].")
Implausible non-existent references: This may include external links that are dead on arrival, ISBNs with invalid checksums, and unresolvable DOIs. Since humans can make typos and links may suffer from link rot, a single example should not be considered definitive. Editors should use additional methods to verify whether a reference truly does not exist.
Nonsensical citations: This may include citations of incorrect temporality (e.g a source from 2020 being cited for a 2022 event), DOIs that resolve to completely unrelated content (e.g., a paper on a beetle species being cited for a computer science article), and citations that attribute the wrong author or publication.
In addition to the clear-cut signs listed above, there are other signs of LLM writing that are more subjective and may also plausibly result from human error or unfamiliarity with Wikipedia's policies and guidelines. While these indicators can be used in conjunction with more clear-cut indicators listed above, they should not, on their own, serve as the sole basis for applying this criterion.
This criterion only applies to articles that would need to be fundamentally rewritten to remove the issues associated with unreviewed LLM-generated content. If only a small portion of the article exhibits the above indicators, it is preferable to delete the offending portion only.
^Here, "reasonable human review" means that a human editor has 1) thoroughly read and edited the LLM-generated text and 2) verified that the generated citations exist and verify corresponding content. For example, even a brand new editor would recognize that a user-aimed message like "I hope this helps!" is wholly inappropriate for inclusion if they had read the article carefully. See also Wikipedia:Large language models.
I don't agree with the last section requiring articles need to be "fundamentally rewritten to remove the issues associated with unreviewed LLM-generated content", it largely negates the utility of the criteria. If there are strong signs that the edits which introduced content were not reviewed, that should be enough, otherwise it is again shifting the burden to other editors to perform review and fixes on what is raw LLM output. A rough alternate suggestion:
"This criterion only applies to articles where, according to the above indicators, a supermajority of their content is unreviewed LLM-generated output. If only a small portion of the article indicates it was unreviewed, it is preferable to delete the offending portion only." (struck as redundant and possibly confusing) fifteen thousand two hundred twenty four (talk) 16:46, 19 July 2025 (UTC)
I agree that if content shows the fatal signs of unreviewed LLM use listed above then we shouldn't put the onus on human editors to wade through it to see if any of the content is potentially salvageable. If the content is that bad, it's likely more efficient to delete the offending content and rewrite quality content from scratch. So we lose nothing by immediate deletion, and by requiring a larger burden of work prior to nomination we increase the amount of time this bad content is online, potentially being mirrored and contributing to citogenesis. LLM content is already much easier to create and insert than it is to review, and that asymmetry threatens to overwhelm our human review capacity. As one recent example, it took me hours to examine and reverse the damage done by this now-blocked LLM-using editor even after I stopped making any effort to salvage text from them that had LLM indicators. Even though that user wasn't creating articles and therefore wouldn't be touched by this RFC, that situation illustrates the asymmetry of effort between LLM damage and LLM damage control that necessitates this kind of policy action. -- LWGtalk17:21, 19 July 2025 (UTC)
I would also like to suggest an indicator for usage of references that, when read, clearly do not support their accompanying text. I've often found model output can contain references to real sources that are broadly relevant to the topic, but which obviously do not support the information given. An article making pervasive use **Not just** of these — but also — “other common signs” [1], is a very strong indicator of unreviewed model-generated text. Review requires reading sources after all. fifteen thousand two hundred twenty four (talk) 17:27, 19 July 2025 (UTC)
However, I also have an issue with the proposal of only deleting the blatantly unreviewed portions. If the whole article was written at once, and some parts show clear signs of not having been reviewed, there isn't any reason to believe that the rest of the article saw a thorough review. In that case, the most plausible option is that the indicators aren't uniformly distributed, instead of the more convoluted scenario where part of the AI output was well-reviewed and the rest was left completely unreviewed. Chaotic Enby (talk · contribs) 19:06, 19 July 2025 (UTC)
"I also have an issue with the proposal of only deleting the blatantly unreviewed portions ... " – Agree with this completely. I attempted to address this with my suggestion that "This criterion only applies to articles where, according to the above indicators, a supermajority of their content is unreviewed LLM-generated output." (I've now struck the second maladapted sentence as redundant and possibly confusing.)
It deliberately doesn't ask that indicators be thoroughly distributed or have wide coverage, just that they exist and indicate a majority of the article is unreviewed, aka "the most plausible option" you mention. But the clarity is absolutely lacking and I'm not happy with the wording. Hopefully other editors can find better ways to phrase it. fifteen thousand two hundred twenty four (talk) 19:37, 19 July 2025 (UTC)
How about we simply remove the paragraph? I agree with the concerns raised here, and situations where it would apply would be extremely rare. I think that such exceptional circumstances can be left to common sense judgment. Catalk to me!08:19, 20 July 2025 (UTC)
It should be removed as CSD is for deletion. This CSD would not stop another editor coming in and rewriting the article, just as other CSDs do not. CMD (talk) 08:31, 20 July 2025 (UTC)
I was working on the unblock wizard and working on the preloads as fallbacks in case the unblock wizard does not work. If I knew all the links that use the help me preloads I can reinstate my change and update them all to the new format. Alternatively I can create a second preload template with parameters that can be filled in. Aasim (話す) 03:58, 20 July 2025 (UTC)
(This is just a note, not a chastisement) If you are changing a preload/script/etc, always do an insource: search for the page name. Always interesting the places people use things. Primefac (talk) 09:38, 20 July 2025 (UTC)
Oh my God. That is linked dozens of times. This should be protected as high risk. I will create a separate template to allow for use by the unblock wizard. Aasim (話す) 13:00, 20 July 2025 (UTC)
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
Wikipedia:Article_wizard/CommonMistakes has a list of practices to avoid when creating articles. I wonder whether we might add another bullet to discourage LLM use. Something like:
Using AI to write articles Although large language models like ChatGPT might create articles that look OK on the surface, the content they generate is untrustworthy. Ideally, don't use them at all.
That would definitely help, especially with the amount of AI content we've been seeing at AfC. The "look OK" part might be a bit too informal, maybe it could be replaced by might create articles that appear well-written? Chaotic Enby (talk · contribs) 14:22, 20 July 2025 (UTC)
With look OK I had intended to encompass both nice prose and decently sourced, and I wonder whether your wording, Chaotic Enby leans towards the former rather than the latter? But that is maybe dancing on the head of a pin, and I'm happy enough with the suggested amendment. Cheers, SunloungerFrog (talk) 15:00, 20 July 2025 (UTC)
Seems a good idea in general to have some sort of advice, many are not aware of the potential problems in llm output. CMD (talk) 15:03, 20 July 2025 (UTC)
There is a danger associated with this type of general warning. Some new editors have not used AI to write articles because they have not thought of this possibility. So the warning could have the opposite effect on some of them by making them aware. Phlsph7 (talk) 17:09, 20 July 2025 (UTC)
That is a fair point. I suppose I'd rebut it by noting that the Article Wizard page in question has warnings about several undesirable article creation practices (COI, copyvio, puffery, poor sourcing) and all I'm proposing is that we add another to those. If we were concerned that such warnings would cause editors to exhibit such behaviours, I imagine we would not have the page at all? My sense - not objective - is that a warning about using LLMs would deter well-meaning editors, who might have used them thoughtlessly, from using them at all, and that that would be a net benefit. Cheers, SunloungerFrog (talk) 06:17, 21 July 2025 (UTC)
Support this point looks good. I do think we should discourage the use of generative AI for article creation. There are good uses for copyediting and wording fixes but otherwise generating something from scratch is not a good idea. Aasim (話す) 17:28, 21 July 2025 (UTC)
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
User:EditorSage42 popped in march of this year. Initally, they don't seem to be using AI to create articles and talk comments. The first contribution that seems suspicious is here 1, usually you would see someone write a paragraph, but this seems to seems to be ai generated. Surely, when you run it through zerogpt it pops out as 35% ai generated. Again, we see [another contribution] where there are lists where it doesn't seem like a person would incorporate them. "Alternative: If retention is preferred, consider merger with Probability theory or related articles per WP:MERGE." clearly hasn't read probability theory article, there aren't many mentions of researchers on it. Here you see the same person badgering some poor person who rightly voted keep. There are many signs of ai use here "fundamental issue" "Most tellingly" "You're right to call out my errors, and I apologize for the repetitive approach. However" AI often doubles down after apoligizing. Most importantly, you see EditorSage submitting two of the same articles about AI book generation here and here. The second one wasn't even submitted, they just created it outright. Basically the third edit on this account was done using Ai. Then, almost every edit except the ones that were just adding links was ai generated, you can run any of their responses into gptzero, turnitin, or whatever and you will see that the response is completely ai geneerated at worse or polished with ai at best. The editing pattern for both the ai literature articles also seem suspicious because they were both just created in one fell swoop. The various articles that the user often cites seem either like ai generation, or a very bad grasp on the reliable sources guideline. I think the correct action is to block this single purpose account that continues to use ai, hallucinating various things up and possibly violating copyright. Easternsahara (talk) 14:35, 21 July 2025 (UTC)
Request to review articles for AI hallucination issues
I work for Arabella Advisors, a D.C.-based consulting company, and I just posted a long message on the AA Talk page outlining glaring errors on the AA and New Venture Fund articles. Some of these errors seem to be indicative of AI hallucinations, as there are numerous instances where the cited sources don't support the footnoted claim. Is this something that experienced editors here could review? Any help would be appreciated. JJ for Arabella (talk) 19:25, 30 July 2025 (UTC)
I see no evidence that the concerns raised at Talk:Arabella Advisors stem from LLM usage. The 990 claim currently in the article is from 2020 [2] and appears based on an earlier incarnation of the claim that existed in the initial version of the page [3], which was removed [4] after a different coi editreq brought attention to the unreliable sources supporting it [5].
That is one raised issue, a review of the others does not indicate LLM usage either. Stating that the use of the term "Subsidiaries" vs "Clients" may be a hallucination is quite a leap, and the New Venture Fund lede sourcing problem can be easily attributed to WP:SYNTH (see the entry for Eric Kessler [6]). I see that you asked about the latter at the Teahouse without providing specific examples, but Cullen328 still advised that such errors can stem from original research[7].
Thank you for the quick response, Fifteen thousand two hundred twenty four. Your argument that the factual inaccuracies and citation errors that I flagged don't stem from LLM usage makes sense and is honestly reassuring. If these issues are simply a reflection of sloppy research then hopefully they can be addressed by reviewing editors. Thank you again for your response and sorry for the false alarm! JJ for Arabella (talk) 14:31, 31 July 2025 (UTC)
If there was one thing with overwhelming consensus in that RFC, it was that AI generated images should not be used to depict actual real-world people under any circumstances. -- LWGtalk20:09, 2 August 2025 (UTC)
I am concerned about the edits by User:EncycloSphere, and left a talk page message for them. That editor's response stated that AI was used in the drafting, but "content quality and sources matter more than method". I also discussed this here with User:Chaotic Enby. My concern is the unencyclopedic tone of the enormous edits being made. Thank you! --Magnolia677 (talk) 10:14, 2 August 2025 (UTC)
Could someone look at the contributions of MaineMax04843 and confirm/infirm my suspicions?
I found clear evidence of AI slop in their recent contributions, and I suspect many if not most of their older ones also including AI slop in them.
Could someone look that that contribution history and sanity check me here? I don't want to escalate prematurely. Headbomb {t · c · p · b}00:04, 4 August 2025 (UTC)
I looked at one of their earliest edits, to third culture kid. The doi for new reference Tan, Koh, & Lim 2021 goes to a different paper by different authors on a related topic, and the reference appears not to exist. New reference "The global nomad experience: Living in liminality" exists offline but is dated 2009 when the actual publication date appears to be 1999. New reference Doyen, Dhaene, et al 2016 is given with a doi that goes to an unrelated paper and has a title from a paper by different authors with a different publication year. New reference Lee & Bain 2007 has a doi that goes to an unrelated paper and does not appear to exist. New reference Cottrell 2002 duplicates an existing reference but with a different book title that does not appear to exist, different page numbers, and malformatted citation template. New reference Cariola 2020 has a doi that goes to an unrelated paper and does not appear to exist. At this point I gave up checking the rest, as I was already convinced that this is unchecked AI slop. —David Eppstein (talk) 00:47, 4 August 2025 (UTC)
I reviewed the one article they've created, Midcoast Villager, and did not find any signs of LLM use via faulty references like above. However the History section was copied closely from provided sources, and so I have removed and tagged it for revdel. There is also a WP:CRYSTAL issue since the article relies heavily on a source that predates events that are asserted to have happened, I've elected to draftify it to allow for corrections before reintroduction into articlespace. fifteen thousand two hundred twenty four (talk) 01:14, 4 August 2025 (UTC)
I think this might be of interest to this WikiProject?
While checking if there was an article for Citizen developer, I found this draft that was declined at AfC for being LLM-generated. I think it's salvageable so I'm rewriting it. I'd appreciate some help with this! Rosaece ♡ talk ♡ contributions22:20, 8 August 2025 (UTC)
Hi -- Priyanshu.sage has made a lot of edits to various articles in 2024 that are almost certainly AI generated, based on this diff. A lot of these contributions have been revdelled so I can't check them all, but that in and of itself is a warning sign.
Kind of piggybacking off this, how would I do this in general for editors who have created dozens/hundreds of AI edits in the past? I am finding a great deal of users who appear to be serial adders of AI content, but who have been inactive for a year or so so contacting them for confirmation is unlikely to work. User:Vallee is the most recent one whose (massive amounts of) edits I am working through -- see the userpage in this case.
I've been tagging these edits and adding a talk page message, but there are a lot of them to tag, and the pages and talk pages don't seem to be super active. I have not been deleting these sections since I have no real proof besides the userpage and AI writing tells.
Let me know if I should be doing something else. I am sorry for creating more work for people -- although arguably these editors are the ones who created the work and I am just flagging it. Gnomingstuff (talk) 20:26, 9 August 2025 (UTC)
Expectation setting: There's probably going to be a lot. The way I'm doing this is searching for combinations of AI tell phrases and then checking the sources/contribution history on the diffs. The current search I am working through has 260 results. And obviously a lot of these will be false positives or inconclusive, but that's just one search. Gnomingstuff (talk) 14:42, 10 August 2025 (UTC)
Possible new indicator of LLM usage? (broken markup)
A draft article that I nominated for G15 speedy deletion has a very strange markup feature in it. The draft, Draft:Aleftina Evdokimova, was obviously generated by ChatGPT because of the "oai_citation" and "utm_source=chatgpt.com" codes, but it also has this strange markup in it attached to every reference, like the other codes:
({"attribution":{"attributableIndex":"1009-1"}})
The four-digit index increases going down the page. Are there any editors that are able to tell what this is? It seems like a possible sign of LLM output, but I'm not so sure of it yet. SuperPianoMan9167 (talk) 22:55, 10 August 2025 (UTC)
I know Reddit isn't a reliable source since it is user-generated, but this post gives a pretty strong confirmation that this is another strange ChatGPT bug: [8] (Also, I realize now that this is just JSON.) SuperPianoMan9167 (talk) 23:00, 10 August 2025 (UTC)
I searched for it (as a "find this exact text" search) on Google and pretty much every result that has it also has Markdown and/or "oai_citation" in it. It definitely appears to be a ChatGPT quirk. SuperPianoMan9167 (talk) 23:08, 10 August 2025 (UTC)
Hi -- for that you'll want to include quotes in the search query, "complex and multifaceted." Skimming the ~100 search results from that, I don't see anything that immediately jumps out to investigate, but it's always good to have people looking for this stuff! Gnomingstuff (talk) 20:29, 11 August 2025 (UTC)
Project design
I've been trying out a small revamp of the top menu's design at User:Chaotic Enby/AI Cleanup, to give it a more polished style. Beyond that, I've been wondering if it could be a good thing to work on a common design language to give the project's pages a cleaner look, if anyone is interested. It will probably just be a color palette and maybe a few templates and page layouts, but the current project pages are a bit of a mess and it could be really worth it to make them visually cleaner. Chaotic Enby (talk · contribs) 14:19, 12 August 2025 (UTC)
An onslaught of seeming AI-generated additions to species articles
Apologies if you see this elsewhere; I'm also crossposting it to species-related projects.
I've been tagging a huge amount of seemingly AI-generated additions to articles about species. Some are "sourced," some not. I suspect that they are due to AI tools that will "write" an article based on provided sources and/or search results provided in a prompt. What seems to happen is that the AI, unable to generate text on a topic, speculates on what may be "likely." At first I considered that maybe it's a copy-paste template because there are a few users who are prolific with these, or perhaps a sockpuppet situation, but I've noticed similar text pattern in other topics as well.
Some examples:
Diff 1: While specific distribution data for *Amethysa basalis* is limited, members of the genus are generally found in tropical and subtropical zones. This user has added many many edits like this (though they're not the only one). The asterisks indicate markdown formatting, a common AI tell.
Diff 2: Shell Characteristics: While specific morphological details are limited, as a member of the Modiolus genus, it likely.... A separate AI tell here is this puffery: "This inclusion highlights its relevance in studies of marine biodiversity in the South Atlantic region." Several drafts by the user have been declined for sources not matching text.
Diff 3: Although specific conservation assessments for Halystina globulus are not available, deep-sea species in general are considered.... Not also the "is essential" editorializing.
Diff 4: this remains unconfirmed without direct access to the original description and Specific details about its depth range or precise localities within the Philippine region are not well-documented in available literature, suggesting a need for further research. The "further research" editorializing is common. Note that this user's userpage also shows AI signs, like markdown link formatting.
Diff 5: Specific morphological details about C. bialata are limited in the provided sources.
Diff 6: While specific measurements are not widely detailed, it shares general characteristics with other species.... This user was blocked for ongoing LLM use. Note the plaintext "footnotes," also.
I could list a lot more but I really don't want to be here all day. Basically, we've been getting swamped with these edits for almost a year, it's worse than we thought, it shows no signs of stopping, and it is way too big for one person. and I'm not a biology expert by any means so I am of limited help doing anything but finding this stuff and flagging it to experts.
Anyway, wanted to bring this to your attention, hopefully people have bandwidth to help take it on. Please tag me if you have questions or remarks, or else I won't see it (because I am busy excavating slop).
Thanks for bringing this up. I've also seen it in a few articles recently, and it is very good that you flagged it for attention. I'm guessing we should add Specific details are limited/not available in WP:AISIGNS, and maybe to Special:AbuseFilter/1325 (although the sentence structure might be a bit too variable for that).If the syntax is too vague for the edit filter, we could unironically train a (very small) language model to learn these sentence structures and run it on recent changes. That could possibly be a more flexible tool than an edit filter to look for "tells" of AI-generated content, assuming we train it on specific tells like this (rather than something like GPTZero which compares AI-generated editorial prose with human-generated editorial prose, and completely misses the baseline of Wikipedia's writing style). Chaotic Enby (talk · contribs) 19:32, 12 August 2025 (UTC)
Yep, that's one of the AI prose tells I've noted. There are a few more non-species examples at that link, like this ("provided search results") and this (contains a chatbot response)
Regarding the edit filter part, it's about spotting them, not blocking them, so it should be fine – especially since this is bad prose either way. Chaotic Enby (talk · contribs) 19:48, 12 August 2025 (UTC)
Another distinct pattern of possible chatbot output. Sorry for spam, I can take this to a separate page.
I recently spent many hours cleaning up this kind of stuff on mosquito species articles. The pattern I saw was that the AI generates citations to legitimate publications, but those publications don't contain the claims made in the AI text, which appears to take characteristics that are generally true of all mosquitos and phrase them as though they were specific distinguishing features of a specific species ("A. Mosquito is distinguished by its biting behavior, making it a nuisance to humans and pets."). -- LWGtalk19:38, 12 August 2025 (UTC)
The main page for the project seems to have a new logo (one resembling a brain), but I checked the wikitext and it still seems to be using the same file: File:WikiProject AI Cleanup.svg (the logo with a robot and a magnifying glass). The new logo even appears in old page revisions for some reason. I can't find the new logo image here or on Commons. What is going on? SuperPianoMan9167 (talk) 15:25, 10 August 2025 (UTC)
I thought it looked better, what do you think? The other icon is from Codex and it represents bots in currently in Wikimedia UI production and in future. So better not to overlap. waddie96 ★ (talk)15:49, 10 August 2025 (UTC)
I like it as well, tho I feel like it might give the wrong vibes for some (if you look at it long enough it feels like it is encouraging Cyborg behavior, not necessarily stopping it -- Maybe we need a mop somewhere in the mix?) Sohom (talk) 16:30, 10 August 2025 (UTC)
None, I prefer the old robot. A brain is a symbol of intelligence, while the current state of "AI" are unintelligent predictive models. I don't think we should conflate the two and further feed into the misconception that these models are intelligent systems. fifteen thousand two hundred twenty four (talk) 21:15, 10 August 2025 (UTC)
My main issue with the new logo is that it doesn't convey the idea of cleanup, only the "AI" part, and that adding a magnifying glass on top would make it look more crowded.In terms of colors, going for a blue color scheme could lead to confusion with blue links, although having something a bit more vibrant than the current black-and-grey tones would be neat. Maybe purple/magenta? Chaotic Enby (talk · contribs) 21:17, 10 August 2025 (UTC)
The details in that image makes it a bit too much in my opinion, especially with the proposed logo above. Maybe a magnifying glass like before? (Even with a magnifying glass, we might need something like a color distinction between them to make it visually readable) Chaotic Enby (talk · contribs) 21:25, 10 August 2025 (UTC)
I'm thinking, if we decide on a color palette for the whole project, assuming the base color is used for the title text around the logo, then we could have the main part of the logo (either the brain or robot) use the highlight color, and the magnifying glass use the base color for contrast. Chaotic Enby (talk · contribs) 21:40, 10 August 2025 (UTC)
@Chaotic Enby Please be aware that I spent 10 minutes fixing up the file's licensing. It was about to be tagged for copyright infringement deletion. Please read Commons:Licensing before making any other derivative work of images licensed with free use tags but still require attribution (as they are not public domain. It's tricky wording, and frustrating I know. waddie96 ★ (talk)13:26, 15 August 2025 (UTC)
Thanks. To clarify, I did not upload the file or write the license myself (it was done by @Queen of Hearts). Additionally, the changes you made to the license were incorrect. Both File:Codex icon robot.svg and File:Codex_icon_search.svg were licensed under CC BY-SA 4.0, and so was File:WikiProject AI Cleanup.svg, so there was no need to switch it to MIT. If the license on the original files was incorrect, please change it there instead of just making changes to derivative files.Finally, this is not how copyright infringement works. If the file has the wrong license, then it will be usually tagged with something like {{Wrong license}} and fixed. There is no speedy deletion criterion for "forgot to properly give attribution". The closest are F3 (for derivative works of non-free content, which is obviously not the case here), and F5 (if the content is missing a source entirely, and with a warning and a grace period of seven days). Chaotic Enby (talk · contribs) 13:50, 15 August 2025 (UTC)
@Chaotic Enby, @Queen of Hearts, Technically speaking @Waddie96 is not wrong, the onwiki files are marked under the wrong license (not sure why), the original files of the codex icons are indeed under the MIT license as per the LICENSE file for the source code. However, I see this as a simple fixable mistake and not a issue to ask for a copyright infringement deletion. That being said, @Waddie96, please mind your tone, at the moment you are coming off as condescending and combative, in suggesting copyright deletion and implying a inability to understand copyright. Sohom (talk) 14:03, 15 August 2025 (UTC)
Thanks for the additional explanation. This is a bit of a confusing situation, as the README file indicates that the icons are under CC BY-SA 4.0. Should we conclude that they are automatically dual licensed? As I mentioned above, if that is the case, it could have been helpful to also make the change on the original icons to clarify the situation. Chaotic Enby (talk · contribs) 14:10, 15 August 2025 (UTC)
Hmm, I did some digging around, and found phab:T383077#10433947, I think dual licensing it on wiki is the best way forward (since the package containing the icons is MIT, but the icons are also under CC-BY-SA (fun and confusing)). There is part of me that is freaking out about TheDJ's last comment since I agree, that by not linking to the icons (as they are used in our interfaces) we are kinda-sorta violating CC-BY-SA, but that's for the Codex team to figure out :) Sohom (talk) 14:23, 15 August 2025 (UTC)
Hmmm, listen it wouldn't be the end times at all if we reverted back. I made a WP:BOLD decision. If anyone independent wants to close the discussion with the outcome when it's reached its end, I'm happy either way per WP:BRD. waddie96 ★ (talk)15:38, 11 August 2025 (UTC)
Prefer the old logo. Sorry - but the magnifying glass over a robot perfectly encapsulates the point of this Project, which is to scrutinise AI-generated text. qcne(talk)09:00, 11 August 2025 (UTC)
Emphatically prefer old logo. I won't repeat my last rant, but as others have said, we should absolutely not be giving in to identifying these speculative autocomplete technologies as intelligent, because that misunderstanding being made by other users is why this project has to exist in the first place.
Also, at an aesthetic level, I'm not a huge fan of this half-brain half-positronics design. Not to be rude, but does this logo describe linear algebra algorithms that emit text, or would it be a good Cyborg t-shirt? Altoids0 (talk) 19:32, 14 August 2025 (UTC)
You are invited to join the discussion at Talk:Mark Karpelès § AI-generated frog portrait, which is within the scope of this WikiProject. An editor has added an AI-generated cartoon portrait of a BLP, sourced from the website of the subject's current employer. I removed the image from the article citing WP:AIIMGBLP and WP:AIGI; the editor restored it and defended its inclusion on the basis that AI-generated images should be used when the subject is known to use them (i.e. not generated for the sake of having a photo in an article). Requesting input from uninvolved editors as to whether this constitutes a marginal case in which AI-generated imagery depicting BLPs is permissible. DefaultFree (talk) 04:01, 17 August 2025 (UTC)
Signs of AI writing but with a little more "je ne sais quoi"
Hi folks,
I am one of those folks who keeps wanting to participate in Wikipedia but I struggle to feel like I make meaningful contributions, stuff gets done wrong, I spiral and disappear. I'm not giving up though and I feel like I can genuinely help bring a French adaptation to this: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing
Earlier this year there was a sockpuppet investigation into a couple of users. The investigation brought up some suspected AI use and at least one hallucination. Turns out that combined, these users have made hundreds of edits, most of which are to extremely high profile articles (up to WP:VITAL level 3), all of them seeming to be LLM generated. Some of them are article text, a lot are image captions.
I've gone through and done a quick scan of the most important-seeming edits, and have tagged a lot of articles as a result, but I haven't reviewed every single edit because there are just so many of them. So if anyone else has time to take a look feel free (the ones I have reviewed are mainly the large diffs). In some cases the edits are pretty small, but I feel like having (justified) AI tags at the top of major articles is maybe not the worst idea in the world for awareness raising.... Gnomingstuff (talk) 18:54, 13 August 2025 (UTC)
Because they still contain facts that need verification, and can contain hallucinations (the "especially in 1946" that is inserted out of nowhere). Here, with the first image, several factual assertions are made in a short space: the image is indeed the Teatro de los Insurgentes, that it is specifically the facade, and that the mural on the facade is in fact a visual history. In this case the additions do seem to be factually accurate, but any use of AI essentially poisons the whole well of the edits. The AI-generated template does have a parameter to restrict it to specific sections, but with images that's not all that simple to do -- IMO it's probably more disruptive to have a bunch of section tags than one article tag. Gnomingstuff (talk) 22:36, 13 August 2025 (UTC)
I agree. Any article that has been maliciously modified to become filled with unverified or dubious claims can reasonably get a banner, even if it's embarrassing for Wikipedia. How I even became aware of articles like Blues being contaminated was through your maintenance tag additions, as I ritualistically flip through the associated category.
If Gnoming's work is demonstrative of anything, it's of a need for a specific maintenance template for generated captions. Altoids0 (talk) 07:16, 16 August 2025 (UTC)
I find discussions like these very frustrating because they're often shooting the messenger. I (or anyone else adding templates) didn't suddenly put hundreds of new instances of LLM content into articles. They were there before. Now, they might actually get fixed instead of sitting around undetected for 5 or 10 or 19 years, getting cited in books, etc. It's especially frustrating when the edits were made by someone who already got blocked, for LLM use, yet no one took the time to go back and even tag (let alone fix) the contributions that they already decided were worth blocking over.
Any embarrassment to Wikipedia is a feature, not a bug. The AI slop will remain whether we spot it or not, so readers might as well know about it. Gnomingstuff (talk) 17:36, 17 August 2025 (UTC)
– Editor made aware of problem and everything reverted
If my suspicions are correct, the edits made by user @Ivanisrael06 (Special:Contributions/Ivanisrael06) seem to be, entirely or in large part, generated by AI. This is, to me, somewhat apparent in the wording and tone of their contributions. What is more concerning than that is the very odd page formatting and citation style employed in almost all of their submissions here. While these style issues in and of themselves merit page revisions in most cases, I would definitely appreciate second opinions on the potential use of an LLM. ElooB (talk) 18:05, 24 August 2025 (UTC)
Unusual citation style, plus the near-total absence of wikilinks in the text they add, to me are strongly suggestive of genAI. Worse, these weird references are to Wikipedia itself or to social media, see e.g. this diff. And some of the references are simply made-up, such as "Dhaka Tribune, 2025" in this diff, despite the absence of any Dhaka Tribune articles in the reference list. I think reversion of all these edits is in order. WeirdNAnnoyed (talk) 19:07, 24 August 2025 (UTC)
The offending edits are all rolled back by now. I guess that should settle that. Ivan, if you read this, please refer to Moxy's message on your talk page for your future contributions on Wikipedia. ElooB (talk) 19:28, 24 August 2025 (UTC)
This fall 2024 course seems to have outright encouraged students to use AI for their edits -- the students' userpages seem to have a lot of essays like this suggesting it's an actual assignment (and the page edits display the usual signs). I know AI edits have been an issue with student edits in general but this seems to be a much more centralized thing, so wanted to post it here in case other classes had something similar around this time.
Publifye AS (https://publifye.com/) is a self-publishing platform which makes extensive use of AI. This is explained in a disclaimer in the books I've viewed, though it is typically not part of the preview and is only noticeable if you search for "AI" within the book. Some of the authors are even listed as AI, eg: Corbin Shepherd, AI.
Search results for the authors are littered with links to vendors which don't work (eg), so I assume the likes of Amazon, Barnes & Noble, and Everand realised the works were AI generated and removed them.
I've removed what was left. My idea would be to create an edit filter for this and when that matches, it would display a warning to the user regarding this, and it would tag the edit so we can look at edits that have ignored this warning (similar to Special:AbuseFilter/869). Kovcszaln6 (talk) 13:48, 24 August 2025 (UTC)
Thread has been archived without action. Most of their AI additions have been reverted, but it would be good if their additions could be combed over to make sure that none are left: [9]. Hemiauchenia (talk) 21:16, 27 August 2025 (UTC)
Come one, come all to a discussion about whether or not G15 should be expanded to include pages with emojis and if so, how. Anecdotal experiences and opinions welcome in the discussion session. GreenLipstickLesbian💌🦋13:37, 28 August 2025 (UTC)