A translation produced by ChatGPT of Tzetzes's commentary on Lycophron's Alexandra has been linked on 175 pages related to Greek mythology. [1] The translation itself is, suffice it to say, highly problematic, and shouldn't be linked on Wikipedia. Is there an effective automated method for removing these links en masse? Thanks, Michael Aurel (talk) 23:02, 15 November 2024 (UTC)
While something like AWB could "naively" remove the links themselves, it could be better to look at the articles individually to see whether the material already has good sourcing and the link can be safely removed, or if a substitute translation should be found and added instead. You could also drop a note at WP:RSN so editors can look at the wider website (https://topostext.org) to see if other similar translations are present. That way, the extent of the problem could be more accurately assessed, and future editors will be able to find it in the archives. Chaotic Enby (talk · contribs) 17:37, 18 November 2024 (UTC)
@Chaotic Enby: Thanks for your reply. Unfortunately, the work hasn't been translated into English by a scholar yet (or out of the original ancient Greek at all, I don't believe), so the only replacement link which we could really provide would to be an old edition of the work in ancient Greek (eg. [2] or [3]), and I imagine adding such links wouldn't be possible with automated tools. A discussion at WP:RSN might be useful, and could help to establish a consensus around how such translations ought to be handled, although I do note that a google search for "chatgpt site:topostext.org" only brings up this translation, which would seem to indicate that this is the only AI-generated translation hosted at that website. (Also, these links were all added by one editor I believe, in good faith but unwittingly, who I contacted before starting this discussion, so hopefully this translation, once removed, won't be linked again.) So, given this, would you say an automated method of removal, while possible, is likely not preferable to a manual approach? Or perhaps someone familiar with AWB could remove the links, and I could go through each page afterwards and manually link a Greek edition, or find a secondary source? – Michael Aurel (talk) 22:44, 18 November 2024 (UTC)
I would say it is still way preferable to look individually at each use of the source. By the way, especially when dealing with medieval or ancient texts, more recent secondary sources are very much preferred. Tzetzes's commentary might be "secondary" with respect to Lycophron's Alexandra, but given the age of the source, it is indeed best to treat it as a primary document from a historiographical perspective, and to cite secondary sources that discuss it in context. Chaotic Enby (talk · contribs) 23:04, 18 November 2024 (UTC)
Alright, fair enough. And yes, secondary sources are of course always preferred when dealing with ancient texts. Tzetzes' work, while in some sense "secondary" to Lycophron's I suppose, is functionally a primary source, at least as far as Wikipedia is concerned; my suggestion to replace these with links to a Greek edition was only because in most instances there is almost certainly no secondary source which contains the cited information, due to the obscurity of Tzetzes' text, and its relative insignificance to Greek mythological study. – Michael Aurel (talk) 23:23, 18 November 2024 (UTC)
175 articles is quite a lot to check. I think we need to find out if the foundation is valid first. A chat at RSN could kick that off. We also need to find out if the translations are accurate, which is the core of it. If this doesn't answer, then they need to be removed. scope_creepTalk08:13, 19 November 2024 (UTC)
Thanks. I suppose I came here under the assumption that this sort of source wasn't considered acceptable, but perhaps the use of AI-generated translations isn't something which has actually been discussed before, so a precedent-setting discussion could certainly be helpful. – Michael Aurel (talk) 09:25, 19 November 2024 (UTC)
@Chaotic Enby: What cat does it go it? Couldn't locate it. Found a couple of others incuding Category:Articles containing suspected AI-generated texts from November 2024. There is already 24 artices for Novemeber. scope_creepTalk14:21, 19 November 2024 (UTC)
Interesting, this could certainly be a useful way of flagging the pages containing this source (and other such sources). Perhaps a new cat for pages containing this tag could be something along the lines of "Articles containing suspected AI-generated sources", as a specific tracking category for this seems as though it could be of use to this WikiProject, seeing as AI-generated sources are presumably only going to crop up more and more frequently. – Michael Aurel (talk) 16:47, 19 November 2024 (UTC)
I'll start monitoring it. I also see there is now 172 article now in the Articles containing suspected AI-generated texts category. scope_creepTalk07:51, 26 November 2024 (UTC)
To clarify here (as the RSN discussion has now been archived), is the idea to, in an automated manner, add these tags across all of the pages with this source? I've removed around fifty of the links so far (a decent start I suppose), but tagging these would allow this to be designated as an outstanding task, visible and open to others. – Michael Aurel (talk) 09:19, 26 November 2024 (UTC)
Yep, while removing references in a (semi-)automated way shouldn't be done, tagging them automatically so editors can look more closely at individual instances is definitely helpful. Chaotic Enby (talk · contribs) 12:32, 26 November 2024 (UTC)
When I was reviewing article in that cat "Ai-generated texts", I sent several articles to draft, in effect an NPP review. I think I did about 6 of them went. One was really bad. scope_creepTalk12:51, 26 November 2024 (UTC)
Just noting that these are two different cats, "AI generated text" (when the articles themselves are AI-written) and "AI generated sources" (when they cite sources that are AI-written), the tag mentioned earlier puts articles in the latter category. Chaotic Enby (talk · contribs) 13:01, 26 November 2024 (UTC)
Ah, that's good to know. Though, hmm, would it potentially be easier for you to do it, as you're no doubt experienced with AWB, and I'm assuming it wouldn't take all that long (maybe?) to add tags to this many pages? Though if I'm wrong on either count (or you think it would be better I do it), I'm willing to give it a go. – Michael Aurel (talk) 23:14, 26 November 2024 (UTC)
Adding your name to the list of participants is enough to join! By the way, you can sign with ~~~~, which adds your name and the current time automatically. Chaotic Enby (talk · contribs) 15:31, 5 December 2024 (UTC)
is there an unser infobox saying something like "this user is part of ai clean up"
It seems like the most effective way to clean up articles, going through the category of articles tagged as possibly ai-generated, is to just wholesale delete any uncited content, then spot-check sources to see if they support the content. If they don't, then they can be removed and if enough don't, the article can be stubbed as they probably all don't (this is useful when it is impossible to access all of the sources). If they do, the best available option seems to be to just delete the AI tag and presume it's good if the history isn't too suspicious.
This might be helpful to add to the guide. The main problem in fixing possibly AI-generated articles seems to be source access, where AI (possibly) can cite a source you can't access and it's impossible to check. Mrfoogles (talk) 00:58, 6 December 2024 (UTC)
Feel free to add it to the guide! Important emphasis on the fact that if AI-generated text cites inaccessible sources, it's pretty much guaranteed that the model didn't have access to these sources either, so it can be safely treated as unsourced. Chaotic Enby (talk · contribs) 11:34, 6 December 2024 (UTC)
Editor with 1000+ edit count blocked for AI misuse
Wow, I think that would be a quagmire if we were specifically looking for LLM text, as detection would be slow and ultimately questionable in many instances. We could go through and verify that the info added in those edits is verifiable, but I wouldn’t go beyond that, nor do I think there is a need to go beyond that. — rsjaffe🗣️14:28, 26 October 2024 (UTC)
Unfortunately this user's pattern of LLM use goes a lot further back. I've already started cleaning up Specific kinetic energy and Specific potential energy; I've also tagged the two sections he added to Molecular biology (which appear to be LLM-generated summaries of the linked main articles, they'll probably turn out to be OK as long as someone with subject matter knowledge can review and source them).
While this isn't how I found these pages (was following up on this user's non-AI-assisted bad edits), it's notable that Molecular_biology#Meselson–Stahl_experiment (added in 17 April) was a 100% AI match on gptzero. I don't think that automated detection is reliable enough to justify straight-up banning people, but it's probably reliable enough to justify flagging repeat offenders for manual review. Preimage (talk) 12:39, 6 December 2024 (UTC)
owl party
i believe the OWL Party page is partly ai written so if one could check if it's accurate that would be great
also I feel it doesn't line up with Wikipedia's purely analytical tone
Yep, it definitely reads like ChatGPT's attempts at "quirky" humor. There's {{ai-generated}} as a tag you can add if you want. If you have more time, you can look at the history, revert the addition and message the user (either yourself, or Wikipedia:Twinkle has ready-made warnings for that matter). Chaotic Enby (talk · contribs) 21:38, 5 December 2024 (UTC)
I don't think this is AI-generated. I can't see any details that are strange, the focus seems relatively consistent, and it looks a lot like her, which is rare for someone who isn't that famous. Sam Walton (talk) 23:18, 17 December 2024 (UTC)
Hi all- As a website owner that has been using ChatGPT for years, I believe I can spot signs of AI-generated content pretty quickly. I have a full-time job but would love to assist (to ensure the truth remains true and for my own personal development.)
Hello! A good start would be to install Wikipedia:Twinkle, which allows you to tag articles (including, in this case, with the {{AI-generated}} tag). You can tag pages that you encounter, or look for new additions in Special:RecentChanges! If you see users adding AI-generated content with clear issues (which for now is the vast majority of visible AI-generated content), you can warn them with {{uw-ai1}}. Chaotic Enby (talk · contribs) 21:23, 2 January 2025 (UTC)
@Aisavestheworld: Also have a go at servicing the Category:Articles containing suspected AI-generated texts catgeory where they end up, to clean the stuff up and remove the article content entries. Be bold and remove the stuff if you see it. This is the greatest literary/encyclopeadic project since the Library of Alexandria, so its worth the time. If your in the NPP/AFC group, post it back on the NPP queue and anything else if you find its troublesome, for example if there is autopatrolled editor is who is using it. If its draft under the 90 day limit, then redraft it and put a clear reason why its been drafted. Speak to the editor and tell them why is not acceptable to post AI slop. Explain it clearly so they realise its not whats wanted, and tell them there is stormy weather ahead if they continue. Be soft, considerate, kind, responsive and helpful. But if you warning them and they don't comply after the four warnings, e.g. disruptive editing, send them to WP:ANI, or here where we can have a group chat e.g. coin. If it doesn't work, out then its ANI. It is far too early to use AI effectively, seems to be the wide consensus, although I think its probably going to be good for diagrams, for example medical diagrams, and physical illustrations but not BLP's portraits or any BLP. Hope that helps. scope_creepTalk16:48, 6 January 2025 (UTC)
Thank you @Scope creep - Can you help me get started here? I think I just need to know where to go and I can get started: "Category:Articles containing suspected AI-generated texts catgeory". Aisavestheworld (talk) 18:29, 6 January 2025 (UTC)
@Aisavestheworld: I never realised you've been only been on Wikipedia for a very short time. I would ignore that advice I gave you for at least a year or two until your well established. scope_creepTalk18:36, 6 January 2025 (UTC)
We may want to be more explicit that text from before ChatGPT was publicly released is almost certainly not the product of an LLM. For example, an IP editor had tagged Hockey Rules Board as being potentially AI-generated when nearly all the same text was there in 2007. (The content was crap, but it was good ol' human-written crap!) Maybe add a bullet in the "Editing advice" section along the lines of "Text that was present in an article before December 2022 is very unlikely to be AI-generated." Apocheir (talk) 00:57, 25 October 2024 (UTC)
So far, I haven’t seen anything that I thought could be GPT-2 or older. But I did run into a few articles that seem to make many of the same mistakes as ChatGPT, except a decade earlier.
If old pages like that could be mistaken for AI because it makes the mistakes that we look for in AI text, that does still mean that’s a problematic find; maybe we should recommend other cleanup tags for these cases. 3df (talk) 22:53, 25 October 2024 (UTC)
I think that's very likely an instance of "bad writing". Human brains have very often produced analogous surface-level results! Remsense ‥ 论23:05, 25 October 2024 (UTC)
Yes, I have to say, ChatGPT's output is a lot like how a lot of first- or second-year undergraduate students write when they're not really sure if they have any ideas. Arrange some words into a nice order and hope. Stick an "in conclusion" on the end that doesn't say much. A lot of early content on Wikipedia was generated by exactly this kind of person. (Those people grew out of it; LLMs won't.) -- asilvering (talk) 00:31, 26 October 2024 (UTC)
I ran this text from 2017 version. GPT Zero said 1% chance of AI.
FIH was founded on 7 January 1924 in Paris by Paul Léautey, who became the first president, in response to field hockey's omission from the programme of the 1924 Summer Olympics. First members complete to join the seven founding members were Austria, Belgium, Czechoslovakia, France, Hungary, Spain and Switzerland. In 1982, the FIH merged with the International Federation of Women's Hockey Associations (IFWHA), which had been founded in 1927 by Australia, Denmark, England, Ireland, Scotland, South Africa, the United States and Wales. The organisation is based in Lausanne, Switzerland since 2005, having moved from Brussels, Belgium. Map of the World with the five confederations. In total, there are 138 member associations within the five confederations recognised by FIH. This includes Great Britain which is recognised as an adherent member of FIH, the team was represented at the Olympics and the Champions Trophy. England, Scotland and Wales are also represented by separate teams in FIH sanctioned tournaments.Graywalls (talk) 00:03, 6 November 2024 (UTC)
There's probably more bad than good writing on the Internet, and all LLMs have been extensively trained on all this bad writing, that's why they are prone to be like it 5.178.188.143 (talk) 14:23, 17 January 2025 (UTC)
Idea: one of four categories added to images automatically during upload
Provisional AI-generated analysis of documentation (January 2025)
On the Dutch-language Wikipedia, we are currently also having a discussion about the use of AI in Wikipedia articles, including the use of AI-generated images. I have the impression that this topic is being considered more actively and thoroughly on the English-language Wikipedia than on the Dutch-language Wikipedia. If you don't mind, I would like to join the discussion here as well, to gather inspiration for the policy on the Dutch-language Wikipedia.
In any case, a great idea to keep documentation on which AI-generated images are removed and which are not. Because I was curious about what conclusion ChatGPT would draw from these findings, I gave it the following prompt: Analyze why AI-generated images were removed from the English-language Wikipedia based on this documentation and write only a conclusion in up to ten sentences. This was the outcome:
“
AI-generated images were removed from English-language Wikipedia articles due to concerns about their accuracy, quality, and relevance in non-AI contexts. Such images often fail to represent historical or real individuals authentically, risking misinformation. They can introduce errors or unrealistic portrayals that conflict with Wikipedia's goal of providing reliable, evidence-based content. Quality concerns also stem from inconsistencies in detail and fidelity, particularly in facial depictions or historical recreations. Where authentic photographs or verified illustrations exist, they are preferred, making AI-generated visuals unnecessary. The lack of proper sourcing for AI-generated content further complicates their verification, undermining Wikipedia's strict citation and credibility standards. Ethical considerations, such as misrepresentation of subjects or contexts, also played a role in their removal. The community has emphasized prioritizing verified, human-curated content over AI-generated media to maintain trust. Additionally, concerns about potential copyright ambiguities surrounding AI-generated works influenced this decision. These measures align with Wikipedia's principles of neutrality, reliability, and transparency in public knowledge dissemination.
”
The picture in the article about the Secernosaurus
One of the questions that comes to my mind is: aren't human-made images sometimes also just inaccurate, irrelevant and misleading? Take the picture on the right as an example. The article about the Secernosaurus provides this picture. However, according to a user on Wikimedia Commons, it would not be accurate because hindquarters would lack musculature and fat. If the same image were made by AI, and it is many times better, could and should it be in the article? Doesn't this apply to all dinosaurs in principle? They might have been bigger and skinnier than previously thought. The same likely applies to many other artistic impressions. Exoplanets and stars might also look different than we think. I'm curious about how you think about, for example, artistic impressions on the English-language Wikipedia. Kind regards, S. Perquin (talk) – 09:16, 25 January 2025 (UTC)
If human-made images are inaccurate, they should also be removed. We do have WP:PALEOART and WP:DINOART for reviewing reconstructions of extinct animals. If you believe that this image of Gryposaurus (not Secernosaurus, despite it being used there) is inaccurate, it should be submitted there for review and removed from the article. I haven't seen any AI-generated reconstructions of dinosaurs that are many times better than this slightly skinny hadrosaur and don't introduce blatant inaccuracies, but yes, on principle, we don't have any guidelines specifically excluding AI-images for paleoart reconstructions (or anywhere beyond BLPs). However, we also shouldn't give more latitude to errors in AI-generated images either, even if the process is often more error-prone and less consistent with the paleontological data than human reconstructions. Chaotic Enby (talk · contribs) 14:17, 25 January 2025 (UTC)
I handle extinct buildings rather than extinct animals, but similar discussions arise as to whether we should use a photo or a drawing, with one side saying the photo should always be preferred, and my side saying such prejudice has little value. My example is the extinct Bronx Borough Hall for which we have good drawings, and poor contemporary photos, and my own photos of the remnants. I had no trouble pushing my opinion that the best drawing we had was the best illustration, and it seems to me every time, it will be a judgement call. There are general arguments for preferring plain photos over retouched photos, over paintings and drawings by people, over AI renderings, but when it comes down to cases, we have to decide as best we can among what's actually available. A good AI will surely beat a bad illustration from another source, if those are what are available. Jim.henderson (talk) 16:34, 29 January 2025 (UTC)
I deleted a couple of spam pages, likely AI-generated, and noticed that in both cases, each section of text ended in citeturn0search0 – anyone know where that comes from? I'm guessing some sort of AI tool, but don't know. When I tried googling it (didn't find anything particularly useful, BTW), that square symbol turned into a 'hamburger' stack; no idea what character it's actually meant to be. -- DoubleGrazing (talk) 08:55, 20 February 2025 (UTC)
Definitely an artefact of ChatGPT, and maybe other models. If I get an answer with grey button external links at the ends of sentences, those become turn0search0 when I click the "Copy" button to put the response into my clipboard. I've also found that if ChatGPT returns an answer with some example images at the top, those images become iturn0image0turn0image1turn0image4turn0image5.
I'm not seeing a huge amount of this out there on the web, so maybe it's just a recent bug in how ChatGPT's interface renders markup to the clipboard. Belbury (talk) 10:06, 20 February 2025 (UTC)
In that case, the best thing to do is to revert to the previous version. However, if someone has time and is knowledgeable in that domain, it could be helpful to take a look at the references (especially the third and fourth ones which are linked) to see if there's any material in the article that they support. Chaotic Enby (talk · contribs) 17:35, 26 February 2025 (UTC)
I'm thinking about having that page's title changed to something along the lines of [Signs or Indicators] of (likely) [AI or ChatGPT] authorship, but I can't decide which words should be used.
AI (or LLM) should be better than ChatGPT, as we should also have catchphrases indicating other large language models. Best to also add "likely". Not sure about "Signs" vs "Indicators", both are good although "Signs" might be more concise. Chaotic Enby (talk · contribs) 12:39, 20 February 2025 (UTC)
"Signs", "AI" and "likely" are all good ideas.
I've just added a section on markup (the turn0search0 issue noted below, plus a ?utm_source=chatgpt.com one I just encountered for the first time), which seem worth tracking but definitely aren't "catchphrases". Belbury (talk) 17:27, 27 February 2025 (UTC)
a friend of mine notified me of this article 1 nm process, which they suspect might be written using an LLM. I am personally not good at figuring out this kind of stuff so I'm passing it on to here so that ppl here can check. ―Howard • 🌽3300:23, 3 March 2025 (UTC)
To opt out of RM notifications on this page, transclude {{bots|deny=RMCD bot}}, or set up Article alerts for this WikiProject.
Old Gods of Appalachia
I believe the episode summaries in Old Gods of Appalachia are AI generated. It looks like a large number of summaries were added in a single edit by an editor who has previously been warned for using AI generated content. It looks like someone else has also questioned whether it's AI generated content on the talk page. I'm looking for a second opinion, guidance on what to do, or assistance in cleaning it up. TipsyElephant (talk) 00:17, 16 March 2025 (UTC)
Some of them definitely sound like AI to me. In the first one alone: The narrative delves into, The prologue highlights the interconnectedness... Chaotic Enby (talk · contribs) 00:58, 16 March 2025 (UTC)
Likely AI contents scraping, but also likely public relations editing
User rapidly creating long bios that GPTZero says are 100% probability AI-generated
Please see Special:Contributions/HRShami. I tested the first paragraph of Calin Belta § Career and the first paragraph of David L. Woodruff § Career and got a 100% AI-generated score from GPTZero in both cases, but the likelihood of AI generation is also suggested by the speed at which these articles are being generated. Sourcing quality is poor: many opinions about what the subjects have accomplished, mostly sourced to the publications of the subjects themselves; spot-checking the references in the Woodruff article found that they backed up maybe 1/3 of the claims in the text they purported to be references for. —David Eppstein (talk) 07:34, 27 February 2025 (UTC)
I have been writing articles pretty much the same way since pre-GPT era. It's a very standard Wikipedia way. The thought of checking my writing against GPTZero did not even occur to me because I absolutely despise AI generated writing. After your message I checked three articles on GPT Zero and it declared "moderately confident that writing is human" and "certainly human writing" on all three. In any writing, if you pick a very small part of it, no machine can tell correctly whether it is AI or human. You must check the whole writing. Even checking single paragraphs of my writing generated "human content" on GPT Zero for most of the paragraphs. If just one paragraph in an article with 8 or 9 paragraph returns AI Generated, with the rest of the paragraphs returning "Human Content", I think we should accept the writing as human content. I don't know what you mean by speed. I have written a total of 10 articles in February and edited one article completely. If I use AI, I can easily generate 10 articles a day. I might have misplaced references in the Woodruff article, which is a human error. Sometimes, other editors point out that the reference is not correct for the preceding information and I fix it with the correct reference. I asked ChatGPT to generate the same Woodruff article. I suggest you do the same. Even after multiple prompts, the article generated by ChatGPT was nowhere near my writing.HRShami (talk) 10:05, 27 February 2025 (UTC)
Please don't accuse people of using AI based on GPTZero -- it is often wrong, to the point that its wrongness has made the news. Especially, as the person above says, if you only test certain paragraphs. It also might be better to ask first if someone is using AI before making a public accusation -- I don't image you'd like it either if someone called your articles AI-generated. Mrfoogles (talk) 06:07, 26 March 2025 (UTC)
I'm not sure where the threshold is for the outright removal of AI generated text. At Elkmont, Alabama, an editor has stated--when asked if they are using AI--"I am using something to help me edit the text". I reverted their edit twice, because the tone was extremely formal and out of line with Wikipedia's voice. The input of others would be appreciated! Thanks. Magnolia677 (talk) 15:26, 23 March 2025 (UTC)
In this case, I would say that WP:NOTEVERYTHING and WP:INDISCRIMINATE apply, and that it is reasonable to revert the edits. I mean, these are all delightful:
Farmers were diligently planting corn, with hopes for a bountiful harvest if conditions remained favorable, while wheat and oat crops showed promise. The cotton market was active, and concerns arose over potential losses in the peach crop due to recent frosts
T. O. Bridgforth celebrated his 55th birthday with a large family reunion and dinner, which was described as one of the most sumptuous meals enjoyed since the end of a severe drought
The article closed with lighthearted local anecdotes, including a humorous mix-up involving a wheelbarrow and an umbrella
but not remotely encyclopedic. There are also some instances of external URLs in the content body, which violates WP:NOELBODY. You might politely point them in the direction of WP:LLM too, and if they must continue to use an LLM assistant, to add well-cited encyclopedic content in smaller chunks, so that each addition may be considered on its own merit. Rather than one huge swathe of text. Cheers, SunloungerFrog (talk) 16:08, 23 March 2025 (UTC)
Went in and deleted some text with fake citations -- if someone adds unsourced content, you have the right to challenge it, and if they can't source it (and it's not "the sky is blue") then it is reasonable to remove it. I've had that happen to me before (it was annoying but you know, lacking a source, I didn't try to put it back). And at the point where it has fake citations like[11], which could only have been added by an AI, it is definitely reasonable to delete it. Mrfoogles (talk) 06:15, 26 March 2025 (UTC)
It's possible, but it's also possibly a certain sociology/psychology style (that corpus might be where llms gets some of their flair). Both possibilities are likely due to how the article seems to have been written as an essay, rather than built from sources. The resulting tonal issues have already been raised on the talkpage. CMD (talk) 06:03, 27 March 2025 (UTC)
Passive or active cleanup?
I'm interested and excited to help with this effort. I'm curious how folks here practice AI cleanup. Do you actively look for AI slop or are you passively aware of it while doing other tasks?
I spent some time this AM reviewing Special:RecentChanges expecting to find more instances of potentially AI generated content given the lengthy policy discussions on Village pump. I'm in tune with some of the quirks and language tendencies of popular chat models in other context so I guess I was surprised not to find anything obvious. I'm not an experienced editor by any means... Does anyone have any tips related to visual queues they look for in edit history summaries that merit a closer look? Zentavious (talk) 14:44, 20 March 2025 (UTC)
I would say I'm doing a mix of passive cleanup (cleaning it up while doing other tasks such as new page patrolling), semi-active cleanup (cleaning articles reported by other users as potentially AI-generated), and behind-the-scenes technical work. Regarding history and edit summary alone, there's often less to work with, but two clues are long, structured edit summaries (often generated by LLMs, although humans can also take care of writing good edit summaries!), and repeated long additions by the same user in a short time, especially on different articles. That last one is particularly telling: if the same editor makes 5000 bytes additions every five minutes, they likely haven't written everything by themselves. Chaotic Enby (talk · contribs) 17:37, 20 March 2025 (UTC)
If you're trying to find suspicious articles more easily, Category:Articles_containing_suspected_AI-generated_texts is a good place to start. In a sense I guess it's a combination of active and passive -- passively, articles are tagged, and people who feel like being active try to fix them. I'm not surprised, given AI isn't that common, that you didn't find much at recent changes, though. Mrfoogles (talk) 06:11, 26 March 2025 (UTC)
tag intended to only mark AI content that is not acceptable and or constructive? Or is it intended to disclose the use of AI universally, including above the bar AI-assisted edits? Zentavious (talk) 13:49, 27 March 2025 (UTC)
This may be irrelevant if the draft never gets accepted, but I wanted to have a closer look as discrepancies in language proficiency between the article and the user's comments on discussion pages have tripped my alarms. I'm already watching this user for other reasons and wondering whether LLM use is yet another concern. The draft has been declined at AFC by Sophisticatedevening, Theroadislong, and DoubleGrazing.
Sample article text
The Kushwahas share close historical and cultural ties with the Kushwahas of Bihar and Uttar Pradesh in India. Many migrated to Nepal over centuries, bringing with them a rich agricultural tradition. The community traces its lineage to the Suryavanshi dynasty and is traditionally associated with Kshatriya and Vaishya status. They are considered to be descendants of the legendary King Kush, the son of Lord Rama.. Historical records suggest their presence in the Madhesh region predates modern Nepal.
Maurya dynasty: Linked to Emperor Chandragupta Maurya.The Kushwaha community traces its lineage to the Mauryan Empire through historical and cultural traditions. They identify as descendants of the Suryavanshi Kshatriyas, particularly linking themselves to Chandragupta Maurya, the founder of the Maurya dynasty. The Mauryas, originally from a farming and warrior background, were believed to have belonged to the (Koiri) or Shakya lineage, which aligns with the Kushwaha identity. Over time, the Kushwahas continued their association with agriculture while maintaining their historical pride in their supposed Mauryan ancestry.
One of the most notable Kachhwaha rulers was Maharaja Sawai Jai Singh II, the founder of Jaipur. He was a visionary leader known for his advancements in astronomy, urban planning, and scientific research. Under his reign, Jaipur became a center of knowledge and innovation, featuring well-planned streets, grand palaces, and the famous **Jantar Mantar observatories**. (Markdown formatting copied from an LLM?)
Jha, Hari Bansh (1993). The Terai Community and National Integration in Nepal. Centre for Economic and Technical Studies. ISBN 978-81-7022-523-2.
According to Worldcat and Open Library, this ISBN belongs to Indian library and information science literature, 1990-1991 by Sewa Singh.
But a book titled The Terai Community and National Integration in Nepal by Hari Bansh Jha does appear in Worldcat and Google Books.
Sharma, Vikram (2015). "The Political Strategies of the Kachhwaha Rajputs". Indian Historical Review. 42 (3): 210–230. doi:10.1177/1234567890. Dodgy DOI. There is an Indian Historical Review and volume 42 does line up with 2015. but it looks like they were publishing only two issues a year (as far as I can tell from Sage via TWL). No matching title for "The Political Strategies of the Kachhwaha Rajputs" in Indian Historical Review, TWL, or Google Scholar.
Singh, Rajendra (2010). The Kachhwaha Dynasty: History and Heritage. Oxford University Press. pp. 45–60. ISBN 978-0198066759. Invalid ISBN. No book with this title in Worldcat or Google Books.
My preliminary verdict: could be LLM-style or just lazy puffery, but inconsistent with user's writing in discussion pages; possibly some hallucinated refs. Copyvio unlikely according to Earwig. — ClaudineChionh (she/her · talk · contribs · email · global) 13:01, 29 March 2025 (UTC)
I'd say there is a very strong possibility. It looks like there was some effort to clean up the formatting as there is no obvious markdown red flags and headings look fine, but the contrast with their comments is super suspicious. I'd run each paragraph individually through GPTzero (I would but I ran out of scans this month), and see if you get any hits. Also, it is super strange (suspicous?) that in one of the earliest versions of it they added From Wikipedia, the free encyclopedia in the lead. If it is more than likely that all of it is AI I'll probably go back and decline it for LLM, and if they resubmit someone else will probably reject it for notability. Sophisticatedevening🍷(talk)14:12, 29 March 2025 (UTC)
I agree it looks somewhat generated. The language a bit stilted and artificial like a brochure almost. Who would write like that. But we probably only have a window about 2-3 years before we won't be able to tell. scope_creepTalk08:12, 30 March 2025 (UTC)
I agree, there is a big difference between how this draft is written, and how the user communicates on talk pages etc.
Oddly, though, the text (even the original version) has some punctuation, capitalisation, etc. mistakes in it, so if it is AI-generated, then AI may need some remedial English grammar lessons. -- DoubleGrazing (talk) 11:17, 30 March 2025 (UTC)
Listenbourg
Two people keep readding AI generated images to the Listenbourg article where the only source for it is two sentences in a single source. Those two details just are there to explain that the name sounds European enough that DALL-E generated vaguely European buildings when prompted with it. Can I please get another person to give their input here? I think it is frankly absurd and stupid that this is even something I have to debate with those two as it very clearly is not relevant to the topic at hand. NineOnLB (talk) 04:48, 28 March 2025 (UTC)
While I've replied on the merits of the image, I would note that the way you worded this post might be seen as WP:CANVASSING. A more neutral notification would have been ideal, such as "We are having a disagreement on Talk:Listenbourg about whether to include an AI-generated illustration. Can we please get more inputs in the discussion?" Otherwise, {{WikiProject please see}} can generate a pre-written notification message for you. Chaotic Enby (talk · contribs) 11:01, 30 March 2025 (UTC)
Possible AI-slop sources will be flagged in orange, thought I'm open to changing that color in the future if it causes issues. If you have the script, you can see it in action on those articles.
For now the list of AI sources is limited to ChatGPT (utm_source=chatgpt.com), but if you know of other chatGPT-like domains, let me know!
I found this thread after some searching from now-closed thread [6], where it was used as a telltale for LLM use. Anyway there may be some urgency for searching insource:"utm_source=chatgpt.com", because there are also bots that go around stripping off utm-source junk from urls and we want to catch it before it is cleaned away. Currently I'm seeing about 1400 of them. —David Eppstein (talk) 21:43, 26 April 2025 (UTC)
But we don't want to just strip it out. We want to find it and check that the text added with it is accurate and not an AI hallucination. Stripping it out would prevent us from finding it. —David Eppstein (talk) 22:57, 26 April 2025 (UTC)
I just became a NPP reviewer and have been messing around with it, and I just ran into some article by the same author whose sources I can't access at all (they're offline mostly, but the ones which have links are mostly deadlinks). I'm not going to link it because it's probably not AI, but I just realized that NPP reviewers are supposed to prevent hoaxes and suchlike, but for articles with mostly offline sources, especially those in different languages, there's no real good way to tell if an article is AI without knowledge of the subject matter. Should (or does) NPP have some guidance on this? Mrfoogles (talk) 15:59, 28 April 2025 (UTC)
If everything is offline (and several different websites are cited) then either it's AI or all the servers are affected by the current Iberian blackout. Flounder fillet (talk) 20:23, 28 April 2025 (UTC)
No, I meant books, not dead links. Also, I'm guessing you looked through my contributions history, but you've gotten the wrong one. Mrfoogles (talk) 01:31, 30 April 2025 (UTC)
A first step is checking if the books exist. Not to say that AI can't pretend it's using a real book, but if the book doesn't exist that's a strong indicator. CMD (talk) 03:53, 30 April 2025 (UTC)
The issue of llms has been discussed far more widely than this WikiProject, in very broad community forums. Things are a bit scattered, but there should be a central repository for the community directly in the Wikipedia space. CMD (talk) 23:04, 1 May 2025 (UTC)
It doesn't appear that WP:LLM is that "repository", or any kind of repository. It would rather be the case that this WikiProject is the central hub of interest in this topic on Wikipedia. The breadth of forums that have discussed LLMs and AI did not translate into breadth of support for the essay such that it might become anything other than an ordinary essay. At the same time, Wikipedia:Artificial intelligence is an information page also covering LLMs. —Alalch E.23:56, 1 May 2025 (UTC)
Hello, a new user has begun editing and on their user page says that they "extensively utilize BIDAI (Boundaryless Information & Data Analysis Intelligence), an advanced analytical system engineered by EIF." I've found their edits to be extremely unproductive and have warned them of such, but I was wondering if there is a standard approach for dealing with such accounts? Reporting without warning or discussion seems extreme, but the potential for this user to cause significant damage to Wikipedia is also very real. I didn't see a clear-cut policy, but I also admittedly didn't look to deep. Thanks. Vegantics (talk) 14:29, 22 May 2025 (UTC)
We don't specifically have policies for this yet (we still don't have a general AI-use policy), but the course of action for unproductive AI-using editors has usually been to report them to ANI. Chaotic Enby (talk · contribs) 14:37, 22 May 2025 (UTC)
Thanks @Chaotic Enby. I'll see if they respond to my Talk page comments/continue editing and will plan to report if they continue this disruptive pattern. Vegantics (talk) 14:39, 22 May 2025 (UTC)
I believe the obvious lack of any meaningful human oversight means this Spledia (talk·contribs) is merely acting as a facade for a computer program, and that their account is thus in effect a disguised bot account. I've suggested they request approval via the normal bot approval process. Given their past editing record, I think they have a mountain to climb with this, but the bot approval process seems like a good way to deal with this kind of blatant automated editing. In the meantime, I've blocked them from editing or creating article content. — The Anome (talk) 05:50, 23 May 2025 (UTC)
Collapsible templates
I've created the {{Collapse AI top}} and {{Collapse AI bottom}} templates that can be used for collapsing (hatting) disruptive talk page discussions that contain LLM-generated text. The {{cait}} and {{caib}} shortcuts are easier to use than the full template names. For an example of the template in action, see Talk:Ark of the Covenant.
The benefits of these AI-focused templates over generic collapsible templates like {{hat}} and {{hab}} are the convenient standardized message and the fact that transclusions of these templates can be tracked to monitor the extent of disruptive LLM use on talk pages.
I've (hopefully) deleted all articles I can find created by M1rrorCr0ss, but (a) I'm not absolutely sure I've got them all, and (b) there are still the huge number of redirects and an unknown amount of garbage content inserted into other, legitimate, articles. Are there any tools for digging this sort of thing out, to allow root-and-branch removal of contributions by an editor? — The Anome (talk) 11:09, 22 May 2025 (UTC)
Should this talkpage be considered the LLM noticeboard (perhaps adding a couple of redirects like WP:LLMN and Wikipedia:Large language models/Noticeboard?)? If not, should one be made? I wonder because I came across Zaida, Khyber Pakhtunkhwa and wanted someone more familiar with LLM to take a look, though I did find a maintenance template I added to the article. Gråbergs Gråa Sång (talk) 05:31, 24 May 2025 (UTC)
To facilitate searching for specific discussions in the archives, I suggest the active participants on this talk page should consider if it wants to keep project discussion separate from discussions of specific situations. isaacl (talk) 15:41, 24 May 2025 (UTC)
That could also be a good alternative, assuming there are too many discussions and searching them ends up overwhelming. However, some discussions of specific situations can easily end up broadening in scope, so a separation between them might not always be practical. Chaotic Enby (talk · contribs) 15:46, 24 May 2025 (UTC)
I do think that a separately maintained page will be better, because I can only see the issue grow in size in future. —CX Zoom[he/him](let's talk • {C•X})17:57, 24 May 2025 (UTC)
Would it be possible to create a bot that would check new articles, follow all embedded links, such as citation links, and attempt to fetch them? 404-ing and similar reference links are an obvious sign of lazy AI slop, and it would be easy to catch these early using this, and to tag articles for examination by editors. It could also try to check the linked references for at least some reseblance to the subject of the article: either through simple text comparison, or a ML method such as comparing embeddings (of which text comparison is a trivial example). It would obviously not detect sophisticated AI slop, but that's another issue entirely. The obvious problem is the anti-crawler features of websites themselves that would tend to block accesses by the bot. Are there any services that can provide this kind of crawler access to third party sites in an ethical way, for example via a WMF-brokered use-whitelisted API obtained via an organization like Google, Cloudflare, Microsoft, Kagi ([7]) or the Internet Archive who have generally unrestricted access to crawling (something like, say, Google's "Fetch as Google" service)? — The Anome (talk) 10:42, 23 May 2025 (UTC)
See also this: https://news.ycombinator.com/item?id=23149841 While slow, the IA's fetch would be ideal for this purpose. Combined with a cache, it would be highly effective. It doesn't really matter if it takes several minutes to do a fetch, for the purposes of bots, which can take as long as they like. Because it would get a lot of hits, it would probably have to be a service agreeement with the IA to prevent it being rate-limited or blocked by them. The IA also seems to offer an API: https://archive.org/help/wayback_api.php — The Anome (talk) 11:24, 23 May 2025 (UTC)
Some AI generated content possibly goes under the radar. So, this bot proposal is a good idea. But this will only be good for new articles, which needs to undergo patrolling, so there is already some human supervision. For AI editors expanding existing articles with fake references, bot would need to check every article that has seen a recent edit. —CX Zoom[he/him](let's talk • {C•X})12:49, 23 May 2025 (UTC)
Absolutely. It will only catch the very dumbest AI slop content, but it appears that is currently low-hanging fruit, and still worth doing. I really like the idea of a content cache for already-fetched reference content; automated checking of references is a really promising research area, and one, I think, where using LLMs is entirely valid, if it is used with the correct threshold settings, so that it is more sceptical than the average human reviewer, and bad references can either be flagged as wholly bad (naive slop detection) or simply questionable (detecting either superior-quality slop, vandalism, or mediocre human contributions), and human review can then take over. — The Anome (talk) 13:30, 23 May 2025 (UTC)
link-dispenser.toolforge.org (a tool I wrote) also exists to check if a link is dead, it directly makes requests instead of routing through IA since IA heavily ratelimits Toolforge. Sohom (talk) 19:23, 30 May 2025 (UTC)
Has the "AI images in non-AI contexts" list served its purpose?
Wikipedia:WikiProject AI Cleanup/AI images in non-AI contexts has been documenting reasons given for removing AI-generated images from Wikipedia articles, since 2023. Is there any reason to continue keeping track of this, now that WP:AIIMAGES has become policy? I assume the list page was created to help guide that eventual policy with organic examples from across Wikipedia, which would mean it was no longer really needed. Belbury (talk) 11:37, 12 May 2025 (UTC)
Yep, most of them have been deleted, and "what to do" is much clearer with the policy. Borderline cases (which will be less frequent, but will certainly happen) can be discussed on this very noticeboard. Chaotic Enby (talk · contribs) 14:06, 12 May 2025 (UTC)
All of the edits made by new user User:1january2000 the past few days and the fast rate at which they have been made considering the amount seem to be almost entirely A.I.-generated in volume, with many of the sources they've cited seeming to not actually exist, although referenced as if real. I am not sure what to do about this, but this seems like the right place to report it. Hellginner (talk) 17:38, 6 June 2025 (UTC)
My general approach with articles that are mostly or all LLM hallucinations, particularly if a chunk of references are clearly made up sources, is to tag them for speedy deletion as hoaxes with {{db-hoax}}. As that template doesn't seem to have a comments or rationale field, I usually add in my analysis and rationale as an HTML comment too. Cheers, SunloungerFrog (talk) 18:29, 6 June 2025 (UTC)
I have already taken care of a few edits by the same user on "Millennium celebrations" (section about Rio and South Georgia) which cited nonexistent sources from Folha de Spaulo, British Antarctic Survey among others. Ramkarlo82 (V • T • C) 01:25, 7 June 2025 (UTC)
A user on Talk:Bidirectional search alerted me to a problem with mass additions of content with hallucinated fake references by User:Noshin Nawal on bidirectional search. I have reverted the article to a version before the additions, and Noshin Nawal has not contributed to any other article, but I thought I'd leave this here in case it sounds familiar to anyone or might be helpful as a record of this action. —David Eppstein (talk) 22:06, 8 June 2025 (UTC)
ToneCheck community call/discussion
Hi hi, the team behind Tone Check, a feature that will use AI prompt people adding promotional, derogatory, or otherwise subjective language to consider "neutralizing" the tone of what they are writing while they are in the editor, will be hosting a community consultation tomorrow on the Wikimedia Discord voice channels from 16:00 UTC to 17:00 UTC. Folks interested in listening in joining in, asking questions should join the Wikimedia Discord server and subscribe to this eventSohom (talk) 19:13, 9 June 2025 (UTC)
I am a newcomer and I don't know how these are handled. What should be done about this? I genuinely don't think the article is a good fit for an encyclopedia, and checking/reworking everything that was included in the linked revision is a huge chore. I couldn't verify most of the sources used. I don't know if they're real, though I manage to find at least one of them. MeowsyCat99 (talk) 13:14, 13 June 2025 (UTC)
Heavily LLM-generated and clearly not verified. I'm willing to put it up for AfD and advocate for TNT as I don't think attempting to salvage that level of generated content is worth any editor's time, not to mention other fundamental issues with the topic. Will wait a short time to see if any editors have a better suggestion. fifteen thousand two hundred twenty four (talk) 14:13, 13 June 2025 (UTC)
My general approach now is to strip out made up sources and those that don't corroborate the sentence or paragraph they're attached to, and then send the article to draft with a reason of LLM-written text. I have also tried nominating for speedy deletion as {{db-g3}} (hoax/vandalism) if it is particularly bad. In this case I'd probably try the draftify approach: I note that the creating editor is part of this WikiEd course so would probably notify the course leader / WikiEd person too. Cheers, SunloungerFrog (talk) 14:34, 13 June 2025 (UTC)
I would normally still advocate deletion to avoid other editors unknowingly getting caught up in LLM-cruft when trying to improve other's drafts, but will give draftifying a try this time.
FWIW, here are the three prompts I used from Gemini 2.5 Flash:
1) Can you generate an updated economic summary using 2024 data for Guyana using the format below, and providing referenced sources for each data point that could be integrated into the Wikipedia page for it located at
2) Can you also provide in Wikipedia format the list of references in your prior answer, also including verified working http links to webpages for each one?
3) Can you
1) find an alternative source than the website tradingeconomics.com for that reference, and if you cannot, remove that data and reference as it is blacklisted by Wikipedia
2) and then provide a combination of the last two answers as a single body of Wikipedia text markup , modeled on the format below, but integrating the data you have just collated in the past two answers. Please double check that both the data and coding for Wikipedia markup are accurate.
And then I made hand-tweaks of a few things that weren't perfect.
Is there a Wikipedia good-faith-AI crew collating efforts like this?
It makes no sense to have the world's data centers regenerating the same kinds of outputs afresh when efforts could be strategically coordinated to flow the data to Wikipedia (among those inclined to do so).... Vikramsurya (talk) 17:02, 19 June 2025 (UTC)
The problem is this, from your edit summary Data needs full verification but preliminary suggests it's accurate. You should only make edits that you have already fully verified are borne out by the sources, not just a vague suggestion that they're probably accurate. There are also three random inline citations on a line by themselves after the Imports bullet, and there's something wrong with the formatting of ref 57. Cheers, SunloungerFrog (talk) 17:25, 19 June 2025 (UTC)
PPP sources are broken, the sites list the data as being both for Guyana and Chad. Under "arable land" the hectare claim is not found in the source. Under "labor force participation" the rate in the source is 49.6%, not 56.4%. Under "industrial production" neither source mentions crude petroleum, gold, timber, or textiles.
Possibly some hallucination, but sourcing misattribution has certainly occurred, which can be viewed as better or worse. The arable land claim of 420,000 hectares (but not "more than") is the exact figure in Wolfram's database, but the prompt requested "working http links to webpages", so the model's pattern contained a link, even if wrong. fifteen thousand two hundred twenty four (talk) 04:39, 20 June 2025 (UTC)
Misattribution and hallucination are really the same issue, the AI is finding words and numbers that fit the pattern it develops. CMD (talk) 05:31, 20 June 2025 (UTC)
I have a question - when did you think the verification by other editors would occur? If I was watching the page and started to check and found more than a couple of errors, I would just revert the whole edit with a request not to submit error-strewn material. Why? Because I would judge that the edit overall could not be trusted if there were already this many faults and I wasn't going to waste my time looking further. This is something that happens all the time: we are all volunteers who shouldn't be making work for each other like this. That doesn't mean using an LLM is bad. It's saved you time doing some of the formatting. That frees you up to do what the LLMs are bad at, which is fine-grained fact-checking of reliable sources. OsFish (talk) 05:44, 20 June 2025 (UTC)
Royal Gardens of Monza
I'm not super familiar with the process here, but Royal Gardens of Monza seems like it might be AI generated to me - two of the books it cites have ISBNs with invalid checksums, the third doesn't seem to resolve to an actual book anyways, it cites dead URLs despite an access date of yesterday, and uses some invalid formatting in the "Design and features" heading. The author has also had a draft declined at AFC for being LLM-generated before. ScalarFactor (talk) 23:07, 21 June 2025 (UTC)
These are, as far as I am aware, translated by editors with dual fluency. All go through AfC and are tagged as necessary by AfC reviewers. @David Gerard, do you have any specific problems with any of them? If so, please do raise them (and maybe also with the AfC reviewer), but in general I believe these aren't any more of an issue than any other translated article. -- asilvering (talk) 03:45, 1 July 2025 (UTC)
For the sake of transparency, this editor asked me on Tumblr what should be done about this situation, and I told her that she could report it to this noticeboard (and clarified that the report would not be seen as casting aspersions). Chaotic Enby (talk · contribs) 13:52, 1 July 2025 (UTC)
Alright duely noted and thanks for bringing this up
I understand the concern regarding the formatting style and the tagged AI related article. I ackwnoledge that in some of my previous articles I used the bullet point format as a way of organising my article clearly but after this review I will surely work on that.
If there is any area my edits has fallen short, I sincerely apologise and will make nessesary corrections. I appreciate your feedback Jessephu (talk) 18:01, 1 July 2025 (UTC)
The bullet-point format, while not ideal, is not the main issue at hand – your response doesn't answer the question of whether you were using AI or not. While that is not explicitly disallowed either, it is something that you should ideally be transparent about, especially given the editorializing and verifiability issues in some of your articles. Chaotic Enby (talk · contribs) 18:30, 1 July 2025 (UTC)
Thank you for the feedback. Yes, I use AI to sometimes assist with drafting, but I do make sure to review and edit the content to ensure accuracy. Jessephu (talk) 03:07, 2 July 2025 (UTC)
You created National Association of Kwara State Students on 21 April. The "Voice of Nigeria" source 404s, the "KSSB:::History" source is repeated twice for for separate claims and fails to support either, the "Ibrahim Wakeel Lekan 'Hon. Minister' Emerges as NAKSS President" source also does not support the accompanying text. Neither of the two provided sources support the subjects notability. The article is unencyclopedic in tone and substance, and is written like an essay. I have serious doubts concerning your claim that you review content for accuracy and have draftified that article. fifteen thousand two hundred twenty four (talk) 07:57, 2 July 2025 (UTC)
i do make sure to review....... But the ones mentioned here could be a mistake from my end, currently going through articles listed here to correct errors. Will do well to strictly cross check thoroughly. Jessephu (talk) 08:10, 2 July 2025 (UTC)
I came across a supposed stub expansion to an article on my watchlist, Formby Lighthouse. It seemed to be largely generated by LLM, with all its accompanying problems (flowery text, content not supported by sources, etc.), so I reverted it.
Just wanted to quickly explain the process I’ve been following: I usually start by Googling for sources based on the requirement. I read through them once, pick out key points or keywords, and then rewrite the content in my own words. After that, I use ChatGPT or other LLM to help refine what I’ve written and organize it the way I want. I also provide the source links at that stage. Once the content is cleaned up, I move it over to Wikipedia.
Since everything was based on the links I gave, I assumed nothing unrelated or unsourced was getting in. But after your observation, I decided to test it. I asked GPT, “Where did this particular sentence come from? Is it from the data I gave you?” and it replied, “No, it’s not from the data you provided.” So clearly, GPT can sometimes introduce its own info beyond what I input.
Thanks again for pointing this out. I’ll go back and review the articles I’ve worked on. If I find anything that doesn’t have a solid source, I’ll either add one or remove the sentence. I’d appreciate it if I could have two weeks to go through everything properly. Yaswanthgadu.21 (talk) 07:52, 6 July 2025 (UTC)
I'll be blunt: it would be far preferable if you self-reverted all the edits you've made in this way, and started from scratch, because then you know you can be confident in the content, language and sourcing. Please do that instead. Cheers, SunloungerFrog (talk) 08:47, 6 July 2025 (UTC)
I agree. Reverting all of the edits you made in this way and redoing them by hand would be preferable on every level. If you want to organize your writing the way you want, organize it yourself. Stepwise Continuous Dysfunction (talk) 16:35, 6 July 2025 (UTC)
Possible new idea for WP:AITELLS: non-breaking spaces in dates
Over the past few weeks, I've been noticing a ton of pages showing up in Category:CS1 errors: invisible characters with non-breaking spaces in reference dates (also causing CS1 date errors). I've been trying to figure out where these are coming from, and I'm leaning towards it being another AI thing -- see this draft, which has various other AI hallmarks. Jay8g [V•T•E] 20:36, 7 July 2025 (UTC)
For the interested
A German newspaper [9] had an AI/human team check articles on German WP, and found that there are many WP-articles that contain errors and have outdated information, and the number of editors are not that many. Apparently this didn't use to be the case, unclear when it changed.[sarcasm]
Anyway, this was interesting:
"Can artificial intelligence replace the online encyclopedia? Not at the moment. The FAS study also shows this: When Wikipedia and artificial intelligence disagreed, the AI wasn't more often right than Wikipedia. Sometimes, the AI even correctly criticized a sentence, but also provided false facts itself. That's why human review was so important. At the same time, most AI models are also trained on Wikipedia articles. The AI has therefore very likely overlooked some errors because it learned inaccurate information from Wikipedia."Gråbergs Gråa Sång (talk) 09:47, 8 July 2025 (UTC)
More signs of LLM use from my recent AfC patrolling
For the past month I've been participating in the WP:AFCJUN25 backlog drive, and oh man, I've been finding a LOT of AI slop in the submission queue. I've found a few more telltale signs of LLM use that should probably be added to WP:AICATCH:
(oh god, these bulleted lists are exactly the sort of thing ChatGPT does...)
Red links in the See also section — often these are for generic terms that sound like they could be articles. Makes me wonder if an actually practical use of ChatGPT would be to suggest new article titles... as long as you write the article in your own words. I'm just spitballing here.
Fake categories, i.e. red links that sound plausible, but don't currently exist in our category system.
Thin spaces? Maybe? I've been encountering a surprisingly high number of Unicode thin space characters, and I'm wondering if there's some chatbot that tends to use them in their output, because I don't know of any common keyboard layouts that let you type them (aside from custom layouts like the one I use, but it seems vanishingly unlikely that some random user with 2 edits is using one of those).
Another one: excessive use of parentheses any time a term with an acronym show up, even if the acronym in the parentheses is never used again in the article. Sometimes it even does it twice: Draft:Saetbyol-4—pythoncoder (talk | contribs)19:15, 8 July 2025 (UTC)
ChatGPT likes to generate malformed AfC templates (which breaks the submission and automatically creates a broken Decline template).
An examples of this..
{{Draft topics|biography|south-asia}}
:{{AfC topic|other}}
:{{AfC submission|||ts=20250708193354|u=RsnirobKhan|ns=2}}
:{{AFC submission|d|ts=2025-06-07T00:00:00Z}}
:{{AFC submission|d|ts=19:32, 8 July 2025 (UTC)}}qcne(talk)19:40, 8 July 2025 (UTC)
Possible disruptive LLM usage by User:Pseudopolybius
I'm not sure if this is the right place to report this kind of thing.
@Ca I've added a couple of examples I've come across in my AfC work. A thought: the drafts linked as examples will be deleted under G13 in six months- should we take a copy as a subpage under this project? qcne(talk)15:32, 12 July 2025 (UTC)
I just found what appears to be an LLM-falsified reference which came to my attention because it raised the citation error "Check |isbn= value: checksum", added in Special:Diff/1298078281. Searching shows some 300 instances of this error string; it may be worth checking whether others are equally bogus. —David Eppstein (talk) 06:43, 6 July 2025 (UTC)
The following is a closed discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. Editors desiring to contest the closing decision should consider a move reviewafter discussing it on the closer's talk page. No further edits should be made to this discussion.
– The word "Catchphrases" insinuate that the page contains specific phrases or words that can catch AI-writing which were true in the essay's inception but is no longer true in its current form; the entries are too broad and wide-reaching to fit the definition. Catalk to me!13:11, 10 July 2025 (UTC)
Support as the page also lists punctuation and broken formatting. The current title presumably intends catchphrase as "a signature phrase spoken regularly by an individual", though, rather than "a phrase with which to catch someone". Belbury (talk) 16:01, 10 July 2025 (UTC)
Support. I'm glad to see this essay graduate from the development stage. I have a weak preference for "LLM" in the title, as it would be more specific than "AI". — Newslingertalk17:29, 10 July 2025 (UTC)
Support - I don't know that we need to specify "LLM", since "AI writing" is ubiquitous with LLMs and probably more recognizable to editors who are not familiar with technical terminology surrounding generative AI. - ZLEAT\C20:24, 10 July 2025 (UTC)
I hate to be contrarian, because obviously moving the page is correct, but I am opposing over the "AI" vs "LLM" split. While referring to them as AI is indeed commonplace in journalism, scholarly sources tend to prefer referring to generative tools by the underlying technology,[1][2][3] meaning in a technical discussion of their behavior it's perhaps better to use the latter phrase.
This has less to do with any Wikipedian rationale, but I want to point out that we are unfortunately colluding with the marketing of these things by referring to them with such a high-prestige term. People come to this site every day and in good faith make use of LLMs on the understanding that they are intelligent and potentially smarter than them, when they are not. The language we use on the site should reflect the fact that we address these things as tools, and agree with the scholarly (and Wikipedian) consensus that these things are generally unreliable when not deeply scrutinized.
Obviously the fate of the universe doesn't rest on the name of this one Wikipedia page. I just want everyone who feels apathetic about the name change to understand the subtext and how we're deviating from academic terminology and replacing it with a trendier term born out of a speculative market, which may in time become seen ubiquitously as inaccurate. Altoids0 (talk) 04:24, 12 July 2025 (UTC)
^"Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks". arXiv. doi:10.48550/arXiv.2506.20548.
^"LLM-based NLG Evaluation: Current Status and Challenges". Computational Linguistics. doi:10.1162/coli_a_00561.
^"A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions". Computational Linguistics. doi:10.1162/coli_a_00549.
The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.