This is an archive of past discussions with User:Citation bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
The bot is not a human being, it cannot always deal with rubbish. Technically, DNS is not case sensitive, so it is not a bug. I guess we can add a check for "starts with www". AManWithNoPlan (talk) 14:08, 9 August 2018 (UTC)
$data='rubbish';
$data = $bot->process($data);
// $data now contains 'new rubbish'
Adding citeseerx url where citeseerx parameter exists
In edit one the bot added citeseerx. In edit two it added the url to the citeseerx paper. This seems like unnecissary, and it seems weird that a second run diectly after the previous run results in "more", and not everything is added during the first run.
GIGO it is. That’s the crossref ISBN. https://api.crossref.org/v1/works/http://dx.doi.org/10.1145/1358628.1358871 Someone changed the ISBN 10 to a 13 and forgot that the check digit often changes. It certainly is never X! Since ISBN is a parity check and not an ECC type check we have no way of knowing what the error is. Also a few books have been assigned invalid ISBN by publishers over the years. That’s minor compared to the using the same ISBN for multiple books—which is one reason there is no {{cite ISBN}}. AManWithNoPlan (talk) 13:41, 16 August 2018 (UTC)
The bot added | doi = 10.4244/ | doi-broken-date = 2018-08-16 in this edit. Seem wrong. Shouldn't we validate doi's before adding to avoid garbage? (t) Josve05a (c)07:41, 16 August 2018 (UTC)
Personally, I prefer the garbage addition in cases like this. It make humans look for the real, non-garbage doi. Headbomb {t · c · p · b}12:11, 16 August 2018 (UTC)
I should note several things. That is a standards compliant DOI-a suffix of zero length is allowed. That is the DOI according to pubmed. It is clearly rubbish though. There has been a long discussion on this in the past and generally people seem to like dead DOIs since often you can google the string and find them. Although perhaps an empty suffix is pretty useless. AManWithNoPlan (talk) 13:16, 16 August 2018 (UTC)
Looks like the problem is that we've exhausted our 5000 queries for the day. I'll request a second key for testing, which may go some way to help. Martin(Smith609 – Talk)05:57, 30 July 2018 (UTC)
Expand citation:
- Checking AdsAbs database
x PHP_ADSABSAPIKEY environment variable not set. Cannot query AdsAbs. [..> yadsabs]: no record retrieved. [..> rossref]
- Checking CrossRef database for doi. [..> indpmid]
- Searching PubMed... nothing found.
Possibly related, there seems to be something weird with that log beyond the key not being set.
..> yadsabs]: no record retrieved. [..> Crossref] (stray y? Missing C?). Headbomb {t · c · p · b}19:49, 14 August 2018 (UTC)
It also seems to not edit the majority of articles it could edit. I'll investigate further, but it seems running the bot on individual articles in a category yield more edits. BTW, the new API for the single page run is beautiful. Headbomb {t · c · p · b}20:56, 17 August 2018 (UTC)
I had them change the "Expand citations" link off to the left size to that mode a while back. At least a "&slow=1" option should probably be added. AManWithNoPlan (talk) 21:09, 17 August 2018 (UTC)
I shall take the fact that your complaint is "The formatting of the logfile offends my sensibilities" as a compliment to the current state of the bot. AManWithNoPlan (talk) 21:11, 17 August 2018 (UTC)
Just following this discussion, when I run it like this it only analyzes 1 page in the category. Retrying it makes it run on one more article etc etc. Redalert2fan (talk) 19:47, 18 August 2018 (UTC)
I'm not sure that I ever envisaged this page being used by actual humans! Glad that it's coming in useful. As a treat, it is now in glorious technicolour (-: Martin(Smith609 – Talk)14:21, 21 August 2018 (UTC)
category output is ugly and generally useless
Category/Slow modes now fixed and working like a charm, but API is still outputting pretty unreadable crap. Headbomb {t · c · p · b}15:57, 20 August 2018 (UTC)
The bot adds |class= to cite journal. |class= is a parameter that's only useful in {{cite arxiv}} (and possibly {{citation}}, although that's bad practice).
What should happen
The bot should not add |class= in {{cite journal}} or others, and should remove it when encountered. It should only add it to {{cite arxiv}}, and only keep it in {{cite arxiv}} and {{citation}}. But if there's a |doi= in {{citation}}, remove |class=.
Italic markup should be removed from 'book' citations (|work= or aliases not set) but a cs1|2 template with a wikilinked |title= and without |url= is perfectly legitimate; there is an article Modern Chess Openings so:
is a correctly formed citation. This form is supported by the cs1|2 documentation at Template:Citation#Title which read.
Do not misunderstand my point here as a contradiction of what I wrote elsewhere. When the whole title is wikilinked, that is acceptable. But, when the title looks like this one from the other discussion, wikilinking is inappropriate:
|title=A definitive abelisaurid theropod dinosaur from the early Late Cretaceous of [[Patagonia]]
These links are almost always wrong. They are things like title=[[Trump]] [[Revealed]]: The Definitive Biography of the 45th [[President]]. AManWithNoPlan (talk) 18:01, 19 August 2018 (UTC)
That would link the entire content of the |title=. In this case that is perhaps an acceptable solution, but not in all. (t) Josve05a (c)01:06, 20 August 2018 (UTC)
There is no reason to change these parameters from one legitimate form to another legitimate for except to unify the form within the template. This applies to all multi word enumerated parameters: |author-mask6=, |interviewer5-link=, etc
this is because the citation templates have sooo many parameter choices. This pull now will add all of them and will also add a check to make sure that we notice any new ones. AManWithNoPlan (talk) 17:56, 19 August 2018 (UTC)
It is probably of little consequence, but this bot is changing ISBN numbers from the number given in the actual frontispiece of the books concerned (or at least the books I have quoted in articles). For example, the ISBN number given in "Verticordia, the turner of hearts" in Verticordia subg. Verticordia is 1 876268 46 8, but the bot has changed it to 978-1-876268-46-6. (The former number is used in more than 100 Verticordia articles.) Similarly it has changed 0 646 402439 to 978-0646402437 in Melaleuca shiressii. Both numbers seem to work but I wonder the purpose of changing. No big deal - just curious. Gderrin (talk) 02:52, 20 August 2018 (UTC)
I don't really understand why the bot would remove parameters like that there. Not only is it cosmetic, the edit doesn't make sense. --Izno (talk) 20:56, 19 August 2018 (UTC)
It is done to discourage the use of the generic and often misused |work=. In almost all cases, the |journal=, |website=, etc. are better choices. Also, in this case |publisher= is already set to the wrong thing, it should use |website= instead, which is an alias for |work=. AManWithNoPlan (talk) 21:17, 19 August 2018 (UTC)
Here is the improvement. https://github.com/ms609/citation-bot/pull/614 If an alias of |work= is filled in, the the empty |work= will be removed to discourage future adding of it which would be an error. If an alias of |work= is not set, then it will be changed to a template specific parameter if relevant: for example inf {{cite journal}} the empty |work= will be changed to an empty |journal=. AManWithNoPlan (talk) 21:37, 19 August 2018 (UTC)
WP:COSMETICBOT compliance is good for large-scale runs, but since the bot is user activated, it's not that big a deal if the bot does minor cleanup like that on select, user-requested pages. There's tons of cosmetic things (e.g. |page=→|pages=), which would in theory be nice to categorize as cosmetic, but this realistically would only be an issue if you run the bot on categories. So maybe in 'category mode', it should skip the cosmetic stuff. Seems a like a very high time cost for little payoff though at this time, but would matter if the bot started to edit on its own. Headbomb {t · c · p · b}00:34, 20 August 2018 (UTC)
I'm pretty sure the bot used to mention who activated it at some point. This will be particularly important when the Category api will be invoked. Headbomb {t · c · p · b}12:49, 21 August 2018 (UTC)
Actually not that simple. Search needs to be non greedy. Would need to have some type of bogus parameter set to either completely revert at end or just remove the specific flag. Lots of testing, and that’s the pain. AManWithNoPlan (talk) 04:54, 20 August 2018 (UTC)
Point is, this wouldn't be an epic 3-months long development process. Yes some thought needs to be put into it, but the codebase for recognizing stuff from URLs is relatively mature. If that gets a hit, whatever you're going to have will beat the raw url. And it'll save doing [9] before running the bot to get [10]. Headbomb {t · c · p · b}04:58, 20 August 2018 (UTC)
Laying in bed it came to me:
search for and change to cite web, but with extra CITATION_BOT parameter that is encrypted url
do normal bot stuff
when writing out look for CITATION_BOT flag and If no title is set then just decrypt url and echo that. If title is set remove special flag.
No opinion on the technical side of things, but that's pretty much what I suggested above, so I'm all for it. Headbomb {t · c · p · b}19:35, 21 August 2018 (UTC)
Character escaping seems off, my understand is that '\>' could be '>'. That or all the other '<' or '>' need to be escaped. Headbomb {t · c · p · b}19:43, 21 August 2018 (UTC)
This works, so, sooooooooooo well it's insane. Whenever it fails, it's because there's an actually problem with the url/identifiers. This is amazing. Headbomb {t · c · p · b}14:18, 22 August 2018 (UTC)
Deleted google books URL listed even though nothing is deleted
I've noticed this a few times now; there's an example visible at
However, it would be very useful to have the bot run on the main pages associated with those talk pages in the category. I.e. take all pages in Category:Draft-Class Astronomy articles, convert to main pages, and run the bot on those. Perhaps via something like
Possible solution [Suggestion 1]: modify function category_members in WikiFunctions.php so that it removes the namespace from all pages that it queues up to visit. I can't think of many situations when one would want to run the bot on pages outside the main namespace, and it could be disruptive if someones deliberately included an incorrectly formatted citation in a discussion.
Alternative solution [Suggestion 2]: Add the main namespace equivalent of each Talk page to the array of pages to be visited.
@Smith609: that's not very useful though (Mainspace articles associated with draft talk? What'd be the use of that?). What would be useful is if it visited the [[Foobar:<...>]] pages associated with [[Foobar talk:<...>]]. Headbomb {t · c · p · b}11:17, 22 August 2018 (UTC)
This was activated via https://tools.wmflabs.org/citations/doibot.php?edit=toolbar&slow=1&user=Headbomb&cat=Particle_physics%20stubs
And the edit summary should reflect this. Possibly even deny category runs without a &user= value specified. Headbomb {t · c · p · b}01:22, 23 August 2018 (UTC)
Category.php will now show a note where the username is invalid or not specified.
As you're making more use of it I'll add a user-friendly interface to doibot.html in the future, to save manually editing URLs... Martin(Smith609 – Talk)14:27, 24 August 2018 (UTC)
Interesting, although I don't know that it's very user-friendly to do that. Couldn't easily run it on say, Book:Canada or User:Headbomb/Sandbox4. And I'm not sure it could handle having 2000 articles shoved in as article1|article2|...|article1999|article2000. Headbomb {t · c · p · b}20:58, 24 August 2018 (UTC)
I have a similar one, the Dutch "en", which means "and", should not be capitalised to "En" either, as done here at the bottom:[22]FunkMonk (talk) 04:13, 24 August 2018 (UTC)
This is more a question than a bug report, is it intentional that all words that are normally not capitalised, such as "the, as, in, an, of" etc., in journal names are now capitalised? And now it seems another bot is changing some of this back? FunkMonk (talk) 16:14, 23 August 2018 (UTC)
Journals should use title case, so if something doesn't use title case, that would be an issue. Converting to title case is fine and encouraged. Note that there was a bug in Citation bot that capitalized journals by mistake for a little while, that's fixed now. Headbomb {t · c · p · b}16:18, 23 August 2018 (UTC)
[24] fix by JCW-CleanerBot (running Citation bot on the page did nothing)
To be clear, I tried running Citation bot on this today, and it failed to update the caps. So I did it via JCW-CleanerBot instead. It just so happened that Citation bot made the last edit before JCW-CleanerBot. Headbomb {t · c · p · b}23:57, 23 August 2018 (UTC)
the gadget api where you are editing the file works right. Also if there had been more to do on the page then it would have done the modifications-AManWithNoPlan (talk) 01:56, 24 August 2018 (UTC)
*
{{cite arxiv|last=Meyertholen|first=Andrew|last2=Di Ventra|first2=Massimiliano|date=2013-05-31|title=Quantum Analogies in Ionic Transport Through Nanopores|eprint=1305.7450|class=cond-mat.mes-hall}}
switched to
{{Cite journal|last=Meyertholen|first=Andrew|last2=Di Ventra|first2=Massimiliano|date=2013-05-31|title=Quantum Analogies in Ionic Transport Through Nanopores|arxiv=1305.7450|bibcode=2013arXiv1305.7450M}}
bibcodes typically denote journals. The bot now (PR 647) won't assume that a bibcode denotes a journal, if the bibcode contains the string "arxiv". Are there any other cases that we should watch out for? Martin(Smith609 – Talk)12:00, 24 August 2018 (UTC)
#### being years. That should cover it. You could simplify it to ####hep, ####math and ####nucl for those with sub-arxivs. Actually no, that might lead to some collisions. Headbomb {t · c · p · b}12:09, 24 August 2018 (UTC)
Citation bot speeds through a category as fast as it can
What should happen
Throttle edits to 6 EPM
We can't proceed until
Feedback from maintainers
WP:BOTPOL suggests that rate for non-critical tasks. While I'm not a stickler for rules, the potential for damage is relatively high, especially in several WP:BEANS scenario (it just edited at a rate of 25 EPM on a recent run!). So implementing a per-activation throttle would be best. Headbomb {t · c · p · b}12:34, 24 August 2018 (UTC)
I've coded a throttle, but not tested it; I'd be grateful if you could keep an eye out and see whether you notice throttling in action! Martin(Smith609 – Talk)17:44, 24 August 2018 (UTC)
@Smith609: seems to work. I unleashed it on Category: CS1 maint: PMC format which had 15 very easy edits to make, and it cleared it at 6 EPM. The category API doesn't update during the run, but you do get the results after the run. Not sure if the throttle is 'smart' (edits at 6/min) or 'dumb' (processes at 6/min), but it's working. Headbomb {t · c · p · b}18:13, 24 August 2018 (UTC)
Great. The script now keeps track of when it last edited, and makes sure that this was at least 10 seconds ago, which probably makes it 'semi-smart' (as if it spent the first 50 seconds of a minute without making an edit, it could squeeze five into the last ten seconds!) Martin(Smith609 – Talk)06:38, 25 August 2018 (UTC)
If an citation has a doi and a URL with a known DOI, the bot removes the URL. It however does not remove |archive-url= (or |archiveurl=) and |archive-date= (or |archivedate=).
What should happen
If it removes |url=, also remove |archive-url= (or |archiveurl=) and |archive-date= (or |archivedate=).
In general |access-date= / |archive-date= / |archive-url= / |dead-url= / |format= / |registration= / |subscription= / |url-access= / |via= can be all be omitted if there are no url. |format= is tricky though, since it's abused for a lot of things that should be in |type= instead. Headbomb {t · c · p · b}13:13, 22 August 2018 (UTC)
{{Cite journal |last1=Levasseur |first1=David G. |last2=Sawyer |first2=J. Kanan |date=August 19, 2006 |title=Pedagogy Meets PowerPoint: A Research Review of the Effects of Computer-Generated Slides in the Classroom |url=https://www.tandfonline.com/doi/full/10.1080/15358590600763383 |url-access=subscription <!-- but archive is ungated --> |journal=Review of Communication |issn=1535-8593 |publisher=Taylor and Francis |volume=6 |issue=1–2 |pages=101–123 |doi=10.1080/15358590600763383 |archive-url=https://www.webcitation.org/6YM4kjvL0?url=http://www.tandfonline.com/doi/full/10.1080/15358590600763383 |dead-url=no |archive-date=May 7, 2015 |access-date=September 23, 2017 |quote= [quotation redacted]}}
the bot produced this:
{{Cite journal |last1=Levasseur |first1=David G. |last2=Sawyer |first2=J. Kanan |date=August 19, 2006 |title=Pedagogy Meets PowerPoint: A Research Review of the Effects of Computer-Generated Slides in the Classroom |url-access=subscription <!-- but archive is ungated --> |journal=Review of Communication |issn=1535-8593 |volume=6 |issue=1–2 |pages=101–123 |doi=10.1080/15358590600763383 |dead-url=no |quote= [quotation redacted]}}
Levasseur, David G.; Sawyer, J. Kanan (August 19, 2006). "Pedagogy Meets PowerPoint: A Research Review of the Effects of Computer-Generated Slides in the Classroom". Review of Communication. 6 (1–2): 101–123. doi:10.1080/15358590600763383. ISSN1535-8593. [quotation redacted]{{cite journal}}: |url-access= requires |url= (help); Unknown parameter |dead-url= ignored (|url-status= suggested) (help)
The bot replaced {{Cite web |url=http://www.jstor.org/stable/3744263 |website=Agricultural History}} with {{Cite journal |jstor=3744263 |journal=Agricultural History |website=Agricultural History}}
UTF-8 encoding in JSTOR data not taken into account
The bot added this from JSTOR. It both looks weird, and the |first1= doens't have a ; to display the Ó character (which should be used instead directly instead of using HTML code.
When you put stuff in your our files such as User:Josve05a/citations.js, instead of just turning on the the citation bot option, you get what you get and you don't throw a fit. AManWithNoPlan (talk) 18:03, 26 August 2018 (UTC)
I've even forgotten I did even did that. It was due to the "official script" was broken for a few weeks/months back in 2014 if I can remember. It had been working ever since, so somehting recently changed. I didn't (mean to) throw a fit, I just wanted to draw attention to that the "old way" the script worked, just broke. (t) Josve05a (c)18:06, 26 August 2018 (UTC)
I have a copy of it for the dev version so, I had to fix that myself too. The "don't throw fit" is an american phrase used to warn kids in advance that complaints will not be listen too (my using is was mostly in jest). Here are some links to the official version AManWithNoPlan (talk) 18:11, 26 August 2018 (UTC)
I hope you understand I only want to help out (with my limited knowledge) by reporting issues which are affecting me or which I'm noticing, and not trying to complain. (t) Josve05a (c)18:22, 26 August 2018 (UTC)
The Smith scripts are not really updated anymore since we got official support from media wiki. He should probably remove them actually. AManWithNoPlan (talk) 18:12, 26 August 2018 (UTC)
Thanks for the report. I've updated the Smith script, and will keep an eye out for other outdated links (I couldn't turn any others up by Google). A redirect is probably a good idea too. Martin(Smith609 – Talk)08:43, 27 August 2018 (UTC)
Please forget all variants of |publisher=Books.google.com from {{cite book}} (such as |publisher=, |work=, |website=, but also |foo=Google Books, |foo=Google, |foo=google.com etc. (t) Josve05a (c)19:28, 22 August 2018 (UTC)
It is, but it's ... a relatively spammy practice. I remove it, but I don't think it would be appropriate for the bot to remove/add it.Headbomb {t · c · p · b}20:08, 26 August 2018 (UTC)
This would prevent this sort of issue from happening [35] See line right above and including |magazine=[[Popular Astronomy (US magazine)|Popular Astronomy]]
When you add |doi= from an apparent JSTOR doi in |jstor=, first check if it is broken before adding it to |doi=. JSTOR assigns internal DOIs all the time without registering them.
> Consult APIs to expand templates
> Using pubmed API to retrieve publication details:
> Found match for pubmed identifier 11090370
> Found match for pubmed identifier 29262068
> Found match for pubmed identifier 21324708
> Found match for pubmed identifier 14718418
> Found match for pubmed identifier 26185361
> Found match for pubmed identifier 18023732
> Found match for pubmed identifier 11472968
> Found match for pubmed identifier 18032698
> Found match for pubmed identifier 28144783
> Found match for pubmed identifier 6370120
> Found match for pubmed identifier 30069044
> Found match for pubmed identifier 30069046
+ Adding volume: 560
+ Adding issue: 7718
+ Adding pmid: 30069046
> Checking PMID 30069046for more details
+ Adding doi: 10.1038/s41586-018-0394-6
+ Adding pmc: 6108322
+ Adding journal: Nature
- Dropping parameter "publisher"
- Dropping parameter "location"
> Found match for pubmed identifier 25613900
> Found match for pubmed identifier 25169055
> Found match for pubmed identifier 24812003
> Found match for pubmed identifier 21356587
> Found match for pubmed identifier 19223979
> Found match for pubmed identifier 22323207
> Found match for pubmed identifier 23378277
> Checking that DOI 10.1038/s41586-018-0394-6 is operational... DOI ok.
> Checking that DOI 10.2214/ajr.175.6.1751537 is operational... DOI ok.
> Checking that DOI 10.1016/j.ejcts.2010.12.028 is operational... DOI
@AManWithNoPlan: Was caused by an invisible NSBP, which WP:WikEd exposed when I edited the page. Removing it fixed [39] the cause of the issue locally, but [40] seems to fix the issue being triggered in the first place everywhere. Headbomb {t · c · p · b}11:54, 27 August 2018 (UTC)
I have always felt this would be good idea, but J Food is probably not Journal Food, but journal of Food or the journal of food. Every journal would be a special case. AManWithNoPlan (talk) 13:19, 27 August 2018 (UTC)
If the J. is of the end of a |journal= and the word Journal (or magazine) is not present, then it sounds to me a good bet that it should be replaced with |journal=Foo Journal. (t) Josve05a (c)13:26, 27 August 2018 (UTC)
Very, very bad idea, per WP:CONTEXTBOT, and per lack of consensus. If you want this to be done on a specific page, delete the abbreviations and run the bot again. Headbomb {t · c · p · b}13:34, 27 August 2018 (UTC)
that’s what I do. Many people would consider this upgrade to be ‘worse than vandalism’. I consider them wrong, but there would blood everywhere AManWithNoPlan (talk) 13:52, 27 August 2018 (UTC)
(We could start a (small) list, and maintain it, and adding new journals one by one...but that's way much work) (t) Josve05a (c)13:53, 27 August 2018 (UTC)
The list or journal is huuuge, and you may run awry of WP:CITEVAR by messing consistently abbreviated journals in an article to a mish-mash of abbreviated-and-not-abbreviated journals in the same article. But if you get consensus for something like this, this would be better addressed by a different, possibly new bot.Headbomb {t · c · p · b}14:10, 27 August 2018 (UTC)
There is no consensus for messing around with valid abbreviations. Fixing caps / title case is fine, but converting abbreviations to non-abbreviations is not, at least not without a strong consensus to do so.Headbomb {t · c · p · b}13:31, 27 August 2018 (UTC)
That has actually been debated and the belief at the time was that DOI was better than JSTOR and if they were exactly the same, then only list the DOI. This DOI is not owned by JSTOR, so it is not truly stable. JSTOR owns 10.2307 AManWithNoPlan (talk) 23:20, 27 August 2018 (UTC){{notabug}}
Run the bot against a cite journal template with just a bibcode and nothing happens.
What should happen
templates with bibcodes should be expanded to a full citation
Relevant diffs/links
no links because nothing happens
Replication instructions
Test here: . Bibcode:2017A&A...600A.127K. {{cite journal}}: Cite journal requires |journal= (help); Missing or empty |title= (help) This is intermittent (throttling?) and sometimes bibcodes are expanded properly. For example, five minutes ago this bibcode failed during an article edit (RS Puppis) but just now it worked right here.
it has a page number without a - character so the bot assumes a single page number and thus when it gets a range of pages it upgrades to the range. Pages are one of the few things we might blow away and replace. AManWithNoPlan (talk) 04:45, 25 August 2018 (UTC)
From memory, previous advice on this page was that if a bot should not populate a parameter (e.g. if an external database generates a false positive), the bot could be deterred by including an empty comment.
Oh dear; I wonder how much dud information has been introduced as a result! Even if there is a better way to deal with false positives, I'm not sure how we might make a transition now that a standard has been set... Martin(Smith609 – Talk)15:20, 21 August 2018 (UTC)