User talk:JL-Bot/Archive 8

This is an archive of past discussions with User:JL-Bot. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 5

Archive 6

Archive 7

Archive 8

DOI search box

See [1]. This could be templatified or hardcoded in the bot, up to you. Headbomb {t · c · p · b} 22:37, 9 August 2023 (UTC)

Done. I updated the bot to add it. -- JLaTondre (talk) 23:27, 10 August 2023 (UTC)

Not quite right though... [2] Headbomb {t · c · p · b} 00:00, 11 August 2023 (UTC)

There is no difference in what is displayed? [3] vs [4] -- JLaTondre (talk) 21:01, 11 August 2023 (UTC)

Well at the very least, two open inputbox tags is bad html/code. Headbomb {t · c · p · b} 21:05, 11 August 2023 (UTC)

I missed that. I focused on the extra line break that was added. I will fix. -- JLaTondre (talk) 22:14, 11 August 2023 (UTC)

Fixed. -- JLaTondre (talk) 13:08, 12 August 2023 (UTC)

WP:JCW/BADDOI

Legit DOIs go up to the 60000s now. So the limit should likely be bumped to 70000. Headbomb {t · c · p · b} 22:59, 9 August 2023 (UTC)

The maximum DOI is automatically calculated from the last Crossref pull. Currently, the Crossref pull is done after the dump file is processed. Unfortunately, that means if a new reference in the latest dump includes a DOI above the last limit, it will get flagged as bad. Instead of basing the Crossref pull on the existance of a new dump, I probably should just have it always execute on the 1st and 20th so it will be done before the dump is available. -- JLaTondre (talk) 23:40, 10 August 2023 (UTC)

Yeah probably. Could update all the JL-Bot/DOI stuff on those dates too, which would let me create the newest DOI redirects ahead of the full dump processing. Headbomb {t · c · p · b} 00:02, 11 August 2023 (UTC)

The DOI registrant pull will now run on the 1st and 20th of the month. -- JLaTondre (talk) 13:09, 12 August 2023 (UTC)

Doi prefix

The bot didn't do it's DOI prefix run on the 1st btw. Headbomb {t · c · p · b} 10:37, 4 September 2023 (UTC)

crossref.org errored out. I re-ran the job & crossref.org is responding this time. Results should be up in awhile. -- JLaTondre (talk) 14:21, 4 September 2023 (UTC)

How to remove orphan tag

How to remove Orphan tag from Blouse (Short Film) ? Rajmama (talk) 13:12, 21 September 2023 (UTC)

Looks like you already figured it out. But, yes, you edit the article and remove the orphan template. -- JLaTondre (talk) 23:54, 21 September 2023 (UTC)

Dec 20 bot run?

I think the bot choked or something? Normally bot runs for WP:JCW have occurred by now... Headbomb {t · c · p · b} 12:39, 24 December 2023 (UTC)

Yes, there was a hiccup on Friday. I fixed it yesterday and it has been processing since. Results are loading now. -- JLaTondre (talk) 13:59, 24 December 2023 (UTC)

Awesome. Hope you had a great festivus! And other upcoming soltice-adjacent holidays. Headbomb {t · c · p · b} 14:33, 24 December 2023 (UTC)

You too. -- JLaTondre (talk) 17:24, 24 December 2023 (UTC)

Possible new task

Could this bot be used to remove old {{ITN note}} transclusions? I found some from 2022 and one from 2021. Schierbecker (talk) 21:17, 18 January 2024 (UTC)

Probably not. The {{under construction}} removal task is based on number of days alone. For removing {{ITN note}}, it should probably be based on whether there is an open nomination vs. a strict number of days. You would be better off asking at Wikipedia:Bot requests. -- JLaTondre (talk) 22:18, 19 January 2024 (UTC)

JCW dump?

I notice the bot hasn't processed the dump yet? Normally it's done within the first 3-4 days of the month, but we're on day 6 now... 142.169.80.39 (talk) 17:53, 6 May 2024 (UTC)

There was an error while running. I have kicked off processing again, but it will take awhile to complete. -- JLaTondre (talk) 22:17, 6 May 2024 (UTC)

Featured sounds no longer active

Hi, I think that "featured sounds" went the way of "featured videos", and is no longer being tracked. See: Wikipedia:Featured sounds and Category:Historical featured content. Maybe it should be removed from the code and the documentation, next time that you're updating it? Thanks in advance! --Funandtrvl (talk) 00:37, 30 April 2024 (UTC)

Okay, thanks for the notice. I will update it. -- JLaTondre (talk) 22:48, 2 May 2024 (UTC)

Type content-featured-sounds has been removed. -- JLaTondre (talk) 21:09, 18 May 2024 (UTC)

ITN

Shouldn’t JL-bot, when updating ITN, put the ITN icon next to the article? 48JCL (talk • contribs) 21:58, 20 May 2024 (UTC)

I have added that one. It can be seen at Wikipedia:WikiProject Women in Red/Recognized content. It will show up on other relevant pages with the next run this weekend. -- JLaTondre (talk) 20:16, 22 May 2024 (UTC)

Citations May 20 Output

@Headbomb : The output for the May 20 dump is producing significant less citations than normal. For example, the A's end on page 100 this time when they typically go to 111. I am investigating to see what is going on. -- JLaTondre (talk) 19:52, 22 May 2024 (UTC)

It looks like the enwiki-20240520-pages-articles.xml.bz2 dump file is missing content. It is only 18G where the last one (20240501) was 20G. It usually increases each month so that is an unexpected (and pretty big) decrease. There are no processing errors and no changes in the expected citation templates. -- JLaTondre (talk) 20:13, 22 May 2024 (UTC)

Would 'enwiki-20240520-pages-meta-current.xml.bz2' in https://dumps.wikimedia.org/enwiki/20240520/ be of use? Or would it be similarly crippled? Headbomb {t · c · p · b} 20:18, 22 May 2024 (UTC)

It looks smaller than last month's so probably crippled too. Headbomb {t · c · p · b} 20:19, 22 May 2024 (UTC)

The 20240601 dump is still not complete. It is typically done by this time of the month so seems like there are issues. -- JLaTondre (talk) 14:11, 8 June 2024 (UTC)

Hmm, I did find this announcement. It doesn't explain why the dumps have not been completed, but sounds like there might be a format change which could impact parsing once it arrives. I use a library for the parsing so not sure if it will impacted or not. -- JLaTondre (talk) 14:16, 8 June 2024 (UTC)

Well at least it's in progress. I checked earlier this month around the 3rd and it hadn't started.

The 20240601 dump is now available. The bot is processing it & we will see how it goes... -- JLaTondre (talk) 19:52, 9 June 2024 (UTC)

Processing done. Looks good so far. Let me know if you see anything odd. -- JLaTondre (talk) 20:45, 10 June 2024 (UTC)

So far so good. Headbomb {t · c · p · b} 21:05, 10 June 2024 (UTC)

Cite tech report gone from Statistics?

This seems weird. Headbomb {t · c · p · b} 23:15, 2 August 2023 (UTC)

Short answer: The template was remamed.

Long answer: {{Cite techreport}} was moved to {{Cite tech report}} back in June. Everything managed to still work okay until Citation bot updated the template usage in the articles. The parsing is based on the "real" template name. It is smart enough to also look for redirects to the template name, but it expects the primary name to be a non-redirect.

For a short-term fix, I can update the template name being checked. For a longer-term fix, I can update it to check that a template has not been renamed before parsing. However, instead of a hard-coded list (the current ones can be found highlighted in yellow here), can it be based on catagories? Maybe use Category:Citation Style 1 templates, Category:Citation Style 2 templates, and Category:Citation Style Vancouver templates? -- JLaTondre (talk) 00:25, 6 August 2023 (UTC)

A dynamic list could work. In Category:Citation Style 1 templates, there's a sandbox and Template:Cs1 function which aren't really templates. Maybe a membership in that category + name starts with Template:Cite_...? Same for Category:Citation Style Vancouver templates. CS2 is just {{Citation}}.

Headbomb {t · c · p · b} 01:08, 6 August 2023 (UTC)

For Bluebook style, there would be {{Bluebook journal}}, {{Bluebook website}}, and {{Cite court}}. |reporter= in {{Cite court}} is equivalent to |journal= in {{Bluebook journal}}. {{Bluebook website}} is likely useless since it doesn't seem to support |journal=. Headbomb {t · c · p · b} 01:12, 6 August 2023 (UTC)

Level 4 obliterated

As a heads up, the bot converted all Level 4 vitals to Level 3 in this report yesterday. I manually reverted but wanted to give a heads up in case it keeps happening. czar 14:51, 27 July 2024 (UTC)

Looks like it's a Cewbot issue: Wikipedia talk:Vital articles#VA4 articles no longer being recognised as such czar 18:34, 27 July 2024 (UTC)

If it's been fixed, I can rerun the bot manually. Otherwise, it will get resolved in next weekend's run (assuming it has been fixed by then). -- JLaTondre (talk) 22:07, 27 July 2024 (UTC)

FM issue?

File:מכתש רמון - גלישת עננים (cropped).jpg has been dropped from WP:PHYS/RECOG here. I can't figure out why. Bot issue? Headbomb {t · c · p · b} 22:08, 10 August 2024 (UTC)

See this edit. It is no longer marked as Template:Picture of the day so was dropped from the "Picture of the day pictures" section. It is still showing in the "Featured pictures" section (last picture). -- JLaTondre (talk) 14:30, 11 August 2024 (UTC)

Wikipedia:JCW/STATS

Could it be possible to add these lines? Headbomb {t · c · p · b} 17:54, 13 August 2024 (UTC)

So "total DOIs" being the total number of times {{cite xxx|doi=}} or {{doi}} appears? The total number of {{doi}} templates is already on the page. Since it looks like both templates support |doi-access=free, I assume so, but wanted to check. And would the |doi-access=free count be all or distinct DOIs (since the other sub measurement would be distinct)? -- JLaTondre (talk) 21:33, 13 August 2024 (UTC)

My idea is the total |doi= + {{doi}} + {{doi-inline}} etc... found, and the total |doi-access=free found. Headbomb {t · c · p · b} 22:33, 13 August 2024 (UTC)

Got it. The first should be easy. The |doi-access=free will require updating the dump parsing. I will try to get that in before the 20th dump. -- JLaTondre (talk) 23:01, 13 August 2024 (UTC)

No rush. Headbomb {t · c · p · b} 23:32, 13 August 2024 (UTC)

Done, see here. -- JLaTondre (talk) 17:23, 18 August 2024 (UTC)

Awesome. And before the next dump too. We'll get to see just how many more we got from the recent CS1 update (Aug 17) that flagged more free DOIs. We've got about 27% of all such citations that have been idenfitied as free to read. Not bad. Headbomb {t · c · p · b} 17:43, 18 August 2024 (UTC)

Upon thinking a bit, the distinct doi prefixes should be the third sub-bullet, like so [5]. Headbomb {t · c · p · b} 17:46, 18 August 2024 (UTC)

Done. No visible change as output matched your edit. -- JLaTondre (talk) 19:01, 18 August 2024 (UTC)

User:JL-Bot/DOI

Bot seems to have chocked there... Headbomb {t · c · p · b} 06:53, 20 August 2024 (UTC)

Resolved. -- JLaTondre (talk) 22:44, 20 August 2024 (UTC)

WP:JCW/DOTS

Could the bot compile a list of redlinks (with at least one dot) that only differ by dots from a bluelink? Regrouped by redirect target? Like...

Rank	Target	Entries (Citations, Articles)	Total Citations	Distinct Articles	⁠Citations/article⁠
1	Journal of Physics A	Journal of Physics A Journal of Physics A. (2 in 2) J. Phys. A J. Phys A (3 in 1) J Phys. A (1 in 1) J Phys A J. Phys A (3 in 1) J Phys. A (1 in 1)	6	4	1.500

← —	Current Dots1	→ Dots2

Hosted at probably Wikipedia:WikiProject Academic Journals/Journals cited by Wikipedia/Maintenance/Dots1, with shortcut WP:JCW/DOTS1. Headbomb {t · c · p · b} 21:08, 16 August 2024 (UTC)

Yes, that would be doable. -- JLaTondre (talk) 00:24, 17 August 2024 (UTC)

You state "redlinks (with at least one dot)". However, if there was a [[One. Two]] blue link, you would not want a [[One Two]] redlink? -- JLaTondre (talk) 12:58, 18 August 2024 (UTC)

Upon further reflection, I don't know why I wanted at least one dot. I thought it would cut down on a certain class of common cases, but I can't think of them at the moment. So yeah, forgot that part for now. All redlinks that differ only by dots from a bluelink. Headbomb {t · c · p · b} 17:41, 18 August 2024 (UTC)

Wikipedia:JCW/DOTS has been created. I ended up going with a single page as the maintenance processing was implemented with single page output. If the DOTS output grows and needs to be broken into multiple pages, I can do that. But I wasn't sure if DOTS was going to be used to clean citations up - I didn't want to do extra work if the actual result was the DOTS output would get smaller over time. DOTS will need to be added to the {{JCW-Main}} template and a description to the Maintenance page. Please let me know if you see any issues with the output. -- JLaTondre (talk) 23:47, 19 August 2024 (UTC)

Looks great. DOTS should shrink pretty fast. I don't know if I'll have time to make a dent in it before the next time, but by September it'll probably be under 100 entries. Headbomb {t · c · p · b} 00:11, 20 August 2024 (UTC)

I think the only thing I'd change is make it case insensitive. And treat , and . as equivalent to each other (i.e. if redlinks differ only by commas or dots). Headbomb {t · c · p · b} 01:38, 20 August 2024 (UTC)

Easy enough. Change made. -- JLaTondre (talk) 23:10, 20 August 2024 (UTC)

Beautiful Headbomb {t · c · p · b} 23:29, 20 August 2024 (UTC)

Sept 1 dump?

The dump has been out for a few days. Normally the bot edits on the 3rd or 4th, but we're now the 5th... Has it crashed? Or something else? Headbomb {t · c · p · b} 14:55, 5 September 2024 (UTC)

There was an issue with the server that has been fixed. Job is running now. It will take awhile to complete and upload results. -- JLaTondre (talk) 20:11, 5 September 2024 (UTC)

List of featured contents in WikiProject Vietnam

Why is the list in Wikipedia:WikiProject Vietnam/Featured content empty? How can it be fixed? Cherry Cotton Candy (talk) 10:25, 29 September 2024 (UTC)

It's given only Category:WikiProject Vietnam articles, which is populated via {{WikiProject Asia|vietnam=yes}} and have about 90 articles in it only. Basing the subscription on {{WikiProject Vietnam}}, or Category:All WikiProject Vietnam pages‎ ([6]) is likely what is desired. Headbomb {t · c · p · b} 10:51, 29 September 2024 (UTC)

September 20th dump issue

There's a ticket for it here. Headbomb {t · c · p · b} 20:21, 25 September 2024 (UTC)

They fixed the issue and the bot is processing the completed dump. -- JLaTondre (talk) 14:04, 28 September 2024 (UTC)

Ticket for the October 1st dump here. Headbomb {t · c · p · b} 10:21, 4 October 2024 (UTC)

10/01 dump completed and the bot is processing it. -- JLaTondre (talk) 09:30, 6 October 2024 (UTC)

User:JL-Bot/DOI

The bot seems to have stopped at User:JL-Bot/DOI/10.59500, not doing anything in the 60K and 70K range, and not updating the User:JL-Bot/DOI/Deltas page. Was there an issue? Headbomb {t · c · p · b} 10:00, 1 October 2024 (UTC)

Issue seems resolved now. No idea what the hiccup was. Headbomb {t · c · p · b} 09:27, 6 October 2024 (UTC)

The Wikipedia API threw an error that broke processing. I reran it and it worked fine this time. -- JLaTondre (talk) 09:30, 6 October 2024 (UTC)

WP:JCW/DOTS

It doesn't report these cases (differs by a spaced dot/spaced comma). If possible, it should. Thanks. Headbomb {t · c · p · b} 20:35, 17 October 2024 (UTC)

I made the change. It is now picking up items like entry #21 (where "Crit Rev Eukaryot Gene Expr . " matches). The delta between page revisions is a little hard to process as I realized it was not sorting the "Entries (Citations, Articles)" box and the result order would vary from run to run. It is now sorted so future changes will be easier to see.

However, it is not picking up the "Microbiol Immunol ." case as the original request was to capture redlinks that differ from bluelinks by dots. In the "Microbiol Immunol" case, there is no bluelink to match against (see M33). The redlink only cases are already picked up the "Spaced dot" case of the patterns matching (see #67 at Patterns) so doesn't seem like it needs to be added to the DOTS processing. -- JLaTondre (talk) 18:38, 18 October 2024 (UTC)

Yeah "Crit Rev Eukaryot Gene Expr ." was a better example (I corrected all I could find, I just didn't remember which had blue links with them). I've added some extra patterns to /Patterns https://en.wikipedia.org/w/index.php?title=User:JL-Bot/Maintenance.cfg&diff=prev&oldid=1251739108, which results in the listing you now see at #67, so that will pick up the majority of the crap, though I'm not sure it's the best regex possible for that. Headbomb {t · c · p · b} 18:48, 18 October 2024 (UTC)

Anarchism

Not sure why the bot has been making this edit to Wikipedia:WikiProject Anarchism/Vital articles every few days, but the edit breaks the template in all pages depending on it, including the main page of the WikiProject. Deor (talk) 17:45, 2 November 2024 (UTC)

@Deor: see the above thread. Headbomb {t · c · p · b} 18:25, 2 November 2024 (UTC)

Dumps paused

See [7] and [8]. Headbomb {t · c · p · b} 05:22, 5 November 2024 (UTC)

Thanks for the update. -- JLaTondre (talk) 23:26, 5 November 2024 (UTC)

Vital articles reports empty

The last two times Wikipedia:WikiProject LGBTQ+ studies/Vital articles has been updated, it had very few and then no articles found. The same pattern seems to have occurred at other reports for vital articles (Wikipedia:WikiProject Korea/Vital articles, Wikipedia:WikiProject Libraries/Vital articles, Wikipedia:WikiProject Film/Vital articles), with most articles being removed on the Oct 19 update and the remainder on today's. Not sure if something has changed in the way vital articles are tagged that caused this? I don't see any changes on the individual articles' talk pages that would explain it.--Trystan (talk) 20:10, 26 October 2024 (UTC)

The vital article structure has changed. The prior categories by level that the bot used are now empty of articles (ex Category:Wikipedia level-1 vital articles). I will have to figure out what they have done and make changes to compensate. -- JLaTondre (talk) 21:20, 26 October 2024 (UTC)

I know what to do, but will need some more time to actually implement the fix. -- JLaTondre (talk) 20:26, 30 October 2024 (UTC)

Fix has been made. The bot is currently running on any project page that has "vital" in its title. It will update the remaining ones during its normal run this weekend. -- JLaTondre (talk) 00:37, 7 November 2024 (UTC)

Worked perfectly for Wikipedia:WikiProject LGBTQ+ studies/Vital articles, thanks!--Trystan (talk) 00:35, 8 November 2024 (UTC)

Weirdness in today's run

[9] + [10] + [11] + [12] removed a ton of valid DOI entries. Weird hiccup? Headbomb {t · c · p · b} 10:47, 12 November 2024 (UTC)

I've reverted (most of) today's run, FWIW. Headbomb {t · c · p · b} 10:50, 12 November 2024 (UTC)

It looks like for some reason the page text was not returned for these DOI redirect pages so the bot was unable to parse out the target. It worked today. I'll change the code so that it stops processing if this happens. Hopefully just a one time API issue, but the change will avoid bad results if it does happen again. -- JLaTondre (talk) 22:33, 13 November 2024 (UTC)

Done. -- JLaTondre (talk) 23:19, 13 November 2024 (UTC)

WP:JCW/DOTS, missed entry

This should report

Front Health Serv. (1)

as a variant of Front Health Serv/Front. Health Serv. Headbomb {t · c · p · b} 17:55, 9 November 2024 (UTC)

There does not appear to be "Front Health Serv" or "Front. Health Serv." citations to match against (see F28 and F29]). The dots processing is only matching against other citations as per the other maintenance reports. -- JLaTondre (talk) 00:04, 11 November 2024 (UTC)

These (and all other maintenance reports, if that's how they work) should be matched vs existing links/targets. Headbomb {t · c · p · b} 10:45, 12 November 2024 (UTC)

I added in redirects from Category:Redirects from ISO 4 abbreviations. Updated results have been uploaded. -- JLaTondre (talk) 01:18, 16 November 2024 (UTC)

December 1st dump

Problems have been sorted out, the dump is finally out. Headbomb {t · c · p · b} 20:33, 5 December 2024 (UTC)

No User:JL-Bot/DOI update?

They are normally happening on the 1st and 20th? 12:56, 23 December 2024 (UTC) — Preceding unsigned comment added by Headbomb (talk • contribs)

Crossref had a slight change in the format of their results which caused the bot a problem. I have adjusted for it and am running the job again. -- JLaTondre (talk) 19:57, 23 December 2024 (UTC)

Awesome. Thanks for the prompt fix. And a merry festivus/solstice/christmas/holidays/whatever to you! Headbomb {t · c · p · b} 02:58, 24 December 2024 (UTC)

Thanks, hope you are having a good holiday season also. I had some free time so I implemented email from the server. Now, I should get an email when it fails vs. someone having to notice it didn't complete a run. -- JLaTondre (talk) 12:57, 26 December 2024 (UTC)

Split pages

are reaching page limits.

I added

In the navbox, but the bot doesn't seem to want to create those pages. Headbomb {t · c · p · b} 11:49, 17 January 2025 (UTC)

I see the issue. It is not properly getting the revid associated with the Publishers.cfg/A–M or Publishers.cfg/N-Z pages so it did not detect that this edit should have caused a reprocessing. It will currently only trigger if the other configuration pages are changed. When the original publisher page was subdivided, I didn't add handling for the /- which need to be URL encoded. I made a change for that and it should run tonight. I will validate in morning. -- JLaTondre (talk) 01:05, 18 January 2025 (UTC)

Seems to be working fine. Headbomb {t · c · p · b} 16:46, 19 January 2025 (UTC)

Dump

A new dump is being generated, finally. Headbomb {t · c · p · b} 14:57, 23 January 2025 (UTC)

The dumpfile is dated 20250123 instead of 20250120. I updated the code to look for that date & it should pull it tonight. Hopefully they get back to a regular schedule. If not, I will have to update the bot to watch the RSS feed and use that to detect a new version. -- JLaTondre (talk) 00:36, 28 January 2025 (UTC)

Some more missing from dots report

When you have something like

Viruses (744 in 525)
- Viruses. (2 in 1)

"Viruses." doesn't get reported in WP:JCW/DOTS Headbomb {t · c · p · b} 13:45, 22 February 2025 (UTC)

I forgot about the " (journal)" case. I added in handling for that. See new version. -- JLaTondre (talk) 17:17, 22 February 2025 (UTC)

That was quick! Thanks. Headbomb {t · c · p · b} 17:21, 22 February 2025 (UTC)

No run today?

Did the bot crash? Normally the bot edits at around 2AM my time when there's been changes to the .cfg pages, and this didn't happen today. Did the bot crash? Headbomb {t · c · p · b} 13:46, 26 February 2025 (UTC)

Yes, it received an error when retrieving a page. I kicked it off again and everything seems fine. I did get an email notifying me of the failure (so that is working), but I didn't have a chance to look into until recently. -- JLaTondre (talk) 23:45, 26 February 2025 (UTC)

How do categories work?

Portal:Free and open-source software/Wikipedia featured articles turned up nothing with Category:Free software. Does the article need to be directly under that category rather than a subcategory? Rjjiii (ii) (talk) 10:13, 1 March 2025 (UTC)

Yes, the bot will only look at the categories specified. It will not recursive look at subcategories. You can specify multiple categories in the Project content template if you wish. However, you may wish to use Category:All Free and open-source software articles instead as that seems to have what you want. -- JLaTondre (talk) 14:38, 1 March 2025 (UTC)

Thanks! I didn't realize that there was a single category that contained software, people, ideas, companies, and so on. I'll try that and see how it goes. Rjjiii (ii) (talk) 17:23, 1 March 2025 (UTC)

WP:JCW/BRACKETS

Could the bot compile a page where journals end with (ALLCAPS) / [ALLCAPS] / (-|–|—|:|;) *ALLCAPS, e.g. [13]?

And then upload to Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Maintenance/Brackets?

Headbomb {t · c · p · b} 21:00, 6 March 2025 (UTC)

Initial version created. For the dash cases, I made it require a space before the dash as otherwise there were a lot of false positives (examples JBR-BTR, KRXI-TV) -- JLaTondre (talk) 15:29, 8 March 2025 (UTC)

That works yeah. After a cleanup pass or three, those numbers should go down substantially. Headbomb {t · c · p · b} 17:38, 8 March 2025 (UTC)

Missing entries in Wikipedia:JCW/DOTS

Ann. Soc. Ent. Fr. exists, but Ann. Soc. Ent. Fr from Cyrioctea wasn't reported. Headbomb {t · c · p · b} 17:53, 15 April 2025 (UTC)

Ann. Soc. Ent. Fr. is tagged as Category:Redirects from short names, not as Category:Redirects from ISO 4 abbreviations (which is what the bot uses). Should the redirect be re-categorized or the bot expanded? If the later, are there other cats to use? -- JLaTondre (talk) 23:32, 18 April 2025 (UTC)

Well it's the wrong category for sure (should be {{R from abbreviation}}), but the goal of the worklist is to catch as many as possible, so expansion is the solution for the bot.

should all be picked up and I'd say

{{R from acronym}}/{{R to acronym}}

{{R from initialism}}/{{R to initialism}}

{{R from short name}}/{{R from long name}}

{{R from modification}}, {{R from former name}}

as well just to be sure we cover as much ground as possible

Headbomb {t · c · p · b} 09:48, 19 April 2025 (UTC)

Change made and results are up. -- JLaTondre (talk) 19:33, 19 April 2025 (UTC)

Cleaned up the new entries... we'll see what happens after the next dump! Headbomb {t · c · p · b} 09:04, 20 April 2025 (UTC)

invoke:cite|journal

In entries like this, {{cite journal}} is invoked via

{{#invoke:Cite |journal | vauthors = Herrera S, Cordes EE | title = Genome assembly of the deep-sea coral ''Lophelia pertusa'' | journal = GigaByte | volume = 2023 | pages = 1–12 | date = 2023-03-16 | pmid = 36935863 | pmc = 10022433 | doi = 10.46471/gigabyte.78 }}

instead of the usual

{{Cite journal | vauthors = Herrera S, Cordes EE | title = Genome assembly of the deep-sea coral ''Lophelia pertusa'' | journal = GigaByte | volume = 2023 | pages = 1–12 | date = 2023-03-16 | pmid = 36935863 | pmc = 10022433 | doi = 10.46471/gigabyte.78 }}

Same for

{{#invoke:Cite | citation | vauthors = Herrera S, Cordes EE | title = Genome assembly of the deep-sea coral Lophelia pertusa | journal = GigaByte | volume = 2023 | pages = 1–12 | date = 2023-03-16 | pmid = 36935863 | pmc = 10022433 | doi = 10.46471/gigabyte.78 }}</nowiki>

instead of the usual

{{citation | vauthors = Herrera S, Cordes EE | title = Genome assembly of the deep-sea coral Lophelia pertusa | journal = GigaByte | volume = 2023 | pages = 1–12 | date = 2023-03-16 | pmid = 36935863 | pmc = 10022433 | doi = 10.46471/gigabyte.78 }}</nowiki>

Does the bot handle those? Headbomb {t · c · p · b} 09:44, 4 August 2025 (UTC)

No, I was not aware of invoke. It would be easy to add. Or there any others that can occur? -- JLaTondre (talk) 23:22, 4 August 2025 (UTC)

Well, all current templates have an equivalent, like {{cite document}} being invoked by {{#invoke:cite |document |...}} Headbomb {t · c · p · b} 23:44, 4 August 2025 (UTC)

For bean counting purposes, they can be considered the same as the non-invoked version. Headbomb {t · c · p · b} 23:45, 4 August 2025 (UTC)

Change made. There were 14k #invoke usages. Of those, roughly 180 were new journals/magazines cited and the remainder were increased counts of journals/magazines already reported. The individual pages are uploading now. The other reports (targets, popular, citewatch, etc.) will update whenever something triggers their processing. I can manually kick them off if needed, but usually there are changes to config, etc. that cause regular processing of those. -- JLaTondre (talk) 23:24, 8 August 2025 (UTC)

No rush. Headbomb {t · c · p · b} 23:38, 8 August 2025 (UTC)

Highlight journal= from different character set

If, for example, you have |journal=Аcta Вaltico‑Slavica, where А and В comes from the Cyrillic alphabet and the others from the Latin alphabet, it would be useful in the complilation to highlight this sort of thing, i.e. when an entry has characters from two different alphabets. If it's from a single alphabet, no highlighting is needed.

Journal¹	Type²	Target¹	Type²	Citations	Articles	⁠Citations/article⁠	Search
Аcta Вaltico‑Slavica	?	Acta Baltico-Slavica	?	1	1	1.000	Wikipedia _(J·M·T) Google _(J·M·T)

In general, there could be a color scheme like

Red = Latin
Orange = Arabic
DarkKhaki = Chinese
Green = Cyrillic
Blue = Greek
Indigo = Hebrew
Violet = Japanese
DeepSkyBlue = Other1
MediumPurple = Other2 (only used when Other1 is already used)
DeepPink = Other3 (only used when Other2 is already used, might not be needed)

Would this be difficult to implement? Headbomb {t · c · p · b} 05:30, 27 August 2025 (UTC)

Yes, that is doable. Perl, which is what that part of the citation processing is written in, makes it easy to check language scripts. Perl can recognize all the ones listed at perlunicode#Scripts (all the ones you are requesting are on that list). For Chinese, it would really be detecting for Han script - which in my understanding is used for several Eastern languages. It will probably be a couple of weeks before I can complete it. -- JLaTondre (talk) 23:14, 27 August 2025 (UTC)

Yes, if it's the Han alphabet, then that's the character set that should be highlighted. The point is to detect names that have multiple character sets in them, which should be rare, and usually limited to case like |journal=The Journal of Things = Το ημερολόγιο των πραγμάτων.

It's probably simpler to collect them and have them all reported on their own WP:JCW/Multiscript subpage, with that highlighting only in effect on that page.

Headbomb {t · c · p · b} 00:59, 28 August 2025 (UTC)

A separate page is easier. I can have a separate script for that vs. integrating into the regular output. -- JLaTondre (talk) 23:37, 28 August 2025 (UTC)

Should it report cases where there is a language template? For example, what should it do with Sidirotrohia ({{langx|el|Σιδηροτροχιά}}) which will produce Sidirotrohia (Σιδηροτροχιά) (after the change discussed below)? There are also cases where people enter titles in multiple languages without the use of a template? Should it only report a mismatch when it happens within a single word? -- JLaTondre (talk) 23:53, 28 August 2025 (UTC)

If there's a language template, that can be ignored IMO. I suppose to start, mismatches could happen accross multiple words, this way it could catch things like Acta Whatever А. Headbomb {t · c · p · b} 00:12, 29 August 2025 (UTC)

Updated Citation Field Extraction & Template Processing

@Headbomb I have been updating how the |journal= | |magazine= field is extracted from the citation templates. Currently, the bot is using a regex to extract the field, but this occasionally gives a bad result due to the complexity of pattern matching against templates within templates, comments, nowiki markup, and everything else people can embed in a citation template. It doesn't happen often, but there are cases that end up in Invalid titles that are due to parsing errors. The new method will use a tokenizer to split the citation templates into parts and pull out the |journal= | |magazine= field. I have found it to be more reliable.

As part of this change, I needed to tweak how the template expansion works. That led down a rabbit of hole of checking all the template expansions that have been implemented and validating them.

I came up with some items that could use your input.

There are a handful of cases where |journal=.. The old method would miss these whereas the new method will return a period. Do you want these to be reported or dropped? On the article output, this causes an extra period when the template is expanded. For example, at Arleen McCarty Hynes#References, you can see a ". ." on 20, 34, 35. I wasn't sure if this is something you wanted to cleanup.

The current processing removes ({{langx}}) from the end of a citation, but not other language templates. So |journal=Studime Historike ({{langx|en|Historical Studies}}) becomes Studime Historike, but |journal=Hubei Wenshi Ziliao ({{lang|zh-hans|湖北文史资料}}) becomes Hubei Wenshi Ziliao (湖北文史资料). My assumption is that any language templates in parenthesis (or brackets) at the end of a citation should be removed and only the non-parenthesis (non-brackets) section returned. Is this correct?

It will be a bit before this new version is ready to go live. I will do it separately from the language script detection request above so that it's easier to see any unintended effects of either change. JLaTondre (talk) 00:27, 28 August 2025 (UTC)

|journal=. should be reported yeah. No special processing needed, though I suppose they could also be added to WP:JCW/Dots. It definitely needs cleaning up.
Ideally, I think the best thing is to report what is rendered, so |journal=Studime Historike ({{langx|en|Historical Studies}}) can be treated as |journal=Studime Historike (Historical Studies) and |journal=Hubei Wenshi Ziliao ({{lang|zh-hans|湖北文史资料}}) can be treated as |journal=Hubei Wenshi Ziliao (湖北文史资料)

Headbomb {t · c · p · b} 00:47, 28 August 2025 (UTC)

I will make sure the lone periods end up on Dots.

I will remove the special handling of langx in parenthesis. I did look at using the API to expand templates so that I wouldn't have to hard code processing them. This would have simplified things as I would not have to add handling new ones when they popup in dumps or make updates if there is a change in how they operate. However, it would cause additional items to appear in the output that I don't believe you want. For some examples:

{{ill|Die Sprache|de}} produces Die Sprache [de] where I believe only Die Sprache should be output
{{nihongo|BOMB|[[:ja:BOMB|BOMB]]}} produces BOMB (BOMB) where I believe only BOMB should be output