This is an archive of past discussions with User:ClueBot Commons. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.
How can I help / Toolserver Load
Hey Cobi and Crispy,
Sorry that I have been, well, elusive for a while now. I'd really like some assignments. Perhaps monitoring and responding on some talk pages?
Also, I did a little grepping through some of the toolserver web-logs and found that ClueBot is actually pretty intensive on the Toolserver. (60814 queries in 24 hours). I also found that the bot was also making queries for edits made by users in the whitelist. Bot-users, for example, made up 9061 of the 60814 queries. Example:
Maybe ClueBot could keep a list of registered bots in memory, and strip those edits before it makes the query. I did this when DASHBot-AV was working. In any case, its only around 15% of queries, so it's no huge deal if you don't want to do that, just something that might speed things up.
Happy New Year, --Tim1357talk01:58, 30 December 2010 (UTC)
Hey Tim1357. About your suggestion of monitoring talk pages, that is already done on (with aid from an IRC feed) by multiple people, Cobi, Crispy, and myself included. Obviously if they have something they would like you to do, they'll be contacting you. Anyway, I'm sure there'll be discussion about whitelisted users querying the toolserver soon. -- SnoFox(t|c)02:18, 30 December 2010 (UTC)
Hey Tim. The best thing to do now is just improve the dataset. We can discuss it more on irc. As far as the extra queries go - even if the edits aren't actually reverted, the score information can still be useful for evaluation and possibly training. Crispy1989 (talk) 19:53, 1 January 2011 (UTC)
The bot is an anti-vandal bot and uses a complex ANN to detect vandalism in changes. If you believe this is a false positive please report it using this page so we can use it to train the bot and ensure that false positives are reduced as much as possible for the future. DamianZaremba(talk • contribs)16:00, 31 December 2010 (UTC)
...but isn't the original ClueBot rendered basically obsolete by User:ClueBot NG? It catches more vandalism, has way fewer false positives, and is generally superior in every way. Why are they both active? ClueBot's false positives are a serious problem, and it's not really all that useful. Shouldn't it just be deactivated? (No offense meant, of course, to those wonderful people who created cluebot in the first place) ☻☻☻Sithman VIII !!☻☻☻ 22:51, 31 December 2010 (UTC)
While attempting to fix a series of related bad articles, ClueBot reverted three of the eight that I had fixed and cleaned up in the interest of preserving their value. It effectively preserved bad entries while stopping me from cleaning up the articles. While I understand the intent of the bot, it certainly needs improvement. It nearly banned me from editing while I attempted to improve the content of an article and remove bogus content. I did report the first reversion and edit it back, but had my contribution been more detailed and time-consuming, I might have been discouraged from ever contributing to Wikipedia again. I'd rather this knowledge base didn't scare away people whose intent it is to help by replacing human diligence with trigger-happy bots. That said, the two or three edits that were reverted should be fixed without bot interference, and I lack the resolve and possibly the authority to do so at the moment. Thanks. — Preceding unsigned comment added by Enonesohc (talk • contribs) 21:03, 1 January 2011 (UTC)
The bot is not "trigger happy" - although all 3 of these edits were indeed false positives, they were not for separate reasons - all 3 edits were very similar, so all 3 were triggered in the same way. It possibly relates to the removal of the deletion template combined with overall reduction in content. Explanations on why these issues are possible are available on the bot user page and FAQ.
Also, after the first false positive, the previous recent warnings may have been a factor in the other false positives. You may remove warnings on your talk page in the event of false positives. I have done so for you for now. Crispy1989 (talk) 21:19, 1 January 2011 (UTC)
I have restored the AfD templates since the articles are still worthy of deletion. The main problem hasn't been addressed yet. The articles are simply not necessary. -- Brangifer (talk) 22:47, 1 January 2011 (UTC)
Your edit was classed as vandalism by the bot which appears to be correct as the redirect was non-constructive. If you believe this to be a false positive then please feel free to report it and it shall be reviewed. DamianZaremba(talk • contribs)13:06, 3 January 2011 (UTC)
ClueBot run pages
Regarding the run pages for the ClueBots, is it possible to add a page notice to help prevent newer editors from shutting off the bot? For example, a clever, new editor may be more likely to shut down the bot entirely instead of reverting the bot and reporting a false positive. I would do it myself, but I am not an administrator. :) Alternatively, if you can add a notice to the actual page using, say, <noinclude> tags that might bring more attention. But according to the current code, that would not work without a quick source change. -- SnoFox(t|c)00:00, 6 January 2011 (UTC)
I do not think so. This allows all Wikipedians to shut of ClueBot III, as well as other ClueBots in case they are malfunctioning. It has not been abused, and if it is, I'm sure page protection would be issued. -- SnoFox(t|c)21:51, 1 January 2011 (UTC)
I've got the bot archiving my talk page and though it's archived the majority of the stuff from 2010 (admittedly, my talk page doesn't have much to begin with) there's one unsigned comment from October 2010 that was missed and everything from 2009 and earlier hasn't been archived. What's wrong? --Kevin W./Talk•CFB uniforms/Talk00:57, 2 January 2011 (UTC)
With regards to the unsigned comment, it won't archive that for that reason. If it's unsigned there will be no date stamp so therefore ClueBot doesn't know when the message was posted on your page so it doesn't know whether or not it is in the timescale that can be archived. As a result it will just leave the comment on your page. That might be something that you'll have to manually archive.--5 albert square (talk) 01:09, 2 January 2011 (UTC)
I don't know I'm afraid, I don't know enough about ClueBot to answer that. You're going to have to wait for Cobi, Crispy or someone else that knows more about the bot to come along and answer that :) --5 albert square (talk) 01:52, 2 January 2011 (UTC)
Actually, just thinking, I set up ClueBot to archive the Neighbours talk page and looking on the settings for that and the settings for your talk page I can't really see any difference in the timeline. Yet with the Neighbours page ClueBot archived everything at once when I set it up. So it could be that your undated message is causing the problem. If that doesn't resolve it I can only think that there's something wrong in the settings.--5 albert square (talk) 01:58, 2 January 2011 (UTC)
It is currently set to archive every 12 hours. The bot calculates the date to archive to from the current time - age. The only time this does not work is on initial archival. Unsigned/undated sections will not give the bot any issue. It uses diffs to determine if the section should be archived or not, not timestamps. Essentially, if the diff between the revision at least age hours ago and the current revision contains the exact same section (it hasn't been modified), then that section is marked for archival. It will not archive text not in a section. -- Cobi(t|c|b)05:25, 2 January 2011 (UTC)
Ok, well, there's some kind of problem, then, because it still hasn't archived the oldest stuff on my talk page. I've got it set to archive anything older than three weeks and there's stuff on the page dating back to 2007 that has yet to be archived. --Kevin W./Talk•CFB uniforms/Talk22:33, 2 January 2011 (UTC)
i made this change because photos used are under copyright and thats why i remove then fron UCE&T Multan.
Personal tools
Engr.Shahzad
Well, personally I would have also made the same revert as ClueBot. The reason being is that there was a whole chunk of text removed without explanation. It may be that ClueBot reverted it for this reason or it may be that you triggered some sort of filter, I don't know. I would suggest that in the future you use the edit summary, it may stop situations like this.--5 albert square (talk) 00:38, 8 January 2011 (UTC)
If that's the case then you need to report it as a false positive. I don't have the link as I'm currently not on my usual computer, but the link for false positives should be at the top of ClueBot's talk page.--5 albert square (talk) 17:50, 3 January 2011 (UTC)
i profoundly dislike the idea that human editors need to spoon-feed this automated bot with false positives. it seems to use dubious algorithms, and its claim of false positive rates need to be explained. based on what analysis did you come up with these numbers? i will not submit my edit to the provided link as it is not hosted on wikimedia, but on some private non-open source location. how come that this bot is allowed at wikipedia if it is not entirely managed within wm servers? i am reporting this bot to the appropriate place. 188.2.168.166 (talk) 23:06, 5 January 2011 (UTC)
While I see your concern, the fact is that this bot is monitored by humans. There are hundreds of us entrusted with the rollback right that can revert the bot's edits. Also, copyright violations are not generally handled by this bot. They are generally handled by CorenSearchBot, which is much better at handling copyright violations than Cluebot. Cluebot perceived the edit as vandalism, so it took the appropriate action. Which was the wrong action, unfortunately. But I've only seen this happen once in three months of revisions by Cluebot. It is a very rare occurrence. ~ MatthewrbowkerSay hi!23:17, 5 January 2011 (UTC)
The bot is fully documented on its user page, here. There is also tons of discussion regarding how the bot works at ClueBot NG's BRFA, which also explains that the bot is accepted for usage on the English Wikipedia. -- SnoFox(t|c)23:25, 5 January 2011 (UTC)
In addition to the full documentation and explanation of the bot's algorithms, the methodology for determining the statistics including false positive rate is explained in several different places, including the Frequently Asked Questions page. We have taken the utmost care to ensure its accuracy. The bot is entirely open source, including all review and report interfaces, and all libraries utilized. There's no reason to fear submitting an edit to an external interface, because there's no need to log in to do so, and your account information cannot be transmitted. Crispy1989 (talk) 01:58, 6 January 2011 (UTC)
Rockfang: I think he meant a copy-vio-type false positive. For the record, false positives are best reported at the report interface so they can be used to train the bot and prevent more false positives. In fact, I just sent that revision from the report interface to the review interface to be added to the dataset just a few minutes ago. -- SnoFox(t|c)20:29, 8 January 2011 (UTC)
@SnoFox: Thank you for replying. Roger that on the copyvio part. Fyi, I was the one that had reported the false positive, I just forgot to state that here.--Rockfang (talk) 20:37, 8 January 2011 (UTC)
Mr ClueBot stop your vandalism of work of other users!!!...
I was reading some errors in the page of Pierre Joxe, and when i made some corrections Mr Cluebot was abusively erasing all and saying was vandalism!... Please Mr Cluebot be more clever!... —Preceding unsigned comment added by 82.124.155.68 (talk) 14:41, 9 January 2011 (UTC)
The stats are updated from the Google apps platform (which the review interface is hosted on) it may be possible to either a) write a class to interface with the WP API in java then update the page using the ClueBot login details or b) generate and upload the stats from the server ClueBot is actually run from. The second option I would say is more unlikely as it is probably a bad idea to expose user related details of any kind though the API (which is public), all though as the data is public anywhere this may not be an issue. I'm sure Cobi will have some solution once he has a little spare time however for now this is not a mega important issue. DamianZaremba(talk • contribs)18:14, 10 January 2011 (UTC)
I keep reporting false positives on Ballia. An IP acct is adding names of schools in the area. While the list of schools is getting absurdly long, it's hardly what I'd call 'vandalism', and seems like an inappropriate bot edit.
I'm reporting the false positives on the training page, but wonder if you could get the bot to lay off Ballia for a while.
Anniepoo (talk) 02:45, 11 January 2011 (UTC)
Still having issues with older talk page stuff
Unfortunately, the previous discussion has been archived, but I was talking about how ClueBot is set to archive my talk page but has yet to archive anything on my page from before 2010. What's the problem? --Kevin W./Talk•CFB uniforms/Talk01:08, 13 January 2011 (UTC)
I was looking through my contributions today because I was considering reporting this user for a vandal-only account.
Something I came across has worried me with ClueBot. If you look here you will see that ClueBot gave the user a level one warning today, I then went on to give the user a level 2 warning, on the next revert ClueBot should have given the user a level 3 warning but if you look what ClueBot has done is erased all previous warnings given to this editor and has gone back to a level one warning. I've had a look through ClueBot NG's other edits tonight and I can't see that it's happened elsewhere, I'm just worried that ClueBot has seemed to remove warnings this once. I thought it was the user at first but from what I can see only myself and ClueBot edited the talk page of the user tonight and it definitely wasn't me!--5 albert square (talk) 02:22, 13 January 2011 (UTC)
It was likely an API error that caused the bot to think the page was blank. Then it added it's warning to the blank page and submitted that data back to the API. I'll see about fixing this error. -- Cobi(t|c|b)09:39, 13 January 2011 (UTC)
I think this is something similar. (My first thoughts were that it could be to do with the year changeover, but that's quite possibly a total red herring.) --Demiurge1000 (talk) 22:44, 14 January 2011 (UTC)
The one that Demiurge just posted, I checked and it was a level 4 warning there before so ClueBot should have reported the user. I just checked AIV under both bot and user section and nothing was reported. I have reported the IP on ClueBot's behalf.--5 albert square (talk) 22:57, 14 January 2011 (UTC)
Perseus, Sonof Zeus has bought you a whisky! Sharing a whisky is a great way to bond with other editors after a day of hard work. Spread the WikiLove by buying someone else a whisky, whether it be someone with whom you have collaborated or had disagreements. Enjoy!
Spread the good cheer and camaraderie by adding {{subst:User:HJ Mitchell/WikiScotch}} to their talk page with a friendly message. Message received at 19:07, 15 January 2011 (UTC)
ClueBot, I was wondering if you could semi-protect the article on Alan Keyes as a group of unsigned-in vandals feel the need to continuely revert back to the old and out of date 1980's photo of Keyes over the 2008 photo of Keyes. Sincerely - Aaaccc (talk), 14 January 2011 (UTC)
Aaaccc - ClueBot NG is a robot -- an automated program, nor is it an administrator. You will have to ask at WP:RPP for page protection requests. ClueBot NG also cannot monitor pages outside of the main namespace (like the file namespace) as it is not programmed to do so. -- SnoFox(t|c)23:35, 15 January 2011 (UTC)
This isn't working out....
(1) Cluebot is getting false positives.
(2) Were a user behaving the way your bot was (repeatedly, even), they would be banned from wikipedia (probably with few warnings).
(3) Were an admin behaving the way your bot was (repeatedly, even), they would not remain an admin because you have removed constructive discussion from the bot's actions.
(4) Wikipedia is not your testing ground for bots.
Due to the nature of the admin role, a bot like this will never be feasible. It isn't even debatable. Please stop abusing wikipedia with your bots. 75.200.196.60 (talk) 04:55, 13 January 2011 (UTC)
ClueBot NG gets a ≤ 0.1% false positive rate. This has been agreed by the community to be acceptable.
You mean making one mistake in 1,000? No, they wouldn't be banned.
The bot is not an admin, will not be an admin, was never an admin, and was never intended to be an admin. This argument is irrelevant. Furthermore, the CBNG team has not removed constructive discussion about (assuming you meant about, not from) the bot's actions. I'd love to see some proof to back up this accusation.
The bot is not currently in it's trial period -- it was tested for over a month.
Your statistics quoted rely on a very tedious (by contrast to most bug reporting mechanisms) bug reporting process, and this had affected the frequency by which errors have been reported. Your assumption is that all users who are being forced to work around this bot are just as enthusiastic about perfecting cluebot as you are. This is a false assumption, and a common sense notion.
Again, you're quoting statistics with an inconvenient error reporting process that would have to take place after a user is already frustrated by seeing a bot doing what a person should be doing (administrating wikipedia articles).
It's reverting edits and making ban threats on behalf of other users, while partially obfuscating reply methods towards the posting admin. Often it will be incorrect about revisions (I have noticed this on many pages, now).
Our statistics are not at all based on reported false positives. If you had bothered to read the FAQ entry that Crispy linked to below, you would know that.
See above.
Which pages have you noticed where it is incorrect?
I should also say that it seems fairly obvious that you have created a rather tedious report process, which seems to be some excuse you have constructed to keep the bot around, citing 'few instances of false positives' when the problems this bot is causing are brought to your attention. I'll say it again: "Wikipedia is not a beta testing ground for your bots"
Have you read this? Please read that and all pages it links to for more information. If you do not believe the statistics are correct, then I'd be happy to explain in great detail the methods used to calculate them.
It has been noted before (by other users not associated with the bot) that CBNG's "false positive rate" is similar to a human editor's. It generates more false positives in total because it reviews more edits in total, and also more reverts in total.
The false positive report process involves clicking a link in the warning, then clicking a button. If this is your idea of a tedious process, then I can see why you found it difficult to navigate to the above FAQ link.
After reading the above linked documentation and statistics, please be more specific about what you think is wrong (you think stated stats are wrong, you think 0.001 (0.1%) false positive rate is too high, etc) so I can direct you to the appropriate FAQ entry or past talk page discussion. Crispy1989 (talk) 09:36, 13 January 2011 (UTC)
I have, and I recommend you read my response about why the error reporting concept being used is not generating accurate statistics.
If it's generating more errors because it's reviewing more content, then you are placing an emphasis on quantity rather than quality, and the bot should not be used here due to the problems it is consistently causing. Simply creating rebuttal pages does not take away from the fact that they are excuses for you to use wikipedia as a playground for your bots. I again recommend this bot be indefinitely banned and not allowed back to prevent future spam, contributor harassment, inaccuracies that only a human could prevent, and bottlenecks to communication about content in articles. Until you create an artificial intelligence module for this bot it is unacceptable as a revisionist.
Your misrepresentation of the error reporting process speaks too clearly about your position on the accuracy of wikipedia contributions, and again does not take into account the false assumption that inconvenienced users are as hopeful about the progress of your bot as you are. I think it would be quite reasonable to assume that a desire not to start an argument and to avoid the bot are more likely to be had in a general contributor than say, someone who works on a bot to test on wikipedia (you).
I did not say that a 0.1% false positive rate is too high, you are (again) obfuscating clear concepts to defend your bot, and I believe that your adamancy should make you a prime candidate for an indefinite ban as well.
See my response above. Reporting has nothing to do with false positive statistics calculations.
Humans make mistakes. If you were reviewing 100,000+ edits a day, you would make mistakes, too.
How is our representation of the error reporting interface incorrect?
The latest one as of this writing is User talk:76.179.137.237. Find the warning. Then click the "Report it here" link in the warning.
Then click the "Report false positive" button at the bottom. And that's all that is required.
You are the one who is being resistant to reason. You have conveniently ignored all of the documentation about this bot and reasoned from false premises. Here's the same logic in your last sentence used against you and formalized:
Users who are adamant are prime candidates for indefinite bans. (Premise)
Do you have a suggestion for improving the statistic generation? The current system is using a counts of the reported and reverted entries which is a pretty standard way of doing it.
There is not a emphasis on quantity rather than quality, the bot is reviewing hundreds more entries than any human editor is capable of. The pure numbers mean that the bot is doing more reverts per min and thus has a "higher" false positive rate.
Once again do you have any suggestion for improving this? I would say you are not taking into account the 'inconvenienced' users from vandalism if the bot was not running.
"Wikipedia is not a beta testing ground for your bots" - this is true hence why the bot was put though a trial and approved by the community.
"I believe that your adamancy should make you a prime candidate for an indefinite ban as well." - There has been no action as far as I can see by Crispy1989 to justify a ban and I am interested to hear why you disagree. DamianZaremba(talk • contribs)18:59, 13 January 2011 (UTC)
new set
Cobi, so it's your position that standard users should be required to read documentation to co-exist with your bot instead of treating your bot's errors (which are very obviously causing more problems than an editor with the same amount of errors due to just the net frequency of error-- what you have presented to be statistics, as an actual statistician, I can assure you, are not "statistics"-- if the bot is getting the same frequency of errors as a human editor but going through much more content, then it will be causing more problems than a human editor because of the net frequency, this is common sense, regardless of your investment in maintaining this bot) as disruptions inconsistent with the nature of wikipedia editing? If I woke up tomorrow with the ability to properly proofread 50,000 articles in a day but sorely screwed up 500 of those edits, then I can assure you I'd have close to 500 complaints every day. And that would be a disruption regardless of the other 99.9% of reverts that were accurate. And that's if your statistics are accurate (they are not by the design of the error discovery process because you falsely attribute your own investment to other users). I'm not very smart. If this is common sense to me, why isn't it to you? I have an idea as to why, and it happens to everyone.
No, it is my position that users who choose to make arguments about the bot, after being told to look at the FAQ first, should be expected to read the FAQ before continuing their argument.
Your analogy is flawed. The false positive rate is ≤ 0.1%. That'd be 50 mistakes for 50,000 proofread articles, not 500.
Your analogy is further flawed. If the bot doesn't review the edits, a group of humans will, and that means that the human false positive rate will apply.
The statistics are correct. They are calculated from a large random sampling of edits which have been manually classified and reviewed by humans. -- Cobi(t|c|b)01:21, 14 January 2011 (UTC)
DamianZaremba, as delighted as I'd be to discuss the appropriate maintenance of what's likely a perl bot with someone who thinks installing prebundled LAMP binary packages are 'resume-worthy', I'm going to have to point out that everything I've said here is verifiable (and no I'm not saying that just because I've put alot of work into maintaining a bot -- I can assure you I have no reason to lie, nor am I invested in maintaining something that seems to cause such frequent problems ;) -- It boils down to the process: These users shouldn't have to work around the bot, the bot should be a 2nd class netizen to all contributors. We don't have to report errors. Most of the time we don't even when we have to. We just wait for someone to fix it most of the time when something bugs out. This is an exception to that. Throw in the fact that the messages left by the bot are left under the bot's monicker with no rebuttal, no question before revision, and you've got a problematic bot that alienates users. The carange of complaints in the revision histories near this bot speaks for itself. Behind that is a user who is unapologetic, is using bogus [and intelligence-insulting, fallacious and misleading] statistics to defend the bot (that should obviously still be in its testing phase unless you're not a serious developer working on an unimportant home project), and I'd say the disruption is banworthy if continued.
I don't really see how the following points are relevent;
Perl - the language the bot is written in makes no different the the issues you are raising here. I assume you are referring to a rule based bot rather than an ANN based one such as Cluebot NG. This has been done before and is proved to be less effective and still generates false positives.
'LAMP binary packages' - I am interested as to why you bother to raise this. Whilst it has nothing to do with the questions I raised in response to your concerns I am slightly offended that you feel the need to question my personal experience rather than actually replying to the straight forward question. I also fail to see how LAMP stacks are not resume worthy, they have application in many business environments and when scaled up with caching and load balancing prove to be quite an effective solution for web based platforms. I am not going to argue the finer points of my resume nor my website as the content is what I choose to place on there and has no relevance to this debate.
I also disagree that users have to work around the bot, users revert pages sometimes incorrectly weather this is intentional to restore vandalized content or just in disagreement to the content; the bot is learning in the same way that a user on Wikipedia would.
I find it quite ironic that you accuse the bots owners of being 'intelligence-insulting' as that is exactly what you have just done to me. If you look into the complaints a lot are a) from users that have vandalized b) resolved because of a misunderstanding on the users part or c) the bot altered to suit the community (the fp rate was lowered due to the community). The statistics which you accuse of being misleading are accepted by the brfa and community as standard. The bot has been though multiple trails and was approved by the community so it should not still be in a testing phase. Whilst you accuse this of being a home project I should just remind you people work and contribute to the bot in their spare time, if you look back though projects worked on by the bot creators I think you will find that there code quality and ideas are impeccable. I cannot comment on them being 'serious' as I have no idea by what standard you are defining that. Are you taking into account of the disruption caused by vandalism that the bot reverts vs the small amount of false positives should it be stopped/banned?
I feel that we are going around in circles here and possible you need to make some productive suggestions and contribute to the bot rather than complain. If you disagree with what I have to say then please explain why and make a suggestion for doing something differently, please don't pull random quotes about me off the internet that are not relative at all DamianZaremba(talk • contribs)01:11, 14 January 2011 (UTC)
DamianZaremba -- Sure, I can clear up your questions.
I can imagine my assuming the bot was written in perl was a distractive detail in the point I made with that statement.
Because more often than not if a computing professional believes that installing preconfigured, precompiled apache and php binaries and config files is "resume worthy", then it's also likely that they wouldn't understand that an artificial neural network can just as easily be written and interfaced with through a series of object-style perl scripts as it can be in any more modern, less equipped "fad language" currently being used. :)
The reason they're not resume worthy, to clear up the confusion, is that it takes 15 minutes, without much briefing, for a 10 year old to set up apache on a loadbalancer after 'using ubuntu linux to check email through firefox for a few months'. Anyone with access to a search engine can do that, and it is not resume worthy. Having LFS memorized to wrote and being able to pass a RHCE, however, is resume-friendly (and, coincidentally, probably qualification to speak on ANN vs. Rule-Based paradigm construction). Sorry to have been confusing about that, but I felt it to be relative to the topic.
The stats speak for themselves. If it's really got a 0.1% error rate (which would mandate that 100% of all false positives were reported since its creation -- guess if that's true?), then that's 50 errors for every 50,000 edits (sorry about the typo), which, from the contributor's point of view is non-negotiable, un-researched, and simply 'compliant with calculated statistics). Contributors spend quite a bit of time on edits sometimes, they deserve (even vandals), to have some time reviewing the edits before deciding if it's actually vandalism or not.
The disruption caused by vandalism 99% of the time (see I can make up statistics with little grounding as well) is caused by administrative inflexibility and bad responses to well intended but 'screwed up' edits. The users write it off due to the pedantic nature of the responses (oh and that's so much better with a bot involved!) and at least some users get the notion that it's not to be respected anymore. (I'm not one of those users)..
My constructive contribution is to strongly urge you to pull this bot offline. It's not doing anything we couldn't do without it, and it removes the (needed) human element from the wikipedia process. Most Wikipedia editors don't want bots qualifying their edits.
Cobi --
"the statistics are accurate because it sampled random data".
If you want to see ClueBot taken offline you can start a discussion on a page such as ANI. I doubt that Cobi himself will be convinced by your arguments, and neither will most of the people who watch this page. —Soap—02:25, 14 January 2011 (UTC)
OK, read this. I have tried to link you to all of the relevant information, but you still refuse to read it. I am not insulting your overall intelligence, but I sure am insulting your ability to read. In the interest of clearing this up, I will repeat for you, here, all of the relevant pieces of information in the links I have given to you. If you do not wish to read it, or do not understand it, then please do not reply again.
The stated statistics have absolutely nothing to do with reported false positives. To calculate statistics of both catch rate and false positive rate, we take a random sampling of edits (generated by selecting revision IDs at random), submit them to our review (not report) interface, and allow users to classify them as either vandalism or constructive. For either classification, at least two different, independent users must agree. If there's any disagreement between users, more reviews are required. From this, we can be assured that we have a random sampling of correctly classified edits. To calculate the statistics, we run these exact same edits through the bot. If the bot classifies an edit as vandalism when it's actually constructive, it's a false positive. If the bot classifies an edit as constructive when it's actually vandalism, it's a false negative. False positive rate is the number of false positives divided by total number of actually constructive edits. Catch rate is number of correct vandalism classifications divided by total number of vandalism edits.
Some number of false positives is necessary. I'll spare the algorithmic details on you (if you were smart enough to understand them, you would have already read all about them on the FAQ and userpage), but in short, the false positive rate is configurable, and set by the bot's operator. 0.1% was decided and agreed upon by the community. The false positive rate is the independent variable, and all other bot functions are calculated given this value. If there is concensus that 0.1% is too high or too low, it can be easily modified - but, to be modified, at the minimum we need a suggestion for what a reasonable rate is. We cannot code "blargh blargh this is too high complain complain ignorance blah blah qualitative nonsense blargh" into the bot. If you think 0.1% false positive is too high, give us a number. Note: This is clearly explained in the very first FAQ link I gave you. Thanks for reading.
Your concept of common sense apparently isn't shared by the rest of us. You state, correctly, that if the bot has the same false positive rate as a human, but goes through more content, that the bot will have more total false positives. In fact, I stated it before you, up above, but I doubt you bothered to read it. Oh well, you can read it this time (or can you?) But consider this. To handle the same volume of vandalism, maybe a thousand human editors have to work - each of which reviews 1/1000 the content that the bot does. If each has an identical false positive rate, each alone will have 1/1000 the false positives. But since there are 1000 human editors total, the total number of false positives will be identical. Note that this is not an entirely accurate analogy, because with current human anti-vandal tools, multiple humans often review the same edit, compounding the chance of a false positive on an edit. The bot does not suffer from this deficiency.
Your discussion about Perl is irrelevant. Because you seem to like to argue for the sake of argument, I'll reply: The bot's core is implemented in C and C++. It leverages existing libraries for the ANN and other components. This is not due to lack of understanding - my original prototype of the core used a library of my own design - but I found it would be easier to maintain the project if someone else was maintaining the ANN portion. Most of the work done by the core is not even done by the ANN - there are hundreds of preprocessing steps to generate the inputs. These steps already take 0.03 seconds per edit (enough to get tedious when training the ANN with our entire 30,000 edit dataset), and writing them in a scripting language would be akin to trying to hollow out a redwood using a hand drill.
I am the expert on the bot's core algorithms, including the ANN. Neither Cobi nor Damian claim to be - they work on other areas of the bot. (This is also fully explained on the bot's user page.) If you would like to imply that the algorithms are poorly conceived or incorrectly implemented, at least imply that about me, so it's at least relevant.
Your main point of concern seems to be that you do not believe the false positive rate to be accurate. Read #1. Read the FAQ. Read the userpage. Then if you still think it's incorrect, give a reason specifically why. So far, all of your points in this regard have been based on an incorrect assumption (that false positive rate is calculated based on reported false positives), and you seem to be ignoring all attempts to correct you.
I would rather enjoy seeing you bring your arguments to the attention of administrators - although you seem to be unable to read, it is a job requirement of Wikipedia administrators, and I'm quite sure they can. Crispy1989 (talk) 02:30, 14 January 2011 (UTC)
After a rather lengthy and friendly discussion on IRC with Crispy1989 I think it's well understood that this bot is a very important asset to wikipedia (and it seems like versions of it will likely be important to other projects as well). Apologies, all around. 75.201.11.97 (talk) 04:22, 14 January 2011 (UTC)
A rare event when 4 warnings (3 by ClueBot and one by you) were issued within a minute. Perhaps something to do with how the bot handles edit conflicts. Materialscientist (talk) 09:53, 21 January 2011 (UTC)
I believe Cobi is looking into this as someone has mentioned it. The bot should in theory just add the template as a new section however as you have pointed out it sometimes replaces the entire page. It appears to be some issue with the data returned from the API which we will try and resolve ASAP so it doesn't cause major issues. DamianZaremba(talk • contribs)18:34, 20 January 2011 (UTC)
As far as I am aware this could be done using a post processing filter (it is what is used to stop the bot reverting edits that have already been reverted etc) but it would be up to Cobi as far as implementing this is concerned. I'm sure that he will have some input for this when he gets change to review it. DamianZaremba(talk • contribs)18:18, 22 January 2011 (UTC)
archiving this page
This talk page gets archived so quickly that it's difficult to have a serious conversation here. Could somebody please slow down the archiving bot? —Stepheng3 (talk) 04:41, 20 January 2011 (UTC)
The archiving bot archives after 3 days of inactivity in a particular thread. If 3 days without any response, then it is unlikely to be continued. -- Cobi(t|c|b)05:09, 20 January 2011 (UTC)
I think it's quite fine. If a conversation is inactive for three days, it's most likely to be forgotten about. If it just so happens you do not have the time to check in within three days, it is trivial to raise the issue on the talk page again, and link back to the archive if necessary. -- SnoFox(t|c)00:56, 23 January 2011 (UTC)
ClueBot III archives talk pages based on time since last reply to a section. The archive run is every 12hours due to bandwidth usage from the bot. So the simple answer is no you cannot make it archive based on message count. All the ClueBot user pages are fully protected to stop them being vandalised; they should never need to be edited as the pages are all transcluded. DamianZaremba(talk • contribs)16:54, 22 January 2011 (UTC)
ClueBot III can archive every 25 sections by setting the "maxkeepthreads" option to 25. However, ClueBot III only archives every 12 hours, as DamianZaremba said, therefore it is impossible to archive the very instant you get 25 messages. -- SnoFox(t|c)00:46, 23 January 2011 (UTC)
Most the talk pages, user templates etc are shared between all ClueBots, there is very little tied directly to User:ClueBot ;) As far as replacing ClueBot NG I cannot see any need for it, the bot will be maintained and updated but probably not replaced. DamianZaremba(talk • contribs)00:56, 23 January 2011 (UTC)
(edit conflict) I highly doubt it. So much I can confidently say, "No". If there is, it will be way, way, way, in the future. ClueBot NG seems quite future-proofed, however. -- SnoFox(t|c)01:01, 23 January 2011 (UTC)
Considering the bot makes thousands of edits a day, no, I don't know. Try linking us to the article in question, not some forums. Even better, report the false positive here. -- SnoFox(t|c)00:24, 25 January 2011 (UTC)
The IP can't link to the article, it's been deleted under G1 for nonsense. 70.163.57.150, if you would care to expand more on what you posted above, we may be able to advise you.--5 albert square (talk) 01:28, 25 January 2011 (UTC)
difference btw pinscreen and pinscreen animation
Hi ClueBot, yes, i am new to WP, but i am in the process to improve the information been posted. why my edit on Pinscreen page had been reverted??? i am in the process to improve the accuracy of pinscreen, / pin art. i have many reasons to clarify the difference between pinscreen and pinscreen animation!!! please read the fact of ward fleming and his patented invention, which i added in the origin section in pinscreen animation.
thank you!
Nip888 (talk) 04:37, 24 January 2011 (UTC)
Hi Nip888, you blanked the page, and you re-directed to another page without explanation, that's why your edit was reverted. It could also be that the edit has set off one of the Bot's triggers. ClueBot is not able to read what you've asked it to read as ClueBot is a Bot editor and not a human editor.--5 albert square (talk) 00:14, 25 January 2011 (UTC)
Possible vandalism on Mani Ratnam by 203.153.223.80
"Please do the needful"? Do you require the bot operators to do something or did the bot do something wrong? Thanks, -- SnoFox(t|c)00:26, 25 January 2011 (UTC)
Uh, good question. I think ClueBot NG takes both into consideration when judging vandalism. Regardless, it should be sent to the report interface. I'll do it for you. -- SnoFox(t|c)00:25, 25 January 2011 (UTC)
This is a bot editor not a human editor, if you feel the revert was unjustified then please file a false positive report so we can train the bot. 89.242.252.165 (talk) 19:03, 26 January 2011 (UTC)
I believe this is a bug caused by the data returned by the Wikipedia API. It seems to happen rarely, but may be unavoidable. -- SnoFox(t|c)00:16, 27 January 2011 (UTC)
Why you just keep on reverting lies about his nationality? This man was Croat as sure as Shakespeare was English! But no one is asking questions what is his nationality or inventing his "new" nationality! Please stop spreading lies, mistakes, etc., ...
And, you know, Brittanica and some similar projects are full of mistakes and missleading facts! —Preceding unsigned comment added by 89.164.117.100 (talk) 23:38, 25 January 2011 (UTC)
Are you processing requests to volunteer to review ClueBot's dataset? I must have submitted at least 2 applications (sorry for the extra work!), but I would really like to help out.
I believe I remember seeing something about seeing a confirmation email too, right? Well, if that's the case, I've never received one.
Yes! Cobi deals with them after checking accounts are not just vandals etc (preventing screwing the dataset), when he gets chance it should be enabled and the conformation email sent out. Might be a couple of hours though ): DamianZaremba(talk • contribs)00:03, 27 January 2011 (UTC)
Thanks for your reply, DamianZaremba. I sent my applications in several WEEKS ago though. Maybe I'm being turned up as a false positive for vandal. I did have 1 or 2 edits that probably weren't too productive in my younger Wikipedia years that I stupidly attributed my name to. Teimu.tm (talk) 03:07, 27 January 2011 (UTC)
Cobi usually reviews and processes the requests within a day. Because it's of the utmost importance that classifications be accurate, and what an experienced vandal-fighter would do, he only approves users that have extensive vandal-fighting experience. It has nothing to do with the bot thinking you're a vandal - statistics just don't show that you have enough experience as a vandal-fighter to be as accurate as we need. If we've missed something, or believe this is inaccurate, please let us know. Thanks for your interest anyway! Crispy1989 (talk) 14:01, 27 January 2011 (UTC)
Why aren't ClueBot NG's edits marked as bot edits?
Why aren't ClueBot NG's edits marked as bot edits? I have seen that on my watchlist page that ClueBot NG's edits do not have a "b" by them like the edits of other bots.
Tideflat (talk) 02:47, 28 January 2011 (UTC)