Proposal for Policy on overuse of bots in Wikipedias

The following is a proposed Wikimedia document. References or links to this page should not describe it as supported, adopted, common, or effective. The proposal may still be in development, under discussion, or in the process of gathering consensus for adoption (which is not determined by counting votes).

I would like to make the following proposal for a Policy on overuse of bots in Wikipedias, in light of recent perceived problems by overuse of bots. It should reflect that limited use of bots is supported. However, it must define what is the overuse of bots, and give solutions to that problem.

Currently, the scope of this proposal only extends to the Wikipedia projects. However, if it is adopted and seems to be successful, it may be used as a model for other Wikimedia projects as well.

History of the perceived problem

Some wikis in the past have required/triggered radical changes after heavy usage of bot creations.

During 2007, there was rapid growth in the Volapuk and Lombard Wikipedias, supported by a massive flood of additions (more than 100'000 articles each) from bots. Some pointed out that some of these new articles had errors. Both communities were relatively small, Volapuk having only one active user at the time of the bot-uploads.
In October there was a request closing Volapük Wikipedia, which has resulted in a Keep decision with reasons cited such as the historical impact of Volapük.
The Lombard Wikipedia had among other issues a similar problem with bot generated articles, which attracted the attention of many users, and had a request to close it in the end of 2007. When that discussion was still ongoing, in about a year a "faction" of the Lombard wikipedians introduced many changes in the project ([1] [2] etc.), including a moratorium on further bot use[3] [4] and the deletion of most empty bot additions, which brought the wiki from about 117 000 articles down to about 14 000.
On December 25, 2007, a new request was made to cut the Volapuk's bot-created articles and move it to the incubator. As of this writing, that discussion is ongoing. However, during the discussion, Jimbo Wales (acting as a non-voting advisor) said: "My recommendation, then, is that all the bot-generated articles be deleted, and that Volapük Wikipedia authors proudly and with joy work to create articles in the old-fashioned human way... helping each other with grammar, with interesting language questions, and with content that is of interest to the users."

Is a Bot-heavy Wikipedia a problem?

First I would like to say that bots can be useful tools, and I am not against their use. They can make tedious jobs easier, update information and leverage the work a Wikipedia does. Pretty much every Wikipedia uses bots to some extent. However, I propose that overuse of bots is harmful to the Wikipedia itself, as well as the wider body of Wikipedias.

In the following analysis, I do not mean to pick on the Volapuk and Lombard Wikipedias, but they are the most current obvious examples we have. However, there are probably others out there that fit the model of a Wikipedia that has relied too heavily on bots.

Harm to the image of Wikipedia. Like it or not, and whether or not it is fair or accurate, people look at the number of articles as a chief indicator of the health and activity of a Wikipedia. When they see a large number of articles, and then actually explore it and find the numbers to be gimmicked, that hurts the reputation of all Wikipedias.
Bot expansion is being used for advertising and political purposes. Volapuk admin Smeira himself said that he uploaded a large number of bot-articles on the Volapuk Wikipedia to advertise it. Smeira said, "I thought I could try to get some new people interested in learning the language and contributing by doing something a little crazy -- like increasing the size of the Volapük wikipedia as fast as I could..." At the Lombard Wikipedia, it seemed that a similar ploy was used to lend legitimacy to the Lombard language-rights movement. I contend that Wikimedia should not be used for political or advertising reasons. It is an encyclopedia, not an advertising service or political platform. Additionally, I believe that the Lombard and Smeira's experiment may border on disrupting a project to make a point. (See: section on Gaming the system.)
Overuse of bots is antithetical to the goals of the WMF. According to the WMF, "The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free content license or in the public domain, and to disseminate it effectively and globally." (emphasis mine) Although, bots may be used as a tool, primarily the focus should be on people doing the work. When a community comes together to build an article, the community gets a sense of accomplishment from coming together to achieve goals. Overuse of bots robs a community of that accomplishment.
Put the brakes on oneupmanship. Some Wikipedians might see the list of Wikipedias as a numbers game, and it is a competition which must be won. That may be a wrong attitude, but it's human nature, especially when dealing with nationalistic things like languages. Thus, bots are used to raise in the ranks faster, sacrificing quality for quantity, harming the reputation of all Wikipedias.
Jimbo Wales himself supports limiting bot-heavy Wikipedias. He advised during the "radical cleanup of the Volapuk Wikipedia" for all of Volapuk's bot additions be deleted. I must respectfully disagree, thinking that it would be wrong to single out the Volapuk for using a tool that almost every Wikipedia uses. However, I do agree that Volapuk has gone too far, and something must be done.
Cookie cutter problem. Overuse of bots causes articles to look alike, and be only about a small set of subjects. For example, most of the content in the VO:WP right now consists of two or three sentence stubs about a small geographic location: towns and communities. Furthermore, although they can create articles far faster than a human, they can also make errors much faster than a human. Several of the robot stubs have errors, due to the bot not realizing it was messing something up. Problems have occurred with vestiges of the old copied templates messing things up: For example, both the Lombard and the Volapuk had large amounts of English text, due to bot-copying errors. An egregious example of this would be in the Volapuk Wikipedia, in which a search for a relatively common English word yielded 438 articles composed of mostly English text (as of this writing).
Overuse of bots can extend a Wikipedia beyond the community's ability to maintain it. A human is likely to feel a sense of responsibility for articles they create on Wikipedia. Humans also generally care about what the final product looks like. A robot feels no such urge, so articles can be ugly, sparse or totally messed up, and nobody will care. Because there is such a high article to editor ratio, it's possible that articles may never be looked at or touched by a human. For example, it is estimated there are 20-30 Volapuk speakers in the world, less than 10 active in VO:WP, who must tend over 110'000 articles. Wikipedias which grow more organically (in proportion to the size of the community) are able to control and support their articles better. At the Volapuk Wikipedia, hundreds of articles have paragraphs of English text in them, and they have been like that since July of 2007. I brought this problem up during the October proposal for deletion of the VO:WP, and it still hasn't been corrected.
Monolithic point of view. With so many bot articles quickly generated, articles only tend to come from one source: the bot. In the Volapuk Wikipedia, SmeiraBot has created over 923'000 out of the 1'200'000 edits there.

What limits should be set?

I am defining a "human article" as an article which has been initiated by a human, and a "bot article" as an article initiated by a bot. Although this may be generalizing a bit, I've chosen this definition for a few reasons. First, it's simple and straightforward, thus it is a parameter which could be easily evaluated.

Additionally, I am also trying to avoid a "microedit brigade" scenario. I could see where a bot creates a bunch of articles, and a human or a team of humans makes a small inconsequential edit on each of them to make them "touched by a human". By "microedit" I mean just a little piece of information-- a link, template, category, etc. -- which just as easily could have been done by the initial bot. It would be difficult to determine if an edit is "inconsequential" or not, so that is a gray area which I would like to avoid.

I realize it's not a perfect system. There will be some terrific articles initiated by bots, and some lousy articles initiated by humans. There will be some articles initiated by humans, and completely overwritten by bots, and some bot articles completely overwritten by humans. However, I would consider most of those scenarios to be outliers. I am speaking about generalities here, and such details are not that important to the end-result.

Jimbo Wales suggested that ALL of the robot articles be cut back out of the Volapuk Wikipedia. I would suggest that would be too harsh of a limit. Besides, to limit one Wikipedia's bot use and not others seems to be unfair.

I am proposing a 3:1 ratio of BOT ARTICLES to HUMAN ARTICLES. Thus you are allowed a MAXIMUM of 75% of your articles to be bot-initiated. If your Wikipedia has 1000 articles initiated by humans, you are entitled to 3000 additional articles coming from bots. I think even this may be too many bot articles, but at least it is a limit. And if it seems too lax, we can always change the ratio later.

By proposing this ratio, please do not misunderstand. It is not suggested that Wikipedias should have a 3:1 ratio of bot-articles to human articles. If you are in a Wikipedia with a small bot-to-human ratio, great! Remember, a human-created article is almost always superior to the bot-created one.

Remedies for noncompliance

The remedies which have been proposed so far have been closure of the Wikipedia, cutting all of the bot articles, and moving the affected Wikipedia to the Incubator. All of these remedies seem excessively harsh to me, and seem punitive. Punishment is not the answer, I don't think. Our best bet is to warn of a violation, then allow a reasonable amount of time to self-correct, then outside corrective action should be taken. Penalties against a Wikipedia should only be punitive in extreme cases of ignoring the established policy.

Warning that a Wikipedia is violating policy.
Allow a reasonable length of time to initiate action to correct it. (30 or 60 days?)
If refused, stewards will delete bot articles starting with the most recent ones.
Punitive measures in extreme cases.

Once a Wikipedia is brought back into compliance, it is allowed to undelete articles in the same 3:1 ratio, as long as it continues to stay in compliance.

Other issues

How do we know which edits are bots and which are humans?

How are we to determine which editors are bots? Are all editors that doesn't have a bot flag, doesn't have 'bot' in the user name and doesn't tell it is a bot on the user page, non-bots? Or are we going to for instance set a limit of edits per day (or hour, week, month), and treat those editors which are making more edits as bots?

Procedural questions

How should a Wikipedia be warned, and who should do the policing? Is this a subject that should be brought up here or in the individual Wikipedia, or both? What should we do about Wikipedias that are currently not in compliance? (ie. Volapuk?)

Comments and additional suggestions

I welcome your comments and suggestions on this difficult issue. However, I have tried to be fair in constructing it. It is not fair to single out one Wikipedia when making this kind of proposal. I would be curious to know if any other Wikipedias besides the Volapuk violate this proposal as it stands now.

I would like to end by saying that I feel no ill-will toward the Volapuk or any other Wikipedia. In fact, I am an admin at the Esperanto Wikipedia, so I have a fondness for conlangs. However, I think the measures that I outline above are necessary to make the Volapuk more robust, and help to protect the image of all Wikipedias. Additionally, it will help to keep the rampant oneupmanship in control. I ask that you try to keep a level head in the ensuing discussion. Thank you for your attention. -- Yekrats 17:31, 8 January 2008 (UTC)[reply]

Thank you for taking the time to write a well thought-out essay on this topic.

Perhaps we also need to recognize the importance of prose in bot-generated articles. Rambot, the canonical example of an article-generating bot, demonstrated that it is feasible to generate full, readable articles using statistical data. It generated the U.S. geography articles before infoboxes were commonplace in Wikipedia, and I believe that, had each rambot article been dominated by a table or infobox rather than paragraphs, virtually none of these articles would have been expanded with the genuinely encyclopedic information that these articles feature today, because prose encourages human editors to get involved with specific articles, and with the project as a whole.

Of course, the rambot articles are far from portable, which is why it's taken awhile for those articles to be incorporated into other language editions, and which is why updates to the statistical data in those articles need to be performed with human supervision (e.g., AWB). But I'm sure prose still reflects much better on Wikipedia than any amount of tabular data (which you can find at countless other sites).

So perhaps we could adopt Erik Zachte's alternative criteria (column F) when evaluating whether a bot-generated page is a legitimate encyclopedia article:

Articles that contain at least one internal link and 200 characters readable text, disregarding wiki- and html codes, hidden links, etc.; also headers do not count

To clarify, we could exclude everything within a template.

– Minh Nguyễn (talk, contribs) 00:11, 9 January 2008 (UTC)[reply]

I do think the Rambot articles are better than SmeiraBot's articles, for a few reasons. The EN:WP has a much bigger audience and many magnitudes more editors to take care of irregularities and problems. It was also based on the actual data, (instead of a messed up translator) and then made a text narrative out of the data to make it less like a phone book. Also, Rambot composed a smaller percentage of articles in the English Wiki at the time, and the community could handle it. Rambot's articles were only a fraction of EO:WP's articles at the time, but SmeiraBot composed over 20 times the VO:WP corpus of articles when it uploaded. -- Yekrats 13:29, 10 January 2008 (UTC)[reply]

Is a Bot-heavy Wikipedia really a problem? (by Smeira)

In the following analysis, I (= Smeira) would like to offer a few more points for reflection, from the 'other' perspective: that bots and bot-created articles may do no harm to Wikipedia, and may actually be good tools in the pursuit of its goals.

Number-of-articles and harm to Wikipedia: whose fault? As we all know, number of articles does not measure quality. It is true that many people will look at it and draw the wrong conclusions; but that doesn't make it a better parameter. Many people believe in superstitions; this doesn't mean Wikipedia should, too. Note that, if it is true that number of articles tends to be interpreted as a measure of quality achievement, this is also because we present it as such. Experienced Wikipedians go around talking about number of articles and creating frequently visited pages based on number of articles (like the List of Wikipedias -- which originally was not based on number of articles and was later changed). In other words: if we give the public a bad parameter, they will take it from us. If we gave them different something else -- the List of Wikipedias by sample of articles comes to mind, but there certainly are other possibilities -- perhaps they would, too.
It seems that it is Wikipedians who like and preserve number-of-articles, not really the general public. Most people want to find true, accurate, reliable information; if they do, I think the question of how the article was written -- with a word processor, or a pen, or a bot -- hardly comes to their mind. In my experience, people are much more concerned with the possibility that Wikipedia pages may contain wrong information (e.g. the Seigenthaler case). "Anyone can edit" is a source of more harm for Wikipedia's image than "there are stubs written with bots". In fact, for the average reader of Wikipedia, the whole question of bot-created articles probably boils down to: can I trust their information? If the answer is yes, the other problems will seem much less important to the average reader. After all, so many things in life are 'gimmicks' these days (in the sense of 'avoiding the previous manual way of doing things and giving faster results'): cash machines, navigaton systems for cars, scanners with OCR... As long as they work, and their work is reliable and useful, does the average user really care about the technical details?
A stub is a stub is a stub. Stubs are useful things (completeness of coverage, potentially useful information, etc.; see e.g. good stub). They add a contribution, however small, to an encyclopedia. It should not matter whether they are made with bots: in many cases it is difficult to tell (compare e.g. de:Gotō (Nagasaki) with nl:Soyaux). In fact, the stub created with a bot can even be better than the one created without it: compare now de:Streator with vo:Streator. There are hundreds of thousands of articles in the stub categories of the major Wikipedias; it seems one should judge them all by the same criteria (e.g. readability, accuracy, relevance), or else, if the only reason for condemning some is that a bot was used, then there is unjustified discrimination.
Bots are in agreement with the goals of WMF. Of course, people are the essential element of WMF projects: they are made by, and for, people. Bots are simply a tool that cannot really change this. Bots cannot write or run their scripts; ultimately, it is people who write bot scripts and run them, and who design the kinds of articles (usually stubs) that can be easily created with the help of a bot -- and they are designed to be readable to people, not to bots. It is always people; there is no other way. People can collaborate in creating and writing articles, sometimes by writing the text with their own hands, sometimes by writing a bot script that will do that, in the cases when this is simpler. Human-written texts are better in many ways (creativity, originality, human interest, etc.), but script-written texts also have their usefulness (e.g. in articles about, or with, "boring" but relevant material like statistics). Humans collaborate: they create pages, they edit them -- to change a lot, or maybe a little, or maybe to correct a mistake or add an interwiki link or a photo. It doesn't matter in the end if they do that by typing every character, with bots, or both. If a person creates an article with a bot, another person can edit and improve it; if a person creates an article manually, another person can edit and improve it with a bot. Each preference has its advantages and disadvantages, and should be used knowingly; but none excludes the other. One should look at the results of these edits to decide if they are good or bad, not at who made them, and whether or not s/he used a bot for that.
Bot-created articles are not a disrespect to human editors. Some of the arguments presented against bot-created articles seem to be based on a perceived threat: bots are "taking our jobs!" Or "lots of bot stubs belittle the long and hard work many people have put into manually creating and editing articles". There is no reason to think that. There are Wikipedians of all kinds: some work for months on a single article, others prefer to edit many articles and add a little content here and there, others prefer to correct mistakes, others are interested in the organization of categories... and others want to create or improve certain kinds of articles with bots. How could any of them be a threat, or even disrespectful, to any of the others? People do different things; and the things they do should be judged by the results they produce. If a Wikipedian worked for months on creating and improving an article till it reaches FA-quality levels, s/he has done a wonderful job, which is no way belittled by another user who preferred to correct spelling mistakes in 200 articles or to create 100 stubs manually or with a bot. And also the other way round: the corrections or stubs, if useful, remain an useful contribution of the second user, not belittled by the achievement of the first user. Again: we should look at results, not at origins.
Wikipedia is not based on authority. In Wikipedia, as everywhere, some have done more than others. I think we all agree that Jimbo Wales did the most important step of all: he created Wikipedia and gave us all the opportunity of developing it. With intelligence and foresight, he made it a free Encyclopedia, in which people can discuss, debate and exchange ideas in a useful way, based on their merits, without being hindered by the pontification of moral authority. Jimbo Wales pointed out that his opinion was simply offered for consideration, in the hope that some people might find it useful. This is what we should do: take his opinion into consideration, discuss, debate, and, if need be, disagree with it. Looking at the principles of the Wikipedia he created, one cannot help thinking that he probably would be the first to agree.
Bots can cause, and also correct, errors. Like all methods used by humans to create content, bots can cause errors (e.g. the source may contain formatting errors or divergences that were not expected by the person running the bot and may cause words, or even large sections of text that were not intended to be in the final article, often in a different language). When the author of the bot script designed it well, the errors are relatively infrequent: in the Volapük Wikipedia, only about 0,5% of the articles contain them. These errors should then -- and can -- be corrected (copying errors often fall into a few categories, and bots can often be used to quickly correct some of them). The Volapük Wikipedia, with its several categories created for dealing with errors, and with its ongoing cooperative effort to correct them, is a good example: the total of articles with wrongly copied text in other languages should decrease dramatically or even disappear in a couple of months (if no obstructions or diversions hinder the work of Volapük Wikipedians, of course). Note that searching for common English words is a dangerous way of evaluating the amount of wrongly copied texts, since English words are often parts of titles or quotations in good articles in other Wikipedias. A search for the words "the" and "have" in the German Wikipedia found 95 768 and 4 056 hits respectively at the time of writing; yet this does not mean that this Wikipedia has lots of copying errors.
What can communities maintain? It is difficult to maintain lots of articles. It usually involves lots of "boring" work, like checking sources, updating statistics, etc. In the case of bot-created articles, fortunately, it is possible to do also that with bots. Just as it was possible to create many articles with the help of a bot, it is also possible to update them (if at least parts of them are left in a standard format -- like an infobox -- by human editors) and correct and improve mistakes. As was described in the previous item (concerning strategies for dealing with errors), this is exactly what is happening in the Volapük Wikipedia. One single script, written by a human user, can power a bot to change e.g. population figures or coordinates to reflect newer data in a large number of articles. This is a usual procedure in all Wikipedias (e.g. for updating infoboxes), and it is easy to extend to cases such as the Volapük Wikipedia.
Competition and oneupmanship. Indeed, Wikipedians are contributors to a project that involves collecting and disseminating knowledge. We should not be comparing statistics with hearts full of national pride; this is for sports events. People who create articles -- with bots or with other means -- only so that 'their country wins' simply do not have the right attitude: they should be warned. But so should people who want to delete bot-created articles in order to 'put others in their place' and 'avoid cheating' (in what game? we're just collecting and disseminating knowledge). Both are extreme attitudes: most people on both sides of the question are fortunately much less extreme.
Sources and point of view. The source of a bot-created article is the same as the source of a human-created article: namely, its source :-). A work of reference on someone's desk; an authoritative website; a published article in a journal; that is where the information ultimately comes from. The POV problem has more to do with style: what to present, in which way, and favoring which side. This is hardly ever the case with the kind of article better created by bots: statistics, numbers, facts, since the best way to present them is usually obvius and the same for all cases: a standard text, an infobox, etc. This is because they are standard data, with standard definitions and in a standard format. One detects a POV by detecting a bias, even a subtle one (e.g. with weasel words); as long as bot edits do not insert a bias but simply keep adding neutral information with neutral text, the number of edits causes no POV problems. In other words: if there is a POV bias, this should be obvious from, and argued with, the text of the article(s); inferring its "possibility" from number of edits does not save us from having to check if it has really happened. The edits may all still be OK.
Many bot-created stubs don't damage quality. Bot-created stubs add information, and therefore they also add to the overall quality; the difference is that they add a lot less than creatively written articles. By using bots to create articles, humans increase the quality of their Wikipedia much more slowly, maybe never beyond a certain threshold; by creating them manually, they can increase the quality much more. But these actions are not in competition: it is possible to do both (create "boring" but useful articles with bots, and "interesting" and "creative" articles manually). A Wikipedia with lots of articles created via bots and few manually created articles has lower quality than one with lots of both, but higher quality than one that has few of both. It is simply that you need two numbers to compare them: bot-aided activity and manual activity. The English Wikipedia is high on both counts; the Volapük Wikipedia is perhaps high bot-aided activity, but low in manual activity (in comparison to the English Wikipedia); and the Diné Bizaad Wikipedia is low on both counts. The only problem here is if one wants to measure quality with only one number, e.g. number-of-articles: one then misinterprets Wikipedias with similar numbers of articles as Wikipedias of similar quality. This is like measuring the area of a rectangle by the length of one of its sides: simply wrong, the area is the product of two lengths (of adjacent sides), only one length is not sufficient. The remedy here is not to oppose or delete stubs (how could reducing the amount of information improve quality?), but to measure bot-aided and manual activity separately, and use them as independent parameters when judging and comparing projects
"Advertising": the use of bot-created stubs for "other purposes". People can use bots to create stubs for many reasons, but these are their reasons, for which the people in question should be held accountable, not the method itself. People may edit and improve articles for "other purposes" (to show how smart they are, to win a bet, to impress a loved one), and it is possible to judge them (e.g. as frivolous) on the basis of these other purposes; but if the articles that were edited are better, if the improvements were real, then this fact is not changed by these "other motives". Creating articles with bots, even lots of them, is a method of adding information to a Wikipedia; if it was done with an eye to "winning the competition" - 'my country is better than yours!' -, the results are still there and and can -- and should -- be judged independently. Having said that, however: what about the original bot user? I suggest that his motivation be evaluated and judged, independently of the merit of the contribution s/he made (the bot-created articles). The case in question is the Volapük Wikipedia (there aren't any others yet, as far as I can tell): the bot user (Smeira, I myself) saw the creation of many articles as a possible means of attracting more contributors to vo.wp, so as to increase the size of its community. I suppose nobody disagrees that increasing a Wikipedia community is a worthy goal (it was part of a three-year plan discussed here at Meta); the question is simply whether having this as one of the goals -- not simply the dissemination of knowlege -- leads to an ethical problem (not with the idea of creating lots of stubs per se, but with the particular case of the Volapük Wikipedia). I would maintain that this is not the case, because between "informing"/"disseminating" and "advertising" there is a continuum with extreme cases but also lots of intermediate ones and no clear dividing line, especially in the absence of any rewards, monetary or otherwise. The ethical question here is far from clear.

Comments on proposal (Millosh)

First of all, a couple of comments about the proposal: --Millosh 18:39, 10 January 2008 (UTC)[reply]

This was very expectable. While I don't like Indo-European based wanna-be-world-languages (like Volapuk), while I don't fully agree with making articles "2 is a number greater then 1 and lesser then 3" (Lobard case), intentions of both cases were very clear: Nationalists, conservatives and people who think that it is more important to delete Wikipedias in artificial languages then to follow our own rules -- united their efforts in something which produced this proposal. --Millosh 18:39, 10 January 2008 (UTC)[reply]
I know that conservatives (as a general, not only political position) would like to see a conservative encyclopedia, like, for example, Britannica is. Value of such encyclopedias are, of course, high and I don't think that this position is wrong. However, Wikipedia is not the right place for building such encyclopedia in a collaborative manner. There is such project on Internet and its name is Citizendium. Please, go there. I am sure that you would make much more good things on such project then on Wikipedia. --Millosh 18:39, 10 January 2008 (UTC)[reply]
I would ask nationalists which belongs to the language area below 100 millions of speakers to delete their bot generated articles before they continue to argue against bot generated articles on other Wikipedias. --Millosh 18:39, 10 January 2008 (UTC)[reply]
People who belong to the language areas with 100M and more speakers have a big luck. I only may wish that my culture has common language with such number of people. Informations in such cultures don't have such strong barriers like in cultures with smaller number of language speakers. But, please, consider that for some languages article like "<place name> is a place in Italy, in the province Piedmont, with coordinates..., number of inhabitants, this postal code, than telephone area number, ..." -- is usually much more then they ever had in an encyclopedia in their language about some place in their country. Forbidding those people to have such data in their language is extremely arrogant. --Millosh 18:39, 10 January 2008 (UTC)[reply]
And to others -- who supported previous proposals because of some of their emotional reasons and by using WM goals as an excuse -- I would like them to think again to what their emotions may bring, seeing this proposal as an example. --Millosh 18:39, 10 January 2008 (UTC)[reply]

And, now, here are the responses: --Millosh 18:39, 10 January 2008 (UTC)[reply]

Harm to the image of Wikipedia. Like it or not, and whether or not it is fair or accurate, people look at the number of articles as a chief indicator of the health and activity of a Wikipedia. When they see a large number of articles, and then actually explore it and find the numbers to be gimmicked, that hurts the reputation of all Wikipedias.

- If someone may make bad conclusions about all Wikipedias because of well written program generated articles -- I don't have anything against. For example, articles about places on Volapuk Wikipedia just need to cite sources. Of course, bots may make articles with much better look. Good examples are bot generated articles on Portuguese and Russian Wikipedias. --Millosh 18:39, 10 January 2008 (UTC)[reply]
Bot expansion is being used for advertising and political purposes. Volapuk admin Smeira himself said that he uploaded a large number of bot-articles on the Volapuk Wikipedia to advertise it. Smeira said, "I thought I could try to get some new people interested in learning the language and contributing by doing something a little crazy -- like increasing the size of the Volapük wikipedia as fast as I could..." At the Lombard Wikipedia, it seemed that a similar ploy was used to lend legitimacy to the Lombard language-rights movement. I contend that Wikimedia should not be used for political or advertising reasons. It is an encyclopedia, not an advertising service or political platform. Additionally, I believe that the Lombard and Smeira's experiment may border on disrupting a project to make a point. (See: section on Gaming the system.)

- Advertising of a Wikimedian project is not only a completely acceptable task, but a preferable one. Many communities were built over such actions. The brightest example are Polish projects which have people to keep their own Wikinews! In their work, they are even better now then German project in the sense of a number of people who are participating in it. I went to both of channels (#wikinews-de and #wikinews-pl) and I saw more people on the Polish channel. People like tsca made a great job in promoting Wikipedia in Poland. Of course they didn't disrupt the project, but they made one of the most live group of projects on Wikimedia. In short, this reason is nonsense. --Millosh 18:39, 10 January 2008 (UTC)[reply]
Overuse of bots is antithetical to the goals of the WMF. According to the WMF, "The mission of the Wikimedia Foundation is to empower and engage people around the world to collect and develop educational content under a free content license or in the public domain, and to disseminate it effectively and globally." (emphasis mine) Although, bots may be used as a tool, primarily the focus should be on people doing the work. When a community comes together to build an article, the community gets a sense of accomplishment from coming together to achieve goals. Overuse of bots robs a community of that accomplishment.

- Thank you for you ethics. My ethics says to me that the goals of WMF may be achieved by using much more clever methods then copy-pasting data or translating something which may be translated by a program. --Millosh 18:39, 10 January 2008 (UTC)[reply]
Put the brakes on oneupmanship. Some Wikipedians might see the list of Wikipedias as a numbers game, and it is a competition which must be won. That may be a wrong attitude, but it's human nature, especially when dealing with nationalistic things like languages. Thus, bots are used to raise in the ranks faster, sacrificing quality for quantity, harming the reputation of all Wikipedias.

- So, a number of articles is not a valid measure for quality. There are better ways to do so. --Millosh 18:39, 10 January 2008 (UTC)[reply]
Jimbo Wales himself supports limiting bot-heavy Wikipedias. He advised during the "radical cleanup of the Volapuk Wikipedia" for all of Volapuk's bot additions be deleted. I must respectfully disagree, thinking that it would be wrong to single out the Volapuk for using a tool that almost every Wikipedia uses. However, I do agree that Volapuk has gone too far, and something must be done.

- This is more then invalid reason. More the 60 of other Wikimedians don't think so. --Millosh 18:39, 10 January 2008 (UTC)[reply]
Cookie cutter problem. Overuse of bots causes articles to look alike, and be only about a small set of subjects. For example, most of the content in the VO:WP right now consists of two or three sentence stubs about a small geographic location: towns and communities. Furthermore, although they can create articles far faster than a human, they can also make errors much faster than a human. Several of the robot stubs have errors, due to the bot not realizing it was messing something up. Problems have occurred with vestiges of the old copied templates messing things up: For example, both the Lombard and the Volapuk had large amounts of English text, due to bot-copying errors. An egregious example of this would be in the Volapuk Wikipedia, in which a search for a relatively common English word yielded 438 articles composed of mostly English text (as of this writing).

- Of course, programs may be written well or not. This is not the general issue, but a particular. And if you are afraid of programs, please don't use computer. --Millosh 18:39, 10 January 2008 (UTC)[reply]
Overuse of bots can extend a Wikipedia beyond the community's ability to maintain it. A human is likely to feel a sense of responsibility for articles they create on Wikipedia. Humans also generally care about what the final product looks like. A robot feels no such urge, so articles can be ugly, sparse or totally messed up, and nobody will care. Because there is such a high article to editor ratio, it's possible that articles may never be looked at or touched by a human. For example, it is estimated there are 20-30 Volapuk speakers in the world, less than 10 active in VO:WP, who must tend over 110'000 articles. Wikipedias which grow more organically (in proportion to the size of the community) are able to control and support their articles better. At the Volapuk Wikipedia, hundreds of articles have paragraphs of English text in them, and they have been like that since July of 2007. I brought this problem up during the October proposal for deletion of the VO:WP, and it still hasn't been corrected.

- Simply, this is not true. While one person is not able to maintain all bot generated articles, 10 people are able to maintain 1.000.000 of articles. "Organic" and "sustainable" arguments more smells to religion then to anything rational. Please, give a long term examples for such claims. I am sure that articles on Volapuk Wikipedia would be much better in a year or two than they are now. Of course, if people like you don't break their community. --Millosh 18:39, 10 January 2008 (UTC)[reply]
Monolithic point of view. With so many bot articles quickly generated, articles only tend to come from one source: the bot. In the Volapuk Wikipedia, SmeiraBot has created over 923'000 out of the 1'200'000 edits there.

- So what? Facts should be multipolar? Coordinates of some place may vary and "all relevant positions should be added there"? Well referenced article (which one bot article may be) is, of course, much better then human made article in a POV manner without sources. Also, this "reason" doesn't have anything common with the basic scientific principles. --Millosh 18:39, 10 January 2008 (UTC)[reply]

This policy does not penalize VO:WP at all

Subitled "Yekrats goes just a little bit nuts"

Before you trigger-happy Volapukists shoot this down without even reading it, did you actually read the part where I asked the question of what to do with VO:WP if this passes? This proposal is full of places where I say I'm not picking on VO:WP or LMO:WP, but simply using them as examples of what I see as bad behavior.

Although I would like to see VO:WP voluntarily comply with this, it is not my intention to force you to anything by this proposal. Mostly it is to prevent other wikis from making an bot-heavy Wikipedia like VO and LMO did, and not being able to maintain it like VO and LMO.

I've considered the possiblity that VO:WP could even be "grandfathered", in other words: allowed to have their articles because they added the articles before this rule was in place.

Do you want to know what I'd like to see in VO:WP?

Fewer bot errors (English-language articles, template errors, etc.)
Less of a monolithic cookie-cutter articles of nothing-much-at-all
Compromise from the Volapuk camp instead of all-or-nothing stand-offs
Fewer "ethical lapses" from the Volapuk crowd pushing the envelope
a focus on QUALITY over QUANTITY

But most of all, I'd like to see

ACTUAL ARTICLES in VO:WP!!!!!!! (ie: Not artificially inflated advertisement fluff.)

People, do you realize that if you spent less time arguing here, and less time sending out ethically questionable emails campaigning for your cause, and less time arguing every nuance of every clause on every forum, bulletin board, and wiki-discussion page across the entire Web, less time trying to convince me to change my mind ... if you did less of all of that and more time just actually crafting real human articles, you would be up to the minimum required by this proposal in no time? I'm not sure how many human articles you currently have, but this proposal is EXTREMELY GENEROUS. I'm guessing you would be in compliance in about 20000-30000 articles.

If you do 30000 human stubs, and call a TEMPORARY moratorium on bot articles, you don't even have to worry about this proposal at all! How do you like that? What do you think about that?

Just give an inch! Until now, I have not seen the Volapuk camp compromise one iota. I have tried to support you, tried to find a middle ground, tried to respect your position. I get attacks, and double talk, disrespect, but no compromise. By not compromising at all, you show extremism and a lack of maturity. It would be a big step if you were to make a proposal that showed you were willing to take some steps to address the concerns of the other side.

Human content is always preferable to bot content. Get to it!

-- Yekrats 20:10, 10 January 2008 (UTC)[reply]

Yektrats...

Please, calm down. You're not being persecuted by Volapük hitmen trying to shoot down your proposal. You're a good person, and you have done great work for the Esperantlingva Vikipedio. Nobody denies that; nobody is trying to attack you personally.

First: Your proposal is quite moderate; I've said so in your talk page, and I say it again here. Only two Volapukists have given their opinions here thus far, and one of them -- Jmb -- has partially supported it. The other one -- me -- is against it on principled reasons: I only mention vo.wp when discussing arguments in which you mentioned it first. My points about bot-created articles do not depend on vo.wp: they are much more general. I think I'm simply showing that I care about the issue, and I contribute to the discussion by mentioning points and giving arguments that others should think about. Is this being "trigger-happy"? Come on...

Second: your proposal does not target vo.wp or lmo.wp specifically. You did it well, and it stands as it is: a place to discuss the question in general. That was what I always had wanted, what I kept asking for in the discussion of Arnomane's proposal. I'm thankful to you for having created this proposal. I didn't miss your suggestion that vo.wp could be "grandfathered". I said and repeat: you are a good person, you are addressing what you see as a problem, and this is an honest step. I just disagree with it, as others on this page also do (none of them is a Volapükist). Please don't see a (non-existing) vo.wp-is-out-to-get-me threat; the reasons for being against your proposal go way beyond vo.wp. As I said, my arguments here don't depend on vo.wp for anything!

Third: you want more good articles in vo.wp -- better than my bot can thus far create. So do I. Have you looked up the new articles there? Check out vo:Hector Berlioz, vo:Anton Bruckner, or vo:Charles Ezra Sprague. Or vo:Leptictidium. Look also at new infra-structual work being done: the new thematic links on the main page, the reorganization of user categories, the correction of errors... New articles of the kind you (and I) want to see are being added. Too few? Only four? Well, look what was being done before Arnomane's proposal: about 100 such articles were added in the intervening period -- following, as you had suggested in the first closure proposal, the List of articles here at Meta. Obviously, the number of new articles of this kind decreased dramatically when Arnomane's proposal appeared here on Meta. Is that difficult to understand? Be frank: if Arnomane had a proposal for closing eo.wp, or for deleting stuff that you cared about and thought should be kept, would you simply go on manually writing good articles without distraction? Would you really not want to come here and defend eo.wp and your work -- especially if the proposed rationale and arguments were weak? Judging by your (successful) fight to keep the en:Category:Esperantists, I'd say probably not.

Let's look at your list of desiderata. Is vo.wp really not doing that?

Fewer bot errors: have you checked the statistics? They're decreasing every day. Vo.wp is doing that. As the template error in eo:Amerika bizono -- uncorrected for over a year (onlly now that I mentioned it did you correct the template: here is the diff link; of course, the hidden English text remains there...)-- clearly shows, "fewer" errors doesn't mean "no errors": error-finding and correcting is a never-ending task in a Wikipedia. But vo.wp is doing that. Just like eo.wp, though with fewer active contributors.
Fewer short articles: look at the ones I mentioned above; we're adding non-short articles too. (But look: all Wikipedias have and are adding short, nothing-much-at-all articles. My classic example: de:Streator. Consider also eo:UK 1929 (created by Arno Lengrange and changed by you), or eo:Krampo (fiksilo). These articles could be branded as "cookie-cutter articles of nothing-much-at-all" by a grinch-like commentator; and maybe you'd even agree, since you also think there are problems in eo.wp; but are these articles really bad? Do they really have nothing to contribute? I don't think so. I think they're just short. And that's not a sin.)
Compromise from the Volapük camp: Yekrats, I sympathize with you there. I think you're saying: if you delete a few of your stubs, then you'll show that you understand our feelings. But that's my point: the only thing you accept as a compromise is the very thing that I think is harmful in the proposal. Don't my counter-proposals to you and Slomox (and Llewrch) in the discusson of Arnomane's proposal count as attempts to compromise? Don't my proposals to Arnomane count, too? I actually suggested they go ahead and place vo.wp on the de.wp spam filter if they will -- a stupidity in my opinion, because there is no spam here, but still: I see this as willing to compromise. Look, Yekrats: when you were arguing against the deletion of en:Category:Esperantists, I think the guy who never understood your criteria for defining when someone is an Esperantist also accused you of 'not compromising'. He kept saying: why don't you accept "Esperanto writers", "Esperanto activists", etc. and other -- as he put it -- better-defined categories? I'm sure he saw you as non-compromising. Let me restate my counterproposal: isn't improving the articles better than deleting them? And if it can be made with a bot -- as nl.wp, ru.wp and pt.wp have already shown -- still isn't it better? You'll have longer articles, with more information.
Yekrats, I think that, like you, I am also an idealist. As you suggested "grandfathering" vo.wp, I could simply say: well, OK, vo.wp is saved, to heck with the others! But I don't, because I think there's a fundamental wrong in this whole story. Look, I'm not even asking that you, or anybody else, like bots. Don't use them in your projects, if you think they're harmful, OK. I disagree but I respect this opinion. But don't prevent those who want to exploit this way of adding content from seeing where it takes them! I -- and others -- want to chose the path less travelled by. Don't make it impossible, in the name of an imaginary benefit!
Fewer "ethical lapses". I suppose you refer to my 'advertising', as they put it. I'm sorry if I offended your ethical feelings. I really hesitated for a while precisely because of such feelings. All I can say is: it's not as simple, or as clear, as you put it. There are shades of gray.
A focus on quality over quantity. Quality is steadily improving in many articles, too, Yekrats. Have you checked how many edits I made to vo:filosop? Or to vo:matemat? We want that too, if you'll let us figure out for ourselves how to best do that. We're few, and we're playing with several different things. We'll gladly take advice, but we'll also reserve the right to disagree. Come back after some time, say, a year (hopefully without distracting proposals), check it all out again, and tell us what you think.
Actual articles: see no.2 above. Don't worry, there'll be more and more of them. Just let us keep working. That's a promise!

Now, Yekrats. Of course your proposal is generous. Of course I understand vo.wp is not so much in danger here -- or could get out of danger more or less easily. Your proposal is moderate, perhaps even generous, from your perspective. That's why I say: the issue here is not about vo.wp. It's about bot-created stubs everywhere, and also about wiki autonomy, and whether or not wikis should decide for themselves what they prefer. It's about whether or not this kind of question is a foundational issue, or just a disagreement between people with different opinions about how to develop a project. As I said above: most of the people commenting here are not from vo.wp, and of the two who are, one partially supported your proposal. Why do you feel we're all out to get you? We're not, Yekrats. Come on, man. No conspiracy.

Give an inch! How about my counterproposals? How about the solutions I offered Arnomane? Yekrats!...

Think about it a little. You work on a Wikipedia that has lots of contributors and good articles. Then you see a smaller one creating more articles with a different method: authomatic bot scripts. You think that's unethical, so you go out and start fighting against it. The smaller Wikipedia never did your project any harm; none of the work done there was ever touched, or undone, or made more difficult. Your internal policies were never endangered, and nobody ever set out to criticize you. Then you come out, with a holier-than-thou attitude about how "bad" this smaller project is just because you think articles are too many and too short, and its community too tiny. You see another project with lots of short articles but many other more important problems ("invented" dialects, nationalistic attitudes, serious internal fighting) and you claim they "exemplify the same problem". You think these two projects pose such a threat to all the others that a new policy has to be developed to deal with such future problems -- OK, since they existed before, let them be, but still they did bad things! Very bad things! Harmful things! It's so obvious! They argue that they didn't, but it's all cloudy words, they're unethical, and they don't give an inch! I'd rather see them solve the problem internally -- but only if the solution includes what I want! Fewer bot-created stubs! Then some other people -- not from the small project with many articles -- answer that there is a question of wiki-autonomy, that this is better solved internally, and that bot-created articles may not be so bad. And you talk about trigger-happy Volapükists.

Yekrats!... What's the problem?

As you said, we all have our wikis to take care of. Because of these proposals, we have neglected them. Now, from conlanger to conlanger: what are we doing here? I'll make your words mine: wouldn't it be better if we all just went back to our work, to developing these wikis in the ways we think are best? How much work time have you lost for eo.wp by being here? We Volapükists had to be in Arnomane's proposal, since it concerns us directly; how could we avoid that? You certainly would have spent a lot of time there if you thought the proposal would be harmful to eo.wp, now wouldn't you?

Let's go back to work! Human content is always preferable to bot content, but there is no reason to delete bot content if it's readable, accurate and relevant! We've gotten to it. How about you? :-) --Smeira 16:00, 11 January 2008 (UTC)[reply]

Hmm... Preferable content is consisted of good written and well sourced NPOV encyclopedic articles. And I don't care if it is written by human, computer or some arthropoda. I really don't accept such religious arguments in talk about one encyclopedia. We are not building a monument of "human genius", but a source for getting informations. As well as I don't care if those informations would be used by humans, computers or arthropoda. --Millosh 00:23, 12 January 2008 (UTC)[reply]

What readers want (by Purodha)

Generally, I do not think that bot created articles constitute a problem.

I do see some problems arising from human-only-maintained articles that get outdated over time when no-one cares to update them.

For example, there was a new mayor elected, or population figures changed in a city. Maintaining those figures with a bot copying them from a trusted source is imho a good idea. Being a technician, I like the idea of avoiding errors, or missed data, with technical means :-) If we have enough volunteers who like to do that manually, fine. If we don't, use bots. The latter is likely more applicable to smaller projects, of course.

I do not see any ethical problems. The motivation of people is nothing we should have to care about as long as it does not lead to wrong or pov Articles.

People may create articles for the praise of Allah, because their spouse left them (and they now have time), to reach top figures, or because someone pays them. They may use bots for whatever reason. They may, of course, themselves have an ethical problem with their motivations, but that is entirely up them to deal with. The only possible ethical concerns - if we had a chance of influence - was imho with people being addicted to writing or being illegally forced into it.

My basic perspective is, what suits readers best. Quite often, I find well written articles by humans superior in quality to bot created ones, especially mass-created ones presenting only a limited amount of 'canonical' information. Yet such an article is - usually - much more valuable than no information at all.

If you are able to read one of the major languages having Wikipedias, English, German, French, Japanese, Polish, Dutch, Italian, Chinese, Spanish, Portuguese, Swedish, Norwegian, Finnish, Russian, and/or can make use of their scripts, you can look many things up there, and maybe get along with what you can find. Otherwise --- you're pretty much out of luck as I see it. Smaller, and most notably less ressourced, language communities are not only in a worse position already from 'normal life' discrimination, limited and expensive internet access, lack of people having a chance to spent their time adding to Wikipedias, etc. they should also not be allowed to use the content available - which most of their readers do not understand - and transform it in an efficient technical way into Wikipedia articles replacing no-information-at-all? This appears to me contrary to the goals of the wmf and most certainly contrary to the readers interest.

One can believe that, missing articles are less inviting to be created than stub articles are to be expanded, if 'seen' by a reader who could contribute, especially inexperienced editors. I'm uncertain about the ratio, though. Bot-created stubs are unlikely to keep readers entirely from becoming contributors. This may only be happening, if bots leave nothing to be desired -- an unlikely situation, and btw. if we were at that point, we'd be happy about our quality and diversity of content, woud we be not? ;-)

Whether or not people see bot article counts as good or bad, is their personal taste. If people want them reflected in statistics, we should include them for clarity, if nothing else. But keep in mind that statistics about edits are always problematic - having a lot of vandalism, spam, edit wars, and other such unwanted edits makes us count many double edits (every nonsense gets reverted) and net contributes nothing at all to both the quality and quantity that readers may want to receive. As long as we cannot get rid of the tara and come to better statistics of quality, I think your statistical figures are basically a reflection of the size of a user base and their activity; bot activity are a reflection of their technical skills, which may even be 'borroughed' from another language community. If people suck some 'my language scores higher' pride from that, let them do so. It is their choice. You will never be able to spare them from behaving in stupid ways, will you? Believe me, this sort of race will end after only few years. At some point it time, the subject matters lending themselves to automated stub creation will be exhausted, and there will be not many articles left to add interlanguage links to. So just be a bit calm and compassionate until then. ;-)

I have yet to meet a reader who cares how an article was created as long as it is useful, reliable, correct, well structured, exhaustive enough, and points to well selected further information sources. We as editors are first of all obliegued to make the best Wikipedia for our readers.

Imho, methods to meet that goals can be choosen freely, and unless bots, or bot operators, scare away human contributors, there should not be general restritions on them beyond obvious technical (such as server load) and well-reasoned community decisions.

As a result of that standpoint, I see most of the proposed 'solutions' as useless attempt to solve a non-problem. Nevertheless I appreciate all your effords to point things out, which last not least helped me a lot to find my position. --Purodha Blissenbach 00:26, 12 January 2008 (UTC)[reply]

A study of the Philippine Wikipedias

I find it odd that this proposal has sprung up, but at least it does some justice towards Wikipedias whose growth was bot-induced. A perfectly good example for me to use here are the Cebuano and Tagalog Wikipedias, which have been monitored by me for some time.

Within 2006 and 2007, the Cebuano Wikipedia experienced unprecedented growth from around 2,000 articles to its present-day 32,000 articles. This was done by adding (via bot) articles on French communes, and as such, the depth of the Cebuano Wikipedia is virtually zero, the lowest of all Philippine-language Wikipedias, even resulting it being removed from the list of the largest Wikipedias on the English Wikipedia's Main Page. It even raised some issues over the dominance of French communes there on a Wikipedia that mostly caters to a Philippine audience, who for the most part could care less about the smallest French political unit. While there are efforts to supplant this with organic growth by the existing community there, the proportion of human articles to bot articles is roughly around 15 or 16 to one.

The use of bots on the Tagalog Wikipedia, however, has been more controversial. From around mid-2007, four users and perhaps others have been contributing there on various topics such as the prime ministers of Thailand, Philippine actors and actresses, world capital cities, comic book characters and even beauty pageant winners (which led to a question of notability on them). On a given day, the recent changes page on the Tagalog Wikipedia would be virtually filled with the edits of one of these users. This led to unprecedented (although smaller) growth from around 8,000 to 15,000 today, becoming even more controversial when one of the users in question denied that the user was a bot, eventually resulting in a CheckUser request to determine whether or not the four users were the same, as their contributions all consisted of new articles in the following format:

Si (name) ay (subject of article). When translated, this is (Name) is (subject of article).

While I (and some other of my fellow Tagalog Wikipedians) know there that the intentions of these Wikipedians are good, it left a lot of questions as to how the existing community can possibly maintain these articles, as the Tagalog Wikipedia, in proportion to its size, is quite small, and already has difficulty maintaining existing articles. To this day, we do not know whether or not the users in question are bots, as we do not know how to check this, but I have spearheaded efforts as a newly-elected administrator there to promote quality over quantity by expanding existing articles, inviting the users in question (seeing as they deny they're bots) to expand their articles, and even instituting a recent changes patrol.

I believe that the policy may do some good for large Wikipedias, or Wikipedias with large communities. However, small Wikipedias, or Wikipedias with small communities, who have users who make bots and whose contributions far exceed the community's capacity would be punished more often that would large Wikipedias who would have communities that have the incentive to improve upon bot-made articles. I would bet that from the immediate implementation of this proposal, if it were adopted wholesale, the Cebuano Wikipedia could well be one of its first victims, seeing from the ratio I have given above. If a policy were to be eventually adopted, it would have to be to be fair to all Wikipedias concerned. In addition, the suggested length of time alloted for compliance is too short by my standards, as we have to consider that not all Wikipedias have round-the-clock administrators.

True, the intentions of bots (and their creators) may be good, but it would be best to strike a balance between human editing and bot editing. It would be wrong for the policy to alienate the use of bots on Wikipedia, where they can in turn be quite useful. The average reader could care less who wrote the article, as long as the article is informative to them, or even that the article exists (the reason why many anons complain on the Tagalog Wikipedia on the lack of a given article, in turn inserting nonsense into it). If bots can write perfectly good articles, that's alright, but a one-sentence stub that is certainly uninformative will certainly not make the cut. There should also be an incentive for users to contribute to bot articles, something which is for the most part absent in both the Wikipedias I've used as examples (an exception would be the article on Mau Marcelo on the Tagalog Wikipedia, which is now an FA there after originally being a one-sentence stub).

Overall, the policy seeks to help curb bot influence on Wikipedia, which to me is quite useful. However, a policy as major as this one should be adequately decided upon by local communities, the ones who would be the most impacted from this type of proposal, and as such, the implementation of this should be decided by them. It would be wrong for us to impose blanket policies on Wikipedias with various communities, standards and ethical considerations, so I would prefer consultation before actually deciding on this policy. --Sky Harbor 02:01, 12 January 2008 (UTC)[reply]

Bots or no bots -> What is Wikipedia?

After reading the previous comments, it occurred to me that the various opinions presented on this page could be seen as reflecting different visions of what Wikipedia is or ought to be. While trying to answer the question Hillgentleman had once asked: What is Wikipedia, I came up with the following typology, which I place here for your consideration: (I've tentatively added people's names as examples of probable supporters of each school of thought; if someone thinks I've misrepresented his/her opinion, please feel free to remove the name.)

The Pragmatic School: Wikipedia is a source of information. It does not matter how information is gathered or displayed, as long as the result is easy to read, accurate, and relevant. In fact, quality = accuracy + readability + relevance. "Wikipedia should inform its users and answer their questions; it should ultimately contain all knowledge and information and make it available to all in all languages". (See also GerardM's contribution: providing information when there is little or none.)
Consequence for bot-created articles: Bots are a good way to add such content, even by creating articles (which should be judged by their quality = accuracy + readability + relevance, not by their origin).

Consequence for small-language/community projects: making lots of previously unavailable information finally available in their language (e.g. by bots) is a plus -- in fact a giant leap forward -- and should be encouraged. Number of articles is irrelevant.

Possible supporters: Smeira, Millsoh, Purodha, GerardM.
The Aesthetic School: Wikipedia is a project by people. It does not matter how information is gathered (as long as it is accurate), but it does matter how it is displayed. Accuracy, readability and relevance are necessary but not sufficient: the elements that only humans can add -- creativity, imagination, style -- are crucial. "Wikipedia is a collaborative work of art; it should please, educate and enlighten its users, a true Renaissance project."
Consequence for bot-created articles: Bots can sometimes be useful to add content, but their "poor style" must be improved by humans, or else deleted, because they are "unworthy" ("junk", "mere database", etc.): they don't "please, educate or enlighten".

Consequence for small-language/community projects: must be limited to what can be done manually; if there aren't many contributors there shouldn't be many articles.

Possible supporters: Yekrats, Arnomane, Lou Crazy, Abdullais4u.

I don't know which of these ideals for Wikipedia is "the true one"; I see both of them as worthy -- which is why I think the bot issue must be decided by local communities: it is not clear which way to go is the best for each project. I'll offer three more comments:

For large communities (de.wp, en.wp, fr.wp), the goal of the aesthetic school makes sense: it is attainable, it probably has already been attained to an extent (which is good PR for all projects). For small communities (cho.wp, ak.wp, kl.wp), this goal is probably unattainable: they don't have sufficient manpower to produce the kinds of pages that the Aesthetic school likes on all topics. In order to remain in this school, they would have to forsake their breadth of coverage: limit themselves to a maximum of a few hundred, perhaps one or two thousand pages. Perhaps limit themselves only to topics of local interests. If they choose to do so, it is OK, IMHO: such projects, though never comparable to en.wp, are still OK. But this decision should not be forced on them. The idea of making as much information as possible available to the speakers of their languages also has merits and should remain a possibility.

The two ideals are not necessarily mutually incompatible. I can imagine a Wikipedia project that both has a sufficiently large community to aspire to the Renaissance ideal and still freely uses bots to add information and pages. For some reason, people tend to think these two things are incompatible, but I frankly don't see why. Wikipedias with large communities may not necessarily need bots for creating articles: de.wp is a good example of how this can work. If de.wp wants to do this, fine. It's their project. But I must say this not so efficient: some humans end up doing the kind of work best suited for bots (e.g. creating stubs like de:Streator) instead of contributing in more creative ways to their Wikipedia. And this is a pity.

Whatever your ideal for Wikipedia is -- one of these two or even a third one that I haven't thought of -- don't belittle other people's ideals with words: "Atom bomb"... "junk"... "botopedia"... what are these words if not emotional reactions + prejudice against other viewpoints?

--Smeira _What_is_Wikipedia?" class="ext-discussiontools-init-timestamplink">15:21, 12 January 2008 (UTC)[reply]

Merging aspects giving a good mix

I liked the idea of regarding two schools at work creating wikipedias. Yes, I feel my position truly reflected as a supporter of the "pragmatic school". Still I must admit that I am as well a moderate supporter of the "aesthetic school". My support does not go as far as calling bo-created acticles generally evil ;-) but I do, from experience, have some grievances regarding both their aesthetics, and their applicability. Let us view this in more detail.

Bot authors are always inevitably also the authors of the, let us call it acticle schemes, that their bots use. Even if these authors create good and truly aesthetic articles schemes, by repeating them a not so small number of times, they loose some of their quality. Not that there is much to wish for at times - A sentence of the form: "abc is the code/abbreviation/reference number/id/label isssued by the issuing agengy for the general class of individual entity at time for time frame under some standard or agreement for a scope of applicability." has to repeatedly appear for call signs, top level domains, various sorts of ISO codes, ship, car, and airplane registration numbers, aiport codes, station codes, carrier codes, bar codes, ISBN, EAN, and likely several hundred others like that, giving a rough estimate of many 100.000's of such sentences having to be written in various articles in any Wikipedia. Although wording and placement etc. may vary, the sheer number does not leave much room for true aesthetics, as aethetics always involve individuality ^{citation needed}.

There is an aesthetic inherent in language. Naturally depending on subject matter, you have various styles of writing at hand and can choose, often, between them. You can plough a whole range between romance, report, scientific, colloquial journalism, etc. Will programmers be able to build that into a bot? I dont't believe that. While man is certainly able to produce articles of bad style or even bad taste, or boring ones, bots will likely not be suited to augmented writing in this sense - both by being repetitive and by their limited subject matters.

I've been in touch with students who were excited to attend classes given by a specific professor of philosophy. I had read English translations of some lectures. The original language was Hindi, I believe, which I could not understand. Texts were likely well researched, collections of findings of various natural sciences put together as a base for philosophical treatment at some later time. Nothing special (to me), you find this kind of info in quite many popular western magazines and books. Students however told me, these translations were impossible to make in a way preserving the original. You can translate the data, they said, but what makes our eyes glow and lets us rise in joy when we listen in the auditory - they are sheer poetry in Hindi (in addition to being informative); you will not be able to understand, unless you learn the language; and even then, you must learn it really well, you must also learn the culture, else you will not understand &hellip I believe that, and it is this kind of quality which I want to be present in the Wikipedias, when possible, in addition to the plain information.

Let me make a point which is imho often overlooked. I am a co-founder of the Wikipedia of Ripuarian languages, modern Ripuarian languages are pretty direct descendants of one of the hundreds of language groups that todays Standard German developed from, which like many others is still being used. One of the strengths of the Ripuarian languages is precision of words (of non-science fields), for some domains of speach, e.g. we have 2 English, 3 Standard German plus 5 colloquial German and in addition a over dozen distinct Ripuarian words to choose from; naturally the amount of possible differentiaton is high in such cases. Along with this, there often goes a completely different approach to convey something with languge, which you can possibly explain, but cannot translate. Many German Wikipedians argued that, "why don't you just write in the German Wikipedia, there is noone in the Ripuarian speaking population who does not read and write German, anything that can be said in a Ripuarian language could as well be said in German, so what the heck duplicating stuff with a tiny fracton of the number of editors? You will never reach the quality, leave alone the quantity, that we already have in the German Wikipeda." My answer is: First, there are (few) Ripuarian communities whose enveloping languages are Dutch, Limburgs, or French, so they may not be as good at German, but more importantly: Second, if it was not for the fact, that there is a huge difference in the possible approaches to language use, that we have means at hand which German has not, with hardly explicable inherent quality, very hard to translate, but this is what makes the people happy when reading; if it was not for that I'd likely not even thought of joining the project.

So, at least for the projects having smaller communities, I hold my belief that bots can be used so as to unburdon men of some of the more tedious tasks, in order to allow them to increase 'aesthetic value' of the project at various other places, including, but not limited to, postprocessing bot created content to make it more appealing. Also, I again want to urge eveyone to look at current decisions with a long time perspective of maybe ten years, or more. When building houses, you have noise and dirt and piles of debris and unequal progress at various edges of the site. Beauty usually comes in and increases most quickly, finally, towards the end of the process. I do think, building Wikipedia is somehow similar. --Purodha Blissenbach 17:50, 22 April 2008 (UTC)[reply]

(New set of comments)

Votes

Because of the complex nature of this proposal, I have broken it down into three essential parts:

Should there be limits to bot-heavy Wikipedias?
Is the above suggested 3:1 ratio fair?
Is the above suggested remedy -- deleting bot-additions beyond the allowed limit -- fair?

I have voting in three parts: Full support, partial support, and opposition. If you only partially support it, could you please define what parts you are against, and why? We might be able to adjust the proposal to suit your concerns. This vote may or may not be official. Someone with more wiki-wisdom than me would have to say if it is or not.

Support Fully

Support Proposer. Yekrats 17:31, 8 January 2008 (UTC)[reply]
Support Wikipedia is for knowledge. Mere bot-created stubs don't provide knowledge, they are only for statistics. -- Felipe Aira 10:11, 9 January 2008 (UTC)[reply]
Felipe, Wikipedia is "for knowledge" to some degree - "for information" is more accurate, since "knowledge" sometimes imply understanding (and points of views) which are better for wikibooks and wikiversity. I would rather say that you look up wikipedia, and from the information there, through understanding you gain knowledge. But what do you mean by "statistics" and are you sure about the "only for"? Hillgentleman 16:15, 10 January 2008 (UTC)[reply]
For most of the times that would be a yes. Why? Because just as what Sky Harbor said in his section in this page, it is not only the Volapuk that is being plagued by these bot-created stubs, also Philippine language Wikipedias, from which I come from and am an active editor there. And I can personally say that the goal of these people who create stub-creating bots are not for Wikipedia's higher quality but for mere higher numbers. One stub creator en masse there at the Tagalog Wikipedia says at his page that the Tagalog Wikipedia should have more articles because it is the national language blah-blah-blah. See it's statistics which matters to them not quality, whereas in reality, quality should be our top priority. For me all page-creators' goal should be making their articles FA quality. -- Felipe Aira 08:41, 25 January 2008 (UTC)[reply]
Felipe, I take it that you do not disagree that it is more accurate to say that wikipedia is for information.:-) There is only one goal of wikipedia: to create a multi-lingual encyclopaedia. In this respect BOTH number of article and depth of content are important. I would like to see how you define "quality", a word often used but rarely defined precisely; I even suspect that everybody has his unique way to interpret this word as an adjective for wikipedia. What is an encyclopaedia? And here is a stupid question: Are there not very short articles even in the Encyclopaedia Britannica? Hillgentleman 00:44, 26 January 2008 (UTC)[reply]

Felipe, I understand your desire to see many longer, FA-quality articles in Philippine (or any other) Wikipedias. I agree with it. But note that this is not opposed to stubs. A project can do both things: increase the quality of specific articles and add stubs (with or without bots) with good information (= accurate: correct as stated; readable: so that humans can understand and use it; and relevant: it is encyclopedic, it is found in other encyclopedias). Editors can prefer to do one thing or the other, or they can sometimes do one and sometimes the other... but anyway, one of these activities does not prevent the other from happening, and both are perfectly compatible within the same project. Unless by "quality" you mean some numerical value like depth, which would be disturbed by the number of stubs... but then you'd be the one who worries about statistics.
As Purodha said above, people can add articles for whatever reason; it's up to them. But if the articles are good (and good stubs can be good), there is no reason to delete them, even if the author's goal was simply to increase the number of articles. Go to the author's talk page, talk with him about why s/he wants so much to increase the number of articles -- such a flawed parameter -- etc. But as long as the stubs s/he added are good stubs, why delete or limit? In what way will this lead to better articles? --Smeira 01:58, 27 January 2008 (UTC)[reply]
Support This is a good policy proposal, which will help all wikipedias. --Lou Crazy 04:57, 10 January 2008 (UTC)[reply]
I Support the proposal and like Jmb above, recommend that bot created articles not be included in article count used to calculate relative position/strength of wikipedias. I further suggest that the relative position/strength formula be revamped completely to include depth (which needs to be redefined itself) as Relative Strength (RS) = sqrt(# of articles (N) * depth (D)). The depth be redefined as # of edits (E)/# of 'good' articles (N) * # of total pages (P)/N * average number of internal links per article(L/N).
$RS=sqrt(N*D)$

$D=E/N*P/N*L/N$

Revamping the forumula to weight other factors that are hard to be emulated by bots will discourage the Smiths (who try to keep up with Joneses..heh) from creating articles for getting ahead in the rankings. --Asnatu wiki 03:56, 11 January 2008 (UTC) p.s. I'm not a contributor to a 'higher' language that feels threatened by the Smiths. I am an admin on mr, which is way down the list and is proud in the policy that prohibits employing bots for anything other than maintenance tasks.[reply]
You are welcome to your own policy. However mr.wikipedia is autonomous in this as is the vo.wikipedia. GerardM 08:26, 11 January 2008 (UTC)[reply]
I know we can have our own policies, and we do. I was clarifying that I have no hidden motives. Any wikipedia worth reading will bar (or at least control, with a tight leash) bots from creating articles willy-nilly, but that's mho. You're entitled to yours. Asnatu wiki 17:57, 11 January 2008 (UTC)[reply]
Support Bots should not be allowed to create articles. They are useful only on minor tasks (interwiki linking, double-redirect fixing, etc). Abdullais4u 11:21, 11 January 2008 (UTC)[reply]
Support, only I think that ratio of bot to human articles should be 1:3, meaning that a Wikipedia is allowed a maximum of 25% of its articles to be bot-created.--Imrek 14:08, 16 January 2008 (UTC)[reply]
Support To make an encyclopedia by using bots is foolish. Though bots are very useful, they should be used as assistants, and should not create new articles especially in Wikipedias which have not many writers. The Volapük Wikipedia is the 15th largest now, but it does not have many essential articles e.g. the Earth, the World War II, Napoleon I, christianity, alphabet, bird, atom, etc. (c.f. my research about vo-wp) It is very regrettable that the Wikipedia has only articles about cities. 93% of all articles are cities, and they are very short. I think it is not an encyclopedia. We should not repeat the same mistake. --Hisagi 11:10, 27 January 2008 (UTC)[reply]
1. Note that the perception of whether a piece of information is useful and important is largely subjective, ie it depends on the reader. The task of Wikipedia is to provide information; it is not useful to second guess whether a stub for a particular city is useful. A stub simply provides the basic information on which others can improve. If some Volapuk reader decides that there should be an article on a particular subject, such as vo:atom she should start it.Hillgentleman 05:21, 28 January 2008 (UTC)[reply]

2. "I think it is not an encyclopedia." <-- Just like the wikipedia in English in its beginning, the wikipedia in volapuk is an encyclopaedia under construction, and in the same way are they "not an encyclopedia".Hillgentleman 05:21, 28 January 2008 (UTC)[reply]

3. Many an arguement in favour of this kind of proposals has started with "I clicked on the randompage button 200 times bla bla bla" and then claimed how short most of the articles are. I personally think it is not smart, and it is certainly not the best way to browse through an encyclopaedia. Even if you do it in the English wikipedia, you will learn something, but a lot of what you learn would be useless trivia rather than actual useful knowledge. Wikipedia provides an indexing system in her categories, and in the wikipedia in volapuk the root category is vo:Klad:Klad. Please be a smart browser and use it. Hillgentleman 05:12, 28 January 2008 (UTC)[reply]

Hisagi, I see you did some serious research on the Volapük Wikipedia -- you are dismayed at the "topic imbalance" of vo.wp (i.e. 93% articles about cities), so you think bots should not be allowed to create articles and this is "a mistake". But... consider GerardM's text: providing information when there is little or none. Consider also the section Purodha wrote above, and the section I wrote -- "Bots or no bots --> what is Wikipedia". These texts address many of the concerns you express; it would be interesting if you could provide your reactions to them. Remember that vo.wp contains more information than there ever has been available in this language; the number of city stubs simply shows that it is easier to make this information available in Volapük than certain other kinds. Taking also into account that small-community Wikipedias cannot have the same goal as large-community Wikipedias -- the contributors are simply too few to ever attain "a repository of all human knowledge" -- one feels that it is at least arrogant to dictate which information should be made available or not, and in what order. Each project should be allowed to decide what to do, which goals are more interesting, based on its own assessment of its possibilities and its resources. Articles on all topics you mentioned should and will eventually be started; but this doesn't mean that city stubs are bad or should be kept under control. In fact, if you think about it, the two issues are simply independent. --Smeira 04:06, 31 January 2008 (UTC)[reply]
Support --Remulazz 13:25, 27 January 2008 (UTC)[reply]
Why? --Smeira 04:06, 31 January 2008 (UTC)[reply]
Strange question. Because I agree with Yekrats. --Remulazz 08:12, 31 January 2008 (UTC)[reply]
Not really. If you look above, you'll see you're the first one to "simply support" without giving your view on the arguments. I note Yekrats has invited discussion, and most people have added ideas. But unconditional agreement is OK too, and now that you've expressed it, your position is clear. No further questions. --Smeira 16:47, 31 January 2008 (UTC)[reply]
Support As long as remedies are corrective, not punitive. Leptictidium 12:08, 20 April 2008 (UTC)[reply]
Suggestions? --Smeira 21:14, 24 April 2008 (UTC)[reply]
Support The disproportionate use of bots to create articles is a bad use of the tools that encourages campaigns with possible conflicts of interest (from en:WP:COI, "Adding material that appears to promote the visibility of [...] personal interests"). Long-term stubs with an obvious description are not useful, so I like Jimbo's recommendation. --Vriullop 16:00, 20 April 2008 (UTC)[reply]
Define "useful". In this context, it looks to me like you haven't thought much about it -- on this very page there is a lot said about this topic. Jimbo's recommendation, by the way, would also affect your native Catalan Viquipèida -- since its speakers all speak Spanish and could use the Spanish version, a Catalan version is in principle superfluous (I don't agree with this, I'm just pointing out a logical conclusion). The 'personal interest' you mention here is to attract new contributors to a Wikipedia project -- a 'personal interest' which is supported by quite a lot of WMF derivative activity (WikiReaders, free CDs, manuals, talks, etc.) --Smeira 21:14, 24 April 2008 (UTC)[reply]
Useful: serviceable for a purpose. A stub should contain enough information for other editors to expand upon it. A long-term stub is not useful for this purpose, and Jimbo clearly said which should be the purpose of Volapük Wikipedia. Personal interest go far beyond attracting the 20-30 Volapük speakers in the world. Comments about Catalan Wikipedia are out of place, as well as if my personal opinion would change according to the ranking, as you have suggested, or if I have thought it or not.--Vriullop 12:55, 25 April 2008 (UTC)[reply]
Support I simply cannot see the point on a botpedia. I cannot find a single good reason to accept such a thing. But I see plenty of reasons to no allowing it. Beginning with a simple thing: statistics are useful only as long as their information is useful, but bot-inflated WP turn our statistics into pure nonsense.-- Wamba 90.163.224.211 19:26, 20 April 2008 (UTC)[reply]
Saying you "cannot see the point" does not mean that there isn't a point, only that you haven't looked very hard. There's a lot on this very page to justify it, if you are willing to read it. "Botopedia" is simply a word of prejudice and means nothing other than 'I don't like bots' without saying why. Please read the arguments (for both sides) on this page; they're also important. Don't vote simply out of desire to get to 15th position. --Smeira 21:14, 24 April 2008 (UTC)[reply]
Support We should stop people from misusing Wikipedia in order to advertise there languages, especially if this harms the image of Wikipedia. The number of bot-created articles should always be of such a size that the community can still maintain those articles. Marcos 11:10, 24 May 2008 (UTC)[reply]
Support Julius1990 (talk) 19:39, 11 January 2013 (UTC)[reply]
Support --92.201.182.168 21:08, 11 January 2013 (UTC)[reply]
Support Stop the cruft. Kaldari (talk) 17:56, 22 July 2014 (UTC)[reply]
Support/mi tugni. ceb.wiki should be cut down to a reasonable level of articles. KATMAKROFAN (talk) 18:36, 29 March 2018 (UTC)[reply]

Support Partially

Please indicate which sections of the proposal you agree or disagree with.

I support the limitation of bot-created articles. But I think an answer to the Volapük/Lombard "artificial article count" problem might be to exclude bot-created articles from article count figures in all Wikipedias, until substantial human edits had been performed on them. Basically, this would be a change in the definition of what an "article" is, at least for counting purposes. Thus users could use bots to add as many pages on Italian communes and NGC objects as they want, but they wouldn't show up in the article count until a human had edited them in some substantial way. There could be a warning of many months before the new count policy was implemented, so that Wikipedia communities would have time to beef up a lot of their bot articles. --Jmb 19:49, 10 January 2008 (UTC)[reply]

Oppose Partially