I guess bots for article creation should have been forbidden at the beginning of the Wikimedia projects. Too late...?

For the moment, I recommand to show on the Wikisource main page the "number of words". Erik Zachte just released new statistics for November 30 : http://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm It could be trusted. It would be :

1. EN : 235 M
2. FR : 142 M
3. ES : 47 M
4. ZH : 37 M (but I don't think this is correct : how do the stats count "words" for Chinese Wikisource ? There is another method : 1737 [number of bytes per article]/3 [a Chinese word is 3 bytes] * 42000 [number of articles] = 24.3 M)
4. RU = 27 M
6. IT = about 22 M (there is a problem with them. The stats say there were 12000 articles on November 30, but it is wrong. There were surely 18000 or 19000 articles, like now -> so the number of 14.4 M words should be risen with its half to 22 M)
7. AR = 21.6 (good surprise, but is it correct ?)
8. PL = 18.7 M
9. DE = 18 M
10. PT = 16 M


I feel sorry for the Germans. The low number must be the result of the total use of the "Page mode". We should ask to some developpers if counting the number of words in, for example, the "Proofread pages", could be possible. This number of words could then be added to the number above...



> De: "Luiz Augusto" <lugusto@gmail.com>
> A: "discussion list for Wikisource, the free library" <wikisource-l@lists.wikimedia.org>
> Objet: Re: [Wikisource-l] Changing the Wikisource main page
> Date: Thu, 1 Jan 2009 20:41:00 -0200

Any progress on this?

I'm asking it because I am going to destroy two statistics in the next weeks

1) Destroying the "article" count: Portuguese Wikisource will be bumped from 7th to 2th in article count when I finish the import of A to N articles from a public domain dictionary. (See more about it at the end of this message).

2) Destroying the "all pages" from http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics : I've found on Internet Archive the 24 volumes of the "Décadas da Ásia" by http://en.wikipedia.org/wiki/Jo%C3%A3o_de_Barros and I will be uploading those djvu files on Commons and the text extracted text on Portuguese Wikisource in a few hours

(I've talked with some users about mass upload of OCR-ed texts and no one have made objections until now; this is like as the bot creation of stub articles on Wikipedias for them. I hope that you all agrees with this opinion and no one start a "radical cleanup" propose like the one on the Volapuk Wikipedia in December 2007 ;-) . Please note that the ProofreadPage Statistics page have more stats and graphs higher than the "all pages" one)

-----------------------------------
More about the dictionary and the import

The portuguese speakers at Distributed Proofreaders @ Project Gutenberg are proofreading a dictionary and making it public avaiable as the work is done at http://dicionario-aberto.net . The "source-code" (the words database) is also shared http://dicionario-aberto.net/sources.html. Since it is in public domain both on United States and in Portugal (the country of origin), I'm going to import it to the Portuguese Wikisource. I've proposed it on http://pt.wikisource.org/wiki/Wikisource:Esplanada/Candido_de_Figueiredo_1913 days ago and no one have made any objections to it until now.

From A to N that dictionary have more than 85k entries and no one on Portuguese Wikisource is opposing to import one entry per article. I've offered to import the dictionary on the Portuguese Wiktionary ( http://pt.wiktionary.org/wiki/Wikcion%C3%A1rio:Esplanada#Candido_de_Figueiredo_1913.2C_v.C3.A3o_querer.3F ) but only one user have manifest suppor for it and two have opposed; one of the opposers suggested to change the "no articles were found" on pt.wiktionary a bit, pointing to search for words on that dictionary in pt.wikisource.

The import will be started in few weeks. I'm only waiting for more opinions on pt.wikisource and pt.wiktionary and to the FlaggedRevs getting enabled on pt.wikisource (since bot created articles are automatically flagged as approved and I don't have plans to spend ten years flagging those pages by hand :-)


Venez découvrir des milliers de célibataires dans votre région et affinez vos recherches grâce à tous les services mis à votre disposition: recherche, annonce vocale, messagerie, photos...
>Cliquez ici