Re: [Wikisource-l] Changing the Wikisource main page

1 Jan 2009

Any progress on this?

I'm asking it because I am going to destroy two statistics in the next weeks

1) Destroying the "article" count: Portuguese Wikisource will be bumped from
7th to 2th in article count when I finish the import of A to N articles from
a public domain dictionary. (See more about it at the end of this message).

2) Destroying the "all pages" from
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics : I've found
on Internet Archive the 24 volumes of the "Décadas da Ásia" by
http://en.wikipedia.org/wiki/Jo%C3%A3o_de_Barros and I will be uploading
those djvu files on Commons and the text extracted text on Portuguese
Wikisource in a few hours

(I've talked with some users about mass upload of OCR-ed texts and no one
have made objections until now; this is like as the bot creation of stub
articles on Wikipedias for them. I hope that you all agrees with this
opinion and no one start a "radical cleanup" propose like the one on the
Volapuk Wikipedia in December 2007 ;-) . Please note that the ProofreadPage
Statistics page have more stats and graphs higher than the "all pages" one)

-----------------------------------
More about the dictionary and the import

The portuguese speakers at Distributed Proofreaders @ Project Gutenberg are
proofreading a dictionary and making it public avaiable as the work is done
at http://dicionario-aberto.net . The "source-code" (the words database) is
also shared http://dicionario-aberto.net/sources.html. Since it is in public
domain both on United States and in Portugal (the country of origin), I'm
going to import it to the Portuguese Wikisource. I've proposed it on
http://pt.wikisource.org/wiki/Wikisource:Esplanada/Candido_de_Figueiredo_19…
ago and no one have made any objections to it until now.

...
 From A to N that dictionary have more than 85k entries
and no one on Portuguese Wikisource is opposing to import one entry per article.
I've
offered to import the dictionary on the Portuguese Wiktionary (
http://pt.wiktionary.org/wiki/Wikcion%C3%A1rio:Esplanada#Candido_de_Figueir…)
but only one user have manifest suppor for it and two have opposed;
one of
the opposers suggested to change the "no articles were found" on
pt.wiktionary a bit, pointing to search for words on that dictionary in
pt.wikisource.

The import will be started in few weeks. I'm only waiting for more opinions
on pt.wikisource and pt.wiktionary and to the FlaggedRevs getting enabled on
pt.wikisource (since bot created articles are automatically flagged as
approved and I don't have plans to spend ten years flagging those pages by
hand :-)

On Tue, Sep 16, 2008 at 9:43 PM, Syagrius &lt;syagrius(a)caramail.com&gt; wrote:

...
   De: "John
Vandenberg" &lt;jayvdb(a)gmail.com&gt;
 A: "discussion list for Wikisource,  the free library" < 
wikisource-l(a)lists.wikimedia.org&gt;
  Objet: Re: [Wikisource-l] Changing the Wikisource
main page
 Date: Sun, 14 Sep 2008 06:03:36 +1000 
  A Chinese "word" has more meaning than
a Spanish "word".  I dont have
 the numbers, but the word "word" is not the same in all languages.
 This makes words a very complex statistic.

 --
 John

 _______________________________________________
 Wikisource-l mailing list
 Wikisource-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l 
 I may have found a very simple solution : if we agree that a chinese sign
 is a word as we understand "word", than we have to found how many sign there
 are. I made a test, and found that a chinese sign is 3 octets. The very same
 statistics tells us that the average number of octets of an article on the
 chinese wikisource is 1957. So, there are 1957/3 = 652.3 words. The
 statistics counts (on may 31, 2008) 29084 articles for the chinese
 wikisource, and 652.3*29084 gives 18.9M words for total.

 The only question remaining is : why the statistics page presents 29.3M as
 the number of words for the chinese wikisource ? Is that the number of
 "groups of letters" ?

 Anyway, if we accept the figures, we would have : 1. English : 211M words -
 2. French : 125M - 3. Spanish : 41.8M - 4. Russian : 22.2M - 5. Chinese :
 18.9M - 6. Polish : 18.2M - 7. Portuguese : 15.5M - 8. Deutsch : 14.4M - 9.
 Italian : 12.0M - 10. Arabic : 10.6M.
 _______________________________________________
 Wikisource-l mailing list
 Wikisource-l(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikisource-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikisource-l] Changing the Wikisource main page