On Sun, Sep 14, 2008 at 2:08 AM, Syagrius syagrius@caramail.com wrote:
De: "John Vandenberg" jayvdb@gmail.com A: "discussion list for Wikisource, the free library" wikisource-l@lists.wikimedia.org Objet: Re: [Wikisource-l] Changing the Wikisource main page Date: Thu, 11 Sep 2008 09:40:56 +1000
On Thu, Sep 11, 2008 at 4:04 AM, Syagrius <<a href=mailto:"syagrius@caramail.com">syagrius@caramail.com</a>> wrote:
Hello,
As Wikipedia decided to change its main page presentation, I think that Wikisource maybe should do the same. The "War" between the spanish and
the
chinese wikisources demonstrates that the article count does not reflect
the
true depht of a Wikisource, as someone can create thousands of very small articles. What should the main page present ? I don't think that, as Wikipedia did, chosing the number of visitors would be a good idea, since Wikisource is very less known than Wikipedia, and the figures may be not reliable. If these stats from Erik Zachte can be trusted (and renewed), I would suggest that the number of words <a target="_blank"
href='http://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm%27%3Ehttp ://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm</a> may be the
fairest figure to present. The only problem would be : is the number of words given for the Chinese (and Japanese, also) wikisource correct ?
It is an interesting statistic, and I havent investigated the algorithm being used. Is there a mathematical description of the algorithm used?
My first guess is that we would need to weight it according to the entropy of each language. For example, Chinese and Japanese have a much higher entropy, so they need to be weighted higher.
What do you think of this proposition ?
If we are going to change the front page of the portal, it is these stats that I would like to see used and improved:
<a target="_blank" href='http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics'>http:/ /wikisource.org/wiki/Wikisource:ProofreadPage_Statistics</a>
We need to present ourselves as a _serious_ project, doing top quality work.
I suggest that we also feature two texts on the main portal each month:
- one work that is hosted on wikisource.org - i.e. from a language
which is _not_ on a subdomain
- one work from a subdomain, from a different sub-domain each month,
_after_ it has been selected as a featured text on the subdomain.
-- John Vandenberg
Wikisource-l mailing list Wikisource-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikisource-l
For more informations about the statistics, I think that you may contact Erik Zachte himself. I still think that the number of words is the best solution, because a Wikisource could easily create a lot of "pages" with no correction at all. I don't really understand what you mean by "entropy of each langage", but it seems that the number of words of the Chinese Wikisource is ok : 29M and 42M for the Spanish Wikisource...
See http://en.wikipedia.org/wiki/Information_entropy
A Chinese "word" has more meaning than a Spanish "word". I dont have the numbers, but the word "word" is not the same in all languages. This makes words a very complex statistic.
-- John