On Thu, Sep 11, 2008 at 4:04 AM, Syagrius
<syagrius(a)caramail.com> wrote:
Hello,
As Wikipedia decided to change its main page presentation, I think that
Wikisource maybe should do the same. The "War" between the spanish and the
chinese wikisources demonstrates that the article count does not reflect the
true depht of a Wikisource, as someone can create thousands of very small
articles. What should the main page present ? I don't think that, as
Wikipedia did, chosing the number of visitors would be a good idea, since
Wikisource is very less known than Wikipedia, and the figures may be not
reliable. If these stats from Erik Zachte can be trusted (and renewed), I
would suggest that the number of words
http://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm may be the
fairest figure to present. The only problem would be : is the number of
words given for the Chinese (and Japanese, also) wikisource correct ?
It is an interesting statistic, and I havent investigated the
algorithm being used. Is there a mathematical description of the
algorithm used?
My first guess is that we would need to weight it according to the
entropy of each language. For example, Chinese and Japanese have a
much higher entropy, so they need to be weighted higher.
What do you think of this proposition ?
If we are going to change the front page of the portal, it is these
stats that I would like to see used and improved:
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics
We need to present ourselves as a _serious_ project, doing top quality work.
I suggest that we also feature two texts on the main portal each month:
- one work that is hosted on
wikisource.org - i.e. from a language
which is _not_ on a subdomain
- one work from a subdomain, from a different sub-domain each month,
_after_ it has been selected as a featured text on the subdomain.
Beside the articles count, the issue here is how do we evaluate the
"quality" of a work? Texts proofread from a scan can be considered as
good quality, but what about imported texts from other reliable sources?
Is our process equivalent to these sources? Seeing that other projects
like Gutenberg and Distributed Proofreaders have a long history of
quality evaluation and proofreading, it is seems presomptuous to me to
consider that the texts proofread within WS are of better quality than
texts proofread there.
I think that what is most important is indicating the source, so that
the reader can evaluate by himself the quality of a text.
If we want to collaborate for displaying a featured text on the main
page, we need to have some common scale between the various subdomains.