On Thu, Sep 11, 2008 at 4:04 AM, Syagrius syagrius@caramail.com wrote:
Hello,
As Wikipedia decided to change its main page presentation, I think that Wikisource maybe should do the same. The "War" between the spanish and the chinese wikisources demonstrates that the article count does not reflect the true depht of a Wikisource, as someone can create thousands of very small articles. What should the main page present ? I don't think that, as Wikipedia did, chosing the number of visitors would be a good idea, since Wikisource is very less known than Wikipedia, and the figures may be not reliable. If these stats from Erik Zachte can be trusted (and renewed), I would suggest that the number of words http://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm may be the fairest figure to present. The only problem would be : is the number of words given for the Chinese (and Japanese, also) wikisource correct ?
It is an interesting statistic, and I havent investigated the algorithm being used. Is there a mathematical description of the algorithm used?
My first guess is that we would need to weight it according to the entropy of each language. For example, Chinese and Japanese have a much higher entropy, so they need to be weighted higher.
What do you think of this proposition ?
If we are going to change the front page of the portal, it is these stats that I would like to see used and improved:
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics
We need to present ourselves as a _serious_ project, doing top quality work.
I suggest that we also feature two texts on the main portal each month: - one work that is hosted on wikisource.org - i.e. from a language which is _not_ on a subdomain - one work from a subdomain, from a different sub-domain each month, _after_ it has been selected as a featured text on the subdomain.
-- John Vandenberg
John Vandenberg wrote:
On Thu, Sep 11, 2008 at 4:04 AM, Syagrius syagrius@caramail.com wrote:
Hello,
As Wikipedia decided to change its main page presentation, I think that Wikisource maybe should do the same. The "War" between the spanish and the chinese wikisources demonstrates that the article count does not reflect the true depht of a Wikisource, as someone can create thousands of very small articles. What should the main page present ? I don't think that, as Wikipedia did, chosing the number of visitors would be a good idea, since Wikisource is very less known than Wikipedia, and the figures may be not reliable. If these stats from Erik Zachte can be trusted (and renewed), I would suggest that the number of words http://stats.wikimedia.org/wikisource/EN/TablesDatabaseWords.htm may be the fairest figure to present. The only problem would be : is the number of words given for the Chinese (and Japanese, also) wikisource correct ?
It is an interesting statistic, and I havent investigated the algorithm being used. Is there a mathematical description of the algorithm used?
My first guess is that we would need to weight it according to the entropy of each language. For example, Chinese and Japanese have a much higher entropy, so they need to be weighted higher.
What do you think of this proposition ?
If we are going to change the front page of the portal, it is these stats that I would like to see used and improved:
http://wikisource.org/wiki/Wikisource:ProofreadPage_Statistics
We need to present ourselves as a _serious_ project, doing top quality work.
I suggest that we also feature two texts on the main portal each month:
- one work that is hosted on wikisource.org - i.e. from a language
which is _not_ on a subdomain
- one work from a subdomain, from a different sub-domain each month,
_after_ it has been selected as a featured text on the subdomain.
Beside the articles count, the issue here is how do we evaluate the "quality" of a work? Texts proofread from a scan can be considered as good quality, but what about imported texts from other reliable sources? Is our process equivalent to these sources? Seeing that other projects like Gutenberg and Distributed Proofreaders have a long history of quality evaluation and proofreading, it is seems presomptuous to me to consider that the texts proofread within WS are of better quality than texts proofread there.
I think that what is most important is indicating the source, so that the reader can evaluate by himself the quality of a text. If we want to collaborate for displaying a featured text on the main page, we need to have some common scale between the various subdomains.
-- John Vandenberg
Regards,
Yann
Yann Forget wrote:
Beside the articles count, the issue here is how do we evaluate the "quality" of a work? Texts proofread from a scan can be considered as good quality, but what about imported texts from other reliable sources? Is our process equivalent to these sources? Seeing that other projects like Gutenberg and Distributed Proofreaders have a long history of quality evaluation and proofreading, it is seems presomptuous to me to consider that the texts proofread within WS are of better quality than texts proofread there.
I agree: if a text has been seriously proofread at Project Gutenberg and then imported on Wikisource without a reference scan, there is no reason to believe that is will remain of good quality on Wikisource, because further edits done on WS will probably be done without looking at the original scan.
Therefore when we copy PGDP texts without scans, it is presomptuous to assert that our copies are always as good as the texts found on PG. If we edit without scan, then the original on PGDP should be considered of better quality, because the PGDP editors use scans.
Thomas
wikisource-l@lists.wikimedia.org