Language counts

List overview All Threads
Download

newer

older

en.wiktionary.org temporarily...

Re: [Wikipedia-l] Re: Why do we...

James R. Johnson

27 Jun 2005 27 Jun '05

11:34 p.m.

Is there any way to add some tag to the wiktionary so that we can get a count of the number of different languages we have on a wiktionary, and the number of words in each? For example:

On EN:

This wiktionary has:

English: 50,345 words

German: 4,211 words

Japanese: 123 words

Spanish: 422 words

And so on.

Is that somehow possible by adding a language tag, say [[lang:en]] and have the tags identified per wiktionary, so that en shows up as Inglés on Spanish, Englisch on German, etc.?

Thanks,

James

Show replies by date

Gerard Meijssen

28 Jun 28 Jun

12:47 a.m.

Hoi, Yes there is a way to find the number of articles in a given language and the number of languages on a wiktionary, check out the 271 languages on the nl.wikipedia http://nl.wiktionary.org/wiki/Categorie:taal. You will also find that all words that are categorised have a number. All articles are categorised. :) The way it is implemented is thanks to a great suggestion from an en.wiktionarian. It was however not possible to implement this on the English wiktionary because some deemed it un-lexicological.

It is done by using templates when a language is indicated. eg {{-en-}} for an English language word.

Thanks, GerardM

James R. Johnson wrote:

...

Is there any way to add some tag to the wiktionary so that we can get a count of the number of different languages we have on a wiktionary, and the number of words in each? For example:

On EN:

This wiktionary has:

English: 50,345 words

German: 4,211 words

Japanese: 123 words

Spanish: 422 words

…..

……. And so on.

Is that somehow possible by adding a language tag, say [[lang:en]] and have the tags identified per wiktionary, so that en shows up as Inglés on Spanish, Englisch on German, etc.?

Thanks,

James

Andrew Dunbar

5:18 a.m.

The two ways to do it are: 1) Parse the database. This is very difficult due to a myriad of article formats and a very large number of articles in which the format is just broken. I did however develop a parser just good enough to count articles and translations by language for the non-broken examples. Sadly the hard drive on which the code lived was destroyed in a grey-out or power surge.

2) Use of templates. On Wiktionary there is quite a bit of "anarchy" or "democracy" at present so it's very difficult to introduce new features and also to have such features used for their proposed purpose without being extended. Also people losing interest in their new ideas, and the fact that there are *so many* articles their to go back through and classify after agreeing on a way to do so.

I think the best way would be to a) come to an agreement of how to make a language-tracking template. b) Create a new parser that can find language headings in their various variant forms. c) Use the data created by the parser with a bot to tag all the existing articles. d) Make the new tags compulsory.

Actually a better way again might be possible with input from the devs once Wiktionary is big enough for them to take notice (: Perhaps the new Styles support coming might also bring along something that helps us on en.wiktionary ?

Hippietrail

On 6/28/05, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, Yes there is a way to find the number of articles in a given language and the number of languages on a wiktionary, check out the 271 languages on the nl.wikipedia http://nl.wiktionary.org/wiki/Categorie:taal. You will also find that all words that are categorised have a number. All articles are categorised. :) The way it is implemented is thanks to a great suggestion from an en.wiktionarian. It was however not possible to implement this on the English wiktionary because some deemed it un-lexicological.

It is done by using templates when a language is indicated. eg {{-en-}} for an English language word.

Thanks, GerardM

James R. Johnson wrote:

...
Is there any way to add some tag to the wiktionary so that we can get a count of the number of different languages we have on a wiktionary, and the number of words in each? For example:

On EN:

This wiktionary has:

English: 50,345 words

German: 4,211 words

Japanese: 123 words

Spanish: 422 words

…..

……. And so on.

Is that somehow possible by adding a language tag, say [[lang:en]] and have the tags identified per wiktionary, so that en shows up as Inglés on Spanish, Englisch on German, etc.?

Thanks,

James

Wiktionary-l mailing list Wiktionary-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wiktionary-l

-- http://linguaphile.sf.net

Gerard Meijssen

10:50 p.m.

Andrew Dunbar wrote:

...

The two ways to do it are:

Parse the database. This is very difficult due to a myriad of article formats

and a very large number of articles in which the format is just broken. I did however develop a parser just good enough to count articles and translations by language for the non-broken examples. Sadly the hard drive on which the code lived was destroyed in a grey-out or power surge.

Use of templates. On Wiktionary there is quite a bit of "anarchy" or

"democracy" at present so it's very difficult to introduce new features and also to have such features used for their proposed purpose without being extended. Also people losing interest in their new ideas, and the fact that there are *so many* articles their to go back through and classify after agreeing on a way to do so.

I think the best way would be to a) come to an agreement of how to make a language-tracking template. b) Create a new parser that can find language headings in their various variant forms. c) Use the data created by the parser with a bot to tag all the existing articles. d) Make the new tags compulsory.

Hoi, When there is a decision that a particular template is to be used, it is possible to use a bot to replace one pattern that indicates that a word is in a language with this template. Many templates can be changes in succession that will help us to implement the chosen template. I do suggest that the use of the templates already in use on many of the wiktionaries makes sense as it will help foster cooperation between the different Wiktionaries as it helps us to share content. Important to note is, that the content of these templates is a matter of choise for the individual Wiktionary as long as the definition is shared by all.. I will be happy to help in implementing one fixed set of templates on the English wiktionary. When the known patterns have been replaced by the selected templates, it will be possible to create a list with articles that do not have the new templates. These have to be changed manually in order to identify them as words in a particular language.

Thanks, GerardM

...

Actually a better way again might be possible with input from the devs once Wiktionary is big enough for them to take notice (: Perhaps the new Styles support coming might also bring along something that helps us on en.wiktionary ?

Hippietrail

On 6/28/05, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Yes there is a way to find the number of articles in a given language and the number of languages on a wiktionary, check out the 271 languages on the nl.wikipedia http://nl.wiktionary.org/wiki/Categorie:taal. You will also find that all words that are categorised have a number. All articles are categorised. :) The way it is implemented is thanks to a great suggestion from an en.wiktionarian. It was however not possible to implement this on the English wiktionary because some deemed it un-lexicological.

It is done by using templates when a language is indicated. eg {{-en-}} for an English language word.

Thanks, GerardM

James R. Johnson wrote:

...
Is there any way to add some tag to the wiktionary so that we can get a count of the number of different languages we have on a wiktionary, and the number of words in each? For example:

On EN:

This wiktionary has:

English: 50,345 words

German: 4,211 words

Japanese: 123 words

Spanish: 422 words

…..

……. And so on.

Is that somehow possible by adding a language tag, say [[lang:en]] and have the tags identified per wiktionary, so that en shows up as Inglés on Spanish, Englisch on German, etc.?

Thanks,

James

Yann Forget

29 Jun 29 Jun

2:37 p.m.

Hi,

Le Tuesday 28 June 2005 19:20, Gerard Meijssen a écrit :

...

Andrew Dunbar wrote:

...
The two ways to do it are:

Parse the database. This is very difficult due to a myriad of article

formats and a very large number of articles in which the format is just broken. I did however develop a parser just good enough to count articles and translations by language for the non-broken examples. Sadly the hard drive on which the code lived was destroyed in a grey-out or power surge.

Use of templates. On Wiktionary there is quite a bit of "anarchy" or

"democracy" at present so it's very difficult to introduce new features and also to have such features used for their proposed purpose without being extended. Also people losing interest in their new ideas, and the fact that there are *so many* articles their to go back through and classify after agreeing on a way to do so.

I think the best way would be to a) come to an agreement of how to make a language-tracking template. b) Create a new parser that can find language headings in their various variant forms. c) Use the data created by the parser with a bot to tag all the existing articles. d) Make the new tags compulsory.

Hoi, When there is a decision that a particular template is to be used, it is possible to use a bot to replace one pattern that indicates that a word is in a language with this template. Many templates can be changes in succession that will help us to implement the chosen template. I do suggest that the use of the templates already in use on many of the wiktionaries makes sense as it will help foster cooperation between the different Wiktionaries as it helps us to share content. Important to note is, that the content of these templates is a matter of choise for the individual Wiktionary as long as the definition is shared by all.. I will be happy to help in implementing one fixed set of templates on the English wiktionary. When the known patterns have been replaced by the selected templates, it will be possible to create a list with articles that do not have the new templates. These have to be changed manually in order to identify them as words in a particular language.

This was done on the French Wiktionary when it changed from plain wiki mark up to template mark up. I did it with the help of a bot.

...

Thanks, GerardM

Regards, Yann

-- http://www.non-violence.org/ | Site collaboratif sur la non-violence http://www.forget-me.net/ | Alternatives sur le Net http://fr.wikipedia.org/ | Encyclopédie libre http://www.forget-me.net/pro/ | Formations et services Linux

Klaus-Eduard Runnel

28 Jun 28 Jun

3:56 p.m.

On 6/27/05, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Hoi, Yes there is a way to find the number of articles in a given language and the number of languages on a wiktionary, check out the 271 languages on the nl.wikipedia http://nl.wiktionary.org/wiki/Categorie:taal.

Actually, the category page tells that "Er zijn 200 woorden in deze rubriek" ("There are 200 articles in this category"), when the number is greater than 200. A fellow wiktionarian just pointed it out to me this morning. Seems to be an old bug. I'll check the buglist later today...

Klaus

Gerard Meijssen

10:37 p.m.

Klaus-Eduard Runnel wrote:

...

On 6/27/05, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Yes there is a way to find the number of articles in a given language and the number of languages on a wiktionary, check out the 271 languages on the nl.wikipedia http://nl.wiktionary.org/wiki/Categorie:taal.

Actually, the category page tells that "Er zijn 200 woorden in deze rubriek" ("There are 200 articles in this category"), when the number is greater than 200. A fellow wiktionarian just pointed it out to me this morning. Seems to be an old bug. I'll check the buglist later today...

Klaus

Hoi, This is not a bug but a "feature". This was installed to make the query of how many articles there are less expensive to run. You get 200 articles but also the option to see the next 200 articles. It is the best that is available at the moment. If the 1.5 software is more efficient, it may be possible to give us the number .. that is for the developers to decide.

Thanks, GerardM

Klaus-Eduard Runnel

10:57 p.m.

On 6/28/05, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...

Klaus-Eduard Runnel wrote:

...
On 6/27/05, Gerard Meijssen gerard.meijssen@gmail.com wrote:

...
Hoi, Yes there is a way to find the number of articles in a given language and the number of languages on a wiktionary, check out the 271 languages on the nl.wikipedia http://nl.wiktionary.org/wiki/Categorie:taal.

Actually, the category page tells that "Er zijn 200 woorden in deze

rubriek"

...
("There are 200 articles in this category"), when the number is greater

than

...

A fellow wiktionarian just pointed it out to me this morning. Seems

to

...
be an old bug. I'll check the buglist later today...

...
Klaus

Hoi, This is not a bug but a "feature". This was installed to make the query of how many articles there are less expensive to run. You get 200 articles but also the option to see the next 200 articles. It is the best that is available at the moment. If the 1.5 software is more efficient, it may be possible to give us the number .. that is for the developers to decide.

I'd say that a misleading message is a bug. At least that's what they tell you at the software testing lectures. At least that's what they taught me :)

A message like "There are 200 articles in this category" is clearly misleading when there's actually some 4000 articles in the category. It's really hard to find out the number of articles in the category following all those "next 200" links. My database skills may be a bit rusty, but it is quite suprising to hear that a query to find out the number of entries to a category is so expensive that one should avoid it. Maybe you can point me to a discussion on that matter? I might want to get enlightened :)

Klaus

Christophe Millet

29 Jun 29 Jun

2:34 a.m.

2005/6/27, James R. Johnson modean52@comcast.net:

...

Is there any way to add some tag to the wiktionary so that we can get a count of the number of different languages we have on a wiktionary, and the number of words in each?

On fr, we are using templates like {{-lang-}} equivalent to ==Langage== (and adding a category automatically also)

Then, parsing the dump, we are able to obtain exactly what you are looking for, see : http://fr.wiktionary.org/wiki/Utilisateur:Kipmaster/Stats

It seems to me that the English wiktionary is not as "templatified" as the French one (I don't know about the other wiktionaries), so it would need some more work, but a bot could help.

Kipmaster.

7127

Age (days ago)

7129

Last active (days ago)

wiktionary-l@lists.wikimedia.org

8 comments

6 participants

tags (0)

participants (6)

Andrew Dunbar
Christophe Millet
Gerard Meijssen
James R. Johnson
Klaus-Eduard Runnel
Yann Forget