Re: [Wiktionary-l] Language counts

28 Jun 2005

      Andrew Dunbar wrote:
...
The two ways to do it are:

Parse the database. This is very difficult due to a myriad of article formats

and a very large number of articles in which the format is just broken. I did
however develop a parser just good enough to count articles and translations
by language for the non-broken examples. Sadly the hard drive on which the
code lived was destroyed in a grey-out or power surge.

Use of templates. On Wiktionary there is quite a bit of "anarchy" or

"democracy" at present so it's very difficult to introduce new features and
also to have such features used for their proposed purpose without being
extended. Also people losing interest in their new ideas, and the fact that
there are *so many* articles their to go back through and classify after
agreeing on a way to do so.
I think the best way would be to a) come to an agreement of how to make
a language-tracking template. b) Create a new parser that can find
language headings in their various variant forms. c) Use the data created
by the parser with a bot to tag all the existing articles. d) Make the new
tags compulsory.
Hoi,
When there is a decision that a particular template is to be used, it is 
possible to use a bot to replace one pattern that indicates that a word 
is in a language with this template. Many templates can be changes in 
succession that will help us to implement the chosen template. I do 
suggest that the use of the templates already in use on many of the 
wiktionaries makes sense as it will help foster cooperation between the 
different Wiktionaries as it helps us to share content. Important to 
note is, that the content of these templates is a matter of choise for 
the individual Wiktionary as long as the definition is shared by all.. I 
will be happy to help in implementing one fixed set of templates on the 
English wiktionary. When the known patterns have been replaced by the 
selected templates, it will be possible to create a list with articles 
that do not have the new templates. These have to be changed manually in 
order to identify them as words in a particular language.
Thanks,
GerardM
...
Actually a better way again might be possible with input from the devs
once Wiktionary is big enough for them to take notice (: Perhaps the
new Styles support coming might also bring along something that helps
us on en.wiktionary ?
Hippietrail
On 6/28/05, Gerard Meijssen gerard.meijssen@gmail.com wrote:
...
Hoi,
Yes there is a way to find the number of articles in a given language
and the number of languages on a wiktionary, check out the 271 languages
on the nl.wikipedia http://nl.wiktionary.org/wiki/Categorie:taal. You
will also find that all words that are categorised have a number. All
articles are categorised. :) The way it is implemented is thanks to a
great suggestion from an en.wiktionarian. It was however not possible to
implement this on the English wiktionary because some deemed it
un-lexicological.
It is done by using templates when a language is indicated. eg {{-en-}}
for an English language word.
Thanks,
GerardM
James R. Johnson wrote:
...
Is there any way to add some tag to the wiktionary so that we can get a
count of the number of different languages we have on a wiktionary, and the
number of words in each?  For example:
On EN:
This wiktionary has:
English:    50,345 words
German:   4,211 words
Japanese: 123 words
Spanish:  422 words
…..
…….
And so on.
Is that somehow possible by adding a language tag, say [[lang:en]] and have
the tags identified per wiktionary, so that en shows up as Inglés on
Spanish, Englisch on German, etc.?
Thanks,
James

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Wiktionary-l] Language counts