Hello All!
Is that possible to get list containing all Wiktionary words of given language through API? I have been looking on api documentation for this but found no solution.
thanks Lukasz
On 29/04/12 12:52, Łukasz Czyż wrote:
Hello All!
Is that possible to get list containing all Wiktionary words of given language through API? I have been looking on api documentation for this but found no solution.
thanks Lukasz
As there's a page per word, you could list all pages: http://en.wiktionary.org/w/api.php?action=query&list=allpages
They can however belong to different languages. So if you want to filter by language, you may want to retrieve instead pages in some subcategories of eg. http://en.wiktionary.org/wiki/Category:English_language
On 29 April 2012 17:29, Platonides platonides@gmail.com wrote:
On 29/04/12 12:52, Łukasz Czyż wrote:
Hello All!
Is that possible to get list containing all Wiktionary words of given language through API? I have been looking on api documentation for this but found no solution.
thanks Lukasz
As there's a page per word, you could list all pages: http://en.wiktionary.org/w/api.php?action=query&list=allpages
They can however belong to different languages. So if you want to filter by language, you may want to retrieve instead pages in some subcategories of eg. http://en.wiktionary.org/wiki/Category:English_language
There are a few other caveats. I summed them up in an answer to a StackOverflow question a while ago:
http://stackoverflow.com/a/4342777/527702
Andrew Dunbar (hippietrail)
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
2012/4/29 Platonides platonides@gmail.com:
As there's a page per word, you could list all pages: http://en.wiktionary.org/w/api.php?action=query&list=allpages
They can however belong to different languages.
That is why I do not find it as 100% good solution...
So if you want to filter by language, you may want to retrieve instead pages in some subcategories of eg. http://en.wiktionary.org/wiki/Category:English_language
I was aware about it, but it is still not very comfortable as it demands querying multiple categories, one by one.
I was expecting single query which provides list of all words which belong to given language, but now I suspect there is no such. I will have to use one of those queries posted by You. Thanks.
Lukasz
On 30 April 2012 11:57, Łukasz Czyż lukasz.czyzz@gmail.com wrote:
2012/4/29 Platonides platonides@gmail.com:
As there's a page per word, you could list all pages: http://en.wiktionary.org/w/api.php?action=query&list=allpages
They can however belong to different languages.
That is why I do not find it as 100% good solution...
So if you want to filter by language, you may want to retrieve instead pages in some subcategories of eg. http://en.wiktionary.org/wiki/Category:English_language
I was aware about it, but it is still not very comfortable as it demands querying multiple categories, one by one.
I was expecting single query which provides list of all words which belong to given language, but now I suspect there is no such. I will have to use one of those queries posted by You. Thanks.
Unfortunately Lukasz there are only the generic MediaWiki API and dumps, they are completely ignorant of the content and format of the various kinds of wikis. This means there are no Wiktionary-specific API or dump files with dictionary-aware content.
This could be addressed if we could attract some developers with an interest in machine readability and parsing of Wiktionary content but even after quite a few years there is still no centralized effort in that direction. Everybody who needs to do it invents their own wheel. Of course everybody wants to do something somewhat different with the data, or with a somewhat different subset of the data...
Andrew Dunbar (hippietrail)
On Sun, Apr 29, 2012 at 3:52 AM, Łukasz Czyż lukasz.czyzz@gmail.com wrote:
Hello All!
Is that possible to get list containing all Wiktionary words of given language through API? I have been looking on api documentation for this but found no solution.
Hi Łukasz,
There's not a way through the API, though I do maintain lists of all the words on the English Wiktionary at http://toolserver.org/~enwikt/definitions/
It's not split by language yet, but that would be really easy to do (in a Unix shell, just "cat enwikt-defs-latest-all.tsv.gz | grep $'^French\t').
Let me know if that's useful.
Conrad
On 2 May 2012 10:40, Conrad Irwin conrad.irwin@gmail.com wrote:
On Sun, Apr 29, 2012 at 3:52 AM, Łukasz Czyż lukasz.czyzz@gmail.com wrote:
Hello All!
Is that possible to get list containing all Wiktionary words of given language through API? I have been looking on api documentation for this but found no solution.
Hi Łukasz,
There's not a way through the API, though I do maintain lists of all the words on the English Wiktionary at http://toolserver.org/~enwikt/definitions/
It's not split by language yet, but that would be really easy to do (in a Unix shell, just "cat enwikt-defs-latest-all.tsv.gz | grep $'^French\t').
Let me know if that's useful.
Conrad
Nice to see you're still around Conrad!
Andrew Dunbar (hippietrail)
mediawiki-api@lists.wikimedia.org