Thank you for the paper. I like the overview in this paper and the clear description of Wiktionary parsing difficulties.
In the beginning of the wikokit development I thought about Finite-state machine in order to extract data, but it was very complex for me, and Wiktionary data formatting are too various in kind or quality :) So, I selected usual procedural programming with short pieces of regular expressions.
But you project proves that Finite-state machines could be used in non-trivial situations. Great!
-- Andrew Krizhanovsky.
On Sun, Apr 7, 2013 at 8:53 AM, Sebastian Hellmann hellmann@informatik.uni-leipzig.de wrote:
Hi Andrew, some statistics are in here: http://svn.aksw.org/papers/2012/JIST_Wiktionary/public.pdf
I executed a SPARQL query on the store to do these statistics: http://downloads.dbpedia.org/wiktionary/stats_2013_04_06.csv
We tried to honor ELE[1] for extraction, so most likely, if the the Wiktionary page deviates from ELE, then results are not so good for it.
I assume you are familiar with SPARQL, because of your D2R mapping for wikokit. Here is the query: Select ?g ?p count(?p) as ?count where { Graph ?g { ?s ?p ?o } } group by ?p ?g order by desc (?g) desc(?count) It takes to long to run over http. If you are interested in more difficult statistics and calculations, I can also give you better access to our service (maybe even ssh access).
All the best, Sebastian
[1] https://en.wiktionary.org/wiki/Wiktionary:Entry_layout_explained
Am 05.04.2013 18:13, schrieb Andrew Krizhanovsky:
Thank Sebastian, for quick reply.
But these do not occur frequently. For senses these seem to be available however...
Can you count - how many senses and synonyms were successfully extracted from English Wiktionary and Russian Wiktionary, i.e. how many senses and synonyms are available now in DBpedia Wiktionary?
It will be interesting to compare with number of senses and synonyms extracted from Wiktionaries by wikokit parser, seehttp://code.google.com/p/wikokit/#Statistics
Best regards, Andrew.
On Fri, Apr 5, 2013 at 5:57 PM, Sebastian Hellmann hellmann@informatik.uni-leipzig.de wrote:
Hi Andrew, actually the tools to solve this problem are in place: http://en.wiktionary.org/wiki/house#English-abode links to a sense, the highlighting is there, also if you go to Editing Gadgets you can enable "Enable definition editing options." to add glosses. This was created by Yair_rand and it allows you to connect senses with the help of glosses such as "abode".
However, this has not received any uptake by the Wiktionary community.
The idea is to have something like (on http://en.wiktionary.org/wiki/house#English-establishment) # {{senseid|en|establishment}}An [[establishment]], whether actual, as a pub, or virtual, as a website. Particularly restaurant, casino, or financial or trading company. ...
- {{sense|establishment}} [[shop]]
... {{trans-top|an establishment}}
But these do not occur frequently. For senses these seem to be available however:
http://wiktionary.dbpedia.org/resource/as_soon_as_possible-English-Adverb-1e...
Query: http://wiktionary.dbpedia.org/sparql select * where {Graph ?g {?s http://wiktionary.dbpedia.org/terms/hasSynonym ?o } } limit 100
All the best, Sebastian
Am 05.04.2013 11:23, schrieb Andrew Krizhanovsky:
DBpedia Wiktionary - is very interesting project!
Is it possible to get list of synonyms for the first meaning of the noun "dog" now? http://en.wiktionary.org/wiki/dog#Synonyms
Best regards, Andrew Krizhanovsky.
On Fri, Apr 5, 2013 at 11:05 AM, Dimitris Kontokostas kontokostas@informatik.uni-leipzig.de wrote:
Hi Moutupsi,
You should definitely take look at DBpedia Wiktionary ( http://dbpedia.org/Wiktionary). It supports everything you want and can be easily configured for other languages.
Best, Dimitris
On Thu, Apr 4, 2013 at 4:21 AM, Moutupsi Paul mopaul@cs.stonybrook.eduwrote:
Hi All,
Greeting,
I am a CS grad student from Data Science Lab Stony Brook< https://sites.google.com/site/datascienceslab/%3E and I am dropping this mail to request information about parsing multi-lingual Wiktionary data. Our lab has been using Wikipedia data for quite a while now but we are really interested in taking advantage of the massive Wiktionary content which we feel , after proper parsing, can become an rich muti-language corpus.
But the big hurdle is a parsing tool. We have tried a few Wiktionary parsing tools
1.https://github.com/clbecker/perl-wiktionary-parser/
https://code.google.com/p/wikokit/wiki/GettingStartedWiktionaryParser
https://github.com/benreynwar/wiktionary-parser/tree/master/wiktionary_parse...
4.http://www.ukp.tu-darmstadt.de/software/jwktl/
but none of them are available in a ready-to-use or easy-to-extend in multiple language mode. (I am currently trying to work with wikokit (parser 2 above) )
I request for some advice, suggestion or redirection towards best available Wiktionary parser. We are mainly looking to extract meanings, POS, examples, translations etc. (more can never hurt).
Any help is appreciated. Kindly let know if further information is needed.
Regards,
Moutupsi
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
-- Dimitris Kontokostas Department of Computer Science, University of Leipzig Research Group:http://aksw.org Homepage:http://aksw.org/DimitrisKontokostas https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
Wiktionary-l mailing list Wiktionary-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
-- Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig Projects:http://nlp2rdf.org ,http://linguistics.okfn.org , http://dbpedia.org/Wiktionary ,http://dbpedia.org Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group:http://aksw.org
-- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Projects: http://nlp2rdf.org , http://linguistics.okfn.org , http://dbpedia.org/Wiktionary , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org