Hi all;
Just a quick notice about a new Google dataset related to Wikipedia.[1][2][3]
Regards, emijrp
[1] http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-bac... [2] http://ebiquity.umbc.edu/blogger/2012/05/19/google-releases-database-linking... [3] http://www-nlp.stanford.edu/pubs/crosswikis-data.tar.bz2/
Thanks, emjirp! Great news! Quoting from [2] for context before you clickthrough:
"The data set contains triples, each consisting of (i) *text*, a short, raw natural language string; (ii) *url*, a related concept, represented by an English Wikipedia article's canonical locationhttp://en.wikipedia.org/wiki/Help:URL#URLs_of_Wikipedia_pages; and (iii) *count*, an integer indicating the number of times*text* has been observed connected with the concept's *url*. Our database thus includes weights that measure degrees of association."
"The database that we are providing was designed for recall. It is large and noisy, incorporating 297,073,139 distinct string-concept pairs, aggregated over 3,152,091,432 individual links".
Published in LREC 2012:
“A Cross-Lingual Dictionary for English Wikipedia Concepts”, Valentin I. Spitkovsky http://research.google.com/pubs/author3196.html, Angel X. Changhttp://research.google.com/pubs/author39061.html , *Eighth International Conference on Language Resources and Evaluation (LREC 2012)*. http://research.google.com/pubs/archive/38098.pdf
On Sat, May 19, 2012 at 7:08 PM, emijrp emijrp@gmail.com wrote:
Hi all;
Just a quick notice about a new Google dataset related to Wikipedia.[1][2][3]
Regards, emijrp
[1] http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-bac... [2] http://ebiquity.umbc.edu/blogger/2012/05/19/google-releases-database-linking... [3] http://www-nlp.stanford.edu/pubs/crosswikis-data.tar.bz2/
-- Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com Pre-doctoral student at the University of Cádiz (Spain) Projects: AVBOT http://code.google.com/p/avbot/ | StatMediaWikihttp://statmediawiki.forja.rediris.es | WikiEvidens http://code.google.com/p/wikievidens/ | WikiPapershttp://wikipapers.referata.com | WikiTeam http://code.google.com/p/wikiteam/ Personal website: https://sites.google.com/site/emijrp/
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org