Thanks, emjirp! Great news! Quoting from [2] for context before you
clickthrough:
"The data set contains triples, each consisting of (i) *text*, a short, raw
natural language string; (ii) *url*, a related concept, represented by
an English
Wikipedia article's canonical
location<http://en.wikipedia.org/wiki/Help:URL#URLs_of_Wikipedia_pages&g…es>;
and (iii) *count*, an integer indicating the number of times*text* has been
observed connected with the concept's *url*. Our database thus includes
weights that measure degrees of association."
"The database that we are providing was designed for recall. It is large
and noisy, incorporating 297,073,139 distinct string-concept pairs,
aggregated over 3,152,091,432 individual links".
Published in LREC 2012:
“A Cross-Lingual Dictionary for English Wikipedia Concepts”, Valentin I.
Spitkovsky <http://research.google.com/pubs/author3196.html>, Angel X.
Chang<http://research.google.com/pubs/author39061.html>
, *Eighth International Conference on Language Resources and Evaluation
(LREC 2012)*.
http://research.google.com/pubs/archive/38098.pdf
On Sat, May 19, 2012 at 7:08 PM, emijrp <emijrp(a)gmail.com> wrote:
Hi all;
Just a quick notice about a new Google dataset related to
Wikipedia.[1][2][3]
Regards,
emijrp
[1]
http://googleresearch.blogspot.com.es/2012/05/from-words-to-concepts-and-ba…
[2]
http://ebiquity.umbc.edu/blogger/2012/05/19/google-releases-database-linkin…
[3]
http://www-nlp.stanford.edu/pubs/crosswikis-data.tar.bz2/
--
Emilio J. Rodríguez-Posada. E-mail: emijrp AT gmail DOT com
Pre-doctoral student at the University of Cádiz (Spain)
Projects: AVBOT <http://code.google.com/p/avbot/> |
StatMediaWiki<http://statmediawiki.forja.rediris.es>
| WikiEvidens <http://code.google.com/p/wikievidens/> |
WikiPapers<http://wikipapers.referata.com>
| WikiTeam <http://code.google.com/p/wikiteam/>
Personal website:
https://sites.google.com/site/emijrp/
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l