Hi everybody,
We are happy to announce an experimental RDF dump of the Wikimedia Commons. A complete
first draft is now available online at
http://nl.dbpedia.org/downloads/commonswiki/20140705/, and will be eventually accesible
from
http://commons.dbpedia.org. A small sample dataset, which may be easier to browse, is
available on Github at
https://github.com/gaurav/commons-extraction/tree/master/commonswiki/201401…
The following datasets showcases some of the improvements that we’ve been working on over
the last two months:
- File information (*-file-information.*) is a completely new dataset that contains
information on the files in the Commons, including file and thumbnail URLs, file
extensions, file type classes and MIME types.
- DBpedia’s Mappings Extractor (*-mappingbased-properties.*) uses templates stored on the
Mapping server (
http://mappings.dbpedia.org/) to create RDF for information-rich
templates. This system still has some important limitations, such as not being able to
process process embedded templates (e.g. license templates inside {{Information}}), but
top-level templates are completely configurable. The existing mappings are available at
http://mappings.dbpedia.org/index.php/Mapping_commons
- This includes 363 license templates that indicate licensing for Commons files under
public domain, Creative Commons and other open access licenses. These were created by bots
and still require verification before use. They are listed at
http://mappings.dbpedia.org/index.php/Category:Commons_media_license
- The DBpedia Geoextractor (*-geo-coordinates.*) now extracts geographical coordinates
from Commons files using the {{Location}} template.
- The DBpedia SKOS Extractor (*-skos-categories.*) now identifies relationships between
Commons categories, building a SKOS-based description of the entire Commons category
tree.
Please have a look and let us know what you think. We’ll be working on a number of open
tasks over the next three weeks, listed at
https://github.com/gaurav/extraction-framework/issues?state=open -- if you see something
wrong with what we’ve done above, or have an issue you’d particularly like us to tackle,
please report it there or drop me an e-mail!
This work is sponsored by the Google Summer of Code program
(
https://www.google-melange.com/gsoc/project/details/google/gsoc2014/gaurav/…).
Thanks!
cheers,
The DBpedia Commons extraction team:
Gaurav Vaidya
Dimitris Kontokostas
Andrea Di Menna
Jimmy O’Regan