Hi folks!
We need a full mapping of WD item -> enwiki sitelinks.
1. We extracted from Dbpedia 2015-04 all statements of the form http://dbpedia.org/resource/Northern_Ireland http://www.w3.org/2002/07/owl#sameAs http://wikidata.org/entity/Q26 And the count is 5882410
2. Checked with WDQ: https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&noitems=1 "items":6263098 6.08% are missing from DBpedia. That's a lot
How to get them from Wikidata?
3. WDQ doesn't seem to return sitelinks. https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&props=enwiki returns just item numbers
4. The SPARQL endpoint doesn't seem to have them: http://wdqs-beta.wmflabs.org/
prefix schema: http://schema.org/ select * {?x schema:about ?y}
returns nothing. https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_d ifferences says "5. Depending on the instance of the service, multi-language labels and sitelinks may or may not be supported." I think this service doesn't have sitelinks: is there one that has them?
Will generate one for you.
On Wed, Aug 5, 2015 at 6:32 PM Vladimir Alexiev < vladimir.alexiev@ontotext.com> wrote:
Hi folks!
We need a full mapping of WD item -> enwiki sitelinks.
- We extracted from Dbpedia 2015-04 all statements of the form
http://dbpedia.org/resource/Northern_Ireland http://www.w3.org/2002/07/owl#sameAs http://wikidata.org/entity/Q26 And the count is 5882410
- Checked with WDQ:
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&noitems=1 "items":6263098 6.08% are missing from DBpedia. That's a lot
How to get them from Wikidata?
- WDQ doesn't seem to return sitelinks.
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&props=enwiki returns just item numbers
- The SPARQL endpoint doesn't seem to have them:
prefix schema: http://schema.org/ select * {?x schema:about ?y}
returns nothing.
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_d ifferences says "5. Depending on the instance of the service, multi-language labels and sitelinks may or may not be supported." I think this service doesn't have sitelinks: is there one that has them?
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hello Vladimir! The easiest way is, I think, to use the JSON dump [1] created every Monday. Each item is serialized in JSON and the enwiki sitelink will be, if it exists, in itemJson['sitelinks']['enwiki']['title']. And the item id in itemJson['id'] Cheers, Thomas
[1] http://dumps.wikimedia.org/other/wikidata/
On Wed, Aug 5, 2015 at 10:32 AM Vladimir Alexiev < vladimir.alexiev@ontotext.com> wrote:
Hi folks!
We need a full mapping of WD item -> enwiki sitelinks.
- We extracted from Dbpedia 2015-04 all statements of the form
http://dbpedia.org/resource/Northern_Ireland http://www.w3.org/2002/07/owl#sameAs http://wikidata.org/entity/Q26 And the count is 5882410
- Checked with WDQ:
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&noitems=1 "items":6263098 6.08% are missing from DBpedia. That's a lot
How to get them from Wikidata?
- WDQ doesn't seem to return sitelinks.
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&props=enwiki returns just item numbers
- The SPARQL endpoint doesn't seem to have them:
prefix schema: http://schema.org/ select * {?x schema:about ?y}
returns nothing.
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_d ifferences says "5. Depending on the instance of the service, multi-language labels and sitelinks may or may not be supported." I think this service doesn't have sitelinks: is there one that has them?
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Here you go: http://tools.wmflabs.org/wikidata-todo/static/item2enwiki.20150805.gz
On Wed, Aug 5, 2015 at 6:47 PM Thomas Pellissier-Tanon thomaspt@google.com wrote:
Hello Vladimir! The easiest way is, I think, to use the JSON dump [1] created every Monday. Each item is serialized in JSON and the enwiki sitelink will be, if it exists, in itemJson['sitelinks']['enwiki']['title']. And the item id in itemJson['id'] Cheers, Thomas
[1] http://dumps.wikimedia.org/other/wikidata/
On Wed, Aug 5, 2015 at 10:32 AM Vladimir Alexiev < vladimir.alexiev@ontotext.com> wrote:
Hi folks!
We need a full mapping of WD item -> enwiki sitelinks.
- We extracted from Dbpedia 2015-04 all statements of the form
http://dbpedia.org/resource/Northern_Ireland http://www.w3.org/2002/07/owl#sameAs http://wikidata.org/entity/Q26 And the count is 5882410
- Checked with WDQ:
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&noitems=1 "items":6263098 6.08% are missing from DBpedia. That's a lot
How to get them from Wikidata?
- WDQ doesn't seem to return sitelinks.
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&props=enwiki returns just item numbers
- The SPARQL endpoint doesn't seem to have them:
prefix schema: http://schema.org/ select * {?x schema:about ?y}
returns nothing.
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_d ifferences says "5. Depending on the instance of the service, multi-language labels and sitelinks may or may not be supported." I think this service doesn't have sitelinks: is there one that has them?
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
http://tools.wmflabs.org/wikidata-todo/static/item2enwiki.20150805.gz http://dumps.wikimedia.org/other/wikidata/
Thanks everyone for being so helpful!!
On Wed, Aug 5, 2015 at 8:32 PM, Vladimir Alexiev < vladimir.alexiev@ontotext.com> wrote:
Hi folks!
We need a full mapping of WD item -> enwiki sitelinks.
- We extracted from Dbpedia 2015-04 all statements of the form
http://dbpedia.org/resource/Northern_Ireland http://www.w3.org/2002/07/owl#sameAs http://wikidata.org/entity/Q26 And the count is 5882410
- Checked with WDQ:
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&noitems=1 "items":6263098 6.08% are missing from DBpedia. That's a lot
Hi Vladimir,
The reason might be that the DBpedia dumps are based on Wikidata dumps from March If you care to give it a try you can try running the extraction framework with "wikidata" and use only the "WikidataSameAsExtractor" extractor.
Cheers, Dimitris
How to get them from Wikidata?
- WDQ doesn't seem to return sitelinks.
https://wdq.wmflabs.org/api?q=link%5Benwiki%5D&props=enwiki returns just item numbers
- The SPARQL endpoint doesn't seem to have them:
prefix schema: http://schema.org/ select * {?x schema:about ?y}
returns nothing.
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#WDQS_data_d ifferences says "5. Depending on the instance of the service, multi-language labels and sitelinks may or may not be supported." I think this service doesn't have sitelinks: is there one that has them?
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
DBpedia dumps are based on Wikidata dumps from March If you care to give it a try you can try running the extraction framework with "wikidata" and use only the "WikidataSameAsExtractor" extractor.
We’ll do that, and I'll diff Magnus' output to check for drift between WD and DBP. There shouldn't be any since all lang links are sourced from WD, but just in case...
On the other hand, there is drift between WD items and WP articles. Some of it "legitimate" (e.g. I added item for Europeana Food and Drink project but I wouldn't dare write an article). Afaik, the Duplicity tool helps people work through this drift.
DBpedia dumps are based on Wikidata dumps from March
Here is the drift from DBP from Mar to current WD: 5882410 WDid-DBP.ttl 6255943 WDid-WD.ttl 763951 differences 12.7% 195209 removed lines 568742 added lines If anyone cares, I can put up the diff file somewhere.
"Removals" include - redirects, like !Bang! -> Funking_Conservatory - renaming, e.g. $O$ -> $O$_(Die_Antwoord_album) Or "It"_–_The_Album -> "It"_the_Album - renaming by removing qualifier, e.g. Centrify_(software) -> Centrify - changing Q number, e.g. "Babbacombe"_Lee Q4540274 -> Q1401456
Looking at the file confirms what Dimitris implied: changes are the result of normal editorial actions. Both files include Template: and Category: pages.
But 12.7% changes of page titles or Q numbers in 3-4 months is a lot!