Am 04.06.2013 15:58, schrieb Markus Krötzsch:
Quick question: I am about to analyse the Wikidata exports (the XML dumps). How exactly do deleted pages show up in there?
They don't. And can't, for legal reasons.
When trying to Special:Export a deleted page, you get nothing (even if including old versions). Does this mean that deleted pages are not part of the dump export either (not even their old versions)? This would mean that edits that happened on deleted pages are lost in the analysis (also in Denny's current code).
Yes, they are.
It would also make it hard to update a local database based on daily exports: the export will not tell you which pages were deleted.
You can find out which pages were deleted by looking at the log dump. Not sure whether the OAI interface also exposes deletions, but I think it does.
-- daniel
PS: please remember to use wikidata-tech for technical discussion. wikidata-intern should only be for organisational issues for the team.
Thanks for the quick answer.
On 04/06/13 15:01, Daniel Kinzler wrote:
Am 04.06.2013 15:58, schrieb Markus Krötzsch:
Quick question: I am about to analyse the Wikidata exports (the XML dumps). How exactly do deleted pages show up in there?
They don't. And can't, for legal reasons.
This behaviour should not be related to legal issues: if something is legally problematic, it needs to be nuked. In any case, one could include a note about the fact that the item was deleted at some time (if the title as such is legally problematic, then it definitely needs to be nuked).
When trying to Special:Export a deleted page, you get nothing (even if including old versions). Does this mean that deleted pages are not part of the dump export either (not even their old versions)? This would mean that edits that happened on deleted pages are lost in the analysis (also in Denny's current code).
Yes, they are.
That's a pity.
It would also make it hard to update a local database based on daily exports: the export will not tell you which pages were deleted.
You can find out which pages were deleted by looking at the log dump. Not sure whether the OAI interface also exposes deletions, but I think it does.
Ok, thanks, will try this later and live with the inaccuracy for now.
Markus
-- daniel
PS: please remember to use wikidata-tech for technical discussion. wikidata-intern should only be for organisational issues for the team.
Thanks for reminding me. I had temporarily forgotten.
wikidata-tech@lists.wikimedia.org