Hi all!
We are currently implementing a linked open data service linking our resources to well known authority files. In doing so we stumbled upon the usage of Wikipedia URIs in our data.
It turns out other authority files like DNB's culturegraph uses the complete title within the URI:
https://hub.culturegraph.org/entityfacts/118540238 → dewiki: https://en.wikipedia.org/wiki/Johann_Wolfgang_von_Goethe
[0] states, that Wikipedia URL are not persistent and can change, for example if a Person changes its name. So the question arises if a usage of pageID would not be a better solution in context of linked open data? In this context the upper link could be represented by
https://de.wikipedia.org/wiki?curid=2501
Are there any concerns why this procedure could be a particularily bad idea? Is the pageID in this case a persistent identifier for the page on wikipedia?
Thank you for your comments!
[0] h ttps://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme#Wikipedia_today
Tom,
I think most authorities are not using Wikipedia entries so much as Wikidata entries which I believe are persistent and point to the appropriate Wikipedia articles (in whatever languages they exist).
So for Goethe, one just has to go to: https://www.wikidata.org/wiki/Q5879 - and you will already see the burgeoning universe of connections.
Bob Kosovsky, Ph.D. -- Librarian, Rare Books and Manuscripts, Music & Recorded Sound Division The New York Public Library for the Performing Arts - Dorothy and Lewis B. Cullman Center
40 Lincoln Center Plaza, New York, NY 10023
www.nypl.org
On Tue, Apr 14, 2020 at 7:31 AM Schilling, Tom < Tom.Schilling@slub-dresden.de> wrote:
Hi all!
We are currently implementing a linked open data service linking our resources to well known authority files. In doing so we stumbled upon the usage of Wikipedia URIs in our data.
It turns out other authority files like DNB's culturegraph uses the complete title within the URI:
https://hub.culturegraph.org/entityfacts/118540238 → dewiki: https://en.wikipedia.org/wiki/Johann_Wolfgang_von_Goethe
[0] states, that Wikipedia URL are not persistent and can change, for example if a Person changes its name. So the question arises if a usage of pageID would not be a better solution in context of linked open data? In this context the upper link could be represented by
https://de.wikipedia.org/wiki?curid=2501
Are there any concerns why this procedure could be a particularily bad idea? Is the pageID in this case a persistent identifier for the page on wikipedia?
Thank you for your comments!
[0] h ttps://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme#Wikipedia_today
-- Tom Schilling project coordinator EFRE-Project "Linked Open Data
Saxon State and University Library Dresden (SLUB) Department 2, Division 2.2 (Metadata) Visiting address: Zellescher Weg 18, D-01069 Dresden Postal address: D-01054 Dresden Phone: +49 351 4677 474 E-mail: Tom.Schilling@slub-dresden.de _______________________________________________ Libraries mailing list Libraries@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/libraries
On Tue, 14 Apr 2020 at 12:30, Schilling, Tom Tom.Schilling@slub-dresden.de wrote:
It turns out other authority files like DNB's culturegraph uses the complete title within the URI:
https://hub.culturegraph.org/entityfacts/118540238 → dewiki: https://en.wikipedia.org/wiki/Johann_Wolfgang_von_Goethe
[0] states, that Wikipedia URL are not persistent and can change, for example if a Person changes its name. So the question arises if a usage of pageID would not be a better solution in context of linked open data? In this context the upper link could be represented by
https://de.wikipedia.org/wiki?curid=2501
Better still would be to use the Wikidata URI instead:
https://www.wikidata.org/entity/Q5879
with more human-readable equivalent at:
https://www.wikidata.org/wiki/Q5879
You could if you wish simply store the UID, "Q5879".
Given one of the above, the Wikidata API will always return the current Wikipedia URI, in German or any other preferred (or other available, as a fallback) language.
Thank you all for the fast reply.
We also enrich wikidata-items but in our particular use case they are not that helpful. We want to enrich our data semantically using the categories given to Wikipedia articles. Unluckily these category pages have no quivalent in wikidata (e.g. "Category:German poets" [1] is attributed to Goethe). Therefore, we need to collect those category pages directly from Wikipedia and so need a persistent URI from there.
Using the mediawiki API in the form of [2] it is easy to extract the page ID for various category pages.
[1] https://en.wikipedia.org/wiki/Category:German_poets [2] https://en.wikipedia.org/w/api.php?action=query&generator=categories&...
Am Dienstag, den 14.04.2020, 13:38 +0100 schrieb Andy Mabbett:
Better still would be to use the Wikidata URI instead:
https://www.wikidata.org/entity/Q5879
with more human-readable equivalent at:
https://www.wikidata.org/wiki/Q5879
You could if you wish simply store the UID, "Q5879".
Given one of the above, the Wikidata API will always return the current Wikipedia URI, in German or any other preferred (or other available, as a fallback) language.
Schilling, Tom, 14/04/20 14:30:
So the question arises if a usage of pageID would not be a better solution in context of linked open data?
Yes, it would be. We use it already for permalinks in Wikimedia Commons attribution strings and in Meta-Wiki pages.
Are there any concerns why this procedure could be a particularily bad idea? Is the pageID in this case a persistent identifier for the page on wikipedia?
It's not completely persistent, it can be lost. It's relatively rare though, and those cases may not matter to you. https://www.mediawiki.org/wiki/Manual:Page_table#page_id
I don't recommend it, but a more permanent ID is the oldid or rev_id, unless that revision is deleted. One advantage would be that it records what version of the page you consulted; later versions could change the topic of the page. It can also be used for some forms of links to the *page* without further redirection, for instance the page history: https://www.mediawiki.org/?oldid=3777687&action=history
Federico
Thank you Federico!
The link to the manual you provided was exactly what I was looking for. I also see the point in using a link to the versioned representation of an article. However, I think I am fine with the page ID under this circumstances.
Thank you, Tom
Am Dienstag, den 14.04.2020, 15:53 +0300 schrieb Federico Leva (Nemo):
Schilling, Tom, 14/04/20 14:30:
So the question arises if a usage of pageID would not be a better solution in context of linked open data?
Yes, it would be. We use it already for permalinks in Wikimedia Commons attribution strings and in Meta-Wiki pages.
Are there any concerns why this procedure could be a particularily bad idea? Is the pageID in this case a persistent identifier for the page on wikipedia?
It's not completely persistent, it can be lost. It's relatively rare though, and those cases may not matter to you. https://www.mediawiki.org/wiki/Manual:Page_table#page_id
I don't recommend it, but a more permanent ID is the oldid or rev_id, unless that revision is deleted. One advantage would be that it records what version of the page you consulted; later versions could change the topic of the page. It can also be used for some forms of links to the *page* without further redirection, for instance the page history: https://www.mediawiki.org/?oldid=3777687&action=history
Federico
Hi Tom,
On Apr 14, 2020, at 10:51 AM, Schilling, Tom Tom.Schilling@slub-dresden.de wrote:
The link to the manual you provided was exactly what I was looking for. I also see the point in using a link to the versioned representation of an article. However, I think I am fine with the page ID under this circumstances.
This is straying off topic a bit, but if you have the time it would be super interesting to now a little bit more about what the enrichment process that you are building, and specifically why Wikipedia categories are important.
//Ed