Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fantasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki/Quebrada_Fantasma "/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)?
2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing.
3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase?
Thanks in advance!
Joe Wass
Hello Joe,
On 11 October 2017 at 14:27, Joe Wass jwass@crossref.org wrote:
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata. crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fantasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/ wiki/Quebrada_Fantasma"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)?
Yes, it is :)
2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing.
Yes, that is the canonical title of the page, with the exception that forward slashes need to be encoded when contacting the REST API, whereas that is not needed (but allowed) for the canonical URL. So for the page entitled "Page/SubPage", you need to provide "Page%2FSubPage" to the REST API. Note that you will still get the correct canonical URL in the `dc:isVersionOf` field.
3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase?
You can either use the `dc:isVersionOf` field, or use the simple transform: https://%7B%7Bdomain%7D%7D/api/rest_v1/page/html/%7Btitle%7D => https://%7B%7Bdomain%7D%7D/wiki/%7Btitle%7D which is guaranteed to work.
Cheers, Marko
Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian Crossref
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Brilliant, thanks very much Marko!
Joe
On 11 October 2017 at 14:19, Marko Obrovac mobrovac@wikimedia.org wrote:
Hello Joe,
On 11 October 2017 at 14:27, Joe Wass jwass@crossref.org wrote:
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fan tasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki /Quebrada_Fantasma"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)?
Yes, it is :)
2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing.
Yes, that is the canonical title of the page, with the exception that forward slashes need to be encoded when contacting the REST API, whereas that is not needed (but allowed) for the canonical URL. So for the page entitled "Page/SubPage", you need to provide "Page%2FSubPage" to the REST API. Note that you will still get the correct canonical URL in the `dc:isVersionOf` field.
3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase?
You can either use the `dc:isVersionOf` field, or use the simple transform: https://%7B%7Bdomain%7D%7D/api/rest_v1/page/html/%7Btitle%7D => https:// {{domain}}/wiki/{title} which is guaranteed to work.
Cheers, Marko
Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian Crossref
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
And, if you want to geek out on the semantic web, the rationale here is that there are multiple "documents" for a given article, one per revision. The semantic markup says that all of these documents are a "VersionOf" the canonical URL. --scott
On Wed, Oct 11, 2017 at 9:45 AM, Joe Wass jwass@crossref.org wrote:
Brilliant, thanks very much Marko!
Joe
On 11 October 2017 at 14:19, Marko Obrovac mobrovac@wikimedia.org wrote:
Hello Joe,
On 11 October 2017 at 14:27, Joe Wass jwass@crossref.org wrote:
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fan tasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki /Quebrada_Fantasma"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)?
Yes, it is :)
2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing.
Yes, that is the canonical title of the page, with the exception that forward slashes need to be encoded when contacting the REST API, whereas that is not needed (but allowed) for the canonical URL. So for the page entitled "Page/SubPage", you need to provide "Page%2FSubPage" to the REST API. Note that you will still get the correct canonical URL in the `dc:isVersionOf` field.
3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase?
You can either use the `dc:isVersionOf` field, or use the simple transform: https://%7B%7Bdomain%7D%7D/api/rest_v1/page/html/%7Btitle%7D => https:// {{domain}}/wiki/{title} which is guaranteed to work.
Cheers, Marko
Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian Crossref
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Thanks! As it happens, I have geeked out on exactly this subject for Crossref Event Data (more info https://www.eventdata.crossref.org/guide/sources/wikipedia/). Though in the interests of pragmatism I am giving consideration to switching the representation, using canonical URL as the primary entity with the version as a property. It's beyond the scope of this list, but if you're interested there's more info at the above link.
Joe
On 11 Oct 2017 6:40 p.m., "C. Scott Ananian" cananian@wikimedia.org wrote:
And, if you want to geek out on the semantic web, the rationale here is that there are multiple "documents" for a given article, one per revision. The semantic markup says that all of these documents are a "VersionOf" the canonical URL. --scott
On Wed, Oct 11, 2017 at 9:45 AM, Joe Wass jwass@crossref.org wrote:
Brilliant, thanks very much Marko!
Joe
On 11 October 2017 at 14:19, Marko Obrovac mobrovac@wikimedia.org wrote:
Hello Joe,
On 11 October 2017 at 14:27, Joe Wass jwass@crossref.org wrote:
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fan tasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki /Quebrada_Fantasma"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)?
Yes, it is :)
2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing.
Yes, that is the canonical title of the page, with the exception that forward slashes need to be encoded when contacting the REST API, whereas that is not needed (but allowed) for the canonical URL. So for the page entitled "Page/SubPage", you need to provide "Page%2FSubPage" to the REST API. Note that you will still get the correct canonical URL in the `dc:isVersionOf` field.
3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase?
You can either use the `dc:isVersionOf` field, or use the simple transform: https://%7B%7Bdomain%7D%7D/api/rest_v1/page/html/%7Btitle%7D => https:// {{domain}}/wiki/{title} which is guaranteed to work.
Cheers, Marko
Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian Crossref
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Drifting off-topic: what's annoying (to me) is that this naming system totally doesn't work for media files.
The canonical URL for a media file is: https://commons.wikimedia.org/wiki/File:Douglas_adams_portrait_cropped.jpg That has the image as well as copyright metadata, etc.
But the versions are:
https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_c...
https://upload.wikimedia.org/wikipedia/commons/archive/c/c0/20100416225428%2...
https://upload.wikimedia.org/wikipedia/commons/archive/c/c0/20100416225321%2... etc
*And it is impossible to get a permalink to the current version*. If you look above, you'll see that the eventual filename of the current version will include the timestamp of the exact time *when the current version is replaced by a newer one*. Without a time machine this is impossible to predict.
So the "canonical URL" is HTML, not the image itself, and the version URLs are not permalinks. Gah.
This makes it hard to write w3c-standard annotations ( https://phabricator.wikimedia.org/T164655 ). --scott
But the latest version of the original image is at:
https://upload.wikimedia.org/wikipedia/commons/c/c6/Hemerocallis_fulva_2016_... And the previous revisions
On Wed, Oct 11, 2017 at 4:30 PM, Joe Wass jwass@crossref.org wrote:
Thanks! As it happens, I have geeked out on exactly this subject for Crossref Event Data (more info https://www.eventdata. crossref.org/guide/sources/wikipedia/). Though in the interests of pragmatism I am giving consideration to switching the representation, using canonical URL as the primary entity with the version as a property. It's beyond the scope of this list, but if you're interested there's more info at the above link.
Joe
On 11 Oct 2017 6:40 p.m., "C. Scott Ananian" cananian@wikimedia.org wrote:
And, if you want to geek out on the semantic web, the rationale here is that there are multiple "documents" for a given article, one per revision. The semantic markup says that all of these documents are a "VersionOf" the canonical URL. --scott
On Wed, Oct 11, 2017 at 9:45 AM, Joe Wass jwass@crossref.org wrote:
Brilliant, thanks very much Marko!
Joe
On 11 October 2017 at 14:19, Marko Obrovac mobrovac@wikimedia.org wrote:
Hello Joe,
On 11 October 2017 at 14:27, Joe Wass jwass@crossref.org wrote:
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fan tasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki /Quebrada_Fantasma"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)?
Yes, it is :)
2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing.
Yes, that is the canonical title of the page, with the exception that forward slashes need to be encoded when contacting the REST API, whereas that is not needed (but allowed) for the canonical URL. So for the page entitled "Page/SubPage", you need to provide "Page%2FSubPage" to the REST API. Note that you will still get the correct canonical URL in the `dc:isVersionOf` field.
3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase?
You can either use the `dc:isVersionOf` field, or use the simple transform: https://%7B%7Bdomain%7D%7D/api/rest_v1/page/html/%7Btitle%7D => https:// {{domain}}/wiki/{title} which is guaranteed to work.
Cheers, Marko
Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian Crossref
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
-- (http://cscott.net)
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Scott, a solution to the stable image reference issue you mention (along with other caching and consistency issues) would be to move to content hash based URLs: https://phabricator.wikimedia.org/T149847
On Wed, Oct 11, 2017 at 2:36 PM, C. Scott Ananian cananian@wikimedia.org wrote:
Drifting off-topic: what's annoying (to me) is that this naming system totally doesn't work for media files.
The canonical URL for a media file is: https://commons.wikimedia.org/wiki/File:Douglas_adams_ portrait_cropped.jpg That has the image as well as copyright metadata, etc.
But the versions are: https://upload.wikimedia.org/wikipedia/commons/c/c0/ Douglas_adams_portrait_cropped.jpg https://upload.wikimedia.org/wikipedia/commons/archive/ c/c0/20100416225428%21Douglas_adams_portrait_cropped.jpg https://upload.wikimedia.org/wikipedia/commons/archive/ c/c0/20100416225321%21Douglas_adams_portrait_cropped.jpg etc
*And it is impossible to get a permalink to the current version*. If you look above, you'll see that the eventual filename of the current version will include the timestamp of the exact time *when the current version is replaced by a newer one*. Without a time machine this is impossible to predict.
So the "canonical URL" is HTML, not the image itself, and the version URLs are not permalinks. Gah.
This makes it hard to write w3c-standard annotations ( https://phabricator.wikimedia.org/T164655 ). --scott
But the latest version of the original image is at: https://upload.wikimedia.org/wikipedia/commons/c/c6/ Hemerocallis_fulva_2016_G1.jpg And the previous revisions
On Wed, Oct 11, 2017 at 4:30 PM, Joe Wass jwass@crossref.org wrote:
Thanks! As it happens, I have geeked out on exactly this subject for Crossref Event Data (more info https://www.eventdata.crossref .org/guide/sources/wikipedia/). Though in the interests of pragmatism I am giving consideration to switching the representation, using canonical URL as the primary entity with the version as a property. It's beyond the scope of this list, but if you're interested there's more info at the above link.
Joe
On 11 Oct 2017 6:40 p.m., "C. Scott Ananian" cananian@wikimedia.org wrote:
And, if you want to geek out on the semantic web, the rationale here is that there are multiple "documents" for a given article, one per revision. The semantic markup says that all of these documents are a "VersionOf" the canonical URL. --scott
On Wed, Oct 11, 2017 at 9:45 AM, Joe Wass jwass@crossref.org wrote:
Brilliant, thanks very much Marko!
Joe
On 11 October 2017 at 14:19, Marko Obrovac mobrovac@wikimedia.org wrote:
Hello Joe,
On 11 October 2017 at 14:27, Joe Wass jwass@crossref.org wrote:
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fan tasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki /Quebrada_Fantasma"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)?
Yes, it is :)
2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing.
Yes, that is the canonical title of the page, with the exception that forward slashes need to be encoded when contacting the REST API, whereas that is not needed (but allowed) for the canonical URL. So for the page entitled "Page/SubPage", you need to provide "Page%2FSubPage" to the REST API. Note that you will still get the correct canonical URL in the `dc:isVersionOf` field.
3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase?
You can either use the `dc:isVersionOf` field, or use the simple transform: https://%7B%7Bdomain%7D%7D/api/rest_v1/page/html/%7Btitle%7D => https:// {{domain}}/wiki/{title} which is guaranteed to work.
Cheers, Marko
Marko Obrovac, PhD Senior Services Engineer Wikimedia Foundation
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian Crossref
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
-- (http://cscott.net)
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services
-- (http://cscott.net)
Services mailing list Services@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/services