As part of moving the Mobile Content Service to use Parsoid instead of action=mobileview[1] I've ran into several missing features which make it significantly harder for the Mobile Content Service to use Parsoid, while providing the same functionality as before[2]:
(1) Parsoid does not follow redirects. Automatic redirects (caused by page moves) will be (somewhat) handled by RB infrastructure[3] soon, this does not yet cover manual redirects. Even then it sounds like it would result in a 301 which the service would have to deal with. Manually (Community-) added redirects would have to be dealt differently by parsing the Parsoid payload. <link rel="mw:PageProp/redirect" href="..."/>
While we could overcome the redirect issues by following those redirects I think this should be a Parsoid feature.
(2) <img> doesn't have srcset attributes with higher-res thumbnails. So, image widening in the Android app wouldn't be feasible. While there is talk about Parsoid potentially addressing this in the future I don't know the timeline for this and don't see a good workaround for this from the service side.
(3) No direct access to the spokenWikipedia audio files (from the SpokenWikipedia templates). A more general version of this is: No direct access to transcoded audio / video files. [4] is the task in question. This task is delayed in favor of adding <section> tags.
Theoretically the service could work around this by making another MW API call. I just have not found it and I don't know how this is done for action=mobileview. If anyone has a solution that actually works please let me know. Exposing the spoken version of articles is one of our quarterly goals.[5][6]
I think we should postpone the move to Parsoid until we have at least a solution for issues #2 and #3. Subsequently, I've -2'd my patch[2] to move the service to Parsoid.
Thanks, Bernd
[1] https://phabricator.wikimedia.org/T108777 [2] https://gerrit.wikimedia.org/r/#/c/246100/ [3] https://github.com/wikimedia/restbase/pull/365 [4] https://phabricator.wikimedia.org/T64270 [5] https://phabricator.wikimedia.org/tag/mobile-app-goals/ [6] https://phabricator.wikimedia.org/T114525
On 10/15/2015 08:52 PM, Bernd Sitzmann wrote:
As part of moving the Mobile Content Service to use Parsoid instead of action=mobileview[1] I've ran into several missing features which make it significantly harder for the Mobile Content Service to use Parsoid, while providing the same functionality as before[2]:
(1) Parsoid does not follow redirects. Automatic redirects (caused by page moves) will be (somewhat) handled by RB infrastructure[3] soon, this does not yet cover manual redirects. Even then it sounds like it would result in a 301 which the service would have to deal with. Manually (Community-) added redirects would have to be dealt differently by parsing the Parsoid payload.
<link rel="mw:PageProp/redirect" href="..."/>
While we could overcome the redirect issues by following those redirects I think this should be a Parsoid feature.
Parsoid is a wt <-> html conversion service and the output for a page (even those with redirects) should reflect what is on that page. This is important, for example, if you want to edit the page with the redirect and change it to something else. However, there could be a version where Parsoid follows the redirect internally and generates the HTML when provided with an explicit API flag to do so (there could be a discussion about what should be the default mode, but there would have to be 2 different modes for sure).
In any case, given that Parsoid's HTML clients usually talk through RESTBase rather than with Parsoid directly, this optional API flag would also have to be supported in RESTBase, and could potentially follow the redirects on behalf of the clients.
(2) <img> doesn't have srcset attributes with higher-res thumbnails. So, image widening in the Android app wouldn't be feasible. While there is talk about Parsoid potentially addressing this in the future I don't know the timeline for this and don't see a good workaround for this from the service side.
This is not a huge undertaking, but the reason it hasn't been done is because of reasons in https://phabricator.wikimedia.org/T88827 .. but, we talked about this a bit at our offsite couple days back, and we are leaning towards going ahead with it. Once we have a firm grasp on that reasoning, this shouldn't take more than a week to get this all done, if that. So, this can be unblocked fairly quickly.
(3) No direct access to the spokenWikipedia audio files (from the SpokenWikipedia templates). A more general version of this is: No direct access to transcoded audio / video files. [4] is the task in question. This task is delayed in favor of adding <section> tags.
Theoretically the service could work around this by making another MW API call. I just have not found it and I don't know how this is done for action=mobileview. If anyone has a solution that actually works please let me know. Exposing the spoken version of articles is one of our quarterly goals.[5][6]
Also see https://phabricator.wikimedia.org/T113066#1667241 and https://phabricator.wikimedia.org/T114072#1704458 for context about the goal setting around these requirements.
It looks like Mobile Apps and Mobile Web have different priority requirements from Parsoid here. Looking at https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also see that there are only 1243 spoken wikipedia articles (that are probably not all the latest version of these articles). It also doesn't look like the video player works currently in mobile web or in mobile apps (except maybe Android ?).
Given all the above, I would appreciate some more clarity why this lack of immediate support for this makes this a -2 blocker ... my question should be read as follows: we try to divide our time between maintenance, supporting clients with whatever is blocking them in their use of Parsoid, and in forging ahead with other work around templates and wikitext. There are a lot of competing priorities / requests (and from multiple teams) on the parsing team's limited engineering resources. Adding multimedia support is a good thing and it helps everyone, and so we will undertake that soon anyway. But, a -2 blocker from mobile apps seems like a very strong signal. So, a little bit more interpretation and context around that will help us with our prioritization.
Thanks, Subbu.
I think we should postpone the move to Parsoid until we have at least a solution for issues #2 and #3. Subsequently, I've -2'd my patch[2] to move the service to Parsoid.
Thanks, Bernd
[1] https://phabricator.wikimedia.org/T108777 [2] https://gerrit.wikimedia.org/r/#/c/246100/ [3] https://github.com/wikimedia/restbase/pull/365 [4] https://phabricator.wikimedia.org/T64270 [5] https://phabricator.wikimedia.org/tag/mobile-app-goals/ [6] https://phabricator.wikimedia.org/T114525 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Subbu,
Parsoid is a wt <-> html conversion service and the output for a page (even
those with redirects) should reflect what is on that page. This is important, for example, if you want to edit the page with the redirect and change it to something else. However, there could be a version where Parsoid follows the redirect internally and generates the HTML when provided with an explicit API flag to do so (there could be a discussion about what should be the default mode, but there would have to be 2 different modes for sure).
Yes, 2 different modes or endpoints would be fine with me. I would argue that following redirects should have been the default behavior, but I concede that this is harder to do now as to not break your current users.
In any case, given that Parsoid's HTML clients usually talk through RESTBase rather than with Parsoid directly, this optional API flag would also have to be supported in RESTBase, and could potentially follow the redirects on behalf of the clients.
I'm not sure why RESTBase would have to support this. I think there should be also some indication in Parsoid's output to indicate that a redirect happened and from where, like it's done on the website.
(2) <img> srcset:
This is not a huge undertaking, but the reason it hasn't been done is
because of reasons in https://phabricator.wikimedia.org/T88827 .. but, we talked about this a bit at our offsite couple days back, and we are leaning towards going ahead with it. Once we have a firm grasp on that reasoning, this shouldn't take more than a week to get this all done, if that. So, this can be unblocked fairly quickly.
This is great. This is my main concern right now since that would be hardest to work around.
(3) No direct access to the spokenWikipedia audio files [...]
It looks like Mobile Apps and Mobile Web have different priority
requirements from Parsoid here. Looking at https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also see that there are only 1243 spoken wikipedia articles (that are probably not all the latest version of these articles). It also doesn't look like the video player works currently in mobile web or in mobile apps (except maybe Android ?).
Yes, the web team is at a different stage of using RESTBase and/or Parsoid than the Android app team. The Android app team wants to use RESTBase (ideally in combination with Parsoid) for the Android beta app in a couple of weeks. The web team is in more experimental stages during Q2. Yes, the Android app plays videos, too (from the Gallery). Since the spokenWikipedia usage is not so high I'd be willing to make another MW API call to get what we need (even though that would be a step back). It would be great if anyone had a better solution to get the URLs to the audio files directly. I don't want to have to load all sections of mobileview just to get a little piece of information out of that.
But, a -2 blocker from mobile apps seems like a very strong signal.
The -2 is for my own patch, just indicating that this patch is not ready to merged. Once the blockers are resolved (either by Parsoid or a workaround) then this -2 will be removed, of course. It's just a temporary thing. We're still hopeful to be using Parsoid in the future.
From my perspective, the top priority of things to fix in Parsoid out of
those three would be the missing srcset attributes.
Thanks, Bernd
On Fri, Oct 16, 2015 at 11:14 AM, Bernd Sitzmann bernd@wikimedia.org wrote:
It looks like Mobile Apps and Mobile Web have different priority
requirements from Parsoid here. Looking at https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also see that there are only 1243 spoken wikipedia articles (that are probably not all the latest version of these articles). It also doesn't look like the
video
player works currently in mobile web or in mobile apps (except maybe Android ?).
With due respect for the hard work people have put in on that project, is there any indication Spoken Articles has any traction and will grow beyond that ~1K articles? Wouldn't using Android's TTS API to read the most up-to-date version of the article be a much better user experience (35M articles, always up-to-date, instead of 1K articles, almost always out of date?)
Luis
We can indeed fall back to TTS if the spoken article is not available, or offer a choice between TTS and the spoken version. The intention was for this to be a quick win of surfacing a useful, if lesser-known, facet of Wikipedia content.
That being said, this doesn't necessarily need to be a blocker for transitioning the Content Service to Parsoid. If all else fails, we can ascertain the audio URL on the client side based on the File page name. As for transcodings of video files, we already make a separate API call to retrieve them, so perhaps we can continue to do that until we're able to get them directly from Parsoid? It sounds like a more pressing issue right now is the srcset attributes...
On Fri, Oct 16, 2015 at 2:30 PM, Luis Villa lvilla@wikimedia.org wrote:
On Fri, Oct 16, 2015 at 11:14 AM, Bernd Sitzmann bernd@wikimedia.org wrote:
It looks like Mobile Apps and Mobile Web have different priority
requirements from Parsoid here. Looking at https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also see
that
there are only 1243 spoken wikipedia articles (that are probably not
all
the latest version of these articles). It also doesn't look like the
video
player works currently in mobile web or in mobile apps (except maybe Android ?).
With due respect for the hard work people have put in on that project, is there any indication Spoken Articles has any traction and will grow beyond that ~1K articles? Wouldn't using Android's TTS API to read the most up-to-date version of the article be a much better user experience (35M articles, always up-to-date, instead of 1K articles, almost always out of date?)
Luis
-- Luis Villa Sr. Director of Community Engagement Wikimedia Foundation *Working towards a world in which every single human being can freely share in the sum of all knowledge.* _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I've mentioned this idea before, but having a service which allowed you to reliably get image thumbs for a given file at a specified width/height would obviate the srcset. And prevent cache fragmentation on img resolutions.
On Friday, October 16, 2015, Dmitry Brant dbrant@wikimedia.org wrote:
We can indeed fall back to TTS if the spoken article is not available, or offer a choice between TTS and the spoken version. The intention was for this to be a quick win of surfacing a useful, if lesser-known, facet of Wikipedia content.
That being said, this doesn't necessarily need to be a blocker for transitioning the Content Service to Parsoid. If all else fails, we can ascertain the audio URL on the client side based on the File page name. As for transcodings of video files, we already make a separate API call to retrieve them, so perhaps we can continue to do that until we're able to get them directly from Parsoid? It sounds like a more pressing issue right now is the srcset attributes...
On Fri, Oct 16, 2015 at 2:30 PM, Luis Villa <lvilla@wikimedia.org javascript:;> wrote:
On Fri, Oct 16, 2015 at 11:14 AM, Bernd Sitzmann <bernd@wikimedia.org
wrote:
It looks like Mobile Apps and Mobile Web have different priority
requirements from Parsoid here. Looking at https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also see
that
there are only 1243 spoken wikipedia articles (that are probably not
all
the latest version of these articles). It also doesn't look like the
video
player works currently in mobile web or in mobile apps (except maybe Android ?).
With due respect for the hard work people have put in on that project, is there any indication Spoken Articles has any traction and will grow
beyond
that ~1K articles? Wouldn't using Android's TTS API to read the most up-to-date version of the article be a much better user experience (35M articles, always up-to-date, instead of 1K articles, almost always out of date?)
Luis
-- Luis Villa Sr. Director of Community Engagement Wikimedia Foundation *Working towards a world in which every single human being can freely
share
in the sum of all knowledge.* _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Oct 16, 2015 at 2:50 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
I've mentioned this idea before, but having a service which allowed you to reliably get image thumbs for a given file at a specified width/height would obviate the srcset.
Our thumbs are already created on demand, based on the image width specified in the URL. Example for a 40px wide thumb:
https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Collage_of_Nine_Do...
The corresponding Parsoid HTML contains the original height & width in data attributes:
<img resource="./File:Collage_of_Nine_Dogs.jpg" src="// upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Collage_of_Nine_Dogs.jpg/260px-Collage_of_Nine_Dogs.jpg" data-file-width="1665" data-file-height="1463" data-file-type="bitmap" height="228" width="260">
Based on this information, it shouldn't be too hard to calculate 1.5x / 2x resolution thumb urls with a combination of multiplication & rounding.
And prevent cache fragmentation on img resolutions.
Isn't the srcset using a limited set of resolution factors?
On Friday, October 16, 2015, Dmitry Brant dbrant@wikimedia.org wrote:
We can indeed fall back to TTS if the spoken article is not available, or offer a choice between TTS and the spoken version. The intention was for this to be a quick win of surfacing a useful, if lesser-known, facet of Wikipedia content.
That being said, this doesn't necessarily need to be a blocker for transitioning the Content Service to Parsoid. If all else fails, we can ascertain the audio URL on the client side based on the File page name.
As
for transcodings of video files, we already make a separate API call to retrieve them, so perhaps we can continue to do that until we're able to get them directly from Parsoid? It sounds like a more pressing issue right now is the srcset
attributes...
On Fri, Oct 16, 2015 at 2:30 PM, Luis Villa <lvilla@wikimedia.org javascript:;> wrote:
On Fri, Oct 16, 2015 at 11:14 AM, Bernd Sitzmann <bernd@wikimedia.org
wrote:
It looks like Mobile Apps and Mobile Web have different priority
requirements from Parsoid here. Looking at https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also
see
that
there are only 1243 spoken wikipedia articles (that are probably
not
all
the latest version of these articles). It also doesn't look like
the
video
player works currently in mobile web or in mobile apps (except
maybe
Android ?).
With due respect for the hard work people have put in on that project,
is
there any indication Spoken Articles has any traction and will grow
beyond
that ~1K articles? Wouldn't using Android's TTS API to read the most up-to-date version of the article be a much better user experience (35M articles, always up-to-date, instead of 1K articles, almost always out
of
date?)
Luis
-- Luis Villa Sr. Director of Community Engagement Wikimedia Foundation *Working towards a world in which every single human being can freely
share
in the sum of all knowledge.* _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Fri, Oct 16, 2015 at 5:59 PM, Gabriel Wicke gwicke@wikimedia.org wrote:
On Fri, Oct 16, 2015 at 2:50 PM, Brian Gerstle bgerstle@wikimedia.org wrote:
I've mentioned this idea before, but having a service which allowed you
to
reliably get image thumbs for a given file at a specified width/height would obviate the srcset.
Our thumbs are already created on demand, based on the image width specified in the URL. Example for a 40px wide thumb:
https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Collage_of_Nine_Do...
It's been mentioned elsewhere (I believe by Gergo) that these URLs aren't stable, and can't be reliably constructed by clients. Is that still the case?
The corresponding Parsoid HTML contains the original height & width in data attributes:
<img resource="./File:Collage_of_Nine_Dogs.jpg" src="//
upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Collage_of_Nine_Dogs.jpg/260px-Collage_of_Nine_Dogs.jpg " data-file-width="1665" data-file-height="1463" data-file-type="bitmap" height="228" width="260">
Based on this information, it shouldn't be too hard to calculate 1.5x / 2x resolution thumb urls with a combination of multiplication & rounding.
And prevent cache fragmentation on img resolutions.
Isn't the srcset using a limited set of resolution factors?
On Friday, October 16, 2015, Dmitry Brant dbrant@wikimedia.org wrote:
We can indeed fall back to TTS if the spoken article is not available,
or
offer a choice between TTS and the spoken version. The intention was
for
this to be a quick win of surfacing a useful, if lesser-known, facet of Wikipedia content.
That being said, this doesn't necessarily need to be a blocker for transitioning the Content Service to Parsoid. If all else fails, we can ascertain the audio URL on the client side based on the File page name.
As
for transcodings of video files, we already make a separate API call to retrieve them, so perhaps we can continue to do that until we're able
to
get them directly from Parsoid? It sounds like a more pressing issue right now is the srcset
attributes...
On Fri, Oct 16, 2015 at 2:30 PM, Luis Villa <lvilla@wikimedia.org javascript:;> wrote:
On Fri, Oct 16, 2015 at 11:14 AM, Bernd Sitzmann <
bernd@wikimedia.org
wrote:
It looks like Mobile Apps and Mobile Web have different priority
requirements from Parsoid here. Looking at https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also
see
that
there are only 1243 spoken wikipedia articles (that are probably
not
all
the latest version of these articles). It also doesn't look like
the
video
player works currently in mobile web or in mobile apps (except
maybe
Android ?).
With due respect for the hard work people have put in on that
project,
is
there any indication Spoken Articles has any traction and will grow
beyond
that ~1K articles? Wouldn't using Android's TTS API to read the most up-to-date version of the article be a much better user experience
(35M
articles, always up-to-date, instead of 1K articles, almost always
out
of
date?)
Luis
-- Luis Villa Sr. Director of Community Engagement Wikimedia Foundation *Working towards a world in which every single human being can freely
share
in the sum of all knowledge.* _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Dmitry Brant Mobile Apps Team (Android) Wikimedia Foundation https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle IRC: bgerstle _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Gabriel Wicke Principal Engineer, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2015-10-19 17:28, Brian Gerstle wrote:
It's been mentioned elsewhere (I believe by Gergo) that these URLs aren't stable, and can't be reliably constructed by clients. Is that still the case?
The URL looks different for very long file names and for different file types. There is an API endpoint to get it, though: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=image...
Right, but the point is to avoid an extra round trip just to get a URL.
On Mon, Oct 19, 2015 at 3:20 PM, Bartosz Dziewoński matma.rex@gmail.com wrote:
On 2015-10-19 17:28, Brian Gerstle wrote:
It's been mentioned elsewhere (I believe by Gergo) that these URLs aren't stable, and can't be reliably constructed by clients. Is that still the case?
The URL looks different for very long file names and for different file types. There is an API endpoint to get it, though: https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=image...
-- Bartosz Dziewoński
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2015-10-19 22:56, Brian Gerstle wrote:
Right, but the point is to avoid an extra round trip just to get a URL.
More of an extra roundtrip to get all the URLs on the page, since you can get results for up to 50 titles at once, but I see your point.
On 10/16/2015 01:14 PM, Bernd Sitzmann wrote:
In any case, given that Parsoid's HTML clients usually talk through RESTBase rather than with Parsoid directly, this optional API flag would also have to be supported in RESTBase, and could potentially follow the redirects on behalf of the clients.
I'm not sure why RESTBase would have to support this. I think there should be also some indication in Parsoid's output to indicate that a redirect happened and from where, like it's done on the website.
No client talks to Parsoid directly anymore (except temporarily while resolving some issue maybe). So, any API flags that need to be passed through to Parsoid should be exposed in the RESTBase API as well.
Given that, I observed that RESTBase could potentially follow the redirect instead of passing it to Parsoid to follow the redirect (and eliminate the extra hop). But, yes, it does make sense to do this in Parsoid directly behind an API flag or a different endpoint.
Subbu.
wikitech-l@lists.wikimedia.org