Awesome. Dmitry I shall make you explain "How I wrote a new RESTBase service" :)
Will this replace or subsume http://www.mediawiki.org/wiki/Extension:TextExtracts ? Will clients be able to request first paragraph, first 3 sentences, etc.?
In the case of entities existing within a paragraph, we can decide (with a little help from Design) whether it's important to keep them inline with the text, or strip and move them outside of the paragraph.
Will clients be able to request different kinds of stripping? It seems really hard. If you look at the Vincent van Gogh article, its opening sentence is
*Vincent Willem van Gogh* (Dutch: [ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx] https://en.wikipedia.org/wiki/Help:IPA_for_Dutch ( https://en.wikipedia.org/wiki/File:Nl-Vincent_van_Gogh.ogg listen https://upload.wikimedia.org/wikipedia/commons/3/32/Nl-Vincent_van_Gogh.ogg );[note 1] https://en.wikipedia.org/wiki/Vincent_van_Gogh#cite_note-1 30 March 1853 – 29 July 1890) was a major Post-Impressionist https://en.wikipedia.org/wiki/Post-Impressionism painter.
and it seems every client shows this differently:
Google search results snippet displays
Vincent Willem van Gogh (Dutch: [ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx] ( listen); 30 March 1853 – 29 July 1890) was a major ...
Clearly " (listen)" shouldn't be there. Meanwhile Wikipedia Beta Android app and Google's Knowledge graph box remove everything in parentheses (how?) and show two sentences: Vincent Willem van Gogh was a major Post-Impressionist painter. He was a Dutch artist whose work had a far-reaching influence on 20th-century art. Wikipedia http://en.wikipedia.org/wiki/Vincent_van_Gogh
But the Wikipedia Beta Android app's Share as image gives me: [image: Displaying Vincent_van_Gogh.jpg] (I filed https://phabricator.wikimedia.org/T102208 ).
It looks like the mobile view service http://appservice.wmflabs.org/en.wikipedia.org/v1/mobile/app/page/lite/Vince... also renders the full HTML, including the "listen" speaker icon.
There's no single correct form for this snippet, it can't be decomposed into separate bits of JSON, and the pronunciation isn't cleanly nested in HTML for clients to easily remove the right parts of it. The mobile view service could have an ill-defined "Do the right thing" API, or implement a lot of named transform styles, or have some kind of domain-specific language 8-), or always returns structured Parsoid HTML that clients strip, or ??
Cheers,
- - - - originals to end - - - - -
On Jun 11, 2015 10:05 AM, "Dmitry Brant" dbrant@wikimedia.org wrote:
Yes, we should definitely do both, keeping in mind that the JSON-only service will be much more important for apps in the long run. The part that worries me a little bit is not knowing when exactly these services can be deployed to production at full scale. Since so many of our brainstorming ideas for Q1 and beyond are dependent on these services, we should have a concrete time frame for this.
The JSON service basically already exists[1] (in its infancy), and experimenting with changes to the output JSON structure is absurdly easy. I would suggest that we take an inventory[2] of all the non-text entities that one might find in articles (infoboxes, tables, references, images, math formulas, etc), and update the service to structure them as JSON. Then we'll be free to decide how we want to present these entities natively in the apps.
In the case of entities existing within a paragraph, we can decide (with a little help from Design) whether it's important to keep them inline with the text, or strip and move them outside of the paragraph. For entities that are important to keep inline, we can still strip them out and restructure them as JSON, but also replace the inline occurrence with a syntax marker that the apps will recognize, and decide how to handle natively.
Whether to preserve HTML formatting might also be a question for design / UX research. However, at least on Android, the native TextView does support some limited HTML tags, and we can do additional formatting with Spans if necessary.
[1] http://appservice.wmflabs.org/en.wikipedia.org/v1/mobile/app/page/lite/Womba... [2] https://etherpad.wikimedia.org/p/json-content-service-structure
On Thu, Jun 11, 2015 at 11:28 AM, Corey Floyd cfloyd@wikimedia.org wrote:
Mostly apps have been talking about this, but I think it would be good to get web folks involved as well.
We have a lot of ideas, and this is at the top of the list for things we need to accomplish potential goals for the quarter. It also seems there are at least 2 ideas for how we should do this:
- Deploy a service backed by Parsed that delivers better marked up HTML
than mobile View. 2. Deploy a service that converts HTML to JSON and delivers that instead.
My suggestion would be to do both… deliver better HTML first (this should be "easier"), then build another service upon that to serve JSON.
The JSON spec for an article needs some discussion - but I think will be pretty easy to settle on.
A couple of questions we will have solve:
- Should text still be marked up in HTML? If not, what about formatting
loss?
- Entities like images that are between paragraphs are easy to handle,
but what about entities within paragraphs?
Any other thoughts here?
On Thu, Jun 11, 2015 at 6:14 PM, S Page spage@wikimedia.org wrote:
In the case of entities existing within a paragraph, we can decide (with a
little help from Design) whether it's important to keep them inline with the text, or strip and move them outside of the paragraph.
Will clients be able to request different kinds of stripping? It seems really hard.
It's even harder if you want it to work outside the English Wikipedia - different wikis might use different templates which generate slightly different HTML.
The skinnable content snippets proposal from the brainstorming could be abused to handle this: ** Explore the possibility of wikitext markup for handling over control over sections of content to the skin (like https://phabricator.wikimedia.org/T25796 but bigger / more generic; Winter had some great design ideas that could be implemented with such a feature)
Editors would write something like this in the source code: '''Vincent Willem van Gogh''' {{#snippet|role=pronunciation|IPA=ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx}}{{#snippet|role=birth and death|birth=30 March 1853|death=29 July 1890}} was a major [[Post-Impressionist]] painter. and the snippets would turn into different templates on different devices/skins, in some cases not producing wikicode output at all but instead pushing the information to a side channel.
You both are pointing out the real challenge here: The inline content.
The good news is that we don't have to worry about the wiki text or templates. Parsoid already has dealt with that and is providing additional markup. This makes it possible to do a 2 way conversion between HTML<->Wikitext for VE. The goal is to build upon that work to the same thing with "ParsoidHTML"->JSON.
The hope is that if the markup provided by Parsoid is enough to allow VE to do the 2 way conversion, there should be enough information for us do at least a 1 way conversion into JSON without loosing data.
Beyond using the markup to translate the HTML into JSON, we also have to figure out how to actually represent that in JSON. Especially for inline entities. Also, we have questions about wether we should still use a subset of HTML to describe the text styles (like bold, italics, etc…) The definition of the article JSON spec may be just as challenging as the actual engineering work.
On Thu, Jun 11, 2015 at 9:51 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Thu, Jun 11, 2015 at 6:14 PM, S Page spage@wikimedia.org wrote:
In the case of entities existing within a paragraph, we can decide (with
a little help from Design) whether it's important to keep them inline with the text, or strip and move them outside of the paragraph.
Will clients be able to request different kinds of stripping? It seems really hard.
It's even harder if you want it to work outside the English Wikipedia - different wikis might use different templates which generate slightly different HTML.
The skinnable content snippets proposal from the brainstorming could be abused to handle this: ** Explore the possibility of wikitext markup for handling over control over sections of content to the skin (like https://phabricator.wikimedia.org/T25796 but bigger / more generic; Winter had some great design ideas that could be implemented with such a feature)
Editors would write something like this in the source code: '''Vincent Willem van Gogh''' {{#snippet|role=pronunciation|IPA=ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx}}{{#snippet|role=birth and death|birth=30 March 1853|death=29 July 1890}} was a major [[Post-Impressionist]] painter. and the snippets would turn into different templates on different devices/skins, in some cases not producing wikicode output at all but instead pushing the information to a side channel.
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
On Thu, Jun 11, 2015 at 7:52 PM, Corey Floyd cfloyd@wikimedia.org wrote:
The good news is that we don't have to worry about the wiki text or templates. Parsoid already has dealt with that and is providing additional markup. This makes it possible to do a 2 way conversion between HTML<->Wikitext for VE. The goal is to build upon that work to the same thing with "ParsoidHTML"->JSON.
All Parsoid does is expose the wikitext structure through the HTML. It does not magically make the templates used across seven projects and nearly 300 languages (or even different sections of the same wiki) uniform.
Gergo Tisza, 12/06/2015 03:51:
Editors would write something like this in the source code: '''Vincent Willem van Gogh''' {{#snippet|role=pronunciation|IPA=ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx}}{{#snippet|role=birth and death|birth=30 March 1853|death=29 July 1890}} was a major [[Post-Impressionist]] painter.
Really? Why not use the traditional way, i.e. a CSS class that can be applied to existing templates like noprint, nomobile and friends?
Nemo
On Fri, Jun 12, 2015 at 5:27 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gergo Tisza, 12/06/2015 03:51:
Editors would write something like this in the source code: '''Vincent Willem van Gogh''' {{#snippet|role=pronunciation|IPA=ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx}}{{#snippet|role=birth and death|birth=30 March 1853|death=29 July 1890}} was a major [[Post-Impressionist]] painter.
Really? Why not use the traditional way, i.e. a CSS class that can be applied to existing templates like noprint, nomobile and friends?
Transformations should ideally be done on the server, not on the client. The latter means impairing performance by pushing unneeded content to the client, having the client do complex DOM transformations, and storing the necessary code on the client (which also means writing it in as many languages as the types of clients we have).
Yes, moving transformations that currently the client does to the server is one of the approaches the service is doing. Well, it's doing some already but hopefully more soon.
I've started documenting the service at https://www.mediawiki.org/wiki/RESTBase_services_for_apps.
And now for another service announcement: Following this discussion, during our prioritization meeting earlier we had a discussion about the missing role of a tech-product guy. (Was is tech-pro or pro-tech? I don't remember.) In any case it'll be more technical than a traditional PO role, or even the aforementioned tech-pro/pro-tech. To keep a long story from getting longer, I've volunteered to play that role for the near-term future.
Side note: What this also means for the Android team, is that I'll have less time coding on the Android app, and split my time mostly between Mobile apps Node.js service lead and Android tech lead roles.
I'm going to reach out more in the upcoming week(s) to figure out a few things: * Collect near-term requirements + add Phabricator tasks. * Prioritize them. * Reach out to the web team to learn what this team's plan is for their Node.js service and see if/how much we can share between apps and web.
-Bernd
On Fri, Jun 12, 2015 at 2:13 PM, Gergo Tisza gtisza@wikimedia.org wrote:
On Fri, Jun 12, 2015 at 5:27 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:
Gergo Tisza, 12/06/2015 03:51:
Editors would write something like this in the source code: '''Vincent Willem van Gogh''' {{#snippet|role=pronunciation|IPA=ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx}}{{#snippet|role=birth and death|birth=30 March 1853|death=29 July 1890}} was a major [[Post-Impressionist]] painter.
Really? Why not use the traditional way, i.e. a CSS class that can be applied to existing templates like noprint, nomobile and friends?
Transformations should ideally be done on the server, not on the client. The latter means impairing performance by pushing unneeded content to the client, having the client do complex DOM transformations, and storing the necessary code on the client (which also means writing it in as many languages as the types of clients we have).
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l