Awesome. Dmitry I shall make you explain "How I wrote a new RESTBase
service" :)
Will this replace or subsume
http://www.mediawiki.org/wiki/Extension:TextExtracts ? Will clients be able
to request first paragraph, first 3 sentences, etc.?
In the case of entities existing within a paragraph,
we can decide (with a
little help from Design) whether it's important to keep them inline with
the text, or strip and move them outside of the paragraph.
Will clients be able to request different kinds of stripping? It seems
really hard. If you look at the Vincent van Gogh article, its opening
sentence is
*Vincent Willem van Gogh* (Dutch: [ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx]
<https://en.wikipedia.org/wiki/Help:IPA_for_Dutch> (
<https://en.wikipedia.org/wiki/File:Nl-Vincent_van_Gogh.ogg> listen
<https://upload.wikimedia.org/wikipedia/commons/3/32/Nl-Vincent_van_Gogh.ogg>
);[note 1] <https://en.wikipedia.org/wiki/Vincent_van_Gogh#cite_note-1> 30
March 1853 – 29 July 1890) was a major Post-Impressionist
<https://en.wikipedia.org/wiki/Post-Impressionism> painter.
and it seems every client shows this differently:
Google search results snippet displays
Vincent Willem van Gogh (Dutch: [ˈvɪnsɛnt ˈʋɪləm vɑn ˈɣɔx] ( listen); 30
March 1853 – 29 July 1890) was a major ...
Clearly " (listen)" shouldn't be there. Meanwhile Wikipedia Beta Android
app and Google's Knowledge graph box remove everything in parentheses
(how?) and show two sentences:
Vincent Willem van Gogh was a major Post-Impressionist painter. He was a
Dutch artist whose work had a far-reaching influence on 20th-century art.
Wikipedia <http://en.wikipedia.org/wiki/Vincent_van_Gogh>
But the Wikipedia Beta Android app's Share as image gives me:
[image: Displaying Vincent_van_Gogh.jpg]
(I filed
https://phabricator.wikimedia.org/T102208 ).
It looks like the mobile view service
http://appservice.wmflabs.org/en.wikipedia.org/v1/mobile/app/page/lite/Vinc…
also renders the full HTML, including the "listen" speaker icon.
There's no single correct form for this snippet, it can't be decomposed
into separate bits of JSON, and the pronunciation isn't cleanly nested in
HTML for clients to easily remove the right parts of it. The mobile view
service could have an ill-defined "Do the right thing" API, or implement a
lot of named transform styles, or have some kind of domain-specific
language 8-), or always returns structured Parsoid HTML that clients strip,
or ??
Cheers,
- - - - originals to end - - - - -
On Jun 11, 2015 10:05 AM, "Dmitry Brant" <dbrant(a)wikimedia.org> wrote:
Yes, we should definitely do both, keeping in mind
that the JSON-only
service will be much more important for apps in the long run.
The part that worries me a little bit is not knowing when exactly these
services can be deployed to production at full scale. Since so many of our
brainstorming ideas for Q1 and beyond are dependent on these services, we
should have a concrete time frame for this.
The JSON service basically already exists[1] (in its infancy), and
experimenting with changes to the output JSON structure is absurdly easy.
I would suggest that we take an inventory[2] of all the non-text entities
that one might find in articles (infoboxes, tables, references, images,
math formulas, etc), and update the service to structure them as JSON. Then
we'll be free to decide how we want to present these entities natively in
the apps.
In the case of entities existing within a paragraph, we can decide (with a
little help from Design) whether it's important to keep them inline with
the text, or strip and move them outside of the paragraph. For entities
that are important to keep inline, we can still strip them out and
restructure them as JSON, but also replace the inline occurrence with a
syntax marker that the apps will recognize, and decide how to handle
natively.
Whether to preserve HTML formatting might also be a question for design /
UX research. However, at least on Android, the native TextView does support
some limited HTML tags, and we can do additional formatting with Spans if
necessary.
[1]
http://appservice.wmflabs.org/en.wikipedia.org/v1/mobile/app/page/lite/Womb…
[2]
https://etherpad.wikimedia.org/p/json-content-service-structure
On Thu, Jun 11, 2015 at 11:28 AM, Corey Floyd <cfloyd(a)wikimedia.org>
wrote:
>
> Mostly apps have been talking about this, but I think it would be good to
> get web folks involved as well.
>
> We have a lot of ideas, and this is at the top of the list for things we
> need to accomplish potential goals for the quarter. It also seems there
> are at least 2 ideas for how we should do this:
>
> 1. Deploy a service backed by Parsed that delivers better marked up HTML
> than mobile View.
> 2. Deploy a service that converts HTML to JSON and delivers that instead.
>
> My suggestion would be to do both… deliver better HTML first (this should
> be "easier"), then build another service upon that to serve JSON.
>
> The JSON spec for an article needs some discussion - but I think will be
> pretty easy to settle on.
>
> A couple of questions we will have solve:
> - Should text still be marked up in HTML? If not, what about formatting
> loss?
> - Entities like images that are between paragraphs are easy to handle,
> but what about entities within paragraphs?
>
>
> Any other thoughts here?
>
>