Wiki-specific parsing in mobile "services"

List overview All Threads
Download

newer

older

Removing wfIncrStats from...

Wikipedia app v5.3.4 for iOS...

Federico Leva (Nemo)

12 Jan 2017 12 Jan '17

3:41 a.m.

Is it considered acceptable now to produce a service or API that hardcodes wiki-specific parsing of certain wikitext or HTML patterns in certain wiki pages (such as the "On this day" section of the main page of one wiki)?

I'm confused by the status of things and after my comment https://phabricator.wikimedia.org/T143408#2919000 I see little effort on finding solutions potentially able to scale to all our projects and languages (which I assume to be the mission, see "globally" in https://wikimediafoundation.org/wiki/Mission_statement ; please point it out if this assumption is incorrect).

It might be that wiki-specific parsing hardcoded in MediaWiki/Wikimedia code is actually able to scale, if written correctly; a comment on the association patch seemed to imply so. This would be a very surprising finding, and one which goes against 15 years of experience, so if we have some examples or evidence of this it would be very worthwhile to point them out.

Nemo

Show replies by date

Monte Hurd

12 Jan 12 Jan

12:47 p.m.

...

I see little effort on finding solutions potentially able to scale to all

our projects and languages

See my reply to your initial comment on that ticket. This was just a first hack at implementing this functionality. If you had simply asked if there were plans to expand this to other projects/languages the answer you would have received would have been "absolutely! this is just a first pass".

I have pasted the referenced comment and response from that ticket below:

...

It's inappropriate an unsustainable to hardcode such wiki-specific

parsing in our code.

In the future please consider the tone and impact of your comments before clicking "post".

Imagine you were, say, a first-time volunteer contributor and the first piece of feedback you received was the comment you posted above. How would that make you feel about even trying to contribute?

I'm not saying there's not a kernel of truth in your comment, because there is*, but the way you phrased it actually inclines me to take your opinion, and you, far less seriously.

*I agree it would indeed be better to have this endpoint work across all language wikis. I intend to examine such functionality, but as a first pass I chose to implement the core logic for my native language. My implementation should be fairly easy to modify, as well, because I had this eventuality in mind throughout development.

On Thu, Jan 12, 2017 at 12:41 AM, Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Is it considered acceptable now to produce a service or API that hardcodes wiki-specific parsing of certain wikitext or HTML patterns in certain wiki pages (such as the "On this day" section of the main page of one wiki)?

I'm confused by the status of things and after my comment https://phabricator.wikimedia.org/T143408#2919000 I see little effort on finding solutions potentially able to scale to all our projects and languages (which I assume to be the mission, see "globally" in https://wikimediafoundation.org/wiki/Mission_statement ; please point it out if this assumption is incorrect).

It might be that wiki-specific parsing hardcoded in MediaWiki/Wikimedia code is actually able to scale, if written correctly; a comment on the association patch seemed to imply so. This would be a very surprising finding, and one which goes against 15 years of experience, so if we have some examples or evidence of this it would be very worthwhile to point them out.

Nemo

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Florian Schmidt

1:05 p.m.

Monte Hurd

1:44 p.m.

Thanks Florian.

I see your point about ease of forking. Would you have time later, perhaps off thread, to detail the challenges you faced?

Regarding the data endpoint topic of this thread, it isn't app-specific despite being part of 'mediawiki/services/mobileapps'. We'll probably want a more generic name with only truly app specific endpoints named accordingly.

On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt < florian.schmidt.welzow@t-online.de> wrote:

...

What is also in my mind: the app wasn't easy to fork and use on other (third-party) projects before that, too, however with the track in this direction (specific Parsons of contents, which apply to Wikimedia projects only) makes it, in reality, impossible to fork (and maintain) the app projects to other, MediaWiki based, third-party projects. I understand, that this might not be the goal of the WMF, which in my personal opinion isn't quite correct, however it's very very sad. I tried to maintain an up-to-date version of the Wikipedia app some time ago, but it takes so much time, as it is so Wikimedia, and not MediaWiki specific, that I ended up mostly stopping the effort in this direction.

This is probably not a response which should be in this thread, and I apologize for that, however I wanted to say that somewhere.

Best, Florian

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Corey Floyd

4:17 p.m.

Hey all, just want to drop in some thoughts on his thread…

I would say the premise of this email is absolutely right: Parsing out this data by hand coding against specific HTML structures is untenable.

On Wikipedia we have a wealth of community curated information: featured articles, in the news, on this day, etc…

Over the past year, the Reading team has been working with Services to setup some basic infrastructure for ingesting some of this data and making it available as structured data via an API that any client can ingest. This includes our own WMF maintained projects (apps, mobile web, desktop)

Because of the nature of our projects, it is difficult to extract this information in a uniform way across all wikis. Though this is clearly the target (and in line with our mission), before we invest significant time in developing such a standardized method for each service, we need to first deploy an API to test whether these services are actually a good direction for our products, services, and mission.

To that end, we develop each new service on en.wikipedia first in a method that we do not intend to scale.

Now that some of these services have proven useful, we have begun efforts to develop a way for all project maintainers to opt in to making their curated content available for consumption in these APIs.

You can see some tickets focused on this effort here: https://phabricator.wikimedia.org/T150806 https://phabricator.wikimedia.org/T152284 https://phabricator.wikimedia.org/T148680

We have also created some draft documentation that we are currently gathering feedback on to see if it is viable for all projects here: https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation

Additionally, we have also added additional resources to our Reading Infrastructure team (which maintains our services) in part to help with this effort.

All this is to say, is that creating and scaling these services to multiple wikis is a continuing effort. While we would love to deploy a solution to all projects at once, in order to make the problem tractable, we are tackling it in steps and re-evaluating our assumptions along the way. Hopefully this explains our thinking and the projects in a way that make sense.

Because this is a large project, we are looking for solutions and help to spread these services across all wikis - If you have or anyone time and would like to help, the tickets and documentation above are great place you to contribute to the process.

Thanks for any help Corey

On Thu, Jan 12, 2017 at 10:45 AM Monte Hurd mhurd@wikimedia.org wrote:

Thanks Florian.

I see your point about ease of forking. Would you have time later, perhaps off thread, to detail the challenges you faced?

On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt < florian.schmidt.welzow@t-online.de> wrote:

What is also in my mind: the app wasn't easy to fork and use on other (third-party) projects before that, too, however with the track in this direction (specific Parsons of contents, which apply to Wikimedia projects only) makes it, in reality, impossible to fork (and maintain) the app projects to other, MediaWiki based, third-party projects. I understand, that this might not be the goal of the WMF, which in my personal opinion isn't quite correct, however it's very very sad. I tried to maintain an up-to-date version of the Wikipedia app some time ago, but it takes so much time, as it is so Wikimedia, and not MediaWiki specific, that I ended up mostly stopping the effort in this direction.

This is probably not a response which should be in this thread, and I apologize for that, however I wanted to say that somewhere.

Best, Florian

_______________________________________________ Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Gabriel Wicke

4:41 p.m.

We discussed a potential approach to scaling content extraction on wiktionary across languages in https://phabricator.wikimedia.org/T138709. The idea was to define HTML microformats that can be integrated in templates to provide a uniform marker for specific bits of content.

On Jan 12, 2017 1:17 PM, "Corey Floyd" cfloyd@wikimedia.org wrote:

...

Hey all, just want to drop in some thoughts on his thread…

I would say the premise of this email is absolutely right: Parsing out this data by hand coding against specific HTML structures is untenable.

On Wikipedia we have a wealth of community curated information: featured articles, in the news, on this day, etc…

Over the past year, the Reading team has been working with Services to setup some basic infrastructure for ingesting some of this data and making it available as structured data via an API that any client can ingest. This includes our own WMF maintained projects (apps, mobile web, desktop)

Because of the nature of our projects, it is difficult to extract this information in a uniform way across all wikis. Though this is clearly the target (and in line with our mission), before we invest significant time in developing such a standardized method for each service, we need to first deploy an API to test whether these services are actually a good direction for our products, services, and mission.

To that end, we develop each new service on en.wikipedia first in a method that we do not intend to scale.

Now that some of these services have proven useful, we have begun efforts to develop a way for all project maintainers to opt in to making their curated content available for consumption in these APIs.

You can see some tickets focused on this effort here: https://phabricator.wikimedia.org/T150806 https://phabricator.wikimedia.org/T152284 https://phabricator.wikimedia.org/T148680

We have also created some draft documentation that we are currently gathering feedback on to see if it is viable for all projects here: https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation

Additionally, we have also added additional resources to our Reading Infrastructure team (which maintains our services) in part to help with this effort.

All this is to say, is that creating and scaling these services to multiple wikis is a continuing effort. While we would love to deploy a solution to all projects at once, in order to make the problem tractable, we are tackling it in steps and re-evaluating our assumptions along the way. Hopefully this explains our thinking and the projects in a way that make sense.

Because this is a large project, we are looking for solutions and help to spread these services across all wikis - If you have or anyone time and would like to help, the tickets and documentation above are great place you to contribute to the process.

Thanks for any help Corey

On Thu, Jan 12, 2017 at 10:45 AM Monte Hurd mhurd@wikimedia.org wrote:

Thanks Florian.

I see your point about ease of forking. Would you have time later, perhaps off thread, to detail the challenges you faced?

Regarding the data endpoint topic of this thread, it isn't app-specific despite being part of 'mediawiki/services/mobileapps'. We'll probably want a more generic name with only truly app specific endpoints named accordingly.

On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt < florian.schmidt.welzow@t-online.de> wrote:

What is also in my mind: the app wasn't easy to fork and use on other (third-party) projects before that, too, however with the track in this direction (specific Parsons of contents, which apply to Wikimedia projects only) makes it, in reality, impossible to fork (and maintain) the app projects to other, MediaWiki based, third-party projects. I understand, that this might not be the goal of the WMF, which in my personal opinion isn't quite correct, however it's very very sad. I tried to maintain an up-to-date version of the Wikipedia app some time ago, but it takes so much time, as it is so Wikimedia, and not MediaWiki specific, that I ended up mostly stopping the effort in this direction.

This is probably not a response which should be in this thread, and I apologize for that, however I wanted to say that somewhere.

Best, Florian

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l

2877

Age (days ago)

2877

Last active (days ago)

mobile-l@lists.wikimedia.org

5 comments

5 participants

tags (0)

participants (5)

Corey Floyd
Federico Leva (Nemo)
Florian Schmidt
Gabriel Wicke
Monte Hurd