We discussed a potential approach to scaling content extraction on wiktionary across languages in https://phabricator.wikimedia.org/T138709. The idea was to define HTML microformats that can be integrated in templates to provide a uniform marker for specific bits of content.
On Jan 12, 2017 1:17 PM, "Corey Floyd" cfloyd@wikimedia.org wrote:
Hey all, just want to drop in some thoughts on his thread…
I would say the premise of this email is absolutely right: Parsing out this data by hand coding against specific HTML structures is untenable.
On Wikipedia we have a wealth of community curated information: featured articles, in the news, on this day, etc…
Over the past year, the Reading team has been working with Services to setup some basic infrastructure for ingesting some of this data and making it available as structured data via an API that any client can ingest. This includes our own WMF maintained projects (apps, mobile web, desktop)
Because of the nature of our projects, it is difficult to extract this information in a uniform way across all wikis. Though this is clearly the target (and in line with our mission), before we invest significant time in developing such a standardized method for each service, we need to first deploy an API to test whether these services are actually a good direction for our products, services, and mission.
To that end, we develop each new service on en.wikipedia first in a method that we do not intend to scale.
Now that some of these services have proven useful, we have begun efforts to develop a way for all project maintainers to opt in to making their curated content available for consumption in these APIs.
You can see some tickets focused on this effort here: https://phabricator.wikimedia.org/T150806 https://phabricator.wikimedia.org/T152284 https://phabricator.wikimedia.org/T148680
We have also created some draft documentation that we are currently gathering feedback on to see if it is viable for all projects here: https://www.mediawiki.org/wiki/User:CFloyd_(WMF)/Feed_Markup_Documentation
Additionally, we have also added additional resources to our Reading Infrastructure team (which maintains our services) in part to help with this effort.
All this is to say, is that creating and scaling these services to multiple wikis is a continuing effort. While we would love to deploy a solution to all projects at once, in order to make the problem tractable, we are tackling it in steps and re-evaluating our assumptions along the way. Hopefully this explains our thinking and the projects in a way that make sense.
Because this is a large project, we are looking for solutions and help to spread these services across all wikis - If you have or anyone time and would like to help, the tickets and documentation above are great place you to contribute to the process.
Thanks for any help Corey
On Thu, Jan 12, 2017 at 10:45 AM Monte Hurd mhurd@wikimedia.org wrote:
Thanks Florian.
I see your point about ease of forking. Would you have time later, perhaps off thread, to detail the challenges you faced?
Regarding the data endpoint topic of this thread, it isn't app-specific despite being part of 'mediawiki/services/mobileapps'. We'll probably want a more generic name with only truly app specific endpoints named accordingly.
On Thu, Jan 12, 2017 at 10:05 AM, Florian Schmidt < florian.schmidt.welzow@t-online.de> wrote:
What is also in my mind: the app wasn't easy to fork and use on other (third-party) projects before that, too, however with the track in this direction (specific Parsons of contents, which apply to Wikimedia projects only) makes it, in reality, impossible to fork (and maintain) the app projects to other, MediaWiki based, third-party projects. I understand, that this might not be the goal of the WMF, which in my personal opinion isn't quite correct, however it's very very sad. I tried to maintain an up-to-date version of the Wikipedia app some time ago, but it takes so much time, as it is so Wikimedia, and not MediaWiki specific, that I ended up mostly stopping the effort in this direction.
This is probably not a response which should be in this thread, and I apologize for that, however I wanted to say that somewhere.
Best, Florian
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l
Mobile-l mailing list Mobile-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mobile-l