Services October 2017

services@lists.wikimedia.org

8 participants
4 discussions

Re: [Services] REST API - Assistance Required - Content Classification and Filtering
by Christopher Smyth 30 Oct '17

30 Oct '17

Hello, We're a small app development company that has integrated Wikipedia content into a geo-locating iOS app. The app is working well and the Wiki content is displaying correctly. However, we'd like to categorise the Wikipedia content into three categories rather than just one. Is there a way to filter and categorise Wikipedia content that is accessed through the REST API? We only use content that is geo-coded (ie has latitude and longitude) information associated with each article. How should we go about configuring our API integration so that we can split Wikipedia content according to its top-level categories? Is there a way to do this? Many thanks for your assistance with this request. Regards, Chris Smyth Christopher Smyth Director Inflighto chris(a)inflighto.com <mailto:chris@inflighto.com> +61 (0)417 298 598 <https://www.inflighto.com/>

2 2

REST API rate limits
by Shahin Saneinejad 28 Oct '17

28 Oct '17

Hi, I'm GET-ing the page/html/{title} endpoint at https://en.wikipedia.org/api/rest_v1/ for information extraction. I'm trying to nail down a polite request rate, and to determine whether the current rate limit is likely to change soon. - the doc at https://en.wikipedia.org/api/rest_v1/ pegs the rate limit at 200 req/s; - on #wikimedia-services, +gwicke noted that the varnish cache's rate limit is much lower -- around 100 req/s; but - in practice, I get 429's whenever I exceed 70 req/s for more than a few minutes. Pchelolo suggested additional debug logs on 429's might help get to the bottom of this lower-than-expected rate limit. What kind of debugging info can I provide from my end? Any chance I'll be able to hit the 200 req/s mark in the next few months? Thanks, Shahin

1 0

Google Code-in: Get your tasks for young contributors prepared!
by Andre Klapper 17 Oct '17

17 Oct '17

Google Code-in is an annual contest for 13-17 year old students. It will take place from Nov28 to Jan17 and is not only about coding tasks. While we wait whether Wikimedia will get accepted: * You have small, self-contained bugs you'd like to see fixed? * Your documentation needs specific improvements? * Your user interface has small design issues? * Your Outreachy/Summer of Code project welcomes small tweaks? * You'd enjoy helping someone port your template to Lua? * Your gadget code uses some deprecated API calls? * You have tasks in mind that welcome some research? Also note that "Beginner tasks" (e.g. "Set up Vagrant" etc) and "generic" tasks are very welcome (e.g. "Choose & fix 2 PHP7 issues from the list in https://phabricator.wikimedia.org/T120336 "). Because we will need hundreds of tasks. :) And we also have more than 400 unassigned open 'easy' tasks listed: https://phabricator.wikimedia.org/maniphest/query/HCyOonSbFn.z/#R Would you be willing to mentor some of those in your area? Please take a moment to find / update [Phabricator etc.] tasks in your project(s) which would take an experienced contributor 2-3 hours. Check https://www.mediawiki.org/wiki/Google_Code-in/Mentors and please ask if you have any questions! For some achievements from last round, see https://blog.wikimedia.org/2017/02/03/google-code-in/ Thanks!, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/

1 0

Canonical URL from RESTBase?
by Joe Wass 12 Oct '17

12 Oct '17

Hi there, I hope this is the right list for a RESTBase query? Let me know if this is the wrong list, or I should head over to Phabricator. I'm visiting a large number of Wikipedia pages' specific versions (for the Crossref Event Data service, if you're interested - https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions from EventStreams. I'm using the RESTBase API because it gives the cleanest HTML and it was recommended to me for the volume of queries, e.g. https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fantasma/13659774 I want to get the *canonical URL* for that version page, e.g. https://ceb.wikipedia.org/wiki/Quebrada_Fantasma The 'normal' HTML view of a page supplies the canonical URL as a <link rel="canonical"> tag, but the RESTBase response doesn't. It does supply an isVersionOf link though: <link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki/Quebrada_Fantasma "/> Questions: 1 - Is the isVersionOf URL in RESTBase identical to the "official" canonical URL that I would get from the HTML metadata (using https:)? 2 - Is the "title" component of the RESTBase URL the same as used in the Canonical URL? The Swagger docs say "Page title. Use underscores instead of spaces. Example: Main_Page". I'm not clear if that is the same thing. 3 - Is there a general recommended way of getting the canonical URL for a page from RESTBase? Thanks in advance! Joe Wass https://en.wikipedia.org/wiki/User:Afandian Crossref

4 6

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

Services October 2017