Hello,
We're a small app development company that has integrated Wikipedia content
into a geo-locating iOS app. The app is working well and the Wiki content is
displaying correctly. However, we'd like to categorise the Wikipedia content
into three categories rather than just one.
Is there a way to filter and categorise Wikipedia content that is accessed
through the REST API? We only use content that is geo-coded (ie has latitude
and longitude) information associated with each article.
How should we go about configuring our API integration so that we can split
Wikipedia content according to its top-level categories? Is there a way to
do this?
Many thanks for your assistance with this request.
Regards,
Chris Smyth
Christopher Smyth
Director
Inflighto
chris(a)inflighto.com <mailto:chris@inflighto.com>
+61 (0)417 298 598
<https://www.inflighto.com/>
Hi,
I'm GET-ing the page/html/{title} endpoint at
https://en.wikipedia.org/api/rest_v1/ for information extraction. I'm
trying to nail down a polite request rate, and to determine whether the
current rate limit is likely to change soon.
- the doc at https://en.wikipedia.org/api/rest_v1/ pegs the rate limit at
200 req/s;
- on #wikimedia-services, +gwicke noted that the varnish cache's rate limit
is much lower -- around 100 req/s; but
- in practice, I get 429's whenever I exceed 70 req/s for more than a few
minutes.
Pchelolo suggested additional debug logs on 429's might help get to the
bottom of this lower-than-expected rate limit.
What kind of debugging info can I provide from my end? Any chance I'll be
able to hit the 200 req/s mark in the next few months?
Thanks,
Shahin
Google Code-in is an annual contest for 13-17 year old students. It
will take place from Nov28 to Jan17 and is not only about coding tasks.
While we wait whether Wikimedia will get accepted:
* You have small, self-contained bugs you'd like to see fixed?
* Your documentation needs specific improvements?
* Your user interface has small design issues?
* Your Outreachy/Summer of Code project welcomes small tweaks?
* You'd enjoy helping someone port your template to Lua?
* Your gadget code uses some deprecated API calls?
* You have tasks in mind that welcome some research?
Also note that "Beginner tasks" (e.g. "Set up Vagrant" etc) and
"generic" tasks are very welcome (e.g. "Choose & fix 2 PHP7 issues
from the list in https://phabricator.wikimedia.org/T120336 ").
Because we will need hundreds of tasks. :)
And we also have more than 400 unassigned open 'easy' tasks listed:
https://phabricator.wikimedia.org/maniphest/query/HCyOonSbFn.z/#R
Would you be willing to mentor some of those in your area?
Please take a moment to find / update [Phabricator etc.] tasks in your
project(s) which would take an experienced contributor 2-3 hours. Check
https://www.mediawiki.org/wiki/Google_Code-in/Mentors
and please ask if you have any questions!
For some achievements from last round, see
https://blog.wikimedia.org/2017/02/03/google-code-in/
Thanks!,
andre
--
Andre Klapper | Wikimedia Bugwrangler
http://blogs.gnome.org/aklapper/
Hi there,
I hope this is the right list for a RESTBase query? Let me know if this is
the wrong list, or I should head over to Phabricator.
I'm visiting a large number of Wikipedia pages' specific versions (for the
Crossref Event Data service, if you're interested -
https://www.eventdata.crossref.org/guide ). I'm getting page ids / versions
from EventStreams. I'm using the RESTBase API because it gives the cleanest
HTML and it was recommended to me for the volume of queries, e.g.
https://ceb.wikipedia.org/api/rest_v1/page/html/Quebrada_Fantasma/13659774
I want to get the *canonical URL* for that version page, e.g.
https://ceb.wikipedia.org/wiki/Quebrada_Fantasma
The 'normal' HTML view of a page supplies the canonical URL as a <link
rel="canonical"> tag, but the RESTBase response doesn't. It does supply an
isVersionOf link though:
<link rel="dc:isVersionOf" href="//ceb.wikipedia.org/wiki/Quebrada_Fantasma
"/>
Questions:
1 - Is the isVersionOf URL in RESTBase identical to the "official"
canonical URL that I would get from the HTML metadata (using https:)?
2 - Is the "title" component of the RESTBase URL the same as used in the
Canonical URL? The Swagger docs say "Page title. Use underscores instead of
spaces. Example: Main_Page". I'm not clear if that is the same thing.
3 - Is there a general recommended way of getting the canonical URL for a
page from RESTBase?
Thanks in advance!
Joe Wass
https://en.wikipedia.org/wiki/User:Afandian
Crossref