I am building an application where I need to find the most relevant commons
image for a wikipedia page. For example, if the page has an infobox with an
image that would be the image I want. If not, I would look for an image on
the page that is also found on a wikimedia commons search for that entity.
Otherwise, I fall back to the first result from wikimedia commons search.
Is this the best possible algorithm for this requirement?
I also considered grabbing the first image from the action=render page, but
there are two problems there:
1. action=render is very slow the first time and can become prohibitive for
2. the first image on the action=render page's html is often an icon - is
there a reasonable way to find the actual first "main" image on that page?
I also played with dbpedia and flickrwrappr, but neither seem to give me
anything better (more relevant to the page) than what I can get directly
from wiki api
If this algorithm is the right approach, I am missing the best way to get at
the titles of the image files on a given wikipedia page. I am currently
but that gives me a list of search results for the search string from
wikipedia, and not images on the page.
I'm wondering how to go about getting one obscure bit of information.
The "Interwicket" bot maintains iwiki links for namespace 0 in the
wiktionaries. On some it has a local bot flag, on some it uses the
global bot mechanism. It automatically checks its own user status for
a local bot flag
which is quite simple. But that doesn't show "bot" for global bots.
To find out if if has global bot status, it needs to know which wikts
have global bots enabled. (It doesn't need to look up its own group
membership, that is just wired in: it knows it is in the global
group.) At present it reads the manually maintained table on meta
(yes, a horrid hack ;-).
What it needs is to read "wiki set 2", the list of wikis on which
global bots are allowed. I haven't figured out how to do that with the
GUI, let alone API. Anyone know?
The specific case I just found:
The table in meta
http://meta.wikimedia.org/wiki/Bot_policy/Implementation shows the
Kannada wikt as having global bots approved, and Interwicket sees
that. But looking at RC on kn.wikt
one sees that the edits are not flagged as bot edits. This isn't a
crisis (and I could certainly patch this case ;-) but I'd like to do
it right by either reading the "wiki set" or reading the global status
of the user from the API, as it applies to that wiki.
Hello. I have an intranet set up that is using mediawiki behind the
scenes for content. I set up a search box on the intranet pages, that
calls the api query module twice, first for title, then for text. I
then take all the text matches and call the index.php render module to
get the page text, so I can parse it for the searched term and highlight
it in the results. I then sort all the title and page text matches
alphabetically by the page titles. This all kind of works as intended,
but seems like a crazy amount of hackery, so I'm hoping there's a better
way. If not, then maybe you can help me solve these issues:
1) The highlighted search results include html and wikitext code
because it's produced by index.php's render. Using strip_tags() helps a
little, but only when the matched string has both brackets ( < > ).
2) Categories show up as page title matches if I search on the
regular wiki page, but not when I go through the api. I assume the wiki
code is just also doing a category search and displaying it in the page
I think I'm also going to split up my title and text search results. I
had them combined as that's what the users are used to in a previous
system, but I think that just destroys whatever ranking system the
search is using. Right?
Last couple of days, I was trying to write a script around WP API
(http://en.wikipedia.org/w/api.php) and I'm struggling with "action=edit"
while trying to edit a page because there seems to be limitation to the POST
The error that I'm receiving is vague:
Request: POST http://en.wikipedia.org/w/api.php?action=edit&format=xml=xml,
from 22.214.171.124 via knsq23.knams.wikimedia.org (squid/2.7.STABLE6) to ()
Error: ERR_INVALID_REQ, errno [No Error] at Thu, 11 Mar 2010 11:46:27 GMT
AFAICS there is no problem with POST size up to 2.5KB, anything more than that
results with that error.
So, is there any size limitation regarding POST?