Greetings! I am a relatively new MediaWiki extension developer, and I'm having trouble with getting the MediaWiki API to capture all links coming into and out of a given wiki page.
I know we can use *action=query&prop=extlinks *to get *external *links from our wiki page to other (non-wiki) web pages, *action=query&prop=links *to get internal links from our wiki page *out *to other wiki pages, and *action=query&prop=backlinks *to get internal links from other wiki pages *into *our wiki page. However, in the case where a link is *external *but points *to an internal wiki page*, I can't find the proper API call. These links seem to be falling through the cracks somehow.
I'm inclined to believe that I'm just missing something simple, especially because this MediaWiki help page includes "external links to internal pages" as a category of links: http://www.mediawiki.org/wiki/Help:Links#External_links_to_internal_pages
Can anyone shed light on this issue? Is there an easy way to capture external links to internal wiki pages through an API call?
Thanks!
Sincerely, Jason Ji
On Mon, Apr 14, 2014 at 8:45 AM, Jason Ji uberjason@gmail.com wrote:
Greetings!
Hi!
I am a relatively new MediaWiki extension developer,
If you're working on an extension and querying these links on the same wiki on which your extension is installed, you might be better served by using the appropriate PHP methods to fetch this data directly, rather than making a query back into the MediaWiki API.
I know we can use *action=query&prop=extlinks *to get *external *links from
our wiki page to other (non-wiki) web pages, *action=query&prop=links *to get internal links from our wiki page *out *to other wiki pages, and *action=query&prop=backlinks *to get internal links from other wiki pages *into *our wiki page.
I expect it's just a typo, but for the benefit of anyone else reading that last should list=backlinks, not prop=backlinks.
However, in the case where a link is *external *but points *to an internal wiki page*, I can't find the proper API call. These links seem to be falling through the cracks somehow.
Those may be accessed, along with all other external links, by using action=query&list=exturlusage. You'd specify the appropriate values for euprotocol and euquery to limit the search to the particular link you're interested in.
Hi Brad,
Thanks for your response. To answer your first point, part of the extension I'm writing is actually a JavaScript D3-based force layout to graphically show wiki pages and links between them. So while in the JavaScript, I'm making MediaWiki API calls to get data about the pages.
It looks like *action=query&list=exturlusage* gets you information about a specific link you already know, e.g. if you're asking "what pages link to http://www.google.com". In my case, I am trying to capture all links into and out of a given wiki page, so I don't generally know what they are. I capture external links out from a wiki page with *extlinks*, internal links out from a wiki page with *links*, and interal links into a wiki page with *backlinks*, but external-links-to-internal-pages don't fit into any of these categories, so they end up being lost. Is there an API call to capture those kinds of links?
To use a concrete example, imagine I have wiki pages called *Foo*, *Bar*, and *Baz *on a wiki with url *http://my-wiki.com/wiki http://my-wiki.com/wiki*. Suppose Foo contains the following links:
[[Bar]] [http://www.google.com Google] [http://my-wiki.com/wiki/Baz Baz]
Suppose Bar contains:
[[Foo]]
In this case, *action=query&prop=extlinks* would return the link to Google. *action=query&prop=links *would return the link to Bar. *action=query&list=backlinks* would return any pages pointing to Foo, in this case Bar again. But none of these cover the link from Foo to Baz, because it is an external link to an internal page. This is what we uncovered in testing on our wiki. So how would I write an API call that would capture the link to Baz, again not knowing that it's necessarily there?
Thanks,
Jason
On Mon, Apr 14, 2014 at 9:51 AM, Brad Jorsch (Anomie) <bjorsch@wikimedia.org
wrote:
On Mon, Apr 14, 2014 at 8:45 AM, Jason Ji uberjason@gmail.com wrote:
Greetings!
Hi!
I am a relatively new MediaWiki extension developer,
If you're working on an extension and querying these links on the same wiki on which your extension is installed, you might be better served by using the appropriate PHP methods to fetch this data directly, rather than making a query back into the MediaWiki API.
I know we can use *action=query&prop=extlinks *to get *external *links
from our wiki page to other (non-wiki) web pages, *action=query&prop=links *to get internal links from our wiki page *out *to other wiki pages, and *action=query&prop=backlinks *to get internal links from other wiki pages *into *our wiki page.
I expect it's just a typo, but for the benefit of anyone else reading that last should list=backlinks, not prop=backlinks.
However, in the case where a link is *external *but points *to an internal wiki page*, I can't find the proper API call. These links seem to be falling through the cracks somehow.
Those may be accessed, along with all other external links, by using action=query&list=exturlusage. You'd specify the appropriate values for euprotocol and euquery to limit the search to the particular link you're interested in.
-- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
On Mon, Apr 14, 2014 at 10:33 AM, Jason Ji uberjason@gmail.com wrote:
Hi Brad,
Thanks for your response. To answer your first point, part of the extension I'm writing is actually a JavaScript D3-based force layout to graphically show wiki pages and links between them. So while in the JavaScript, I'm making MediaWiki API calls to get data about the pages.
That makes sense to use the API then.
In this case, *action=query&prop=extlinks* would return the link to Google.
I had thought it would also include the link to my-wiki.com, but now that I test this I find it actually doesn't. I tracked this down to bug 19637 where someone implemented exactly this behavior, without addressing the objections raised that people actually do want to search for these links.
There's nothing the API can do here, as these links are never recorded anywhere due to how the "fix" for bug 19637 works. On your local wiki you could set $wgRegisterInternalExternals to true (and then reparse all pages), but if you'd like it fixed for the general case it could probably use a discussion on wikitech-l to decide if we want to fix this and if so whether to do it by just reverting r53104 or by adding another link tracking table just for these.
Hi Brad,
Thanks, that $wgRegisterInternalExternals looks helpful for our wiki! I will check to see if that fixes the issue. When you say reparse all pages, do you mean we should run refreshLinks.php?
Regarding the general case, I suppose I'll have a discussion with my colleagues on our project and decide if we want to raise the issue on wikitech-l.
Thanks!
Jason
---------- Forwarded message ---------- From: Brad Jorsch (Anomie) bjorsch@wikimedia.org Date: Mon, Apr 14, 2014 at 10:58 AM Subject: Re: [Mediawiki-api] API call for external links to internal pages? To: MediaWiki API announcements & discussion < mediawiki-api@lists.wikimedia.org>
On Mon, Apr 14, 2014 at 10:33 AM, Jason Ji uberjason@gmail.com wrote:
Hi Brad,
Thanks for your response. To answer your first point, part of the extension I'm writing is actually a JavaScript D3-based force layout to graphically show wiki pages and links between them. So while in the JavaScript, I'm making MediaWiki API calls to get data about the pages.
That makes sense to use the API then.
In this case, *action=query&prop=extlinks* would return the link to Google.
I had thought it would also include the link to my-wiki.com, but now that I test this I find it actually doesn't. I tracked this down to bug 19637 where someone implemented exactly this behavior, without addressing the objections raised that people actually do want to search for these links.
There's nothing the API can do here, as these links are never recorded anywhere due to how the "fix" for bug 19637 works. On your local wiki you could set $wgRegisterInternalExternals to true (and then reparse all pages), but if you'd like it fixed for the general case it could probably use a discussion on wikitech-l to decide if we want to fix this and if so whether to do it by just reverting r53104 or by adding another link tracking table just for these.
On Mon, Apr 14, 2014 at 11:10 AM, Jason Ji uberjason@gmail.com wrote:
When you say reparse all pages, do you mean we should run refreshLinks.php?
I'm 99% sure that refreshLinks will do it, yes.
mediawiki-api@lists.wikimedia.org