[Labs-l] Backlinks counter for Wikipedia articles?
John
phoenixoverride at gmail.com
Tue Sep 9 14:05:35 UTC 2014
If you want a report on that many pages drop me a list of those titles and
and I can write a report for you given that volume of affected pages.
I would say 1-2 seconds between quires should be reasonable for a moderate
volume of quires. Any large scale request I will do server side and avoid
hammering the web-servers for something that is better batched.
On Tue, Sep 9, 2014 at 9:58 AM, Navino Evans <navino at histropedia.com> wrote:
> Once again, a huge thank you for taking the time to do this John - That's
> exactly what I was looking for! - the helpfulness of this community never
> ceases to amaze me :)
>
> Hopefully I haven't initiated a journey down the rabbit hole into a fully
> fledged muliti-language counting machine ;)
>
>
> Can I just ask what the limit of reasonable use would be for making API
> calls to this new tool? (e.g. number of calls per day)
>
> It would be incredibly useful if we could use it to update the events in
> our database once a month (we are using it to rank historical events by
> 'importance'), but we are already have approximately 1.5 million events so
> am aware this may be way beyond what would be acceptable.
>
> On Tue, Sep 9, 2014 at 2:56 PM, John <phoenixoverride at gmail.com> wrote:
>
>> That's doable, however it will require a little more time as I need to
>> unearth some old code to handle multi-projects/languages
>>
>>
>> On Tue, Sep 9, 2014 at 9:51 AM, Jan Ainali <jan.ainali at wikimedia.se>
>> wrote:
>>
>>> Awesome John!
>>>
>>> Now I only wish that one could specify language code also ;)
>>>
>>>
>>> *Med vänliga hälsningar,Jan Ainali*
>>>
>>> Verksamhetschef, Wikimedia Sverige
>>> <http://se.wikimedia.org/wiki/Huvudsida>
>>> 0729 - 67 29 48
>>>
>>>
>>> *Tänk dig en värld där varje människa har fri tillgång till
>>> mänsklighetens samlade kunskap. Det är det vi gör.*
>>> Bli medlem. <http://blimedlem.wikimedia.se>
>>>
>>>
>>> 2014-09-09 15:34 GMT+02:00 John <phoenixoverride at gmail.com>:
>>>
>>>> Per request, its no frills but what you what you asked for:
>>>> http://tools.wmflabs.org/betacommand-dev/cgi-bin/backlinks
>>>>
>>>>
>>>> On Tue, Sep 9, 2014 at 8:32 AM, Navino Evans <navino at histropedia.com>
>>>> wrote:
>>>>
>>>>> That is fantastic news... I'm incredibly grateful for the help and
>>>>> advice.
>>>>>
>>>>> On Tue, Sep 9, 2014 at 1:27 PM, John <phoenixoverride at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Given the overhead of the API and that he only needs a count getting
>>>>>> that info should be fairly easy via a python cgi wrapper around an sql
>>>>>> query.
>>>>>>
>>>>>> The only thing that I cannot do is #3 since the software does not
>>>>>> differentiate between links in templates and links not in templates. Its a
>>>>>> requested feature for years now.
>>>>>>
>>>>>> Give me a few hours and ill get you the tool you want. This should be
>>>>>> less than 30 minutes work
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 9, 2014 at 7:55 AM, Jan Ainali <jan.ainali at wikimedia.se>
>>>>>> wrote:
>>>>>>
>>>>>>> Related tip: In the API you can get a list of backlinks (but you
>>>>>>> have to count them yourself) from the main namespace including all
>>>>>>> redirects by a query like this:
>>>>>>>
>>>>>>>
>>>>>>> https://en.wikipedia.org/w/api.php?action=query&list=backlinks&format=json&bltitle=Example&blnamespace=0&blfilterredir=all&bllimit=250&blredirect=
>>>>>>>
>>>>>>> More info at: https://www.mediawiki.org/wiki/API:Backlinks
>>>>>>>
>>>>>>>
>>>>>>> *Med vänliga hälsningar,Jan Ainali*
>>>>>>>
>>>>>>> Verksamhetschef, Wikimedia Sverige
>>>>>>> <http://se.wikimedia.org/wiki/Huvudsida>
>>>>>>> 0729 - 67 29 48
>>>>>>>
>>>>>>>
>>>>>>> *Tänk dig en värld där varje människa har fri tillgång till
>>>>>>> mänsklighetens samlade kunskap. Det är det vi gör.*
>>>>>>> Bli medlem. <http://blimedlem.wikimedia.se>
>>>>>>>
>>>>>>>
>>>>>>> 2014-09-09 13:41 GMT+02:00 Navino Evans <navino at histropedia.com>:
>>>>>>>
>>>>>>>> Wow! That would be awesome :)
>>>>>>>>
>>>>>>>> The API we are looking for can be as simple as sending a GET
>>>>>>>> request to a url (
>>>>>>>> http://www.somewhere.com/api/count?t=wikipedia_title_goes_here),
>>>>>>>> returning a number in "text/plain" format.
>>>>>>>>
>>>>>>>> The actual count that we're interested is for English Wikipedia
>>>>>>>> only, and would ideally include the following, all added up into a single
>>>>>>>> number:
>>>>>>>>
>>>>>>>> 1) All links from articles in Main Namespace only (for our purpose
>>>>>>>> it would be better to not include links from User pages, Talk pages etc if
>>>>>>>> possible)
>>>>>>>>
>>>>>>>> 2) Including links from Redirect pages (e.g. counting a link from
>>>>>>>> "Michel Jackson" redirect as part of the count from the article "Michael
>>>>>>>> Jackson")
>>>>>>>>
>>>>>>>> 3) Excluding links that are within a template transcluded in an
>>>>>>>> article (so we don't need to count the links inside Navboxes within an
>>>>>>>> article for example)
>>>>>>>>
>>>>>>>> 4) For our purpose, it doesn't really matter whether transclusions
>>>>>>>> of the actual page that is called are included in the count (we generally
>>>>>>>> won't be using it for checking templates, timeline and list articles).
>>>>>>>>
>>>>>>>> Just to give the full picture for this request - my use of this
>>>>>>>> tool will be for a company (www.histropedia.com), so I wouldn't
>>>>>>>> want to take up your time with this unless it's something you feel should
>>>>>>>> be available for wider use. My plan was to get the developer working on our
>>>>>>>> site to make this tool for the community if it didn't exist somewhere, but
>>>>>>>> we would be reliant on datadumps so could not get live information (which
>>>>>>>> would be incredibly useful for us, and I hope many others).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 8, 2014 at 8:10 PM, John <phoenixoverride at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> What numbers/data do you want? I can whip up a replacement for it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Monday, September 8, 2014, Navino Evans <navino at histropedia.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> Hi all, does anyone know if there is a tool currently available
>>>>>>>>>> for counting backlinks to Wikipedia articles via an API? I have been using
>>>>>>>>>> this tool
>>>>>>>>>> http://dispenser.homenet.org/~dispenser/cgi-bin/backlinkscount.py
>>>>>>>>>> - but it seems to have finally gone offline completely following some
>>>>>>>>>> recent controversy with user:Dispenser.
>>>>>>>>>>
>>>>>>>>>> Any advice much appreciated!
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Navino
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Labs-l mailing list
>>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> ___________________________
>>>>>>>>
>>>>>>>> Histropedia
>>>>>>>> The Timeline for all of History
>>>>>>>> www.histropedia.com
>>>>>>>>
>>>>>>>> Follow us on:
>>>>>>>> Twitter <https://twitter.com/Histropedia> Facebo
>>>>>>>> <https://www.facebook.com/Histropedia>ok
>>>>>>>> <https://www.facebook.com/Histropedia> Google +
>>>>>>>> <https://plus.google.com/u/0/b/104484373317792180682/104484373317792180682/posts>
>>>>>>>> L <http://www.linkedin.com/company/histropedia-ltd>inke
>>>>>>>> <http://www.linkedin.com/company/histropedia-ltd>dIn
>>>>>>>> <http://www.linkedin.com/company/histropedia-ltd>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Labs-l mailing list
>>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Labs-l mailing list
>>>>>>> Labs-l at lists.wikimedia.org
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Labs-l mailing list
>>>>>> Labs-l at lists.wikimedia.org
>>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ___________________________
>>>>>
>>>>> Histropedia
>>>>> The Timeline for all of History
>>>>> www.histropedia.com
>>>>>
>>>>> Follow us on:
>>>>> Twitter <https://twitter.com/Histropedia> Facebo
>>>>> <https://www.facebook.com/Histropedia>ok
>>>>> <https://www.facebook.com/Histropedia> Google +
>>>>> <https://plus.google.com/u/0/b/104484373317792180682/104484373317792180682/posts>
>>>>> L <http://www.linkedin.com/company/histropedia-ltd>inke
>>>>> <http://www.linkedin.com/company/histropedia-ltd>dIn
>>>>> <http://www.linkedin.com/company/histropedia-ltd>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Labs-l mailing list
>>>>> Labs-l at lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Labs-l mailing list
>>>> Labs-l at lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Labs-l mailing list
>>> Labs-l at lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>>
>>>
>>
>> _______________________________________________
>> Labs-l mailing list
>> Labs-l at lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/labs-l
>>
>>
>
>
> --
> ___________________________
>
> Histropedia
> The Timeline for all of History
> www.histropedia.com
>
> Follow us on:
> Twitter <https://twitter.com/Histropedia> Facebo
> <https://www.facebook.com/Histropedia>ok
> <https://www.facebook.com/Histropedia> Google +
> <https://plus.google.com/u/0/b/104484373317792180682/104484373317792180682/posts>
> L <http://www.linkedin.com/company/histropedia-ltd>inke
> <http://www.linkedin.com/company/histropedia-ltd>dIn
> <http://www.linkedin.com/company/histropedia-ltd>
>
>
> _______________________________________________
> Labs-l mailing list
> Labs-l at lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/labs-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/labs-l/attachments/20140909/c5d2aa60/attachment-0001.html>
More information about the Labs-l
mailing list