Mediawiki-api January 2018

mediawiki-api@lists.wikimedia.org

9 participants
8 discussions

Need to extract abstract of a wikipedia page

by aditya srinivas

Hello, I am writing a Java program to extract the abstract of the wikipedia page given the title of the wikipedia page. I have done some research and found out that the abstract with be in rvsection=0 So for example if I want the abstract of 'Eiffel Tower" wiki page then I am querying using the api in the following way. http://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Eiffel… and parse the XML data which we get and take the wikitext in the tag <rev xml:space="preserve"> which represents the abstract of the wikipedia page. But this wiki text also contains the infobox data which I do not need. I would like to know if there is anyway in which I can remove the infobox data and get only the wikitext related to the page's abstract Or if there is any alternative method by which I can get the abstract of the page directly. Looking forward to your help. Thanks in Advance Aditya Uppu

5 months

[Mediawiki-api-announce] Deprecation of list=allusers 'recenteditcount' result property

by Brad Jorsch (Anomie)

When list=allusers is used with auactiveusers, a property 'recenteditcount' is returned in the result. In bug 67301[1] it was pointed out that this property is including various other logged actions, and so should really be named something like "recentactions". Gerrit change 130093,[2] merged today, adds the "recentactions" result property. "recenteditcount" is also returned for backwards compatability, but will be removed at some point during the MediaWiki 1.25 development cycle. Any clients using this property should be updated to use the new property name. The new property will be available on WMF wikis with 1.24wmf12, see https://www.mediawiki.org/wiki/MediaWiki_1.24/Roadmap for the schedule. [1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=67301 [2]: https://gerrit.wikimedia.org/r/#/c/130093/ -- Brad Jorsch (Anomie) Software Engineer Wikimedia Foundation _______________________________________________ Mediawiki-api-announce mailing list Mediawiki-api-announce(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce

5 years

Files moved to Commons

by Victor Shepelev

Hi all. I am developing a set of Ruby libraries for Wikipedia/MediaWiki access: * low-level: https://github.com/molybdenum-99/mediawiktory * high-level (semantic): https://github.com/molybdenum-99/infoboxer I have a problem with understanding some files. For example, this one: https://en.wikipedia.org/wiki/File:Logo_of_the_Verkhovna_Rada_of_Ukraine.svg In UI, the page states the file is actually on commons, but the page itself exists and is accessible. In API: https://en.wikipedia.org/w/api.php?action=query&titles=File:Logo_of_the_Ver… -- it is "completely absent" like it is missing totally. In Commons API it is there OK: https://commons.wikimedia.org/w/api.php?action=query&titles=File:Logo_of_th… Is there a way to tell actually non-existent files from files that are included from commons, through API? Or "just check on commons, in case it is there" is the only way? Thanks in advance. V.

6 years, 3 months

[Mediawiki-api-announce] BREAKING CHANGE: Whitespace-only values to multi-valued parameters will no longer be considered empty sets

by Brad Jorsch (Anomie)

There's an inconsistency in the handling of multi-value parameters where if you pass a value consisting of only whitespace[1] it will be interpreted as an empty set rather than as a single value consisting of whitespace. For example, https://www.mediawiki.org/w/api.php?action=query&titles= correctly handles the query as specifying no titles, but https://www.mediawiki.org/w/api.php?action=query&titles=%20 is also treated as specifying no titles rather than specifying a title consisting of a space character. With Gerrit change 405609,[2] the behavior will be changed so that the latter query will behave more like https://www.mediawiki.org/w/api.php?action=query&titles=_ in reporting that the supplied title is invalid. The plan is that this will be deployed to WMF wikis with 1.31.0-wmf.20 on February 6–8, see https://www.mediawiki.org/wiki/MediaWiki_1.31/Roadmap for the schedule. Passing an empty string as the value of a multi-valued parameter will continue to be treated as a set of zero elements rather than a one-element set containing the empty string. Most clients should handle this change without major issue, as the resulting response is consistent with the response when any other invalid title is submitted. Clients that want to continue treating whitespace-only input as no input should begin checking for whitespace-only input before submitting it to the API. [1]: Specifically: spaces (U+0020), tabs (U+0009), line feeds (U+000A), and carriage returns (U+000D). [2]: https://gerrit.wikimedia.org/r/#/c/405609/ -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation _______________________________________________ Mediawiki-api-announce mailing list Mediawiki-api-announce(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce

6 years, 3 months

[Mediawiki-api-announce] New warning due to removal of nonfunctional tgprop=name from action=query&list=tags

by Brad Jorsch (Anomie)

The tags query module has a tgprop parameter to specify which properties of the tag should be returned. One of these properties was 'name', but the name was always included in the response regardless of whether this property was included in tgprop. Since most uses aren't specifying 'name' but we can assume clients are depending on the name being included (since there's otherwise no way to identify which tag is which), we've removed the nonfunctional 'name' as a valid option for tgprop. This will result in a warning that 'name' is not a valid value for tgprop for those few clients that are specifying 'name' there. This change will not affect the functionality of any clients unless the new warning somehow breaks them. It should be deployed to WMF wikis with 1.31.0-wmf.18 or later. Clients can safely stop specifying 'name' in tgprop immediately, though, since it doesn't do anything. For further information, see https://phabricator.wikimedia.org/T185058. -- Brad Jorsch (Anomie) Senior Software Engineer Wikimedia Foundation _______________________________________________ Mediawiki-api-announce mailing list Mediawiki-api-announce(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce

6 years, 3 months

Getting a list of words from a page

by Brandon Keith Biggs

Hello, I am new with using the API and can’t seem to get the list of words from: https://en.wiktionary.org/wiki/Category:English_agent_nouns I have managed to get the page: https://en.wiktionary.org/w/api.php?format=jsonfm&formatversion=2&titles=Ca… But nothing I have tried has worked. I originally tried: https://en.wiktionary.org/w/api.php?format=jsonfm&formatversion=2&titles=Ca… and I would expect for a list of pages (aka words) with a title attribute to show up, but instead I get titles such as: "title": "!" which is not very useful. I have also tried going through all the props, but still nothing. I don’t really understand props and list, but something should have given all the words. My goal is to use request to fetch the JSON so I can use the words in a script. Are there any tutorials or guides that go into getting lists of words like this? Thank you, Brandon Keith Biggs <http://brandonkeithbiggs.com/>

6 years, 3 months

How can I programmatically get the pages in a list?

by Mike MacHenry

Hello everyone, I am trying to use the MediaWiki API to create a dictionary based on categories or lists on Wikipedia. I would like to be able to select a category, or perhaps a list page, and get all members of that list. I've done some reading of the API, and implemented a prototype. It works a little bit but only when the data is structured just perfectly for my purposes. For example, I can easily get a list of all of the English-language films. I'm using the action=query and list=categorymembers for this. I end up with 500 films at a time, and I can continue as needed to get all 60k or so. This is because there is a category that is tagged to each English-language film's individual page. On the other hand, if I want to get a list of all National Hockey League (NHL) players, this is a lot more difficult. The category "Category:Lists of National Hockey League players" exists, but it's a category of lists of players. Much of the categorization of Wikipedia turns out to be in lists, not categories. I could write a webscrapper for this but that would probably be very unreliable. Is there a standardized way to deal with lists and sublists that I might have missed? I don't mind write a bunch of code to recursively crawl sublists and expand them. But I would like to avoid something as not-standard as web scrapping the content because it will be very fragile. Thank you for the help, -mike

6 years, 3 months

Fair Use of The MediaWiki API In A Game‏

by Tal Barda

Hello! I hope this is the correct place to ask this question. My name is Tal, and I'm developing my own version of the Wiki Game <https://en.wikipedia.org/wiki/Wikipedia:Wiki_Game> (or Wiki Race, some may say) for Android devices. There is a great multiplayer version <https://thewikigame.com/>of it for the iPhone and Web, but I think there should be a worthy version to the Android. My version is still in progress, and I put a lot of effort into it, including the gameplay experience (single-player and multiplayer) and the design. _________________________ I'm writing this email because I'd like to consult with some wiki experts about one of the main things which on my mind, and it's my usage of the Mediawiki API. I'd like to launch the game and upload it to the Play Store in a few months, but I can't keep working on it while knowing that I might put my efforts in danger if I will do so, because of unfair usage of the Wiki API. Well, the game is based on generating two random articles: the first article (the starting point) and the target article (the end point) which the user should reach to using only the hyperlinks inside the articles. The thing is, that unlike the currently exist Wiki Game <https://thewikigame.com/> version which is played by many users together at the same time (by joining "game rooms"), or using pre-generated articles, - I'm generating the articles live for each user individually. Each user can play in single-player mode and generate his own pair of articles, his own game, a thing which is gonna keep the game infinite for all of the players. My generator generates two articles before each game and the user has 15 seconds to decide whether he would like to start the game or not. After the 15 seconds are up the generator is generating a new pair of articles. While the time is running, the user may refresh the articles manually and ask to generate a new pair. So why am I worried about my app's usage of the API? Well, not all of the target articles are reachable through the hyperlinks in the mobile version of Wikipedia, because the templates view in the mobile version is very limited and there are articles without references from other articles, and I don't want the users to get stuck. In order to overcome this , and make sure that the target article is actually reachable, I need to make about 5 requests to the REST API each articles generation (one request after another, not together). The algorithm was made after a very long time of research by me, and I think its efficient for the purpose. So, I assume that the average player is gonna play about an hour in a day, so he is gonna refresh his pair about 15 times, which are 75 requests from the API in a given day for a user. Let's say (hopefully and hypothetically) the game goes pretty viral - well, there are gonna be a lot of requests to the API at the same time, and this makes me unsure about my way of action. I don't mind sending my random articles generator code (as long it is not copied :P ) for a review, and work together with somebody to make the game better and fair, and I'm also willing to donate a part of the game incomes (if there will be any) to the Wikimedia Foundation, but there is one thing I'm not gonna do, and it's launching the game before I'm making sure I do a fair use of the API. I know Wikipedia is taking care millions and billions of requests everyday, and still - I can't move on without knowing everything is fine. This information <https://www.mediawiki.org/wiki/API:Etiquette> about the usage limits is not very explicit, and it doesn't put the line between good and bad, and maybe I'm using it badly. I will accept every answer respectfully. Thanks in advance, Tal

6 years, 3 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Mediawiki-api January 2018