On Wed, Jun 10, 2015 at 9:48 AM, Luca Bartek <luca.x.bartek@gsk.com> wrote:
I work for Glaxo SmithKline, a pharmaceutical company, and currently preparing for a project we are planning to do with the use of Wikipedia.

Specifically the English-language Wikipedia, I presume? Or are you going to check other languages as well?
 

We will use our computational tools to assess the accuracy of the drug-related information that can be found on Wikipedia. We will use the open database Open PHACTS as a “gold standard” and compare the information on Wikipedia to this.


This reminds me of a report on a similar study I heard about recently.
 

On the drug pages of Wikipedia there is normally a data table on the right side with the drug/chemical/pharma related information. Our plan is, if this is possible to carry out, to assess the accuracy of this information and if necessary, correct/update it from our database. If the time constrainsts allow us, I would like to also automatically write some very basic articles on drugs which currently do not have an entry on Wikipedia.


Personally, as a volunteer editor on the English Wikipedia, I like the idea of verifying and correcting our information with reference to reliable sources!

Do be aware of the editing policies if you perform any edits, though. For the English Wikipedia, I recommend:
  • Talk to the people at WikiProject Pharmacology! That would probably be the best place to connect with the people who are already editing the articles you're interested in, and they can help you avoid various pitfalls.
  • Don't try to make a "company" account, do the edits as an individual person. See this part of the username policy in particular for details.
  • Review the policy on financial conflicts of interest and follow it as closely as possible. Expect close scrutiny, as Wikipedia editors are very wary of corporations trying to use Wikipedia as a vehicle for advertisement or other PR purposes.
  • If you're planning on having a program make these edits in an automated manner (i.e. without human review of each one to ensure it's correct and properly formatted), review the bot policy.
  • Automatic article creation in particular can be problematic as it can strain the capacity of our editors. Again, the people at WikiProject Pharmacology should be able to help, work with them to ensure the to-be-created articles will be properly formatted and useful and to have a plan for reviewing them.

For other-language Wikipedias you'd want to look for similar things, but I don't know them well enough to give you links or to advise you on how their policies might differ.

My questions are the following: How do I obtain an API key? On the api home page I saw that I may need a special key if I would like to do so many queries.


There aren't any special API keys for the usual API accessed via api.php. Please reply with a link to the page that recommended one?

The closest thing I can think of is that certain limits such as the number of pages that can be retrieved in a single request can be raised by using a logged-in account that has been granted the "bot flag". But this doesn't give any additional access, it just allows for fetching information using fewer requests.
 

What are the limitations?


In general, follow the advice in https://www.mediawiki.org/wiki/API:Etiquette and you should be good.
 

Is it possible to carry out the process I have described or should I find a different approach?


Besides fetching the page content via the API, your other option is to download a database dump to process articles offline.

Do note that the API accessed via api.php is concerned with fetching the content of the page itself; you'll likely have to write your own code to extract the data you need from the wikitext, or adapt client code from elsewhere (such as this python library, for example) to do so.


--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation