Dear Wiki API team,
I work for Glaxo SmithKline, a pharmaceutical company, and currently preparing for a project we are planning to do with the use of Wikipedia. We will use our computational tools to assess the accuracy of the drug-related information that can be found on Wikipedia. We will use the open database Open PHACTS as a "gold standard" and compare the information on Wikipedia to this. On the drug pages of Wikipedia there is normally a data table on the right side with the drug/chemical/pharma related information. Our plan is, if this is possible to carry out, to assess the accuracy of this information and if necessary, correct/update it from our database. If the time constrainsts allow us, I would like to also automatically write some very basic articles on drugs which currently do not have an entry on Wikipedia.
My questions are the following: How do I obtain an API key? On the api home page I saw that I may need a special key if I would like to do so many queries. What are the limitations? Is it possible to carry out the process I have described or should I find a different approach? Previously I thought about using SPARQL to query DBpedia but I found that in the conversion process many of the strings which are important to us chemists (SMILES representations of chemical compounds) get changed because of the special characters.
Thank you very much for your response.
Best wishes,
Luca Bartek Computational Scientist Complementary Worker on Assignment at GSK
GSK Medicines Research Centre, Gunnels Wood Road, Stevenage, Hertfordshire, SG1 2NY, UK Email luca.x.bartek@gsk.commailto:monika.x.rella@gsk.com Tel +44 1438 762 778
gsk.comhttp://www.gsk.com/ | Twitterhttp://twitter.com/GSK | YouTubehttp://www.youtube.com/user/gskvision | Facebookhttp://www.facebook.com/glaxosmithkline | Flickrhttp://www.flickr.com/photos/glaxosmithkline
[cid:image001.png@01CFFDB0.BBC53A10]
________________________________
This e-mail was sent by GlaxoSmithKline Services Unlimited (registered in England and Wales No. 1047315), which is a member of the GlaxoSmithKline group of companies. The registered address of GlaxoSmithKline Services Unlimited is 980 Great West Road, Brentford, Middlesex TW8 9GS.
On Wed, Jun 10, 2015 at 9:48 AM, Luca Bartek luca.x.bartek@gsk.com wrote:
I work for Glaxo SmithKline, a pharmaceutical company, and currently preparing for a project we are planning to do with the use of Wikipedia.
Specifically the English-language Wikipedia, I presume? Or are you going to check other languages as well?
We will use our computational tools to assess the accuracy of the drug-related information that can be found on Wikipedia. We will use the open database Open PHACTS as a “gold standard” and compare the information on Wikipedia to this.
This reminds me of a report https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-05-27/Recent_research#German_study_finds_Wikipedia.27s_pharma_articles_accurate_and_largely_complete on a similar study I heard about recently.
On the drug pages of Wikipedia there is normally a data table on the right side with the drug/chemical/pharma related information. Our plan is, if this is possible to carry out, to assess the accuracy of this information and if necessary, correct/update it from our database. If the time constrainsts allow us, I would like to also automatically write some very basic articles on drugs which currently do not have an entry on Wikipedia.
Personally, as a volunteer editor on the English Wikipedia, I like the idea of verifying and correcting our information with reference to reliable sources!
Do be aware of the editing policies if you perform any edits, though. For the English Wikipedia, I recommend:
- Talk to the people at WikiProject Pharmacology https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Pharmacology! That would probably be the best place to connect with the people who are already editing the articles you're interested in, and they can help you avoid various pitfalls. - Don't try to make a "company" account, do the edits as an individual person. See this part of the username policy https://en.wikipedia.org/wiki/Wikipedia:Username_policy#Usernames_implying_shared_use in particular for details. - Review the policy on financial conflicts of interest https://en.wikipedia.org/wiki/Wikipedia:Conflict_of_interest#Financial_conflict_of_interest and follow it as closely as possible. Expect close scrutiny, as Wikipedia editors are very wary of corporations trying to use Wikipedia as a vehicle for advertisement or other PR purposes. - If you're planning on having a program make these edits in an automated manner (i.e. without human review of each one to ensure it's correct and properly formatted), review the bot policy https://en.wikipedia.org/wiki/Wikipedia:Bot_policy. - Automatic article creation in particular can be problematic as it can strain the capacity of our editors. Again, the people at WikiProject Pharmacology https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Pharmacology should be able to help, work with them to ensure the to-be-created articles will be properly formatted and useful and to have a plan for reviewing them.
For other-language Wikipedias you'd want to look for similar things, but I don't know them well enough to give you links or to advise you on how their policies might differ.
My questions are the following: How do I obtain an API key? On the api home page I saw that I may need a special key if I would like to do so many queries.
There aren't any special API keys for the usual API accessed via api.php. Please reply with a link to the page that recommended one?
The closest thing I can think of is that certain limits such as the number of pages that can be retrieved in a single request can be raised by using a logged-in account that has been granted the "bot flag". But this doesn't give any additional access, it just allows for fetching information using fewer requests.
What are the limitations?
In general, follow the advice in https://www.mediawiki.org/wiki/API:Etiquette and you should be good.
Is it possible to carry out the process I have described or should I find a different approach?
Besides fetching the page content via the API, your other option is to download a database dump https://dumps.wikimedia.org/ to process articles offline.
Do note that the API accessed via api.php is concerned with fetching the content of the page itself; you'll likely have to write your own code to extract the data you need from the wikitext, or adapt client code from elsewhere (such as this python library https://github.com/earwig/mwparserfromhell, for example) to do so.
mediawiki-api@lists.wikimedia.org