On Jun 14, 2014 4:54 AM, "Maximilian Klein" isalix@gmail.com wrote:
Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to signal and badge when a reference in Wikipedia is Open Access source. I'm writing the bot at the moment to do this, and I'm encountering a question - how do I keep track of the values of the template {{Cite doi | doi=value}}, in as close to real-time as possible?
The most efficient approach I can come up with is to query the SQL servers on Labs in constant loop, returning the results of "What transcludes {{Cite doi}}" and seeing if the last_edited timestamp is newer than previous? If the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every time the page has been edited?
The API can provide the list of URLs on the page, which may be enough for you.
It sounds like you want a API hook which returns (only) the reference metadata for a page. That would be lovely. I would love to see an option to provide those results in Zotero's JSON format.
COinS metadata can be downloaded in structured form.
1. get a list of sections
https://en.wikipedia.org/w/api.php?action=parse&prop=sections&page=C...
2. fetch the formatted HTML for the relevant section(s)
https://en.wikipedia.org/w/api.php?action=mobileview&prop=text&page=...
3. extract out the COinS metadata
Look in the page from 2. above - it should contain 'reference-*'
-- John Vandenberg