Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to signal and badge when a reference in Wikipedia is Open Access source. I'm writing the bot at the moment to do this, and I'm encountering a question - how do I keep track of the values of the template {{Cite doi | doi=value}}, in as close to real-time as possible?
The most efficient approach I can come up with is to query the SQL servers on Labs in constant loop, returning the results of "What transcludes {{Cite doi}}" and seeing if the last_edited timestamp is newer than previous? If the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every time the page has been edited?
Thanks in advance,
Max Klein ‽ http://notconfusing.com/
[1] https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_Access/Signalling_O...
If I understand correctly,
{{cite doi|10.1103/RevModPhys.69.865}}
essentially includes https://en.wikipedia.org/wiki/Template:Cite_doi/10.1103.2FRevModPhys.69.865 ; if that page does not exist, a bot will fill it (e.g. https://en.wikipedia.org/w/index.php?title=Template:Cite_doi/10.1103.2FRevMo... ).
As such, you could just edit pages with the prefix "Template:Cite_doi/". You'd just have to check for new pages there, and that should be easy to do with a database query.
Merlijn
On 13 June 2014 23:53, Maximilian Klein isalix@gmail.com wrote:
Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to signal and badge when a reference in Wikipedia is Open Access source. I'm writing the bot at the moment to do this, and I'm encountering a question - how do I keep track of the values of the template {{Cite doi | doi=value}}, in as close to real-time as possible?
The most efficient approach I can come up with is to query the SQL servers on Labs in constant loop, returning the results of "What transcludes {{Cite doi}}" and seeing if the last_edited timestamp is newer than previous? If the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every time the page has been edited?
Thanks in advance,
Max Klein ‽ http://notconfusing.com/
[1]
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_Access/Signalling_O... _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That is a a great Idea for just dealing with those cases of {{Cite doi}}. However I just realized from your response that the scope of this problem is larger, {{Cite journal|doi=}} also contains dois that the project wants to keep track of. Is there a way to know when {{Cite journal}} changes?
I suppose the hack could be to make {{Cite journal}} call another template which we can track. Or have Lua somehow report out what DOIs are in use.
Max Klein ‽ http://notconfusing.com/
On Sat, Jun 14, 2014 at 3:14 AM, Merlijn van Deen valhallasw@arctus.nl wrote:
If I understand correctly,
{{cite doi|10.1103/RevModPhys.69.865}}
essentially includes https://en.wikipedia.org/wiki/Template:Cite_doi/10.1103.2FRevModPhys.69.865 ; if that page does not exist, a bot will fill it (e.g.
https://en.wikipedia.org/w/index.php?title=Template:Cite_doi/10.1103.2FRevMo... ).
As such, you could just edit pages with the prefix "Template:Cite_doi/". You'd just have to check for new pages there, and that should be easy to do with a database query.
Merlijn
On 13 June 2014 23:53, Maximilian Klein isalix@gmail.com wrote:
Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to
signal
and badge when a reference in Wikipedia is Open Access source. I'm
writing
the bot at the moment to do this, and I'm encountering a question - how
do
I keep track of the values of the template {{Cite doi | doi=value}}, in
as
close to real-time as possible?
The most efficient approach I can come up with is to query the SQL
servers
on Labs in constant loop, returning the results of "What transcludes
{{Cite
doi}}" and seeing if the last_edited timestamp is newer than previous? If the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every
time
the page has been edited?
Thanks in advance,
Max Klein ‽ http://notconfusing.com/
[1]
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_Access/Signalling_O...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm sending this from my mobile, but doesn't the cite doi generate an external link?
On Saturday, June 14, 2014, Maximilian Klein isalix@gmail.com wrote:
That is a a great Idea for just dealing with those cases of {{Cite doi}}. However I just realized from your response that the scope of this problem is larger, {{Cite journal|doi=}} also contains dois that the project wants to keep track of. Is there a way to know when {{Cite journal}} changes?
I suppose the hack could be to make {{Cite journal}} call another template which we can track. Or have Lua somehow report out what DOIs are in use.
Max Klein ‽ http://notconfusing.com/
On Sat, Jun 14, 2014 at 3:14 AM, Merlijn van Deen <valhallasw@arctus.nl javascript:;> wrote:
If I understand correctly,
{{cite doi|10.1103/RevModPhys.69.865}}
essentially includes
https://en.wikipedia.org/wiki/Template:Cite_doi/10.1103.2FRevModPhys.69.865
; if that page does not exist, a bot will fill it (e.g.
https://en.wikipedia.org/w/index.php?title=Template:Cite_doi/10.1103.2FRevMo...
).
As such, you could just edit pages with the prefix "Template:Cite_doi/". You'd just have to check for new pages there, and that should be easy to
do
with a database query.
Merlijn
On 13 June 2014 23:53, Maximilian Klein <isalix@gmail.com javascript:;>
wrote:
Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to
signal
and badge when a reference in Wikipedia is Open Access source. I'm
writing
the bot at the moment to do this, and I'm encountering a question - how
do
I keep track of the values of the template {{Cite doi | doi=value}}, in
as
close to real-time as possible?
The most efficient approach I can come up with is to query the SQL
servers
on Labs in constant loop, returning the results of "What transcludes
{{Cite
doi}}" and seeing if the last_edited timestamp is newer than previous?
If
the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every
time
the page has been edited?
Thanks in advance,
Max Klein ‽ http://notconfusing.com/
[1]
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_Access/Signalling_O...
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/wikitech-l
If the Abuse filter is smart enough to detect every edit that changes a value in those templates, it can attach a tag to those edits. Then, it would be easy to follow recent changes for that tag.
Il 13/06/2014 23:53, Maximilian Klein ha scritto:
Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to signal and badge when a reference in Wikipedia is Open Access source. I'm writing the bot at the moment to do this, and I'm encountering a question - how do I keep track of the values of the template {{Cite doi | doi=value}}, in as close to real-time as possible?
The most efficient approach I can come up with is to query the SQL servers on Labs in constant loop, returning the results of "What transcludes {{Cite doi}}" and seeing if the last_edited timestamp is newer than previous? If the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every time the page has been edited?
Thanks in advance,
Max Klein ‽ http://notconfusing.com/
[1] https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_Access/Signalling_O...
Ok, taking a closer look all you need to do is to track external link usage. It appears that all the cite templates use a http://dx.doi.org/XXXXXXXXXXXXXX format URL.
write a program to parse/keep track of the uses It shouldnt be that hard.
John, and John,
Using the External URL search is a good idea. I think I'm going to do this on the database replicas too.
Thanks for the tips.
Max Klein ‽ http://notconfusing.com/
On Mon, Jun 16, 2014 at 12:25 PM, John phoenixoverride@gmail.com wrote:
Ok, taking a closer look all you need to do is to track external link usage. It appears that all the cite templates use a http://dx.doi.org/XXXXXXXXXXXXXX format URL.
write a program to parse/keep track of the uses It shouldnt be that hard. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Ricordisamoa,
That's a clever idea. It's not exactly what the Abuse Filter is for, but I made a request for that tag anyway. Thanks for the idea.
Max Klein ‽ http://notconfusing.com/
On Sat, Jun 14, 2014 at 12:02 PM, Ricordisamoa <ricordisamoa@openmailbox.org
wrote:
If the Abuse filter is smart enough to detect every edit that changes a value in those templates, it can attach a tag to those edits. Then, it would be easy to follow recent changes for that tag.
Il 13/06/2014 23:53, Maximilian Klein ha scritto:
Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to signal and badge when a reference in Wikipedia is Open Access source. I'm writing the bot at the moment to do this, and I'm encountering a question - how do I keep track of the values of the template {{Cite doi | doi=value}}, in as close to real-time as possible?
The most efficient approach I can come up with is to query the SQL servers on Labs in constant loop, returning the results of "What transcludes {{Cite doi}}" and seeing if the last_edited timestamp is newer than previous? If the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every time the page has been edited?
Thanks in advance,
Max Klein ‽ http://notconfusing.com/
[1] https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Open_ Access/Signalling_OA-ness
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Jun 14, 2014 4:54 AM, "Maximilian Klein" isalix@gmail.com wrote:
Hello All,
I'm working on the Open-Access Signalling Project[1], which aims to signal and badge when a reference in Wikipedia is Open Access source. I'm writing the bot at the moment to do this, and I'm encountering a question - how do I keep track of the values of the template {{Cite doi | doi=value}}, in as close to real-time as possible?
The most efficient approach I can come up with is to query the SQL servers on Labs in constant loop, returning the results of "What transcludes {{Cite doi}}" and seeing if the last_edited timestamp is newer than previous? If the last_edit is newer, then get the content of the page and see if the {{Cite_doi}} value has changed, checking against a local database.
This seems horribly inefficient still. Is there a hook to know when a template on a page has been edited, rather than having to check every time the page has been edited?
The API can provide the list of URLs on the page, which may be enough for you.
It sounds like you want a API hook which returns (only) the reference metadata for a page. That would be lovely. I would love to see an option to provide those results in Zotero's JSON format.
COinS metadata can be downloaded in structured form.
1. get a list of sections
https://en.wikipedia.org/w/api.php?action=parse&prop=sections&page=C...
2. fetch the formatted HTML for the relevant section(s)
https://en.wikipedia.org/w/api.php?action=mobileview&prop=text&page=...
3. extract out the COinS metadata
Look in the page from 2. above - it should contain 'reference-*'
-- John Vandenberg
wikitech-l@lists.wikimedia.org