hi all,
I'm seek interlanguage link edits that were undertaken by users who are no bots, these users for interlanguage link edits would usually be active before bots come in to do their work
maybe it is a good path to look at the summaries of the first interwiki bot edits per entry and take what happened before that point in time?
if yes, it should be great if anyone on this list could run such a search for me, am ready to provide details about what is needed,
thanks & cheers, Claudia koltzenburg@w4w.net
Unfortunately, bot flag is only stored in recent changes table of the database, and there is no chance to filter bot edits from page histories. :-(( So either you use recent changes for the recent iw edits or you may guess from contributor's name and edit summary whether it was a bot or a human. In an ideal world, first author of an article would provide interwikis that may be traced through other projects, and sometimes this is really the case.
2012/5/8 koltzenburg@w4w.net
hi all,
I'm seek interlanguage link edits that were undertaken by users who are no bots, these users for interlanguage link edits would usually be active before bots come in to do their work
maybe it is a good path to look at the summaries of the first interwiki bot edits per entry and take what happened before that point in time?
if yes, it should be great if anyone on this list could run such a search for me, am ready to provide details about what is needed,
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
On 8 May 2012 15:23, Bináris wikiposta@gmail.com wrote:
Unfortunately, bot flag is only stored in recent changes table of the database, and there is no chance to filter bot edits from page histories.
This is not completely true - the bot flag is also a property of the user account. You can query e.g. http://nl.wikipedia.org/w/index.php?title=Speciaal:Gebruikerslijst&offse...
I'm not sure if there is an easy way to query this from pywikipedia, though.
Best, Merlijn
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of the user account. You can query e.g.
http://nl.wikipedia.org/w/index.php?title=Speciaal:Gebruikerslijst&offse...
Yes, that's true. And if you want to be quite accurate, you must also determine the date of acquiring the bot flag from bureau logs and compare it to the page history. :-)
To get a list of bots you may use:
query=wikipedia.query.GetData({'action':'query', 'list':'allusers', 'augroup':'sysop', 'aulimit':'500'},useAPI=True)
This way you get local flagged ones, so you'll need to make the same thing to global bots. On svn you have also botlist.py, which downloads a list of flagged bots.
Considering that in almost projects bot must use the word "Bot" on summary, you can also work around this, or use the summary factory of interwiki.py to exclude the py interwiki bots.
Alchimista
2012/5/8 Bináris wikiposta@gmail.com
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of the user account. You can query e.g.
http://nl.wikipedia.org/w/index.php?title=Speciaal:Gebruikerslijst&offse...
Yes, that's true. And if you want to be quite accurate, you must also determine the date of acquiring the bot flag from bureau logs and compare it to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like "FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote:
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of the user account. You can query e.g.
http://nl.wikipedia.org/w/index.php?title=Speciaal:Gebruikerslijst&offse...
Yes, that's true. And if you want to be quite accurate, you must also determine the date of acquiring the bot flag from bureau logs and compare it to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
hi Bináris, Merlijn, Alchimista, and Morten,
thank you very much does anyone of you remember hearing a very new type of song, and being fascinated for sure but not quite trusting your ears?
btw, on his talk page yesterday, JAn came up with an idea that sounds like "new song" to me, too: http://cs.wikipedia.org/w/index.php?title=Diskuse_s_wikipedistou:JAn_Dud%C3%...
Morten said
Hope some of this helps, let me know if there's any questions.
I guess there are, Morten, thanks :-)
Q: being in none of the special Wikipedia roles, which of these ideas would I be able to try out by myself?
btw, thanks for asking @Morten,
cheers, Claudia
On Tue, 8 May 2012 10:01:23 -0500, Morten Wang wrote
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like "FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote:
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of the user account. You can query e.g.
title=Speciaal:Gebruikerslijst&offset=&limit=500&group=bot&uselang=en
Yes, that's true. And if you want to be quite accurate, you must also determine the date of acquiring the bot flag from bureau logs and compare it to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Claudia asked:
Q: being in none of the special Wikipedia roles, which of these ideas would I be able to try out by myself?
All the metadata is available through the Wikipedia API, and the Pywikipediabot framework makes a lot of it easily accessible, so if you know how to program in Python, it's doable :)
Cheers, Morten
On 8 May 2012 10:40, koltzenburg@w4w.net wrote:
hi Bináris, Merlijn, Alchimista, and Morten,
thank you very much does anyone of you remember hearing a very new type of song, and being fascinated for sure but not quite trusting your ears?
btw, on his talk page yesterday, JAn came up with an idea that sounds like "new song" to me, too: http://cs.wikipedia.org/w/index.php?title=Diskuse_s_wikipedistou:JAn_Dud%C3%...
Morten said
Hope some of this helps, let me know if there's any questions.
I guess there are, Morten, thanks :-)
Q: being in none of the special Wikipedia roles, which of these ideas would I be able to try out by myself?
btw, thanks for asking @Morten,
cheers, Claudia
On Tue, 8 May 2012 10:01:23 -0500, Morten Wang wrote
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like "FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote:
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of the user account. You can query e.g.
title=Speciaal:Gebruikerslijst&offset=&limit=500&group=bot&uselang=en
Yes, that's true. And if you want to be quite accurate, you must also determine the date of acquiring the bot flag from bureau logs and compare it to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks, Morten:
if you know how to program in Python
well, I don't, ... not yet, that is, one reason why I came here to ask :-)
my favourite is b.: a. find the time and set the priorities to do it myself b. ask sb else to support me by programming this query c. follow a different kind of interest and not rely on the Wikipedia community
anyone up for b.?
thanks C.
On Tue, 8 May 2012 11:00:22 -0500, Morten Wang wrote
Claudia asked:
Q: being in none of the special Wikipedia roles, which of these ideas would I be able to try out by
myself?
All the metadata is available through the Wikipedia API, and the Pywikipediabot framework makes a lot of it easily accessible, so if you know how to program in Python, it's doable :)
Cheers, Morten
On 8 May 2012 10:40, koltzenburg@w4w.net wrote:
hi Bináris, Merlijn, Alchimista, and Morten,
thank you very much does anyone of you remember hearing a very new type of song, and being fascinated for sure but not
quite
trusting your ears?
btw, on his talk page yesterday, JAn came up with an idea that sounds like "new song" to me, too: http://cs.wikipedia.org/w/index.php?
title=Diskuse_s_wikipedistou:JAn_Dudík&diff=8497947&oldid=8497773
Morten said
Hope some of this helps, let me know if there's any questions.
I guess there are, Morten, thanks :-)
Q: being in none of the special Wikipedia roles, which of these ideas would I be able to try out by
myself?
btw, thanks for asking @Morten,
cheers, Claudia
On Tue, 8 May 2012 10:01:23 -0500, Morten Wang wrote
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like "FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote:
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of the user account. You can query e.g.
title=Speciaal:Gebruikerslijst&offset=&limit=500&group=bot&uselang=en
Yes, that's true. And if you want to be quite accurate, you must also determine the date of acquiring the bot flag from bureau logs and compare it to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Claudia, did you already got help on this?
Alchimista [[pt:user:Alchimista]]
2012/5/9 koltzenburg@w4w.net
thanks, Morten:
if you know how to program in Python
well, I don't, ... not yet, that is, one reason why I came here to ask :-)
my favourite is b.: a. find the time and set the priorities to do it myself b. ask sb else to support me by programming this query c. follow a different kind of interest and not rely on the Wikipedia community
anyone up for b.?
thanks C.
On Tue, 8 May 2012 11:00:22 -0500, Morten Wang wrote
Claudia asked:
Q: being in none of the special Wikipedia roles, which of these ideas
would I be able to try out by myself?
All the metadata is available through the Wikipedia API, and the Pywikipediabot framework makes a lot of it easily accessible, so if you know how to program in Python, it's doable :)
Cheers, Morten
On 8 May 2012 10:40, koltzenburg@w4w.net wrote:
hi Bináris, Merlijn, Alchimista, and Morten,
thank you very much does anyone of you remember hearing a very new type of song, and being
fascinated for sure but not quite
trusting your ears?
btw, on his talk page yesterday, JAn came up with an idea that sounds
like "new song" to me, too:
title=Diskuse_s_wikipedistou:JAn_Dudík&diff=8497947&oldid=8497773
Morten said
Hope some of this helps, let me know if there's any questions.
I guess there are, Morten, thanks :-)
Q: being in none of the special Wikipedia roles, which of these ideas
would I be able to try out by myself?
btw, thanks for asking @Morten,
cheers, Claudia
On Tue, 8 May 2012 10:01:23 -0500, Morten Wang wrote
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like "FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote:
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of
the
user account. You can query e.g.
title=Speciaal:Gebruikerslijst&offset=&limit=500&group=bot&uselang=en
Yes, that's true. And if you want to be quite accurate, you must
also
determine the date of acquiring the bot flag from bureau logs and
compare it
to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Since this is not a 2-line-code I guess it would be the best to make feature request for that. Or we have option d: wait for wikidata, maybe this request is obsolete til then. I've implemented an option to search for several comments like "interwiki", "+iw" etc. This gives not all interwiki changes but some of them. If this is of some interest I could commit it soon.
regards xqt
----- Original Nachricht ---- Von: Alchimista alchimistawp@gmail.com An: Pywikipedia discussion list pywikipedia-l@lists.wikimedia.org Datum: 21.05.2012 22:19 Betreff: Re: [Pywikipedia-l] anyone? Re: how search for non-bot interlanguage link edits
Claudia, did you already got help on this?
Alchimista [[pt:user:Alchimista]]
2012/5/9 koltzenburg@w4w.net
thanks, Morten:
if you know how to program in Python
well, I don't, ... not yet, that is, one reason why I came here to ask :-)
my favourite is b.: a. find the time and set the priorities to do it myself b. ask sb else to support me by programming this query c. follow a different kind of interest and not rely on the Wikipedia community
anyone up for b.?
thanks C.
On Tue, 8 May 2012 11:00:22 -0500, Morten Wang wrote
Claudia asked:
Q: being in none of the special Wikipedia roles, which of these ideas
would I be able to try out by myself?
All the metadata is available through the Wikipedia API, and the Pywikipediabot framework makes a lot of it easily accessible, so if you know how to program in Python, it's doable :)
Cheers, Morten
On 8 May 2012 10:40, koltzenburg@w4w.net wrote:
hi Bináris, Merlijn, Alchimista, and Morten,
thank you very much does anyone of you remember hearing a very new type of song, and
being
fascinated for sure but not quite
trusting your ears?
btw, on his talk page yesterday, JAn came up with an idea that sounds
like "new song" to me, too:
title=Diskuse_s_wikipedistou:JAn_Dudík&diff=8497947&oldid=8497773
Morten said
Hope some of this helps, let me know if there's any questions.
I guess there are, Morten, thanks :-)
Q: being in none of the special Wikipedia roles, which of these ideas
would I be able to try out by myself?
btw, thanks for asking @Morten,
cheers, Claudia
On Tue, 8 May 2012 10:01:23 -0500, Morten Wang wrote
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like
"FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote:
2012/5/8 Merlijn van Deen valhallasw@arctus.nl > > > This is not completely true - the bot flag is also a property of
the
> user account. You can query e.g. > > http://nl.wikipedia.org/w/index.php?
title=Speciaal:Gebruikerslijst&offset=&limit=500&group=bot&uselang=en
>
Yes, that's true. And if you want to be quite accurate, you must
also
determine the date of acquiring the bot flag from bureau logs and
compare it
to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Alchimista http://pt.wikipedia.org/wiki/Utilizador:Alchimista
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks, Alchimista & xqt, for picking up this thread
xqt said:
I've implemented an option to search for several comments like "interwiki", "+iw" etc. This gives not all interwiki changes but some of them. If this is of some interest I could commit it soon.
yes, this sounds very interesting for my purpose, it might get me a big step ahead pretty soon
please keep me updated on your results, xqt
thanks & cheers Claudia
On Tue, 22 May 2012 07:09:38 +0200 (CEST), info wrote
Since this is not a 2-line-code I guess it would be the best to make feature request for that. Or we have option d: wait for wikidata, maybe this request is obsolete til then. I've implemented an option to search for several comments like "interwiki", "+iw" etc. This gives not all interwiki changes but some of them. If this is of some interest I could commit it soon.
regards xqt
----- Original Nachricht ---- Von: Alchimista alchimistawp@gmail.com An: Pywikipedia discussion list pywikipedia-l@lists.wikimedia.org Datum: 21.05.2012 22:19 Betreff: Re: [Pywikipedia-l] anyone? Re: how search for non-bot interlanguage link edits
Claudia, did you already got help on this?
Alchimista [[pt:user:Alchimista]]
2012/5/9 koltzenburg@w4w.net
thanks, Morten:
if you know how to program in Python
well, I don't, ... not yet, that is, one reason why I came here to ask :-)
my favourite is b.: a. find the time and set the priorities to do it myself b. ask sb else to support me by programming this query c. follow a different kind of interest and not rely on the Wikipedia community
anyone up for b.?
thanks C.
On Tue, 8 May 2012 11:00:22 -0500, Morten Wang wrote
Claudia asked:
Q: being in none of the special Wikipedia roles, which of these ideas
would I be able to try out by myself?
All the metadata is available through the Wikipedia API, and the Pywikipediabot framework makes a lot of it easily accessible, so if you know how to program in Python, it's doable :)
Cheers, Morten
On 8 May 2012 10:40, koltzenburg@w4w.net wrote:
hi Bináris, Merlijn, Alchimista, and Morten,
thank you very much does anyone of you remember hearing a very new type of song, and
being
fascinated for sure but not quite
trusting your ears?
btw, on his talk page yesterday, JAn came up with an idea that sounds
like "new song" to me, too:
title=Diskuse_s_wikipedistou:JAn_Dudík&diff=8497947&oldid=8497773
Morten said
Hope some of this helps, let me know if there's any questions.
I guess there are, Morten, thanks :-)
Q: being in none of the special Wikipedia roles, which of these ideas
would I be able to try out by myself?
btw, thanks for asking @Morten,
cheers, Claudia
On Tue, 8 May 2012 10:01:23 -0500, Morten Wang wrote
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like
"FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote: > 2012/5/8 Merlijn van Deen valhallasw@arctus.nl >> >> >> This is not completely true - the bot flag is also a property of
the
>> user account. You can query e.g. >> >> http://nl.wikipedia.org/w/index.php?
title=Speciaal:Gebruikerslijst&offset=&limit=500&group=bot&uselang=en
>> > > Yes, that's true. And if you want to be quite accurate, you must
also
> determine the date of acquiring the bot flag from bureau logs and
compare it
> to the page history. :-) > > -- > Bináris > > _______________________________________________ > Pywikipedia-l mailing list > Pywikipedia-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l >
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
-- Alchimista http://pt.wikipedia.org/wiki/Utilizador:Alchimista
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
thanks & cheers, Claudia koltzenburg@w4w.net
Morten said:
if you know how to program in Python, it's doable :)
ok, where exactly do I go with my lines of code to do the search?
thanks ck
On Tue, 8 May 2012 11:00:22 -0500, Morten Wang wrote
Claudia asked:
Q: being in none of the special Wikipedia roles, which of these ideas would I be able to try out by
myself?
All the metadata is available through the Wikipedia API, and the Pywikipediabot framework makes a lot of it easily accessible, so if you know how to program in Python, it's doable :)
Cheers, Morten
On 8 May 2012 10:40, koltzenburg@w4w.net wrote:
hi Bináris, Merlijn, Alchimista, and Morten,
thank you very much does anyone of you remember hearing a very new type of song, and being fascinated for sure but not
quite
trusting your ears?
btw, on his talk page yesterday, JAn came up with an idea that sounds like "new song" to me, too: http://cs.wikipedia.org/w/index.php?
title=Diskuse_s_wikipedistou:JAn_Dudík&diff=8497947&oldid=8497773
Morten said
Hope some of this helps, let me know if there's any questions.
I guess there are, Morten, thanks :-)
Q: being in none of the special Wikipedia roles, which of these ideas would I be able to try out by
myself?
btw, thanks for asking @Morten,
cheers, Claudia
On Tue, 8 May 2012 10:01:23 -0500, Morten Wang wrote
I did some data gathering last fall that is more or less the same as Claudia is asking about. Looking up the bot flag, or checking the username is often regarded as a reasonable way of filtering out the bots. I chose to apply both, if there's no bot flag we look for a typical bot signature in the username (regex: "bot$| ", username either ends with bot or a part of it does), and used a case-insensitive match since some users have usernames like "FoObOt".
Checking the edit history to find when interwiki links were first added can be time-consuming if the page had lots of activity. I therefore chose to use a binary search, halving the distance between two test points until either the actual edit is found, or we're down to so few edits that all can be efficiently grabbed through the API (e.g. using Pywikibot's PreloadingGenerator). Otherwise you might be examining thousands of edits for no reason.
Having Toolserver access simplifies the process a lot since all the metadata is more easily accessible, but the revision text will still have to be grabbed from the API.
Hope some of this helps, let me know if there's any questions.
Cheers, Morten
On 8 May 2012 08:39, Bináris wikiposta@gmail.com wrote:
2012/5/8 Merlijn van Deen valhallasw@arctus.nl
This is not completely true - the bot flag is also a property of the user account. You can query e.g.
title=Speciaal:Gebruikerslijst&offset=&limit=500&group=bot&uselang=en
Yes, that's true. And if you want to be quite accurate, you must also determine the date of acquiring the bot flag from bureau logs and compare it to the page history. :-)
-- Bináris
Pywikipedia-l mailing list Pywikipedia-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l