Hey folks,
I was running a script to update the revert tables on db1047 with stat1 two days ago that had some bad disk access patterns. (FYI, don't use python shelve as an on-disk cache of a dict().) As soon as I saw the load come up, I killed the script. For any difficulty that occurred in the meantime, I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine:
- sessions.py - Updating session table on db1047. Useful for measuring editor labor hours. - reverts.py - Updating revert tables on db1047. Fixed to not need a disk cache.
Both of these processes are nice'd, so they should wait in line for CPU access behind any non-nice'd processes you have running. If the processes cause any trouble, please feel free to kill them or let me know and I'll kill them.
For Science, -Aaron
It's useful to have such things in a public repo so people can take a peek at your code if it goes crazy and suggest improvements :)
-- Ori Livneh
On Friday, March 22, 2013 at 9:05 AM, Aaron Halfaker wrote:
Hey folks,
I was running a script to update the revert tables on db1047 with stat1 two days ago that had some bad disk access patterns. (FYI, don't use python shelve as an on-disk cache of a dict().) As soon as I saw the load come up, I killed the script. For any difficulty that occurred in the meantime, I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine: sessions.py - Updating session table on db1047. Useful for measuring editor labor hours. reverts.py - Updating revert tables on db1047. Fixed to not need a disk cache.
Both of these processes are nice'd, so they should wait in line for CPU access behind any non-nice'd processes you have running. If the processes cause any trouble, please feel free to kill them or let me know and I'll kill them.
For Science, -Aaron
E3-team mailing list E3-team@lists.wikimedia.org (mailto:E3-team@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/e3-team
Hey Aaron,
(removing E3 and adding wmfresearch)
can you recap the main use case for the revert tables generated by reverts.py? We've been thinking of moving them to the prod DB but now that we have SHA1 population completed in enwiki AND revert rates implemented in the metrics API I am curious about what you use this for. If we were to make this a permanent table in prod we should definitely have the script in a public repo as a starter.
Dario
On Mar 22, 2013, at 10:42 AM, Ori Livneh ori@wikimedia.org wrote:
It's useful to have such things in a public repo so people can take a peek at your code if it goes crazy and suggest improvements :)
-- Ori Livneh
On Friday, March 22, 2013 at 9:05 AM, Aaron Halfaker wrote:
Hey folks,
I was running a script to update the revert tables on db1047 with stat1 two days ago that had some bad disk access patterns. (FYI, don't use python shelve as an on-disk cache of a dict().) As soon as I saw the load come up, I killed the script. For any difficulty that occurred in the meantime, I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine: sessions.py - Updating session table on db1047. Useful for measuring editor labor hours. reverts.py - Updating revert tables on db1047. Fixed to not need a disk cache.
Both of these processes are nice'd, so they should wait in line for CPU access behind any non-nice'd processes you have running. If the processes cause any trouble, please feel free to kill them or let me know and I'll kill them.
For Science, -Aaron
E3-team mailing list E3-team@lists.wikimedia.org (mailto:E3-team@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/e3-team
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
The reverts table is useful for looking at global reverting patterns over time. Right now, I'm trying to answer questions about the robustness of Wikipedia's vandal fighting system by looking at who picked up the slack when ClueBot went down for a month and what effect this had on the presence of vandalism in the Wiki. I'd also like to review reverts and retention of new users since the E3 & Teahouse work once I get back into WMF gear.
I'd be happy to add this to a shared repo. I'm planning to push them to https://bitbucket.org/halfak/wikimedia-utilities when I'm done with them. I haven't used the git system you guys are working with yet, so I might need some setup if you want me to move it there. In the meantime, I'm just trying to get things done with the few hours I have to devote to this work.
-Aaron
On Fri, Mar 22, 2013 at 10:54 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Hey Aaron,
(removing E3 and adding wmfresearch)
can you recap the main use case for the revert tables generated by reverts.py? We've been thinking of moving them to the prod DB but now that we have SHA1 population completed in enwiki AND revert rates implemented in the metrics API I am curious about what you use this for. If we were to make this a permanent table in prod we should definitely have the script in a public repo as a starter.
Dario
On Mar 22, 2013, at 10:42 AM, Ori Livneh ori@wikimedia.org wrote:
It's useful to have such things in a public repo so people can take a
peek at your code if it goes crazy and suggest improvements :)
-- Ori Livneh
On Friday, March 22, 2013 at 9:05 AM, Aaron Halfaker wrote:
Hey folks,
I was running a script to update the revert tables on db1047 with stat1
two days ago that had some bad disk access patterns. (FYI, don't use python shelve as an on-disk cache of a dict().) As soon as I saw the load come up, I killed the script. For any difficulty that occurred in the meantime, I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine: sessions.py - Updating session table on db1047. Useful for measuring
editor labor hours.
reverts.py - Updating revert tables on db1047. Fixed to not need a disk
cache.
Both of these processes are nice'd, so they should wait in line for CPU
access behind any non-nice'd processes you have running. If the processes cause any trouble, please feel free to kill them or let me know and I'll kill them.
For Science, -Aaron
E3-team mailing list E3-team@lists.wikimedia.org (mailto:E3-team@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/e3-team
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
wmfresearch mailing list wmfresearch@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfresearch
Bump.
I was just reading this and wondering if I might take a look at sessions.py. It sounds like something that would be useful to port to the user metrics code base and considering exposing via the API. I couldn't find the module in your home folder on stat1, would it be cool for me to have a look at the code?
Apologies if I'm rehashing something that's already been discussed.
On Fri, Mar 22, 2013 at 11:09 AM, Aaron Halfaker aaron.halfaker@gmail.comwrote:
The reverts table is useful for looking at global reverting patterns over time. Right now, I'm trying to answer questions about the robustness of Wikipedia's vandal fighting system by looking at who picked up the slack when ClueBot went down for a month and what effect this had on the presence of vandalism in the Wiki. I'd also like to review reverts and retention of new users since the E3 & Teahouse work once I get back into WMF gear.
I'd be happy to add this to a shared repo. I'm planning to push them to https://bitbucket.org/halfak/wikimedia-utilities when I'm done with them. I haven't used the git system you guys are working with yet, so I might need some setup if you want me to move it there. In the meantime, I'm just trying to get things done with the few hours I have to devote to this work.
-Aaron
On Fri, Mar 22, 2013 at 10:54 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Hey Aaron,
(removing E3 and adding wmfresearch)
can you recap the main use case for the revert tables generated by reverts.py? We've been thinking of moving them to the prod DB but now that we have SHA1 population completed in enwiki AND revert rates implemented in the metrics API I am curious about what you use this for. If we were to make this a permanent table in prod we should definitely have the script in a public repo as a starter.
Dario
On Mar 22, 2013, at 10:42 AM, Ori Livneh ori@wikimedia.org wrote:
It's useful to have such things in a public repo so people can take a
peek at your code if it goes crazy and suggest improvements :)
-- Ori Livneh
On Friday, March 22, 2013 at 9:05 AM, Aaron Halfaker wrote:
Hey folks,
I was running a script to update the revert tables on db1047 with
stat1 two days ago that had some bad disk access patterns. (FYI, don't use python shelve as an on-disk cache of a dict().) As soon as I saw the load come up, I killed the script. For any difficulty that occurred in the meantime, I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine: sessions.py - Updating session table on db1047. Useful for measuring
editor labor hours.
reverts.py - Updating revert tables on db1047. Fixed to not need a
disk cache.
Both of these processes are nice'd, so they should wait in line for
CPU access behind any non-nice'd processes you have running. If the processes cause any trouble, please feel free to kill them or let me know and I'll kill them.
For Science, -Aaron
E3-team mailing list E3-team@lists.wikimedia.org (mailto:E3-team@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/e3-team
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
wmfresearch mailing list wmfresearch@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfresearch
wmfresearch mailing list wmfresearch@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfresearch
Not a problem.
See stat1.wmf:/home/halfak/Sandbox/cluebot-down/sessions.py
There's a util.py in that dir that I import. Otherwise, all of the code is contained in the one file. I'll be wrapping a library into wikimedia-utilities eventually, but checking out these files should give you a sneak peek.
-Aaron
On Mon, Apr 1, 2013 at 11:34 AM, Ryan Faulkner rfaulkner@wikimedia.orgwrote:
Bump.
I was just reading this and wondering if I might take a look at sessions.py. It sounds like something that would be useful to port to the user metrics code base and considering exposing via the API. I couldn't find the module in your home folder on stat1, would it be cool for me to have a look at the code?
Apologies if I'm rehashing something that's already been discussed.
On Fri, Mar 22, 2013 at 11:09 AM, Aaron Halfaker <aaron.halfaker@gmail.com
wrote:
The reverts table is useful for looking at global reverting patterns over time. Right now, I'm trying to answer questions about the robustness of Wikipedia's vandal fighting system by looking at who picked up the slack when ClueBot went down for a month and what effect this had on the presence of vandalism in the Wiki. I'd also like to review reverts and retention of new users since the E3 & Teahouse work once I get back into WMF gear.
I'd be happy to add this to a shared repo. I'm planning to push them to https://bitbucket.org/halfak/wikimedia-utilities when I'm done with them. I haven't used the git system you guys are working with yet, so I might need some setup if you want me to move it there. In the meantime, I'm just trying to get things done with the few hours I have to devote to this work.
-Aaron
On Fri, Mar 22, 2013 at 10:54 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Hey Aaron,
(removing E3 and adding wmfresearch)
can you recap the main use case for the revert tables generated by reverts.py? We've been thinking of moving them to the prod DB but now that we have SHA1 population completed in enwiki AND revert rates implemented in the metrics API I am curious about what you use this for. If we were to make this a permanent table in prod we should definitely have the script in a public repo as a starter.
Dario
On Mar 22, 2013, at 10:42 AM, Ori Livneh ori@wikimedia.org wrote:
It's useful to have such things in a public repo so people can take a
peek at your code if it goes crazy and suggest improvements :)
-- Ori Livneh
On Friday, March 22, 2013 at 9:05 AM, Aaron Halfaker wrote:
Hey folks,
I was running a script to update the revert tables on db1047 with
stat1 two days ago that had some bad disk access patterns. (FYI, don't use python shelve as an on-disk cache of a dict().) As soon as I saw the load come up, I killed the script. For any difficulty that occurred in the meantime, I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine: sessions.py - Updating session table on db1047. Useful for measuring
editor labor hours.
reverts.py - Updating revert tables on db1047. Fixed to not need a
disk cache.
Both of these processes are nice'd, so they should wait in line for
CPU access behind any non-nice'd processes you have running. If the processes cause any trouble, please feel free to kill them or let me know and I'll kill them.
For Science, -Aaron
E3-team mailing list E3-team@lists.wikimedia.org (mailto:E3-team@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/e3-team
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
wmfresearch mailing list wmfresearch@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfresearch
wmfresearch mailing list wmfresearch@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfresearch
--
Ryan Faulkner Research Analyst - Editor Engagement Experimentation (e3) Wikimedia Foundation
mobile: (415) 793-5086 office: (415) 839-6885 ext 6726
Wicked, cheers Aaron! :)
On Mon, Apr 1, 2013 at 2:17 PM, Aaron Halfaker aaron.halfaker@gmail.comwrote:
Not a problem.
See stat1.wmf:/home/halfak/Sandbox/cluebot-down/sessions.py
There's a util.py in that dir that I import. Otherwise, all of the code is contained in the one file. I'll be wrapping a library into wikimedia-utilities eventually, but checking out these files should give you a sneak peek.
-Aaron
On Mon, Apr 1, 2013 at 11:34 AM, Ryan Faulkner rfaulkner@wikimedia.orgwrote:
Bump.
I was just reading this and wondering if I might take a look at sessions.py. It sounds like something that would be useful to port to the user metrics code base and considering exposing via the API. I couldn't find the module in your home folder on stat1, would it be cool for me to have a look at the code?
Apologies if I'm rehashing something that's already been discussed.
On Fri, Mar 22, 2013 at 11:09 AM, Aaron Halfaker < aaron.halfaker@gmail.com> wrote:
The reverts table is useful for looking at global reverting patterns over time. Right now, I'm trying to answer questions about the robustness of Wikipedia's vandal fighting system by looking at who picked up the slack when ClueBot went down for a month and what effect this had on the presence of vandalism in the Wiki. I'd also like to review reverts and retention of new users since the E3 & Teahouse work once I get back into WMF gear.
I'd be happy to add this to a shared repo. I'm planning to push them to https://bitbucket.org/halfak/wikimedia-utilities when I'm done with them. I haven't used the git system you guys are working with yet, so I might need some setup if you want me to move it there. In the meantime, I'm just trying to get things done with the few hours I have to devote to this work.
-Aaron
On Fri, Mar 22, 2013 at 10:54 AM, Dario Taraborelli < dtaraborelli@wikimedia.org> wrote:
Hey Aaron,
(removing E3 and adding wmfresearch)
can you recap the main use case for the revert tables generated by reverts.py? We've been thinking of moving them to the prod DB but now that we have SHA1 population completed in enwiki AND revert rates implemented in the metrics API I am curious about what you use this for. If we were to make this a permanent table in prod we should definitely have the script in a public repo as a starter.
Dario
On Mar 22, 2013, at 10:42 AM, Ori Livneh ori@wikimedia.org wrote:
It's useful to have such things in a public repo so people can take a
peek at your code if it goes crazy and suggest improvements :)
-- Ori Livneh
On Friday, March 22, 2013 at 9:05 AM, Aaron Halfaker wrote:
Hey folks,
I was running a script to update the revert tables on db1047 with
stat1 two days ago that had some bad disk access patterns. (FYI, don't use python shelve as an on-disk cache of a dict().) As soon as I saw the load come up, I killed the script. For any difficulty that occurred in the meantime, I'm very sorry. I've since re-written things to behave much better.
I currently have two processes running on the machine: sessions.py - Updating session table on db1047. Useful for measuring
editor labor hours.
reverts.py - Updating revert tables on db1047. Fixed to not need a
disk cache.
Both of these processes are nice'd, so they should wait in line for
CPU access behind any non-nice'd processes you have running. If the processes cause any trouble, please feel free to kill them or let me know and I'll kill them.
For Science, -Aaron
E3-team mailing list E3-team@lists.wikimedia.org (mailto:E3-team@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/e3-team
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
wmfresearch mailing list wmfresearch@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfresearch
wmfresearch mailing list wmfresearch@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wmfresearch
--
Ryan Faulkner Research Analyst - Editor Engagement Experimentation (e3) Wikimedia Foundation
mobile: (415) 793-5086 office: (415) 839-6885 ext 6726