For use in our monthly report, due to come out tomorrow
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/June
I'd like to know how many unique contributors ("owners") had commits merged into the mediawiki & mediawiki/* Gerrit projects between June 1-30 inclusive. I've had luck in using "age:4d -age:34d status:merged project:^mediawiki.* -owner:L10n-bot" as a search on https://gerrit.wikimedia.org to get a big paginated table of all the commits (and then I figure I'd look for all the unique owner names and count them), but when I try that on the command line as
ssh -p 29418 gerrit.wikimedia.org gerrit query 'age:4d -age:34d status:merged project:^mediawiki.* -owner:L10n-bot'
I get the error "fatal: "-age:34d" is not a valid option".
I'll accept either help in running this query correctly so I get the giant table on the command line so I can gin up the status myself, or I will simply accept a number if you want to do my homework for me. :-)
I'll accept either help in running this query correctly so I get the giant table on the command line so I can gin up the status myself, or I will simply accept a number if you want to do my homework for me. :-)
It's because you've passed in that string (which was good) as one argument to the SSH command, which is then read as multiple arguments on the remote server. This works for me, adding double quotes around the remote command:
ssh -p 29418 gerrit.wikimedia.org "gerrit query 'age:4d -age:34d status:merged project:^mediawiki.* -owner:L10n-bot'"
On 07/04/2012 10:52 PM, Sumana Harihareswara wrote:
For use in our monthly report, due to come out tomorrow
https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/June
I'd like to know how many unique contributors ("owners") had commits merged into the mediawiki & mediawiki/* Gerrit projects between June 1-30 inclusive. I've had luck in using "age:4d -age:34d status:merged project:^mediawiki.* -owner:L10n-bot" as a search on https://gerrit.wikimedia.org to get a big paginated table of all the commits (and then I figure I'd look for all the unique owner names and count them), but when I try that on the command line as
ssh -p 29418 gerrit.wikimedia.org gerrit query 'age:4d -age:34d status:merged project:^mediawiki.* -owner:L10n-bot'
I get the error "fatal: "-age:34d" is not a valid option".
I'll accept either help in running this query correctly so I get the giant table on the command line so I can gin up the status myself, or I will simply accept a number if you want to do my homework for me. :-)
Got help from Mark Holmquist and advice from Giovanni Luca Ciampaglia -- needed to use double quote marks. Mark wrote:
It's because you've passed in that string (which was good) as one argument to the SSH command, which is then read as multiple arguments on the remote server. This works for me, adding double quotes around the remote command:
ssh -p 29418 gerrit.wikimedia.org "gerrit query 'age:4d -age:34d status:merged project:^mediawiki.* -owner:L10n-bot'"
Mark then also wrote:
Hm, the SSH interface only returns 500 at a time, and won't accept limit: keywords to the contrary. I've hacked together some results, but I wouldn't recommend reproducing it by hand. I could write up a script with minimal effort, I think.
It took 3 queries but he got all 1401 results. And with that:
$ grep ' name:' wmf-results.txt | sort -u | wc -l 92
So, 92 unique committers in June. Which is way better than Ohloh has been saying, yay.
So, 92 unique committers in June. Which is way better than Ohloh has been saying, yay.
I'd like to confirm that number, 92 unique contributors in June is absolutely correct. I've also scriptified my method, so now I can do multiple months.
Month | Unique contributors ------------+-------------------- July so far | 60 ------------+-------------------- June | 92 ------------+-------------------- May | 77 ------------+-------------------- April | 67 ------------+-------------------- March | 34 ------------+-------------------- February | 2
A note or two: July is high because we have a lot of regular committers, I suppose. You could confirm that by graphing how many contributors are added by adding on one day at a time. My guess is you'll get a nice steep line at first that tapers out to nearly 0 at the end of 30 days. Also, I'm sure the earlier months have inadequate sample sizes to be relevant, since the extensions had to take some time to transfer over, and apparently February was just for testing.
Of course, the coolest thing is that each month so far has seen at least 10 additional contributors! :)
The script I used to generate it is attached (since it's only 1.1 kb). If you have a sane SSH setup already, you should be able to make it executable and do....
$ ./gerunique "ssh -p 29418 gerrit.wikimedia.org" 0d 30d
....and get the number of contributors for the past 30 days. It will also give you some friendly notifications, though they're largely for debugging.
The first option can be described as "how you would ssh into gerrit if you had to", and it's provided for the convenience of those people (like me) whose local username doesn't match their remote username.
Cheers,
On 07/05/2012 07:40 PM, Mark Holmquist wrote:
So, 92 unique committers in June. Which is way better than Ohloh has been saying, yay.
I'd like to confirm that number, 92 unique contributors in June is absolutely correct. I've also scriptified my method, so now I can do multiple months.
Month | Unique contributors ------------+-------------------- July so far | 60 ------------+-------------------- June | 92 ------------+-------------------- May | 77 ------------+-------------------- April | 67 ------------+-------------------- March | 34 ------------+-------------------- February | 2
Thanks for that, Mark! Yeah, that's also way better than Ohloh thinks https://www.ohloh.net/p/mediawiki. I've gone back and updated the April and May months of the engineering report on mediawiki.org, e.g., https://www.mediawiki.org/wiki/Wikimedia_engineering_report/2012/May .
A note or two: July is high because we have a lot of regular committers, I suppose. You could confirm that by graphing how many contributors are added by adding on one day at a time. My guess is you'll get a nice steep line at first that tapers out to nearly 0 at the end of 30 days. Also, I'm sure the earlier months have inadequate sample sizes to be relevant, since the extensions had to take some time to transfer over, and apparently February was just for testing.
Of course, the coolest thing is that each month so far has seen at least 10 additional contributors! :)
That is indeed a great thing to see! And, based on the monthly reports, I believe the most unique committers we ever had in a month was 100, including people making localisation commits - January 2012. So I infer that we are recovering nicely from the transition cost of the Git move.
(Also, this isn't counting people who contribute to the mobile projects on GitHub, and really the final monthly report stat ought to. I don't quickly see a way to ask "how many unique contributors submitted unique pull requests to a https://github.com/wikimedia/ repo in June?" on GitHub, though, so I'll put that off till next month.)
(Also, this isn't counting people who contribute to the mobile projects on GitHub, and really the final monthly report stat ought to. I don't quickly see a way to ask "how many unique contributors submitted unique pull requests to a https://github.com/wikimedia/ repo in June?" on GitHub, though, so I'll put that off till next month.)
It also doesn't count people who are making operations changes, or Wikimedia site configuration changes, or are packaging debs, etc, etc. It would be awesome to see stats for those as well. I have a feeling that we have more contributors then the record ;).
- Ryan
(Also, this isn't counting people who contribute to the mobile projects on GitHub, and really the final monthly report stat ought to. I don't quickly see a way to ask "how many unique contributors submitted unique pull requests to a https://github.com/wikimedia/ repo in June?" on GitHub, though, so I'll put that off till next month.)
I was bored, so I made you a Python script this time :)
It's attached, it takes a year and month as its arguments, and fetches all the repos at github/wikimedia, then fetches their pull requests, and then finally checks to see which pull requests match the month you specified. Something like
$ ./githubunique 2012 06 # should give "3 unique contributors"
And yes, most months only have very few contributors, but anything we can do to increase the count :)
It also doesn't count people who are making operations changes, or Wikimedia site configuration changes, or are packaging debs, etc, etc. It would be awesome to see stats for those as well. I have a feeling that we have more contributors then the record ;).
This should be as simple as removing the "project:^mediawiki.*" bit from the previous bash script. I'm not sure if there are other bots to exclude in that case, though, so I'll leave it up to someone more versed with the rest of Gerrit (Ryan?)
If there are other github repositories *not* in the wikimedia github account, it shouldn't be hard to add those to the consideration in this script.
P.S., a word to the wise: don't try to parse github's API requests with bash, it's just not worth it.
P.P.S., for those who like unified counts, adding this python script to the end of the previous bash script should be easy enough, so you could get all of the contributors (95!) in one command if you wanted.
wikitech-l@lists.wikimedia.org