On Mon, 2012-12-03 at 23:46 +0100, Platonides wrote:
El 03/12/12 22:45, Jesus M. Gonzalez-Barahona escribió:
On Mon, 2012-12-03 at 21:04 +0100, Platonides wrote:
El 03/12/12 19:40, Federico Leva (Nemo) escribió:
That data is hardly useful, it doesn't explain what it refers to
I guess I missed your message, Federico.
He forgot to keep you in CC, so it was sent only to the mailing list.
Thanks. I happen to be a subscriber to the list, but I automatically archive it, so I didn't notice the message. I already saw it.
I agree a glossary of each term would be useful. It took me a while to realise that committers/closers/senders where the terms used for users of git/bugzilla/mailing list.
Well, in fact commiters are committers to the git repository (you also have authors, see below), closers are specifically ticket closers (you also have people opening or changing tickets) and senders are indeed senders of mailing list messages. We'll work to make this much more clear. Again, thanks for the suggestion.
They should track authors instead of committers, though (preferably skipping merge commits)
We do both. In the summary (main) page you have authors in the summary (orange) chart, since authors seemed more meaningful than committers. Same for the "blue" chart in that page. In the source code page you have committers (orange chart) and both authors and committers (blue charts), for a more detailed comparison.
I was specifically thinking in the table of Top committers. Also, the summary page has an authors graph but http://bitergia.com/public/previews/2012_11_mediawiki/scm.html has a committers one.
When the committer is different than the author there are usually two options:
- It was a merge and the committer is 'gerrit'.
- The patchset was (slightly) changed by the committer from the original
by the author.
There's also a less common one of committing a patch from a different source, such as a bugzilla patch.
Thanks for the info.
Number of commits by gerrit are meaningless, and committers with little changes inflate some numbers but are not too useful. Number of comments / approvals in gerrit would be more appropiate than that.
Equally, the author field of merges should IMHO be ignore since that's not a commit which really touches the code (could be measured in a different statistic), so many commits produce two entries.
You're right, thanks. However, we intended to measure "raw commits", as found in the git repo. One of the filters after this "first pass" filters out those commits. You're right that in this case at least, those numbers would be more appropriate.
Seems that Jesús did a fine job. It could be polished quite more with some local knowledge, merging users, hiding bots, etc.
Thanks a lot. We usually go, after this first stage, with that identification of bots, unification of identities, identification of large commits, classification of different kinds of tickets, etc. In this case, we were mainly testing the automated (first) stage: the second one, as you mention, usually needs some detailed knowledge about the project, and some manual intervention.
Sure. I wasn't intending to put pressure on you.
None taken ;-)
A few quirks I noticed:
- noreply@sourceforge.net is abusing its second place as sender (2525).
- I bet the two brion are the same, with different emails
(4561+1285=5846, wow!)
We will look into that.
l10n-bot is indeed a bot. On svn localisation commits weren't done with a specific account, but you can look for commit messages like «Localisation updates for core and extension messages from translatewiki.net»
ok
Commits migrated from svn will have emails of username@users.mediawiki.org All commits done on git use a different one. Moreover, some people have used different mails (see other threads on the mailing list about this in ohloh).
We noticed this. But it is a bit risky to assume that xxx@users.mediawiki.org is the same as xxx@otherdomain: probably in most cases, the merge is perfect, but in some it could be wrong. We preferred to provide the raw numbers since we didn't have resources to do the manual checking needed. But we can try to produce accumulated stats assuming that match: probably it would be accurate enough.
[...]
Saludos,
Jesus.