On 26/10/11 22:24, Andre Koopal wrote:
Hi,
For the WLM in the Netherlands I want to make statistics from how many
monuments somebody made a picture.
In principle the sql for this is simple:
SELECT img_user_text, COUNT(img_user) FROM image
JOIN page ON img_name=page_title
JOIN categorylinks AS clA ON page_id=clA.cl_from
WHERE page_namespace=6
AND page_is_redirect=0
AND clA.cl_to='Images_from_Wiki_Loves_Monuments_2011_in_the_Netherlands'
and img_timestamp > '20110831220000'
and img_timestamp < '20110930220000'
GROUP BY img_user_text
ORDER BY COUNT(img_user_text) DESC;
However, the uploader selected here is the last uploader, not the first. Due
to this, one of the uploaders in this list is 'rotatebot'.
I know the other uploaders are in a different table, but I don't see an easy
way to build the query, certainly because there is not always something
in that table.
Does anybody know a good way to solve this?
You could left join with logging_ts_alternative where 'log_namespace' =
6 AND log_title page_title AND log_action = 'upload'.
That will give you the first uploader, but it will fail for renamed images.
Another approach would be to join revision where rev_id = page_id and
rev_parent_id IS NULL, but I think that will lead to a lot of scanning
and will be very slow. Plus it could fail if an admin split some image.
I recommend you to make an outer loop for obtaining the proper data.