Hi,
For the WLM in the Netherlands I want to make statistics from how many monuments somebody made a picture.
In principle the sql for this is simple:
SELECT img_user_text, COUNT(img_user) FROM image JOIN page ON img_name=page_title JOIN categorylinks AS clA ON page_id=clA.cl_from WHERE page_namespace=6 AND page_is_redirect=0 AND clA.cl_to='Images_from_Wiki_Loves_Monuments_2011_in_the_Netherlands' and img_timestamp > '20110831220000' and img_timestamp < '20110930220000' GROUP BY img_user_text ORDER BY COUNT(img_user_text) DESC;
However, the uploader selected here is the last uploader, not the first. Due to this, one of the uploaders in this list is 'rotatebot'.
I know the other uploaders are in a different table, but I don't see an easy way to build the query, certainly because there is not always something in that table.
Does anybody know a good way to solve this?
Part of the reason I was looking for this is because of our quantity price, but fortunately the number of pictures of rotatebot is low enough not to have effect on the top 3.
Regards,
Andre
I might be giving a dumb answer (apologies for that) but this don't solve your problem?
http://www.wikilovesmonuments.org.pt/tools/stats/api-bridge.php?ctscope=user... _____ *Béria Lima* Wikimedia Portugal http://wikimedia.pt (351) 963 953 042
*Imagine um mundo onde é dada a qualquer pessoa a possibilidade de ter livre acesso ao somatório de todo o conhecimento humano. É isso o que estamos a fazer.*
On 26 October 2011 21:24, Andre Koopal andre@molens.org wrote:
Hi,
For the WLM in the Netherlands I want to make statistics from how many monuments somebody made a picture.
In principle the sql for this is simple:
SELECT img_user_text, COUNT(img_user) FROM image JOIN page ON img_name=page_title JOIN categorylinks AS clA ON page_id=clA.cl_from WHERE page_namespace=6 AND page_is_redirect=0 AND clA.cl_to='Images_from_Wiki_Loves_Monuments_2011_in_the_Netherlands' and img_timestamp > '20110831220000' and img_timestamp < '20110930220000' GROUP BY img_user_text ORDER BY COUNT(img_user_text) DESC;
However, the uploader selected here is the last uploader, not the first. Due to this, one of the uploaders in this list is 'rotatebot'.
I know the other uploaders are in a different table, but I don't see an easy way to build the query, certainly because there is not always something in that table.
Does anybody know a good way to solve this?
Part of the reason I was looking for this is because of our quantity price, but fortunately the number of pictures of rotatebot is low enough not to have effect on the top 3.
Regards,
Andre
Wiki Loves Monuments mailing list WikiLovesMonuments@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikilovesmonuments http://www.wikilovesmonuments.eu
Beria, Andre,
It does, but not exactly that URL. Try this one:
http://toolserver.org/~ntavares/patrimonio/api/api.php?action=statisticsct&a...
The column n_wlm_ids is the value Andre is seeking for.
Andre, the limitation you ran across was (one of) why ranking.php shouldn't be used.. I found 2 solutions:
1) going through revisions to extract the user of the *first* revision in an image - this is very very heavy, it was implemented and optimized by means of TEMPORARY tables, I still might have the code somewhere;
2) In the end I came up with the following: keep the search base limited and simple as possible (which lets me optimize it my way, instead of relying in inefficient 'commons' VIEWs), which is the statisticsct table. The, the api module 'statistictsct&ctscope=user' will just have to do the second query.
Note that, in the second query, the JOIN is just to get a traversal "user participation" (that is to say, "in all countries").
$sql = 'SELECT /* SLOW_OK */ img_name , rev_user AS img_user_id , rev_user_text AS img_user_name , page_id AS img_page_id , "'.$country.'" AS img_wlm_country , TRIM(SUBSTR(c2.cl_sortkey, 1, INSTR(c2.cl_sortkey, "\n")-1)) AS img_wlm_id , rev_timestamp AS img_timestamp , IF (user_registration IS NULL,20051222000000,user_registration) AS user_first_rev /* there is no user_registration for users older than this */ , IF (c3.cl_to IS NULL,0,1) AS img_is_valid FROM image JOIN page ON page_namespace = 6 AND page_title = img_name JOIN categorylinks c1 ON page_id = c1.cl_from AND c1.cl_to = "Images_from_Wiki_Loves_Monuments_2011_in_'.$country.'" LEFT JOIN categorylinks c2 ON c1.cl_from = c2.cl_from AND c2.cl_to = "'.$category.'" /* not all have */ JOIN revision ON rev_page = page_id AND rev_parent_id = 0 JOIN user ON user_id = rev_user LEFT JOIN categorylinks c3 ON c3.cl_from = page_id and c3.cl_to = "Reviewed_images_from_Wiki_Loves_Monuments_2011_in_'.$country.'" ';
$sql = 'SELECT st1.img_user_name AS user , COUNT(DISTINCT st1.img_name) AS n_images , SUM(img_is_valid) AS n_images_accepted , GROUP_CONCAT(DISTINCT st1.img_name SEPARATOR "[,]") AS images , COUNT(DISTINCT st1.img_wlm_id) AS n_wlm_ids , GROUP_CONCAT(DISTINCT st1.img_wlm_id SEPARATOR "[,]") AS wlm_ids , st2.n_countries AS n_countries , st2.countries AS countries , st2.n_images_all AS n_images_all , st2.images_all AS images_all , st2.n_wlm_ids_all AS n_wlm_ids_all , st2.wlm_ids_all AS wlm_ids_all , st2.n_images_accepted_all , IF(st1.user_first_rev<20110831230000,0,1) AS user_is_new FROM statisticsct2 st1 JOIN ( SELECT img_user_name , COUNT(DISTINCT img_wlm_country) AS n_countries , GROUP_CONCAT(DISTINCT img_wlm_country SEPARATOR "[,]") AS countries , COUNT(DISTINCT img_name) AS n_images_all , GROUP_CONCAT(DISTINCT img_name SEPARATOR "[,]") AS images_all , COUNT(DISTINCT img_wlm_id) AS n_wlm_ids_all , GROUP_CONCAT(DISTINCT img_wlm_id SEPARATOR "[,]") AS wlm_ids_all , SUM(img_is_valid) AS n_images_accepted_all FROM statisticsct2 GROUP BY img_user_name) st2 ON st1.img_user_name = st2.img_user_name WHERE st1.img_wlm_country IN ( '.$countries.' ) GROUP BY st1.img_user_name';
On 26/10/11 22:24, Andre Koopal wrote:
Hi,
For the WLM in the Netherlands I want to make statistics from how many monuments somebody made a picture.
In principle the sql for this is simple:
SELECT img_user_text, COUNT(img_user) FROM image JOIN page ON img_name=page_title JOIN categorylinks AS clA ON page_id=clA.cl_from WHERE page_namespace=6 AND page_is_redirect=0 AND clA.cl_to='Images_from_Wiki_Loves_Monuments_2011_in_the_Netherlands' and img_timestamp > '20110831220000' and img_timestamp < '20110930220000' GROUP BY img_user_text ORDER BY COUNT(img_user_text) DESC;
However, the uploader selected here is the last uploader, not the first. Due to this, one of the uploaders in this list is 'rotatebot'.
I know the other uploaders are in a different table, but I don't see an easy way to build the query, certainly because there is not always something in that table.
Does anybody know a good way to solve this?
You could left join with logging_ts_alternative where 'log_namespace' = 6 AND log_title page_title AND log_action = 'upload'.
That will give you the first uploader, but it will fail for renamed images.
Another approach would be to join revision where rev_id = page_id and rev_parent_id IS NULL, but I think that will lead to a lot of scanning and will be very slow. Plus it could fail if an admin split some image.
I recommend you to make an outer loop for obtaining the proper data.
I made you a list of all authors of files in Images_from_Wiki_Loves_Monuments_2011_in_the_Netherlands category. Get it from /home/platonides/public_html/wlm/uploaders-nl-non-unique.txt
Just count the times a name appears. You probably want an incantation like this: sort uploaders-nl-non-unique.txt | uniq -c | sort -nr
On Wed, Oct 26, 2011 at 11:42:07PM +0200, Platonides wrote:
I made you a list of all authors of files in Images_from_Wiki_Loves_Monuments_2011_in_the_Netherlands category. Get it from /home/platonides/public_html/wlm/uploaders-nl-non-unique.txt
Just count the times a name appears. You probably want an incantation like this: sort uploaders-nl-non-unique.txt | uniq -c | sort -nr
Hi Platonius,
I realised I used the wrong query this morning. I just corrected myself to the dutch jurymembers :-(
The query I grabbed counted just the number of pictures, while I needed the number of monuments (unique id's).
Thanks for your help, but this give me the number of images again. In the mean while I learned Nuno has already written what I need. However, will still study your hints and those of Nuno to see how it can be fixed. It seems you need to build logic outside the sql for a solution.
Regards,
Andre
Andre Koopal wrote:
The query I grabbed counted just the number of pictures, while I needed the number of monuments (unique id's).
Are they stored somewhere in the db? Otherwise, which template stores it?
I did the same kind of queries for Spain, but I had a table with images, authors, ids... so it resulted in sane queries.
On Thu, Oct 27, 2011 at 12:02:27AM +0200, Platonides wrote:
Andre Koopal wrote:
The query I grabbed counted just the number of pictures, while I needed the number of monuments (unique id's).
Are they stored somewhere in the db? Otherwise, which template stores it?
I did the same kind of queries for Spain, but I had a table with images, authors, ids... so it resulted in sane queries.
The id is set as the sortkey of hidden category, so you can do:
SELECT img_user_text, COUNT(DISTINCT(clB.cl_sortkey)) FROM image JOIN page ON img_name=page_title JOIN categorylinks AS clA ON page_id=clA.cl_from JOIN categorylinks AS clB ON page_id=clB.cl_from WHERE page_namespace=6 AND page_is_redirect=0 AND clA.cl_to='Images_from_Wiki_Loves_Monuments' AND clB.cl_to='Rijksmonumenten_with_known_IDs' and img_timestamp > '20100901020000' and img_timestamp < '20101001020000' GROUP BY img_user_text ORDER BY COUNT(DISTINCT(clB.cl_sortkey)) DESC;
Again, this is with the last uploader, not the first, and this is the query for last year, so using the old categories, but just for the idea.
Regards,
Andre
Platonides, the problem is deeper than that (lang is not compiled in the statistics, I've noticed that requirement too late), I don't intend to fix it in the near future.... so I guess your "ugly hack" is actually very pretty :-)
Damn, I just noticed my branch is so old by now,....
On Thu, 27 Oct 2011 00:02:27 +0200, Platonides platonides@gmail.com wrote:
Andre Koopal wrote:
The query I grabbed counted just the number of pictures, while I needed the number of monuments (unique id's).
Are they stored somewhere in the db? Otherwise, which template stores
it?
I did the same kind of queries for Spain, but I had a table with images, authors, ids... so it resulted in sane queries.
If you want to have an idea, this one should help
http://nl.wikipedia.org/wiki/Wikipedia:Wikiproject/Erfgoed/Nederlandse_Erfgo...
just comparison of the history (the page was updated daily). But it is too approximate to calculate an exact number, for instance, some of the monuments uploaded during WLM were only added to the tables later (and as of yesterday, a dozen were not yet added), and some were uploaded before WLM but discovered and added to the tables during September 2011.
Cheers Yaroslav
Andre Koopal wrote:
The id is set as the sortkey of hidden category, so you can do:
(...)
Again, this is with the last uploader, not the first, and this is the query for last year, so using the old categories, but just for the idea.
Regards,
Andre
Ok, I prepared a new file (uploaders-nl-ids.txt) which I think may contain what you want. There you have uploader and id separated by # (with the uppercased name in the next line, seems you defined the sortkey that way)
The incantation would be like: grep '#' uploaders-nl-ids.txt | sort | uniq | cut -f 1 -d'#' | uniq -c | sort -nr
wikilovesmonuments@lists.wikimedia.org