First of all i would apologize if this question is not related to this community.
I am trying to run the following query on the Wikipedia data and this query is running for the last 36 hours now. Can anyone please tell me is this normal time for such queries on Wikipedia data?
Another questions is: Do we have some service where we can query offline wikipedia data and get our desired results?
select rev_page, rev_user, rev_user_text, count(rev_user) from revision where rev_user IN ( select r1.rev_user from revision r1, revision r2 where r1.rev_user <> r2.rev_user and r1.rev_user <> 0 and r2.rev_user <> 0 and r1.rev_page = r2.rev_page group by r1.rev_user ) group by rev_user order by rev_page, rev_user;
On 10/29/07, Fawad Nazir fawad.nazir@gmail.com wrote:
I am trying to run the following query on the Wikipedia data and this query is running for the last 36 hours now. Can anyone please tell me is this normal time for such queries on Wikipedia data? . . . select rev_page, rev_user, rev_user_text, count(rev_user) from revision where rev_user IN ( select r1.rev_user from revision r1, revision r2 where r1.rev_user <> r2.rev_user and r1.rev_user <> 0 and r2.rev_user <> 0 and r1.rev_page = r2.rev_page group by r1.rev_user ) group by rev_user order by rev_page, rev_user;
Your outer query is attempting to scan the entire revision table, and do a filesort on it, and that's even without considering the subquery. You need to write more efficient queries if you want to expect them not to take days.
Another questions is: Do we have some service where we can query offline wikipedia data and get our desired results?
No, because perfectly well-intentioned queries like the one you posted would slow it down so much as to make it useless. Such a service did exist, long ago, but was stopped once the databases got to some reasonable size.
Simetrical:
On 10/29/07, Fawad Nazir fawad.nazir@gmail.com wrote:
Another questions is: Do we have some service where we can query offline wikipedia data and get our desired results?
No, because perfectly well-intentioned queries like the one you posted would slow it down so much as to make it useless. Such a service did exist, long ago, but was stopped once the databases got to some reasonable size.
but there is http://wiki.ts.wikimedia.org/view/Query_service.
- river.
wikitech-l@lists.wikimedia.org