cc-ing leila as we were experimenting with these some weeks back in SF, I think they can be killed w/o problems. I did not know they were still running, we run a faster version of those queries and got the data we were interested in a while back.On Mon, Nov 10, 2014 at 1:55 AM, Sean Pringle <springle@wikimedia.org> wrote:_______________________________________________Can we kill these? Can we optimize and/or redesign the jobs? Happy to help...Three identical queries from the 'research_prod' user have just passed one month execution time on s1-anlytics-slave:
select count(*)
from staging.ourvision r
where exists (
select *
from staging.ourvision r1
inner join
staging.ourvision r2
on r2.sha1 = r1.sha1
where r1.page_id = r.page_id
and r2.page_id = r.page_id
and DATE_ADD(r.timestamp, INTERVAL 1 HOUR)
and r2.timestamp between r.timestamp and DATE_SUB(r.timestamp , INTERVAL 1 HOUR)
and r1.sha1!= r.sha1
);I havn't checked to see if the queries are just that amazingly slow, or if they're part of a larger ongoing transaction. In any case, three month-long transactions is pushing the resource limits of the slave and will soon result in either mass replication lag or some other interesting lockup that may in turn take days to rollback :-)
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics