Amazingly, the problematic query never finishes executing. Still trying to
figure out what is going on here. Any help is appreciated.
On Wed, Aug 7, 2019 at 10:21 PM Maximilian Doerr <maximilian.doerr(a)gmail.com>
wrote:
Running an EXPLAIN on all three reveal identical
indices being used. They
should theoretically all run equally fast.
Cyberpower678
English Wikipedia Account Creation Team
English Wikipedia Administrator
Global User Renamer
On Aug 7, 2019, at 22:15, Huji Lee <huji.huji(a)gmail.com> wrote:
I have a question for the MySQL/MariaDB experts.
*Short and sweet*
How is that Query 1 <https://quarry.wmflabs.org/query/38237> runs in
seconds and so does Query 2 <https://quarry.wmflabs.org/query/38243>, but Query
3 <https://quarry.wmflabs.org/query/38244> -- which is essentially the
same thing except it tries to bring columns from both sides of the join --
takes forever to run?
*Details*
I have a bot that uses this script
<https://github.com/PersianWikipedia/fawikibot/blob/master/HujiBot/findproxy.py>
to identify and block IPs associated with open proxies. To be parsimonious
with the blocks, it only blocks the said IP and not its associated range.
However, many proxy IPs belong to a web hosting range and it would be
better to block the entire range. The goal of the query is to find all
active blocks made by my bot, sort them in order of IP address, and emulate
the LEAD() function -- which we still don't have because we have not
upgraded to MariaDB 10.2 on Labs servers -- to make it easy to find cases
where two consecutive IPs start with the same two octets (like 100.24.X.Y
and 100.24.C.D) so that I can manually investigate those in more detail.
The nested SELECT that is repeated twice simply generates a list of all
active blocks by my bot. Query 1 shows all of them (88 rows) and Query 2
shows only the rows in which the LEAD subquery row has a rownumber that is
equal to 1 + that of a row in the original 88-row data. Of note, this also
has 88 rows, though one of the rows is all NULLs because for the last row
of the data we should not find a match in the LEAD subqeury.
Anyhow, Query 3 simply aims to put the actual data and the LEAD subquery
data side by side and that is where things fall apart somehow. I cannot run
an EXPLAIN on this through Quarry, because Quarry does not like SQL
variables :/ and I have no idea how else to diagnose this problem.
Thanks!
Huji
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud
_______________________________________________
Wikimedia Cloud Services mailing list
Cloud(a)lists.wikimedia.org (formerly labs-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/cloud