On Wed, Feb 28, 2018 at 5:26 PM, Brad Jorsch (Anomie) <bjorsch@wikimedia.org
wrote:
On Wed, Feb 28, 2018 at 8:47 AM, Jaime Crespo jcrespo@wikimedia.org wrote:
Very recently I have been experimenting with recursive Common Table Expressions [2], which are or will be available on the latest versions of MySQL and MariaDB.
Do the other databases MediaWiki tries to support have that feature?
Actually, MySQL/MariaDB is the *last* database to conform to the sql:1999 WITH standard: https://modern-sql.com/feature/with#compatibility Even sqlite suported a limited set of those!
The good news is that, probably because it arrived last, it got a pretty feature-full implementation: https://twitter.com/MarkusWinand/status/852862475699707904
With a single query on can obtain all titles directly or indirectly in a category:
WITH RECURSIVE cte (cl_from, cl_type) AS ( SELECT cl_from, cl_type FROM categorylinks WHERE cl_to = 'Database_management_systems' -- starting category UNION SELECT categorylinks.cl_from, categorylinks.cl_type FROM cte JOIN
page
ON cl_from = page_id JOIN categorylinks ON page_title = cl_to WHERE cte.cl_type = 'subcat' -- subcat addition on each iteration ) SELECT page_title FROM cte JOIN page ON cl_from = page_id WHERE page_namespace = 0 ORDER BY page_title; -- printing only articles in the end , ordered by title
Does that work efficiently on huge categories, or does it wind up fetching millions of rows and filesorting?
Needs more testing-- that query worked for me well enough on my laptop to expose it directly on a webrequest (<0.1 s), but I only imported the categorylinks and page tables, so I was working with memory. Obviously, the more complex the query, and the more results it returns, the less likely it is to be able to be exposed to, e.g. a public API. But at least there are configurable limits on recursivitiy and max execution time.
Honestly, given it is a new feature, I don't expect mediawiki --which at the moment has to support 5.5- to embrace it any time soon. However, I wanted to ask if it was interesting enough to setup some test hosts for mediawiki to "play" with it --e.g. evaluate performance--, and maybe (?) some upgraded mariadb/mysql servers for WMF labsdb (for long-running analytics or gadgets that generates reports).