Why not allow arbitrary SQL queries on most of the database tables?
Let's see, only a few, like the user table, have much confidential information, and even only a few columns of it too.
So api.php could drop its read privileges for (parts of?) that table before running any queries.
Motivation example:
It comes the time when all websites should check for link lint. OK, so I need a list of external links that are present in my wikis.
$ echo "SELECT DISTINCT el_to FROM wiki_externallinks ORDER BY el_to;"| mysql -B my_database gets it for me all with one command.
Can api.php get all the external links, for all namespaces, all in one shot? Can Special:Linksearch get them all either, all in one shot?
The sysop could also customize what tables/columns to restrict, and how many rows to output. Also set the total row output limit too.
No need for only allowing SELECT, as api.php would first drop all other privileges than read-only privileges, including the privilege to GRANT its privileges back to itself... No need to even filter against SQL injection attacks (but as I don't even know how to spell SQL, don't quote me on that.)
Anyway, being able to do arbitrary SQL would greatly simplify many api.php queries. Let's see, for the URL perhaps use: api.php?sql=SELECT+DISTINCT... (maybe use no CAPS in the examples to "sell the ease of the idea".)
Hello,
2009/3/15 jidanni@jidanni.org:
Why not allow arbitrary SQL queries on most of the database tables?
Performance?
Security?
Data formatting?
MinuteElectron.
On Sun, Mar 15, 2009 at 6:29 PM, MinuteElectron minuteelectron@googlemail.com wrote:
Hello,
2009/3/15 jidanni@jidanni.org:
Why not allow arbitrary SQL queries on most of the database tables?
Performance?
Security?
Data formatting?
MinuteElectron.
Mediawiki-api mailing list Mediawiki-api@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-api
Special:AskSQL was removed from core for a reason.
Jidanni: If you want your users to do this on your wiki, you can install the extension.
-Chad
Don't forget database compatibility and abstraction.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://nadir-seen-fire.com] -Nadir-Point & Wiki-Tools (http://nadir-point.com) (http://wiki-tools.com) -MonkeyScript (http://monkeyscript.org) -Animepedia (http://anime.wikia.com) -Narutopedia (http://naruto.wikia.com) -Soul Eater Wiki (http://souleater.wikia.com)
MinuteElectron wrote:
Hello,
2009/3/15 jidanni@jidanni.org:
Why not allow arbitrary SQL queries on most of the database tables?
Performance?
Security?
Data formatting?
MinuteElectron.
jidanni@jidanni.org wrote:
Why not allow arbitrary SQL queries on most of the database tables? Anyway, being able to do arbitrary SQL would greatly simplify many api.php queries. Let's see, for the URL perhaps use: api.php?sql=SELECT+DISTINCT... (maybe use no CAPS in the examples to "sell the ease of the idea".)
Long time ago, there used to be a special page on Wikimedia wikis allowing to do arbitrary SQL requests. It was restricted more and more until being finally disabled because it was able to produce too much load to servers (it also has a security risk).
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/29001
jidanni@jidanni.org schreef:
Why not allow arbitrary SQL queries on most of the database tables?
I think the other replies in this thread have answered that already.
Can api.php get all the external links, for all namespaces, all in one shot? Can Special:Linksearch get them all either, all in one shot?
Yes, they can get them all, but only in chunks of 500 rows (5000 for sysops/bots), as usual.
Roan Kattouw (Catrope)
Gentlemen, the challenge: produce the equivalent of $ php dumpBackup.php --current (or Special:Export of all the pages on the wiki) via the API.
OK, something like api.php?action=query&generator=allpages&gaplimit=max&export&exportnowrap but is there a way to do all the namespaces in one query, or must we do one query for each gapnamespace?
jidanni@jidanni.org schreef:
Gentlemen, the challenge: produce the equivalent of $ php dumpBackup.php --current (or Special:Export of all the pages on the wiki) via the API.
OK, something like api.php?action=query&generator=allpages&gaplimit=max&export&exportnowrap but is there a way to do all the namespaces in one query, or must we do one query for each gapnamespace?
If you want to be sure you get everything, you have to do each gapnamespace separately *and* omit &exportnowrap so you'll get a query-continue (in case there's a namespace with more than 500/5000 pages).
Roan Kattouw (Catrope)
I notice an artificial restriction in the API, the inability to do certain operations across more than one namespace.
One should not only be able to say a query should be for e.g., namespace 1, but also 1,2,3, and even "all".
I can think of several usage cases: * What page name start with the letter K, in all namespaces? * What pages are ....., in all namespaces? * What are the names of the pages, in all namespaces? (so I can use them as a generator for &export.)
Maybe the restriction apnamespace: The namespace to enumerate. You can only enumerate one namespace at a time. was made with Wikipedia in mind. But now with the default limits in place, there is no need force this restriction on smaller sites, no?
but is there a way to do all the namespaces in one query, or must we do one query for each gapnamespace?
RK> If you want to be sure you get everything, you have to do each RK> gapnamespace separately
Hello,
2009/3/19 jidanni@jidanni.org:
I notice an artificial restriction in the API, the inability to do certain operations across more than one namespace.
One should not only be able to say a query should be for e.g., namespace 1, but also 1,2,3, and even "all".
I can think of several usage cases:
- What page name start with the letter K, in all namespaces?
- What pages are ....., in all namespaces?
- What are the names of the pages, in all namespaces? (so I can use
them as a generator for &export.)
Maybe the restriction apnamespace: The namespace to enumerate. You can only enumerate one namespace at a time. was made with Wikipedia in mind. But now with the default limits in place, there is no need force this restriction on smaller sites, no?
but is there a way to do all the namespaces in one query, or must we do one query for each gapnamespace?
RK> If you want to be sure you get everything, you have to do each RK> gapnamespace separately
Do you actually have use for this, or is this a purely hypothetical request?
In any case, if you there are too many pages that you can't reasonably parse the full page list - then it is unlikely that being able to specify multiple namespaces will save you any requests anyway.
Regards, MinuteElectron
MinuteElectron schreef:
Hello,
2009/3/19 jidanni@jidanni.org:
I notice an artificial restriction in the API, the inability to do certain operations across more than one namespace.
One should not only be able to say a query should be for e.g., namespace 1, but also 1,2,3, and even "all".
I can think of several usage cases:
- What page name start with the letter K, in all namespaces?
- What pages are ....., in all namespaces?
- What are the names of the pages, in all namespaces? (so I can use
them as a generator for &export.)
Maybe the restriction apnamespace: The namespace to enumerate. You can only enumerate one namespace at a time. was made with Wikipedia in mind. But now with the default limits in place, there is no need force this restriction on smaller sites, no?
No, this isn't related to limits. Due to the way the database works, certain queries can efficiently sort their results using an index, while other queries can't, either because there's no index for them or because making an index for it is impossible. Combining apnamespace with apprefix results in a query of the third category, which means it'll sort slow no matter how many rows are requested.
This of the database as a phone book, sorted by town first, then by last name. If you wanted to list the first 10 people whose last names start with a "B" in a certain town (sorted by last name of course), that'd be easy (and wouldn't depend on how many Bs there are in that town), while listing all Bs regardless of town would be a lot harder because you'd have to make a list of all Bs in all towns, then sort that and take the first 10.
In any case, if you there are too many pages that you can't reasonably parse the full page list - then it is unlikely that being able to specify multiple namespaces will save you any requests anyway.
Yes, this is true. For namespaces with more than 500/5000 pages, you're not saving any requests.
Roan Kattouw (Catrope)
All I know is there should be a way to do a SELECT on table wiki_page WHERE the page_namespaces are in, or not in, a list we give, and output that in export format. Seems like a very direct path. Sorry I have not yet looked at any API code.
xexport&ns=all ns=1,2,3 xns=4,5 ns=all ns=-1,-2... ¤t_only(default)... ns=1|2|-1 ... I remember there are negative namespaces, so must think twice...
That way the API would then be able to do general exports, better than Special:Export and dumpBackup.php (except for a few of the latter's additional options...)
Then we can have a button on our wikis: "Click here to export a backup of all the pages on this wiki before the cops arrive and confiscate it!
That way you can let all the police radio frequency tables of http://radioscanningtw.jidanni.org/ live on, at least in your own home, while the former maintainer is serving out his sentence for breaking draconian national security laws, if there one day happens to be such laws.
Sure, we exclude user data, because that's confidential."
And the backup can be up to the minute (not just to a link to a .xml file made by a nightly cronjob). And it can be done all via one single API HTTP GET with no need for some homebrew PHP CGI script.
2009/3/20 jidanni@jidanni.org:
All I know is there should be a way to do a SELECT on table wiki_page WHERE the page_namespaces are in, or not in, a list we give
There's no efficient way to write such a SQL query.
Please read again the paragraph Roan wrote you:
This of the database as a phone book, sorted by town first, then by last name. If you wanted to list the first 10 people whose last names start with a "B" in a certain town (sorted by last name of course), that'd be easy (and wouldn't depend on how many Bs there are in that town), while listing all Bs regardless of town would be a lot harder because you'd have to make a list of all Bs in all towns, then sort that and take the first 10.
It specifically answers your question, explaining why it is not possible.
If you want to read more about the technical reasons for this, please head for http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html and http://s.petrunia.net/blog/?p=24 , look for "filesort" keyword.
The API limitations you mention were not put here because some devs thought that it would look nicer, or because it was easier to write code for this at a certain time. These limitations are here, because of the way the underlying data is ordered in the database.
"ND" == Nicolas Dumazet nicdumz@gmail.com writes:
ND> There's no efficient way to write such a SQL query. First I admit I don't even know how to spell "SQL", but working on how to export all or several namespaces with one API query, wouldn't SELECT page_id FROM page; or SELECT page_id FROM page WHERE page_namespace IN (1,2,3); be efficient at the core of generators? Ordering isn't important for our exports. They would be called by something like $ GET api.php?...&xxexport&xxexportns=all #or 1,2,3 > file.sql
jidanni@jidanni.org schreef:
"ND" == Nicolas Dumazet nicdumz@gmail.com writes:
ND> There's no efficient way to write such a SQL query. First I admit I don't even know how to spell "SQL", but working on how to export all or several namespaces with one API query, wouldn't SELECT page_id FROM page; or SELECT page_id FROM page WHERE page_namespace IN (1,2,3); be efficient at the core of generators?
Yes, they'd work, but would be of extremely limited use since adding pretty much any other condition results in an inefficient query. In practice, that would mean that multiple values for apnamespace would disable pretty much all other ap* parameters. I recall explaining earlier on this list that that'd mean too much complex logic for such minimal gain.
Ordering isn't important for our exports. They would be called by something like $ GET api.php?...&xxexport&xxexportns=all #or 1,2,3 > file.sql
Export doesn't generate lists of pages itself, but relies on allpages to do that, so a parameter like exportns doesn't make sense considering the way things currently work.
Roan Kattouw (Catrope)
Hello
To export 100 pages starting at "D" I tried to use the following command :
api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export
this has the same result as
api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100
How can I make the export command work ? i.e. get an Xml file similar to that obtained with Special:Export ?
What kind of name this exported file is expected to have ?
Thanks Francois Colonna
2009/5/10 Colonna Francois colonna@lct.jussieu.fr:
Hello
To export 100 pages starting at "D" I tried to use the following command :
api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export
this has the same result as
api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100
No it doesn't, the latter has a list of the pages *and* the export data below it (scroll down).
How can I make the export command work ? i.e. get an Xml file similar to that obtained with Special:Export ?
If you want to get just the export XML, use &export&exportnowrap .
Roan Kattouw (Catrope)
Le dimanche 10 mai 2009 à 14:40 +0200, Roan Kattouw a écrit :
2009/5/10 Colonna Francois colonna@lct.jussieu.fr:
Hello
To export 100 pages starting at "D" I tried to use the following command :
api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export
this has the same result as
api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100
No it doesn't, the latter has a list of the pages *and* the export data below it (scroll down).
How can I make the export command work ? i.e. get an Xml file similar to that obtained with Special:Export ?
If you want to get just the export XML, use &export&exportnowrap .
Roan Kattouw (Catrope)
I am sorry, but api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100 api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export&exportnowrap
give exactly the same result and there is no file dumped. May be I missed to set some parameter on in LocalSettings or whatever else place ?
Thanks for helping.
2009/5/10 Colonna Francois colonna@lct.jussieu.fr:
I am sorry, but api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100 api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export&exportnowrap
give exactly the same result and there is no file dumped. May be I missed to set some parameter on in LocalSettings or whatever else place ?
You're probably using MediaWiki 1.14 or lower. Exporting pages through the API will only be available from 1.15 (which should be released soonish), and on Wikipedia.
Roan Kattouw (Catrope)
Le dimanche 10 mai 2009 à 16:06 +0200, Roan Kattouw a écrit :
2009/5/10 Colonna Francois colonna@lct.jussieu.fr:
I am sorry, but api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100 api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export api.php?action=query&generator=allpages&gapfrom=D&gaplimit=100&export&exportnowrap
give exactly the same result and there is no file dumped. May be I missed to set some parameter on in LocalSettings or whatever else place ?
You're probably using MediaWiki 1.14 or lower. Exporting pages through the API will only be available from 1.15 (which should be released soonish), and on Wikipedia.
Roan Kattouw (Catrope)
Exact, I am using MediaWiki 1.14. Thank you Francois Colonna
mediawiki-api@lists.wikimedia.org