On Thu, Jul 19, 2012 at 8:06 PM, Platonides platonides@gmail.com wrote:
I'm not convinced about its utility. What tools would need combining? If I just need the results of a SQL query, it may be easier for me than using this system. Maybe a better interface would help.
The use case I see more interesting is for taking a tool which outputs a list of pages and provide for input of another tool. Some page/user to work with seems to be the most common input. Maybe we should just standarize the input parameters and let some tools chain to another, simply using a special format parameter.
For instance I usually use names like: art, lang, project such as: $_REQUEST += array('art'=>'', 'lang'=>'en', 'project'=>'wikipedia' );
I believe we mean the same thing; maybe I didn't describe the asset thing very well.
It's not for a "single page run" of some tool; one reason I chose my CatScan rewrite as demo ist that it can run for a long time (two-digit number of minutes), and generate a vast list of results (tens of thousands of pages), depending on the query. The idea is that (a) you're not "blocking" while waiting for that to finish, before you can do something else; (b) you can access the results of the run again, maybe if the subsequent tool fails, or you want to try a different filter or subset, or a different subsequent tool altogether; (c) you can define new data sources, maybe a tool where you just paste in page titles, or another tool that gets the newest 1.000 articles, or 1.000 random articles, or the last 1.000 articles you edited, or /insert crazy idea here/, and all subsequent tools will just run with it.
And you can chain tools together via a single number; no file path that the other guy doesn't have access to, no sql query that runs for a few minutes every time (that is, /if/ your tool can be reduced to that...), no massive paste orgy, no loss of meta-data between tools.
I also envision longer chains: Give me all articles that are in both these two category trees; remove the ones that have images (except template symbol icons, if possible); remove the ones that have language links; remove the ones that had an edit less than a month old; render that as wikitext. There's a subject-specific "needs work" finder from simple components. UNIX philosophy at its finest :-)
Magnus Manske wrote:
Right now, a tool is started by this page via "nohup &"; that could change to the job submission system, if that's possible from the web servers, but right now it seems overly complicated (runtime estimation? memory estimation? sql server access? whatnot) The web page then returns the reserved output asset ID, while the actual tool is running; another tool could thus be "watching" asynchronously, by pulling the status every few seconds.
Yes, it can be called. I use it in a script for scheduling a cleanup of the created temporary files.
The relevant code:
$dt = new DateTime( "now", new DateTimeZone( "UTC" ) ); $tmpdir = dirname( __FILE__ ) . "/tmp"; @mkdir( $tmpdir, 0711 ); $shell = "mktemp -d --tmpdir=" . escapeshellarg($tmpdir) . " catdown.XXXXXXXX";
$tmpdir2 = trim( `$shell` ); // Program the folder destruction // Note that qsub is 'slow' to return, so we perform it in the background $dt->add( new DateInterval( "PT1H" ) ); exec( "SGE_ROOT=/sge/GE qsub -a " . $dt->format("YmdHi.s") . " -wd " . escapeshellarg( $tmpdir ) . " -j y -b y /bin/rm -r " . escapeshellarg( $tmpdir2 ) . " 2>&1 &" );
Thanks, that looks interesting. I'll play with it, thou I still face the problem of estimating resource requirements for a tool by a generic wrapper. /Shudder/
Cheers, Magnus