-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
To me as someone always struggling with sql queries (since I am not an
expert at all) this sound somehow promising. May be there will be a
comprehensive set of queries going to pipes one day. Then I would simply
have to pick up the pipe and connect it further. (I like this flavour
of UNIX philosophy ;)
Greetings
DrTrigon
On 19.07.2012 22:34, Magnus Manske wrote:
On Thu, Jul 19, 2012 at 8:06 PM, Platonides
<platonides(a)gmail.com>
wrote:
I'm not convinced about its utility. What
tools would need
combining? If I just need the results of a SQL query, it may be
easier for me than using this system. Maybe a better interface
would help.
The use case I see more interesting is for taking a tool which
outputs a list of pages and provide for input of another tool.
Some page/user to work with seems to be the most common input.
Maybe we should just standarize the input parameters and let some
tools chain to another, simply using a special format parameter.
For instance I usually use names like: art, lang, project such
as: $_REQUEST += array('art'=>'', 'lang'=>'en',
'project'=>'wikipedia' );
I believe we mean the same thing; maybe I didn't describe the
asset thing very well.
It's not for a "single page run" of some tool; one reason I chose
my CatScan rewrite as demo ist that it can run for a long time
(two-digit number of minutes), and generate a vast list of results
(tens of thousands of pages), depending on the query. The idea is
that (a) you're not "blocking" while waiting for that to finish,
before you can do something else; (b) you can access the results of
the run again, maybe if the subsequent tool fails, or you want to
try a different filter or subset, or a different subsequent tool
altogether; (c) you can define new data sources, maybe a tool where
you just paste in page titles, or another tool that gets the newest
1.000 articles, or 1.000 random articles, or the last 1.000
articles you edited, or /insert crazy idea here/, and all
subsequent tools will just run with it.
And you can chain tools together via a single number; no file path
that the other guy doesn't have access to, no sql query that runs
for a few minutes every time (that is, /if/ your tool can be
reduced to that...), no massive paste orgy, no loss of meta-data
between tools.
I also envision longer chains: Give me all articles that are in
both these two category trees; remove the ones that have images
(except template symbol icons, if possible); remove the ones that
have language links; remove the ones that had an edit less than a
month old; render that as wikitext. There's a subject-specific
"needs work" finder from simple components. UNIX philosophy at its
finest :-)
Magnus Manske wrote:
Right now, a tool is started by this page via
"nohup &"; that
could change to the job submission system, if that's possible
from the web servers, but right now it seems overly complicated
(runtime estimation? memory estimation? sql server access?
whatnot) The web page then returns the reserved output asset
ID, while the actual tool is running; another tool could thus
be "watching" asynchronously, by pulling the status every few
seconds.
Yes, it can be called. I use it in a script for scheduling a
cleanup of the created temporary files.
The relevant code:
> $dt = new DateTime( "now", new DateTimeZone( "UTC" ) ); $tmpdir
> = dirname( __FILE__ ) . "/tmp"; @mkdir( $tmpdir, 0711 ); $shell
> = "mktemp -d --tmpdir=" . escapeshellarg($tmpdir) . "
> catdown.XXXXXXXX";
>
> $tmpdir2 = trim( `$shell` ); // Program the folder destruction
> // Note that qsub is 'slow' to return, so we perform it in the
> background $dt->add( new DateInterval( "PT1H" ) ); exec(
> "SGE_ROOT=/sge/GE qsub -a " . $dt->format("YmdHi.s") . "
-wd "
> . escapeshellarg( $tmpdir ) . " -j y -b y /bin/rm -r " .
> escapeshellarg( $tmpdir2 ) . " 2>&1 &" );
Thanks, that looks interesting. I'll play with it, thou I still
face the problem of estimating resource requirements for a tool by
a generic wrapper. /Shudder/
Cheers, Magnus
_______________________________________________ Toolserver-l
mailing list (Toolserver-l(a)lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting
guidelines for this list:
https://wiki.toolserver.org/view/Mailing_list_etiquette
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla -
http://enigmail.mozdev.org/
iEYEARECAAYFAlALzSAACgkQAXWvBxzBrDD+TgCfWRF59s6oaGaRANJW+NscTix3
Jl8AoOIoaqBPwV/NWw4TeIZhqvj14/Qx
=t1Fk
-----END PGP SIGNATURE-----
_______________________________________________
Toolserver-l mailing list (Toolserver-l(a)lists.wikimedia.org)