Occasionally, "toolserver people" (both programmers and users) talk about joining up tools. Wouldn't it be great if we could use one or several toolserver tools, and "mash-up" their output to create something new and useful? And wouldn't it be even better if the users could do this directly, across tools, without programmers hard-connecting tools?
Some tools already support machine-readable output, and some tools already use others to perform a specific function. But these are hardcoded, out formats are often crude (tabbed text, not that there's anything wrong with that in principle), runtimes add up, and so on.
So, I went ahead and, as a first step towards a pipeline setup called wpipe ("w" for "wiki", as you no doubt have guessed), implemented an asset tracker. Here, an asset is a dataset, ideally a JSON file (I also created a simple structure to hold lists of wiki pages, along with arbitrary metadata). Each asset is tracked in a database, accessible via a unique numeric identifier, and its data is stored in a file. Assets can be created and queried via toolserver command line, as well as via a web interface.
The usual steps in asset creation involve: 1. reserve (gets a new, unique ID) 2. start (set a flag that the asset data creation has begun) 3. done (store data, either by creating a file directly, or passing the data to be stored) 4. fail (if there was an error during data creation)
Some points: * All assets and associated data are public. * Creation and last-access time (write or read the actual data) are tracked, so unused assets can be removed to conserve storage. * Currently, data creation is limited to toolserver IP adresses (yes, I know, it can be gamed. Like the rest of the wiki world.) * The suggested JSON format should be flexible enough for most tools dealing with lists of wiki pages, but any text-based format will work for specialist uses. * The asset system can be used by command-line tools and web tools alike. * Existing tools should be simple to adapt; if a tool takes a list of page names, and language/project information, then using asset IDs as an alternative source should be straightforward. * A pipeline of tools could be started asynchronously, and their progress could be tracked via JavaScript; once a tool has finished, the next one in the chain could be run, all from the user's browser.
The main web API, and documentation page, is here:
http://toolserver.org/~magnus/wpipe/
That page also links to a generic "asset browser" : http://toolserver.org/~magnus/wpipe/asset_info.php
As an appetizer, and feasibility demo, I adapted my own "CatScan rewrite" to use assets as an optional output. This can be done by copying the normal "CatScan rewrite" URL parameters and pasting them here:
http://toolserver.org/~magnus/wpipe/toolwrap.php
This is intended as a generic starter page for command-line-enabled tools. At the moment, only "CatScan rewrite" is available, but I plan to add others. If you have tools you would like to add, I'll be happy to help you set that up. Right now, a tool is started by this page via "nohup &"; that could change to the job submission system, if that's possible from the web servers, but right now it seems overly complicated (runtime estimation? memory estimation? sql server access? whatnot) The web page then returns the reserved output asset ID, while the actual tool is running; another tool could thus be "watching" asynchronously, by pulling the status every few seconds.
Of course, this whole shebang doesn't make sense unless others are willing to join in, with work on this core, or at least by enabling some of their tools; so please, if you are even slightly interested in a generic data exchange mechanism between tools, potentially leading to a pipeline-able ecosystem, by all means step forward!
Cheers, Magnus