Currently, the best way to bulk-process article text is to read from an XML dump. You can adopt the exiting importers to fit your purpose, code is available in PHP, Java and C#, I believe.
Well, I think this means that Stefan's team has to recode a lot. Pulling the titles and texts out of the XML dump is easy but you only get a new dump every 1 or 2 month. On the other hand XML is more robust while the database structure will change with every MediaWiki version - for instance I was not aware of the external text before.
XML dumps should be handled by the Wiki. Not only for the monthly dumps, but for the Special:Export, which also uses the same format. Queries done through it are supposed to be better for the server load as it only needs one query for getting many articles.
Well, you'd also need some kind of guessing about which articles will be queried after this to optimize it. Or you could get the asked article plus the next X pages on the DB that need http query.
Leo, you should also watch on that direction, as it is easier for the programmer to know the total amount of articles to be queried, not having to rely on the getting layer to guess the improvements.
Maybe you could have another parameter on the wikiproxy for the articles i want too, to make the wikiproxy aware of it? The most accurate way would be to have the layer acting asyncronously, so it would get a query and not really do it through http unless a) a parameter 'notwait' is set; b) the query queue is X long; c) it's Y seconds old (a wait timeout). Then it solves all the queries at the same time. However, it makes more difficult the client part, as client programs tend to use a ask-process-ask-process-loop