Hi everyone,
I'm new to the list and somewhat-new to Mediawiki.
I am trying set up some wrapper code that automates certain MediaWiki functions, such as page creation or renaming. For example, an external application would attempt to create a new article in MW. It might check to see whether the article exists, and create it if it does not. It might append some content to the article text. It might move the article, or set up a redirect. At the moment, I'm accomplishing this by issuing HTTP sub-requests to MediaWiki and interpreting the response.
MediaWiki does not seem to have a public-facing API, is that right? Are there any F/OSS projects that provide such functionality without requiring screen-scraping? Alternatively, is there any way to interface with the internals of the MW, *without* invoking an entire request, i.e. the stuff that requires the MEDIAWIKI constant? Failing *that*, does documentation exist that explains MW's DB schema so I can begin to write my own?
I'm using the current-stable version, 1.4.9, if that matters.
-mike.
---------------------------------------------------------------- michal migurski- mike@stamen.com 415.558.1610
Michal Migurski wrote:
Hi everyone,
I'm new to the list and somewhat-new to Mediawiki.
I am trying set up some wrapper code that automates certain MediaWiki functions, such as page creation or renaming. For example, an external application would attempt to create a new article in MW. It might check to see whether the article exists, and create it if it does not. It might append some content to the article text. It might move the article, or set up a redirect. At the moment, I'm accomplishing this by issuing HTTP sub-requests to MediaWiki and interpreting the response.
MediaWiki does not seem to have a public-facing API, is that right?
Yes
Are there any F/OSS projects that provide such functionality without requiring screen-scraping?
No, although the best-maintained screen-scraping product is pywikipediabot, it has many of the functions you want.
Alternatively, is there any way to interface with the internals of the MW, *without* invoking an entire request, i.e. the stuff that requires the MEDIAWIKI constant?
No, not really. It'd only save you 20ms and you'd end up rewriting half the stuff that's inside the MEDIAWIKI sections.
Failing *that*, does documentation exist that explains MW's DB schema so I can begin to write my own?
Oh, so you want to rewrite it in some other language? Have fun, the documentation is in tables.sql.
I'm using the current-stable version, 1.4.9, if that matters.
You should probably upgrade to 1.5, 1.4.x will be obsolete soon.
-- Tim Starling
On Wed, Sep 14, 2005 at 02:16:15PM +1000, Tim Starling wrote:
Are there any F/OSS projects that provide such functionality without requiring screen-scraping?
No, although the best-maintained screen-scraping product is pywikipediabot, it has many of the functions you want.
If you feel like operating a bot, you may also want to try tawbot's code. https://taw.pl.eu.org/svn/src/tawbot/
It has a lot of nifty features that pywikipediabot lacks, like mass banning users (used for open proxy banning), or reverting a change somewhere deep in the history (simple merge3), and it's written in a saner language (Perl instead of Python), so it should be much easier to adapt to your needs.
Are there any F/OSS projects that provide such functionality without requiring screen-scraping?
No, although the best-maintained screen-scraping product is pywikipediabot, it has many of the functions you want.
I'll look into it... though Python isn't really an option for now. I'm using PHP.
Failing *that*, does documentation exist that explains MW's DB schema so I can begin to write my own?
Oh, so you want to rewrite it in some other language? Have fun, the documentation is in tables.sql.
Thanks.
Not so much another language, but I am looking for a way to access the information in a slightly more "encapsulated" way. From my research so far, it seems that MW makes heavy use of many globals & singletons ($wgUser, $wgArticle, etc.), so keeping track of each function's side effects is ... challenging. :)
Trying not to reinvent too much of the wheel here.
I'm using the current-stable version, 1.4.9, if that matters.
You should probably upgrade to 1.5, 1.4.x will be obsolete soon.
Will do.
---------------------------------------------------------------- michal migurski- mike@stamen.com 415.558.1610
2005/9/14, Michal Migurski mike@stamen.com:
Are there any F/OSS projects that provide such functionality without requiring screen-scraping?
No, although the best-maintained screen-scraping product is pywikipediabot, it has many of the functions you want.
I'll look into it... though Python isn't really an option for now. I'm using PHP.
I see I'm not the only crazy guy who uses PHP for a bot. Good to know that.
Anyway, try the Advanced HTTP Client (http://www.phpclasses.org/browse/package/576.html). It's easy to base off your code on that. I made a simple bot in it which took 3 or 4kb, but I lost it during the recent server crash - I'm rebuilding it in my free time, adding some more features - eventually you could call it an API. If you'll be interested, email me (datrio@gmail.com) in some time - I'll give you a link to its source code.
Dariusz Siedlecki wrote:
2005/9/14, Michal Migurski mike@stamen.com:
Are there any F/OSS projects that provide such functionality without requiring screen-scraping?
No, although the best-maintained screen-scraping product is pywikipediabot, it has many of the functions you want.
I'll look into it... though Python isn't really an option for now. I'm using PHP.
I see I'm not the only crazy guy who uses PHP for a bot. Good to know that.
I have the feeling that Michal is not actually trying to write a bot. Looking at his original posting, my interpretation suggests that he has already written a bot but is unhappy with it because he has direct access to the DB and there should therefore be better/cleaner ways of achieving the effects he wants.
Michal, if I'm right, then I guess the best way would be to try to use the functions in the MediaWiki code. Yes, some of it uses globals, but there are scripts in the 'maintenance' directory that use the MediaWiki code to accomplish things outside an HTTP request. You might want to have a look at those.
Timwi
Are there any F/OSS projects that provide such functionality without requiring screen-scraping?
No, although the best-maintained screen-scraping product is pywikipediabot, it has many of the functions you want.
I'll look into it... though Python isn't really an option for now. I'm using PHP.
I see I'm not the only crazy guy who uses PHP for a bot. Good to know that.
I have the feeling that Michal is not actually trying to write a bot. Looking at his original posting, my interpretation suggests that he has already written a bot but is unhappy with it because he has direct access to the DB and there should therefore be better/ cleaner ways of achieving the effects he wants.
That's right, yes - I'm trying to treat MW as a library for use in a larger project. The wiki is intended to act as an annotation service for other resources. Pages are created on the fly when users want to make shared notes about a resource.
Michal, if I'm right, then I guess the best way would be to try to use the functions in the MediaWiki code. Yes, some of it uses globals, but there are scripts in the 'maintenance' directory that use the MediaWiki code to accomplish things outside an HTTP request. You might want to have a look at those.
Good pointer, I'll do this. I realize I'm asking RTFM-type questions here - mainly I'm looking for advice on which FM to R. Dariusz, I'll also have a look at your PHP bot.
-mike.
---------------------------------------------------------------- michal migurski- mike@stamen.com 415.558.1610
Michal Migurski wrote:
I realize I'm asking RTFM-type questions here - mainly I'm looking for advice on which FM to R.
There aren't too many Ms to R, F or otherwise. :)
Insofar as there is documentation, it's probably mostly going to be the comments in the source code. (There are phpdoc comments on most method headers, though not all complete.)
The internal API has changed significantly over time and remains unstable, but you can look at the existing source code to see how things are currently done.
(Some of the maintenance scripts however are out of date or unmaintained because we don't use them much ourselves.)
-- brion vibber (brion @ pobox.com)
I realize I'm asking RTFM-type questions here - mainly I'm looking for advice on which FM to R.
There aren't too many Ms to R, F or otherwise. :)
Insofar as there is documentation, it's probably mostly going to be the comments in the source code. (There are phpdoc comments on most method headers, though not all complete.)
I've seen these - having just recently discovered phpDoc, they are very much appreciated.
The internal API has changed significantly over time and remains unstable, but you can look at the existing source code to see how things are currently done.
(Some of the maintenance scripts however are out of date or unmaintained because we don't use them much ourselves.)
This looks extremely promising. It's helpful to see semi-official scripts that dig into the code, to understand how its authors view the internals.
Off to dig, -mike.
---------------------------------------------------------------- michal migurski- mike@stamen.com 415.558.1610
Michal Migurski wrote:
MediaWiki does not seem to have a public-facing API, is that right? Are there any F/OSS projects that provide such functionality without requiring screen-scraping? Alternatively, is there any way to interface with the internals of the MW, *without* invoking an entire request, i.e. the stuff that requires the MEDIAWIKI constant? Failing *that*, does documentation exist that explains MW's DB schema so I can begin to write my own?
There is a Perl client library:
http://search.cpan.org/~markj/WWW-Mediawiki-Client-0.27/
Mark wrote:
There is a Perl client library:
Wonderful, thanks for the link :)
wikitech-l@lists.wikimedia.org