[Mediawiki-l] Mass Import

John Blumel johnblumel at earthlink.net
Wed Apr 13 20:57:17 UTC 2005


On Apr 13, 2005, at 4:23pm, Wolfe, Jeff wrote:

> I'm seeking a way to mass import lots of data into a MediaWiki.  I can
> massage my data in most reasonable ways and have direct access to the
> database.  I can use existing PHP, generate fake URLS, or hit the SQL
> database directly.  Does anyone have a suggestion?

I'm working on a similar issue and decided to load the data through 
MediaWiki's web interface, using a bot written in Perl (using LWP). I 
went that way for a couple of reason's, chiefly because I want the 
original submission attributable to a specific source (depending on the 
user name I give the bot) and I want all the file updates that normally 
take place (category assignment, recent changes, etc.) to occur without 
me having to worry about what exactly the MediaWiki code does and when 
it does it.

One of my sources has about 900 entries and there are several others 
that are smaller, so it's a lot less work than creating all these 
entries manually, even though some of the sources are non-trivial to 
parse, and I expect fewer errors in the final text using this method. 
I'm also creating category info off the extracted data and will insert 
that into the final wiki text before it is uploaded so that the 
submitted entries will be assigned to specific categories

The bot, in this case, simply does the work of submitting the generated 
entries and I'm creating individual scripts to parse the various source 
materials. The next step is to generate HTML output (1 file per entry) 
from the data files I've generated (also individual scripts since the 
sources contain different types of information) and then convert that 
to wiki text for the bot to upload. (I could skip the HTML but I'd like 
to be able to "preview" a sampling of the entries before I start 
uploading them and it's not that much more work.) I'll probably also 
create a second bot to delete a set of entries, just so that I can get 
rid of the entries resulting from "test runs" on a test wiki I set up.

You're welcome to the scripts I'm working on, although, none of them is 
completely finished at the moment, other than a couple of parsing 
scripts that wouldn't be of much use to you.


John Blumel




More information about the MediaWiki-l mailing list