I believe you can just populate cur and old, then have the scripts
included in MediaWiki rebuild the indecis.
If you're not using MediaWiki classes, also make sure you populate
logging as needed.
Here are some of the other tables that may need to be populated, based
on the capabilities of the old system:
* image and oldimage if uploads are included.
* user and user_rights for users
* validate if validation is possible
* watchlist if there are Watchlists
Just be careful of IDs matching Titles and revisions. It may help (if
possible) to use the MediaWiki classes, though modified to support
arbitrary dates, times, and users.
On 4/13/05, Wolfe, Jeff <Jeff_Wolfe(a)intuit.com> wrote:
Hi John,
I'm not familiar with LWP (though I google'd it and get the basic idea), but
I'll take any help I can get. One could almost build a command line SDK
that way for instances where you didn't want to hit the db directly.
I was thinking about just pushing into cur, category, and searchindex, but I
think you have an excellent point. I really like being able to attribute
the author, source, etc. Have you considered trying to use some of the php
scripts from the command-line as an alternative?
I would indeed appreciate your scripts if you don't mind.
Thanks,
Jeff
-----Original Message-----
From: mediawiki-l-bounces(a)Wikimedia.org
[mailto:mediawiki-l-bounces@Wikimedia.org] On Behalf Of John Blumel
Sent: Wednesday, April 13, 2005 3:57 PM
To: MediaWiki announcements and site admin list
Subject: Re: [Mediawiki-l] Mass Import
On Apr 13, 2005, at 4:23pm, Wolfe, Jeff wrote:
I'm seeking a way to mass import lots of data
into a MediaWiki. I can
massage my data in most reasonable ways and have direct access to the
database. I can use existing PHP, generate fake URLS, or hit the SQL
database directly. Does anyone have a suggestion?
I'm working on a similar issue and decided to load the data through
MediaWiki's web interface, using a bot written in Perl (using LWP). I went
that way for a couple of reason's, chiefly because I want the original
submission attributable to a specific source (depending on the user name I
give the bot) and I want all the file updates that normally take place
(category assignment, recent changes, etc.) to occur without me having to
worry about what exactly the MediaWiki code does and when it does it.
One of my sources has about 900 entries and there are several others that
are smaller, so it's a lot less work than creating all these entries
manually, even though some of the sources are non-trivial to parse, and I
expect fewer errors in the final text using this method.
I'm also creating category info off the extracted data and will insert that
into the final wiki text before it is uploaded so that the submitted entries
will be assigned to specific categories
The bot, in this case, simply does the work of submitting the generated
entries and I'm creating individual scripts to parse the various source
materials. The next step is to generate HTML output (1 file per entry) from
the data files I've generated (also individual scripts since the sources
contain different types of information) and then convert that to wiki text
for the bot to upload. (I could skip the HTML but I'd like to be able to
"preview" a sampling of the entries before I start uploading them and it's
not that much more work.) I'll probably also create a second bot to delete a
set of entries, just so that I can get rid of the entries resulting from
"test runs" on a test wiki I set up.
You're welcome to the scripts I'm working on, although, none of them is
completely finished at the moment, other than a couple of parsing scripts
that wouldn't be of much use to you.
John Blumel
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
MediaWiki-l(a)Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Thank you to JosephM for inviting me to Gmail!
Have lots of invites. Gmail now had 2GB.