[Mediawiki-l] Mass Import

Jamie Bliss astronouth7303 at gmail.com
Wed Apr 13 22:12:59 UTC 2005


I believe you can just populate cur and old, then have the scripts
included in MediaWiki rebuild the indecis.

If you're not using MediaWiki classes, also make sure you populate
logging as needed.

Here are some of the other tables that may need to be populated, based
on the capabilities of the old system:
* image and oldimage if uploads are included.
* user and user_rights for users
* validate if validation is possible
* watchlist if there are Watchlists

Just be careful of IDs matching Titles and revisions. It may help (if
possible) to use the MediaWiki classes, though modified to support
arbitrary dates, times, and users.

On 4/13/05, Wolfe, Jeff <Jeff_Wolfe at intuit.com> wrote:
> Hi John,
> 
> I'm not familiar with LWP (though I google'd it and get the basic idea), but
> I'll take any help I can get.  One could almost build a command line SDK
> that way for instances where you didn't want to hit the db directly.
> 
> I was thinking about just pushing into cur, category, and searchindex, but I
> think you have an excellent point.  I really like being able to attribute
> the author, source, etc.  Have you considered trying to use some of the php
> scripts from the command-line as an alternative?
> 
> I would indeed appreciate your scripts if you don't mind.
> 
> Thanks,
> Jeff
> 
> 
> -----Original Message-----
> From: mediawiki-l-bounces at Wikimedia.org
> [mailto:mediawiki-l-bounces at Wikimedia.org] On Behalf Of John Blumel
> Sent: Wednesday, April 13, 2005 3:57 PM
> To: MediaWiki announcements and site admin list
> Subject: Re: [Mediawiki-l] Mass Import
> 
> On Apr 13, 2005, at 4:23pm, Wolfe, Jeff wrote:
> 
> > I'm seeking a way to mass import lots of data into a MediaWiki.  I can
> > massage my data in most reasonable ways and have direct access to the
> > database.  I can use existing PHP, generate fake URLS, or hit the SQL
> > database directly.  Does anyone have a suggestion?
> 
> I'm working on a similar issue and decided to load the data through
> MediaWiki's web interface, using a bot written in Perl (using LWP). I went
> that way for a couple of reason's, chiefly because I want the original
> submission attributable to a specific source (depending on the user name I
> give the bot) and I want all the file updates that normally take place
> (category assignment, recent changes, etc.) to occur without me having to
> worry about what exactly the MediaWiki code does and when it does it.
> 
> One of my sources has about 900 entries and there are several others that
> are smaller, so it's a lot less work than creating all these entries
> manually, even though some of the sources are non-trivial to parse, and I
> expect fewer errors in the final text using this method.
> I'm also creating category info off the extracted data and will insert that
> into the final wiki text before it is uploaded so that the submitted entries
> will be assigned to specific categories
> 
> The bot, in this case, simply does the work of submitting the generated
> entries and I'm creating individual scripts to parse the various source
> materials. The next step is to generate HTML output (1 file per entry) from
> the data files I've generated (also individual scripts since the sources
> contain different types of information) and then convert that to wiki text
> for the bot to upload. (I could skip the HTML but I'd like to be able to
> "preview" a sampling of the entries before I start uploading them and it's
> not that much more work.) I'll probably also create a second bot to delete a
> set of entries, just so that I can get rid of the entries resulting from
> "test runs" on a test wiki I set up.
> 
> You're welcome to the scripts I'm working on, although, none of them is
> completely finished at the moment, other than a couple of parsing scripts
> that wouldn't be of much use to you.
> 
> John Blumel
> 
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l at Wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
> 
> _______________________________________________
> MediaWiki-l mailing list
> MediaWiki-l at Wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
> 


-- 
-------------------------------------------------------------------
http://endeavour.zapto.org/astro73/
Thank you to JosephM for inviting me to Gmail!
Have lots of invites. Gmail now had 2GB.



More information about the MediaWiki-l mailing list