New subject: [Mediawiki-l] Problem populating new database

8 Apr 2004


      Hello Wikipedians,
I am in the process of making a local mirror of the WikiPedia encyclopedia
and seem to have hit a stumbling block. But first a related question. I
have looked thru the software doc and the downloaded pages and haven't
seen this but just want to  make sure. I have the base software Ver 1.2.3
installed from the web interface. Only had one small but surmountable
problem. When using IE 5.1.3 Under MacOS 9.0.4 I could not enter the name
for the site. The field was overlaid by the info that should have been to
its right. I switched to Netscape and then no problem.
Now I am in the process of populating the database and was wondering if in
the maintenance folder (or someplace else) there is a set of script to
fetch and upload the actual base data content and then the  weekly
updates. I would like to keep this mirror up to date with the master copy.
...
From looking at mailing list archive I have seen it stated that there is
no  doc file explaining what each of the maint. scripts does, and looking
them over hasn't yielded one to create/update the database. If one doesn't
exist I am ready to do it manually. But in my first attempts I have hit a
few problems.
The first trick is getting the correct data to do the upload with. I found
the dump D/L page and the files for the EN version of the database(dated
2004-04-03). The current one looks find and I have been able to retrieve
it and do some(not all) processing with it. My first problem is the old
database. It is my assumption that this contains the full database content
(minus images) prior to the new data in the current file. I notice that
the format of the old/full file has changed recently and grown a lot. I
tried to D/L the full DB as the single file and failed (403 - not
authorized) and this seems to not be unexpected since there is mention of
the multi part files for those experiencing problems. The single files
http://download.wikimedia.org/archives/en/20040403_cur_table.sql.bz2 and
http://download.wikimedia.org/archives/en/20040403_old_table.sql.bz2 have
names and formats that make sense to me. The partials have me confused,
especially with my inability to decompress them. First there are only
three files and based on the file sizes (and one unlisted file) it appears
that there should be four. The first three come to exactly 2Gigs each, a
mathematical oddity if that was all there was, but a fourth file would
even it out nicely. First the files themselves have names that give no
clue as to their contents. http://download.wikimedia.org/archives/en/xaa
xab xac and the unlisted xad . What format are these and how should they
be joined together? I copied them over via wget and then tried to merge
and decompress them but failed. The command and response I tried(to verify
before actual processing) was:
========= start of clip
-bash-2.05b$ nice bzip2 -t xaa xab xac xad
bzip2: xaa: file ends unexpectedly
bzip2: xab: bad magic number (file not created by bzip2)
bzip2: xac: bad magic number (file not created by bzip2)
bzip2: xad: bad magic number (file not created by bzip2)
You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.
========= end of clip
Are these files damaged or am I just using the wrong software to do this?
BTW I am on a RH9 system running PHP434, MySQL 4.0.18 and Apache 2
To try and continue my testing and make sure I had everything else in
place I thought I'd try using the current file and see how that went. It
might not be all the data but it would give me a taste of how things were
going. The decompress went fine but I had a problem part way during the
load. The data that did load was enough for me to do some minimal testing
and verify that the software basically works and that I was close in doing
the upload. The command I tried and the response I got was:
========= start of clip
-bash-2.05b$ nice mysql -p -uxxxxxxx wikipedia < 20040403_cur_table.sql
Enter password:
ERROR 1153 at line 831: Got a packet bigger than 'max_allowed_packet'
-bash-2.05b$========= end of clip
What size should I be setting the 'max_allowed_packet' to?
Thanks in advance for your help and for creating this software and its
associated database.
Paul
http://PrivacyDigest.com/         Daily news from the privacy front.
PS The ls -al for the data files I Downloaded is:
-rw-r--r--    1 wikipedia psacln   850374900 Apr  3 02:09
20040403_cur_table.sql
-rw-r--r--    1 wikipedia psacln   2000000000 Mar 22 17:32 xaa
-rw-r--r--    1 wikipedia psacln   2000000000 Mar 22 17:35 xab
-rw-r--r--    1 wikipedia psacln   2000000000 Mar 22 17:38 xac
-rw-r--r--    1 wikipedia psacln   1614740369 Mar 22 17:40 xad