This would mean either that the input is incomplete/corrupt or that my regular expression
is not as sound as it seems (it has parsed millions of records so far without falling
over, but -like Popper told us- seeing millions of black ravens only is no proof that no
white raven will ever be found).
In any case I think I will add some extra checks.
1 when the regexp completes it should have reached nearly end of file.
2 when the number of records read is less than on previous run clearly something is wrong
Is there any way I can make the script alert you and me when something went wrong, like
sendmail ?
Erik Zachte
van: Brion Vibber <brion(a)pobox.com>
datum: 2003/12/22 ma AM 03:07:57 CET
aan: Wikimedia developers <wikitech-l(a)Wikipedia.org>rg>,
epzachte(a)chello.nl
onderwerp: Re: [Wikitech-l] Stats weirdness
On Dec 21, 2003, at 10:24, <epzachte(a)chello.nl> wrote:
The stats job failed halfway through on the en:
'old' SQl dump.
This happened before, since huge en: has been split into 2 files.
Second part probably corrupt/incomplete.
===== WikiCounts / 6:12 Sunday, December 21, 2003 / Wikipedia: EN =====
Read sql dump file
'/home/wikipedia/backups/public/en/cur_table.sql.bz2' (150.2 Mb)
Extract names and timestamps.
Data read (Mb):
06:12 - 10
06:13 - 20 30 40 50 60 70 80 90 100 110
06:14 - 120 130 140 150 160 170 180 190 200 210 220
06:15 - 230 240 250 260 270 280 290 300 310 320
06:16 - 330 340 350 360 370 380 390 400 410 420 430 440
06:17 - 450 460 470 480 490 500 510 520 530 540 550 560 570
06:18 - 580 590 600 610
Read sql dump file
'/home/wikipedia/backups/public/en/old_table.sql.bz2' (2665.3 Mb)
Extract names and timestamps.
Data read (Mb):
10 20 30 40 50 60 70
06:19 - 80 90 100 110 120 130 140 150 160 170 180
Parsing SQL files took 6 min, 59 sec.
That's rather mysterious.
Re-running with the new version, let's see what happens...
-- brion vibber (brion @
pobox.com)