I'm sure this is not an original thought and probably has been discussed again recently on IRC. I'm curious to the outcome though:
Would it be possible to make dumps every one or two days, so that we are less dependant on good fortune next time?
Erik Zachte
Erik Zachte wrote:
Would it be possible to make dumps every one or two days, so that we are less dependant on good fortune next time?
I am not a specialist nor a programmer so this may be stupid but isn't there a way to make an SQL "diffs" every day? Aren't the "logs" that you spoke about almost the same? If they can be offered for download, a complete database dumps (very time-consuming) may be created even more rarely.
The "diffs.sql" would not be enormous in MB and if applied ("patched") one after another to the database, everyone would have the latest complete data. This would also save lots of bandwidth for downloads.
5ko>bg.wiki
On Thu, 24 Feb 2005 18:35:28 +0100, petko yotov 5ko@free.fr wrote:
Erik Zachte wrote:
Would it be possible to make dumps every one or two days, so that we are less dependant on good fortune next time?
I am not a specialist nor a programmer so this may be stupid but isn't there a way to make an SQL "diffs" every day? Aren't the "logs" that you spoke about almost the same? If they can be offered for download, a complete database dumps (very time-consuming) may be created even more rarely.
The "diffs.sql" would not be enormous in MB and if applied ("patched") one after another to the database, everyone would have the latest complete data. This would also save lots of bandwidth for downloads.
Well, there's exporting only articles with a cur version later than the last backup. That would work for the current versions. Actually, the old versions can work in a similar manner (although ten article texts are compressed into one entry nowadays so it's a little more complicated).
Tomer Chachamu wrote:
Well, there's exporting only articles with a cur version later than
How much data (megabytes) are in the cur table for en.wikipedia, and how much are in the old table? I understand these numbers grow all the time, but does anybody have rough estimates? And perhaps some numbers for where we were a year ago?
Lars Aronsson wrote:
Tomer Chachamu wrote:
Well, there's exporting only articles with a cur version later than
How much data (megabytes) are in the cur table for en.wikipedia, and how much are in the old table? I understand these numbers grow all the time, but does anybody have rough estimates? And perhaps some numbers for where we were a year ago?
Hello,
If I read mysql "Show table status" output correctly, for the english wikipedia we have:
Current revisions (cur): 3GB with a 560MB index. History (old): 80GB with a 3GB index.
You can get an idea of the progression rate by looking at erick statistics: http://en.wikipedia.org/wikistats/EN/PlotDatabaseSize1.png
Hi Erik,
Erik Zachte schrieb:
Would it be possible to make dumps every one or two days, so that we are less dependant on good fortune next time?
as far as I understood from another mail (which I cannot find anymore), due to Tim Starling's compression run since December - -> in order to reserve a maximum of CPU resources for the so-urgently needed compression, the dumps have been put to a lower priority than the compression.
This meant: no dumps for a while.
wikitech-l@lists.wikimedia.org