I notice that /usr (/dev/sda2) is at 96%. ext2 has some pretty bad problems with fragmentation once it gets above a certain percentage. This can cause some pretty bad performance problems. Once it has fragmented, it is difficult to get it back to a contiguous state.
There are defrag programs, but they are fairly scary. The only other way to get it back to normal is to back everything up, mkfs, and restore it.
Perhaps somebody can remove a bunch of the packages that are installed that we don't use?
(Nick Reinking nick@twoevils.org): I notice that /usr (/dev/sda2) is at 96%... Perhaps somebody can remove a bunch of the packages that are installed that we don't use?
The server is very clean in terms of software. The big culprits for disk usage are MySQL's Innodb transaction data (currently a single 10Gb file!), and logfiles from MySQL and Apache.
I don't know enough about MySQL to know how to limit that data and what the impact might be. I do think we could lighten the load on Apache log files considerably now, to save both disk space and gain some performance. For instance, we logged user agents and referrers to get some stats, but I don't think we really need that anymore.
On Tue, Apr 29, 2003 at 05:07:24PM -0500, Lee Daniel Crocker wrote:
(Nick Reinking nick@twoevils.org): I notice that /usr (/dev/sda2) is at 96%... Perhaps somebody can remove a bunch of the packages that are installed that we don't use?
The server is very clean in terms of software. The big culprits for disk usage are MySQL's Innodb transaction data (currently a single 10Gb file!), and logfiles from MySQL and Apache.
I don't know enough about MySQL to know how to limit that data and what the impact might be. I do think we could lighten the load on Apache log files considerably now, to save both disk space and gain some performance. For instance, we logged user agents and referrers to get some stats, but I don't think we really need that anymore.
Yeah, that's what I was thinking. Putting the Apache logs on another drive, or turning off the access_logs altogether would probably be quite helpful; at the cost of statistics (since we don't have a second drive.
Of course, it doesn't really matter since Apache will be on a different machine soon enough - but we should still get /usr utilization down, else we're going to fragment the heck out of that partition.
On Tue, 29 Apr 2003, Lee Daniel Crocker wrote:
(Nick Reinking nick@twoevils.org): I notice that /usr (/dev/sda2) is at 96%... Perhaps somebody can remove a bunch of the packages that are installed that we don't use?
The server is very clean in terms of software. The big culprits for disk usage are MySQL's Innodb transaction data (currently a single 10Gb file!), and logfiles from MySQL and Apache.
Innodb keeps most of its goodies in that one big file, which can expand but cannot contract. Certain operations (like altering the table structure) involve making a complete duplicate of the database, altering it, then replacing the old one; so it's taking up nearly twice the space it actually _needs_ on a regular basis. On the plus side, it gives us room to grow. :)
There's also the www-bin.### files, which are the binary log. These track changes made to the database, and are rotated at 1 gigabyte or when the server is restarted. These are mainly useful for database replication, which we don't do _yet_ but will do in the future. For now, I just periodically delete the old ones. It can be disabled somehow, but we'll likely want them in the future so I've not bothered.
Now, here's the space used by the actual wiki files under /usr/local/apache:
2017344 htdocs 387056 logs 302384 htdocs-fr 183700 htdocs-sv 172480 htdocs-de 133512 htdocs-meta 129772 htdocs-eo 103748 htdocs-pl 98588 htdocs-es 97832 htdocs-ja 88128 htdocs-nl 71080 htdocs-da 31276 htdocs-zh 30896 htdocs-test 17036 htdocs-wiktionary 10868 htdocs-ko 7788 htdocs-ru 6784 htdocs-cs 4056 htdocs-bs 3256 htdocs-ms 3020 htdocs-el 2960 htdocs-tr 2788 htdocs-sh 2788 htdocs-ml 2772 htdocs-sr 2772 htdocs-hr 2740 htdocs-sep11
These include the php files, uploaded images, backup tarballs, webalizer stuff, and TeX-generated images. I've deleted saved log files from prior to one week ago (and those that are retained are gzipped).
Further breakdown on the English wiki: 1306996 tarballs 440728 upload 120176 stats 92972 tmp 28472 math 16796 images 4444 w ... some other small smidgens of files...
I do think we could lighten the load on Apache log files considerably now, to save both disk space and gain some performance. For instance, we logged user agents and referrers to get some stats, but I don't think we really need that anymore.
Oh, I think it's quite useful to get that information, otherwise I wouldn't know about *($%@^&*$%@# Grub.
Anyway, I cleaned out a few things and moved some of the older tarballs over to the archives in the home partition, and we're down to 85% usage on /usr.
-- brion vibber (brion @ pobox.com)
Brion Vibber vibber@aludra.usc.edu wrote in news:Pine.GSO.4.33.0304291740490.16252-100000@aludra.usc.edu:
Oh, I think it's quite useful to get that information, otherwise I wouldn't know about *($%@^&*$%@# Grub.
Anyway, I cleaned out a few things and moved some of the older tarballs over to the archives in the home partition, and we're down to 85% usage on /usr.
-- brion vibber (brion @ pobox.com)
I like also stats. If possibel, i would like more stats. Now there are only Top 30 of 10972 Total Referrers. I would like to have the top 250. Same whit Top 30 of 12904 Total URLs. I like the top 500.
Now there is no registration of the Top of Total Countries.
On Wed, 30 Apr 2003, Giskart wrote:
I like also stats. If possibel, i would like more stats. Now there are only Top 30 of 10972 Total Referrers. I would like to have the top 250. Same whit Top 30 of 12904 Total URLs. I like the top 500.
Well, at the least we should merge the fifty variants of google. :)
Now there is no registration of the Top of Total Countries.
Yeah, I had to disable that because it took ALL DAY to do the reverse DNS lookups. Hypothetically we could have Apache do lookups as connections happen and store the hostnames in the log, but that would lead to a performance hit during busy times...
-- brion vibber (brion @ pobox.com)
On Wed, Apr 30, 2003 at 10:01:33PM +0000, Giskart wrote:
Brion Vibber vibber@aludra.usc.edu wrote in news:Pine.GSO.4.33.0304291740490.16252-100000@aludra.usc.edu:
Oh, I think it's quite useful to get that information, otherwise I wouldn't know about *($%@^&*$%@# Grub.
Anyway, I cleaned out a few things and moved some of the older tarballs over to the archives in the home partition, and we're down to 85% usage on /usr.
-- brion vibber (brion @ pobox.com)
I like also stats. If possibel, i would like more stats. Now there are only Top 30 of 10972 Total Referrers. I would like to have the top 250. Same whit Top 30 of 12904 Total URLs. I like the top 500.
Now there is no registration of the Top of Total Countries.
I imagine it will matter much less with the new server, but it would still be a good idea to add a second drive that is only used for logs and database dumps. That would help quite a bit with the high I/O levels on the primary disk.
wikitech-l@lists.wikimedia.org