A summary of current performance issues and discussions in IRC, combined with some purchase options discussions is available at:
http://meta.wikipedia.org/wiki/Upgrade_discussion_April_2004
Also linked from there are the new, excellent, Ganglia statistics Tim Starling set up a few days ago.
I'll integrate the discussion from the mailing list which isn't yet covered there shortly.
On Sun, 11 Apr 2004, user_Jamesday wrote:
http://meta.wikipedia.org/wiki/Upgrade_discussion_April_2004
"RAID 5 does not offer more read seeks than write seeks because each stripe is on all disks and all must seek together to get the data"
This is simply incorrect. RAID 5 has distributed parity, so a write will require reads from at least part of the stripe. Gez, you'd think Wiki people would read their own entry on RAID: http://en.wikipedia.org/wiki/RAID#RAID_5:_Independent_Data_disks_with_distri...)
I quote: "<b>Characteristics and Advantages</b> Highest read data transaction rate. Medium to poor write data transaction rate, especially when the host CPU performs software parity checking. Low ratio of ECC (Parity) disks to data disks means high efficiency. Good aggregate transfer rate."
Note: this discounts the ability of modern processors in performing parity calculations. A 3GHz P4 Xeon can easily handle parity for hundreds of drives.
Parallel reads (of blocks not on the same drive) are faster with RAID 5 vs. RAID 1 as there are usually many more drives.
And a stripe does not have to be across all the drives in the array, but does have to be at least two drives.
--Ricky
PS: Software RAID 5 speeds for various filesystems using a 7 drive array (7x ST118202FC 18.2G FC drives, Qla2100, dual 2.8G P4 Xeon, 3G RAM)
On Sun, 11 Apr 2004, user_Jamesday wrote:
http://meta.wikipedia.org/wiki/Upgrade_discussion_April_2004
"RAID 5 does not offer more read seeks than write seeks because each stripe is on all disks and all must seek together to get the data"
This is simply incorrect. RAID 5 has distributed parity, so a write will require reads from at least part of the stripe. Gez, you'd think Wiki people would read their own entry on RAID: http://en.wikipedia.org/wiki/RAID#RAID_5:_Independent_Data_disks_with_distri...)
I quote: "<b>Characteristics and Advantages</b> Highest read data transaction rate. Medium to poor write data transaction rate, especially when the host CPU performs software parity checking. Low ratio of ECC (Parity) disks to data disks means high efficiency. Good aggregate transfer rate."
Note: this discounts the ability of modern processors in performing parity calculations. A 3GHz P4 Xeon can easily handle parity for hundreds of drives.
Parallel reads (of blocks not on the same drive) are faster with RAID 5 vs. RAID 1 as there are usually many more drives.
And a stripe does not have to be across all the drives in the array, but does have to be at least two drives.
--Ricky
PS: Software RAID 5 speeds for various filesystems using a 7 drive array (7x ST118202FC 18.2G FC drives, Qla2100, dual 2.8G P4 Xeon, 3G RAM)
Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP xfs 6072M:64k 65897 30 18549 6 63092 14 335.0 8 ext2 6072M:64k 65777 27 18437 6 58003 14 288.4 6 ext3 6072M:64k 61752 38 22538 9 50738 12 212.0 5 reiserfs 6072M:64k 62821 26 23505 9 53290 14 269.4 6 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP xfs 1024 239 15 3591 84 280 5 254 16 1531 32 135 2 ext2 1024 43 99 48962 32 40645 30 43 99 84 27 82 85 ext3 1024 34 97 38788 27 24712 44 34 97 80 31 56 67 reiserfs 1024 488 2 9431 6 663 3 426 2 6299 6 285 2
I've not run these tests on "modern" 73G and 146G drives. (they're in use.) These drives will be reattched this eveningif anyone wants any different numbers (RAID 0, 1, 0+1, 1+0, 6, whatever)
wikitech-l@lists.wikimedia.org