I agree on the quote from my writing. I've changed that from:
"RAID 1 mirroring offers about twice the number of read seeks as write seeks (each drive seeks independently). RAID 5 does not offer more read seeks than write seeks because each stripe is on all disks and all must seek together to get the data"
to:
"RAID 1 mirroring or RAID 10 offers about twice the number of read seeks as write seeks (each drive or stripe seeks independently). RAID 5 does not offer more read seeks than a single drive, RAID 1 or RAID 10 can deliver because each stripe is on all disks and all must seek together to get the data. In addition, in RAID 5 writes are slowed because at least one read is required to get the parity data unless it has been cached."
so it doesn't ignore the write rate reduction in RAID 5. That might be significant if it turns out that we become write rate limited and it might be one part of the reasons why LiveJournal, which is almost exclusively write-limited, is switching from RAID 5 to RAID 10. Doesn't affect what I was intending to write about, though, which was read rates of the various RAID systems.
Experience at the Wikipedia is that Suda with a 3 disk RAID 5 setup is far slower than Geoffrin with a 4 disk RAID 10 setup. I'm interested in reading your views on why that is the case. Either way, though, I'm inclined to go with what we've seen of performance in the Wikipedia environment until we can make Suda with RAID 5 faster than it has been. If you can come up with some proposals which might do that, it's worth considering trying them, since the greater space efficiency of RAID 5 will be useful eventually.
RAID 5 compared to RAID 10 is interesting when it comes to sequential read rates because the RAID 5 system can read the data from more drives, so it can get a higher sequential transfer rate. The catch is that this is a database system and database systems are generally considered to be limited primarily by their seek rate, not their sequential transfer rate. There are some potential gotchas in that though - cases with large chunk/cluster sizes in the database and some access patterns might change it. "Transaction rate" rather than seek rate or sequential transfer rate has lots of significant details not spelled out, which is one reason why I stuck to the comparatively unambiguous seek and sustained transfer rate measures (though those have a fair amount of varying potential as well).
Yes, I agree that it's possible to have RAID 5 systems set up not to have striping across all drives in the RAID 5 system box. However, that's not how people normally think of RAID 5 - they are normally thinking in terms of one set of drives. The RAID 5 option offers less independent seeks than RAID 1, unless you start to do things like splitting the array stripes as you described. Not really sure what I'd call that but RAID 5 probably isn't it. Maybe a pair of RAID 5s. In any case, I expect that to offer less seeks than RAID 1 because that two drive minimum per stripe has to seek together and RAID 1 drives can seek independently.
I do not agree that RAID 5 offers the highest read transaction rate, in general. Please support that claim compared to RAID 1 and RAID 10 over in the article talk page. It'll be interesting to see your data and any you can point to which compares the systems. Since we're considering Wikipedia use, data with Wikipedia access patterns, including transfer sizes, is what really interests me. I don't know the typical transfer size per seek for Wikipedia, though.
In a past life I was disk then overall manager for CompuServe's benchmarks and standards community, so I'm always happy to discuss disk system performance - it's a fun subject for me.:) But probably best not done on this list.:)