Brion Vibber wrote:
David Gerard wrote:
Brion, what would it take to get article rating switched on? Is there any such feature you would allow in, or is it basically off the agenda and I should stop asking?
Well, it needs to be shown to work correctly on pages with thousands of revisions without bogging the server or otherwise exploding in interesting ways.
-- brion vibber (brion @ pobox.com)
Brion,
Am I being naive here, or would a super-dumb implementation with a single table with the columns shown below be enough to work in the short term?
Page_ID Revision_ID User_ID Rating_ID Rating value Timestamp
If a user rates five parameters, five entries go into the table. Even if users were to rate articles at the same rate that they are currently editing them (about once a second), and each rating had five rating dimensions, the table would grow at a rate of about half-a-million entries per day. If each entry took (say) twenty bytes, that would be a growth of 10 megabytes per day, or about four gigabytes a year. Since the table would grow by simply appending records to its end, and would not otherwise change, it would not add much load to the database when being written, as adding five records at once in a single InnoDB transaction would result in only a single disk write.
To throttle back the load from rating, it might be reasonable to restrict rating only to logged-on users, and, if that's not enough, * to throttle the rate at which they could rate articles? * to restrict rating only to times when the server load was low? * to restrict rating only to users with a certain number of edits and/or time since account was registered?
Now, this could clearly be made much, much more efficient in a number of obvious way, but it would be simple enough to implement in the short term to get things going.
Also in the short term, the only output method needed would be the ability to dump an XML or CSV file showing all the rating records for an given article in a given time-period, and this could be restricted to admins for the time being if the random access required would be a significant load on the database. This would be enough to allow users to start experimenting with ratings analysis schemes.
To prevent the unlimited growth of the table, it would be quite reasonable to archive rating records more than (say) a year old into a compressed XML dump.
Finally, in any case, since ratings would only be an experiment during this phase, the whole ratings system could be turned off at any time if it presented a serious problem.
-- Neil