Brion Vibber wrote:
David Gerard wrote:
Brion, what would it take to get article rating
switched on? Is there
any such feature you would allow in, or is it basically off the agenda
and I should stop asking?
Well, it needs to be shown to work correctly on pages with thousands of
revisions without bogging the server or otherwise exploding in
interesting ways.
-- brion vibber (brion @
pobox.com)
Brion,
Am I being naive here, or would a super-dumb implementation with a
single table with the columns shown below be enough to work in the short
term?
Page_ID
Revision_ID
User_ID
Rating_ID
Rating value
Timestamp
If a user rates five parameters, five entries go into the table. Even if
users were to rate articles at the same rate that they are currently
editing them (about once a second), and each rating had five rating
dimensions, the table would grow at a rate of about half-a-million
entries per day. If each entry took (say) twenty bytes, that would be a
growth of 10 megabytes per day, or about four gigabytes a year. Since
the table would grow by simply appending records to its end, and would
not otherwise change, it would not add much load to the database when
being written, as adding five records at once in a single InnoDB
transaction would result in only a single disk write.
To throttle back the load from rating, it might be reasonable to
restrict rating only to logged-on users, and, if that's not enough,
* to throttle the rate at which they could rate articles?
* to restrict rating only to times when the server load was low?
* to restrict rating only to users with a certain number of edits and/or
time since account was registered?
Now, this could clearly be made much, much more efficient in a number of
obvious way, but it would be simple enough to implement in the short
term to get things going.
Also in the short term, the only output method needed would be the
ability to dump an XML or CSV file showing all the rating records for an
given article in a given time-period, and this could be restricted to
admins for the time being if the random access required would be a
significant load on the database. This would be enough to allow users to
start experimenting with ratings analysis schemes.
To prevent the unlimited growth of the table, it would be quite
reasonable to archive rating records more than (say) a year old into a
compressed XML dump.
Finally, in any case, since ratings would only be an experiment during
this phase, the whole ratings system could be turned off at any time if
it presented a serious problem.
-- Neil