On Fri, Aug 13, 2010 at 8:55 AM, Magnus Manske
<magnusmanske(a)googlemail.com> wrote:
Disk dump thread:
* Get mutex for list start pointer
* Copy list start pointer
* Reset list start pointer = NULL
* Release mutex
* Write list to disk
* Release memory
If you allocate memory per list item, the freed ones should nicely fit
the next ones, so malloc would not be too slow, I imagine (just use
char[] of fixed size).
While we're having fun speculating on possible designs without
actually volunteering to write the code ;): wouldn't a circular buffer
make more sense than a linked list? You could have one buffer per
thread, to avoid locking for the end pointer, and then the thread that
writes the logs to disk can just go around all the buffers in turn,
grabbing the requests from the start, ordering them by timestamp, and
writing them out in the correct order. I don't think this would need
any locking at all, and if you make the buffer large enough that you
don't have to worry about the dump thread not being able to keep up,
you wouldn't even have any cache bouncing.
You could also avoid malloc()/free() issues by just preallocating all
the buffers, with only the dump thread doing malloc()/free() during
operation. If you keep enough buffer space for 100k requests total,
and each request takes 1 KB, that's only 100 MB.
Now I want Domas to shoot holes in my idea too! :)
Just curious: is the "million wakeups" an
actual number, or a figure
of speech? How many views/sec are there?
Current #wikimedia-tech topic says that peak load is about 100k req/s.
Assuming a million per second is a reasonable idea for
future-proofing.