Without having looked at any code, can't the threads just add data to a semaphore linked list (fast), and a single separate thread writes the stuff to disk occasionally?
Isn't that the usual error that threaded software developers do:
1. get all threads depend on single mutex 2. watch them fight! (you'd get a million wakeups here a second :-)
as a bonus point you get a need to copy data to a separate buffer or frenzy memory allocating with another mutex for malloc/free ;-)
Domas