Hello team,
I note that there has been little investigation of what Academia has to say on the topic; here are a few suggested bits of reading that might prove profitable as they explore the same problem space:
Lin, Xiaojun, and Shahzada B. Rasool. "Constant-time distributed scheduling policies for ad hoc wireless networks." Decision and Control, 2006 45th IEEE Conference on. IEEE, 2006.
Abawajy, Jemal H. "Fault-tolerant scheduling policy for grid computing systems." Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International. IEEE, 2004.
And, while older, still has good insight:
Bannister, Joseph A., and Kishor S. Trivedi. "Task allocation in fault-tolerant distributed systems." Acta Informatica 20.3 (1983): 261-281.
I am certain that more can be found; one useful research hint I can give you is that those three papers are cited often in later papers, taking a look at those is likely to be profitable.
-- Marc A. Pelletier