I done some testing of the performance of Lua-based templates as deployed on enwiki. This analysis is summarized at:
https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance
The bottom line is that Lua is fast, and often much faster than the template coding it replaces.
For the important case of citation templates, one can anticipate seeing about an 80% reduction in render time once Module:Citation/CS1 is deployed. This will have the effect that 300 citations can be processed in about 3.5 seconds rather than 18 seconds. Such an improvement should make a meaningful difference for many of Wikipedia's complex pages.
One unexpected detail that came out of my testing is that the overhead per #invoke call is about 4.5 milliseconds, which is actually fairly large once one starts talking about having several hundred calls on a single page. For the citation module, this overhead is about 40% of the run time. For some of the simpler number formatting and string manipulation Lua modules, the overhead can be 75-90% of the run time. I don't know if it is possible, but it may be worth looking to see if there are ways to use caching or other techniques to reduce the overhead associated with launching each #invoke instance.
-Robert Rohde