== What do we need? ==
Actually, we don't need a lot to solve this problem. I have the solution for the most important part of the problem, the linguistic one. Even if I don't have enough of time to deal with all cases, I am able to find students or professors of linguists who are willing to work on those issues for free (they would have scientific papers after the work is done). We need "just" a PHP programmer who is willing to work on this problem. And for a couple of years I didn't find any (even I know a lot of PHP programmers).
It sounds like a good project for a directed grant. Have you tried contacting potential grant-making organisations? I imagine some awesome things could be done with as little as $100K.
-- Tim Starling