Hi all,
I think countries of teh world would be a fine place to start. There are already good tables in the major languages that can be translated easily.
I've not seen this resource mentioned yet in this thread, so I thought I'd throw out a link:
http://unicode.org/cldr/data/common/main/
As you may know, CLDR (Common Locale Data Repository) is a project that's hosted on unicode.org. It has xml files with lots of localization data for various languages. (I believe there is also a process for contributing information for languages that haven't yet been added -- if it's possible to share this info with that project, it would also be a good way to promote those languages on the web. I don't have anything to do with CLDR myself, by the way, I just think it's a cool project.)
Most of this data has been vetted, and data that hasn't been is tagged with draft="true"
Some specific files to look at (you can load the files directly in Firefox, I'm not sure about other browsers):
http://unicode.org/cldr/data/common/main/sw.xml Kiswahili (Swahili) - 79 territories http://unicode.org/cldr/data/common/main/am.xml አማርኛ (Amharic) - 255 http://unicode.org/cldr/data/common/main/ka.xml ქართული (Georgian) - 191 http://unicode.org/cldr/data/common/main/ms.xml Bahasa Melayu (Malay) - 239
For some languages there isn't much there currently:
http://unicode.org/cldr/data/common/main/ur.xml Urdu - 1 territory http://unicode.org/cldr/data/common/main/az.xml Azeri - 1 territory http://unicode.org/cldr/data/common/main/ml.xml Malayalam - 1 territory
And for others there's not even a file -- unfortunately, Udmurt, Ossetian, and Chuvash are in this group -- but this might change in the future. Perhaps it would be easier, also, for these speakers to translate from the Russian file?
These files also contain these other categories (again, with varying degrees of completeness):
* language names * currencies * "exemplar characters" -- potentially the basis for an entry on the writing systems of various languages * calendar information -- months, days, stuff like that
I've already started writing some Python scripts for extracting info from the CLDR, I'd be happy to try to generating stuff with them in whatever formats people need, etc.
Well, there's my several cents =)
Best regards,
Patrick Hall