On Tue, Feb 10, 2004 at 07:29:14PM +0100, talthen@wp.pl wrote:
Hello, Wikipedia's database is quite huge. But is not widening so fast. But it would be changed when all the Wikipedians started creating common database. The main problem is the difference of languages, but... I have an idea! :) I know my idea will not be so easy to realize, but I would be very usefull.
The idea is to create new language, based on most popular languages from all over the world. This language would not be a human language, but a language to store information.
Today we have some language translating applications, but they are not perfect, because two things:
- Some languages differ too much
- Some words have many meanings, and theprogram doesn't know which one
shoulb be chosen. By creating new language we would solve first problem. (I think we do not have to create entirely new language, maybe modifying Esperanto would be just enough). The second problem could be chosen by listing all the meanings of words. For example for english language we could create file like this: word number word meaning
1 mind intellect 2 mind thoughts 3 mind a head 4 mind to object to
The translating would look like this: I have written a sentence: "The study of logic trains the mind". Application scans my sentence and asks in which meaning I used word "mind". Then I choose from all "mind" meanings word "intellect". After explaining allthe meanings by the writer the application saves it in it's own language in a structure like this: 116117 6322 987672 1 312312 Where the numbers means word numbers.
Decompression would look like this: I have asked the program to display the message in Polish. The application loads file "polish.txt" and is looking for words with these numbers. As a fourth word it loads word from line one (because word "mind" with meaning "intellect" is in line 1 in all the languages, not only in English). It finds all the words and displays them.
I know that writing down all the meanings of words is not easy. But if all Wikipedians write just a few we would finish it very fast. The hardest thing is to make the language, that describes in which time is the sencence, what the order of words should be after translating to language X and what after diplaying in Y, etc. But I think this is possible and would make for eg. building the database of Wikipedia much easier. And not only this. There will be many applications for it.
Hope you understood what I mean. I know I may have made some mistakes (both gramatically and logically)...
So- how do you like my idea? Do you think it's worth realizing?
First, choose some small area of knowledge. It doesn't matter what would it be, but it must be non-trivial for the experiment to be any meaningful. Then, try to implement something that works with this area and just a few languages.
Natural Language Processing is one of the most difficult parts of the Computer Science, where lot of really promising ideas failed in practice. Obviously, we'd love to use anything that'd make our work easier, but it would be very hard to get something like the thing you describe working.