Hi Gerard and all of you,
thinking about the code I was just considering some points.
What I noted on the page you gave me for the Italian version of the
ISO-code is that you use a mixed version for language identifiers - the
two letter code and where there's no two letter code the three letter
code - is this correct? I also noted that not all languages are present
in the ISO-3-letter-code - so they are standardised, but not completely.
This would obviously lead to an own wiktionary standard.
I am asking as I thought about compiling a list of the used language
codes for wiktionary and then add the several translations of the
languages names asking freinds and colleagues to complete the list.
Normally in the translation world the two letter code is used.
I'll then add the list to my sourceforge project (wsi-glossary:
http://sourceforge.net/projects/wsi-glossary/) you can see who is
contributing right now with integrations to the lists here:
http://wiki.wesolveitnet.com/wakka.php?wakka=WsiGlossaryContributors.
I should modify licensing (mine up to now was the same as the one used
for the OmegaT manual to GNU FDL - I have to check out if this is
possible without problems on sourceforge net. I am to new to OpenContent
to know all about this - so another thing to be done immediately).
If you are working on a multilanguage list e.g. of trees, birds,
vegetables etc. etc. please consider seriously to have these lists
integrated by other people as well and have it ready somewhere for
download or just integrate it into wsi-glossary. Certain kinds of work
can be done even by schools in language lessons - e.g. the Italian
Thesaurus for OpenOffice.org was created with the help of a school where
the teachers were the team leaders and during the classes the pupils did
something that made "sense" to them. Having them work directly in
wiktionary online is impossible for most schools as computers don't have
Internet access (or only a few of them) and so working on tables is much
easier.
If you prefer not to hand out the list: give out single terms or gourps
of terms like this:
I need these term(s)
house
cat
mouse
etc.
in the following languages:
German
French
Italian
etc.
I can then publish these parts or on my portal or send the request to
different lists of translators - so step by step it is possible to
integrate and improve.
Best wishes from Italy,
Sabine
--
Sabine Cretella
s.cretella(a)wordsandmore.it
www.wordsandmore.it
Meetingplace for translators
www.wesolveitnet.com
Hoi,
On META I have written a piece
http://meta.wikimedia.org/wiki/The_Ultimate_wikitonary In it I describe
what I think the Ultimate wiktionary would be like.
In the article I assume that things like etymology, pronounciation all
the good stuff that we need in a wiktionary are a given. Therefore this
article is about how it works and what functionality makes it possible.
The article is intended as a Request For Comments.
I also want to direct your attention to the article
http://meta.wikimedia.org/wiki/Proposal_for_a_Wiktionary_proof_of_concept
.. In it I propose to implement a subset of what is required for a
wiktionary that understands XML. I really welcome comments.
Thanks,
GerardM
On the nl:wikipedia, there were many articles that contained the names
of the subject in different languages. There was opposition to this, as
this is something you expect in a dictionary and not in an encyclopedia.
There were some itterations before we came up with our current solution.
On nl:Wiktionary we refer to articles in wikipedia and use the template
{{-info-}} to produce a text with an interlink to the wikipedia article.
On Wikipedia we now refer to articles in wiktionary and use the template
{{wikt}} to produce a text with an interlink to the wiktionary article.
The benefits are:
*There are more articles in nl:wiktionary as a result
*We did not lose the information that was on nl:wikipedia
*We have richer information overall
*The best bit is: more people contribute to nl:wiktionary.
Everybody is happy with this solution. :)
Thanks,
GerardM
There is a big thing on the wikipedia-l about writing up Chinese. One
thing I gleaned from this discussion is that zh-tw and zh-cn are used to
indicate respectively traditional and simplified Chinese. As it is
relevant to wiktionary to have both correct spellings, I propose to use
these codes as well as the zh code to indicate Chinese words.
I hope someone has a good suggestion for Serbian, cyrillic and
alphabetic. There are more language that are written in different
charactersets. I am looking forward to suggestions.
Thanks,
GerardM
Hoi,
As discussed in [[The need for XML re: wiktionary]] and [[Tables for
Wiktionary]] on Meta and the wiktionary mailinglist, it is necessary to
structure the wiktionary content in order to be able to share the
content of wiktionary. The complexity of doing this is huge. Not only do
we need to describe all kinds of data to be able to include this in our
database. We have to watch and make it not too difficult for a would be
contributor. We also have to produce XML or something like it to publish
our content.
To do all this in one go is a bit much. So I propose to do something
that is simpler first. We may use the GEMET data for inclusion within
Wikimedia. This is a rich and important body of knowledge and we can use
it not only in wiktionary but also in wikispecies. We have been given
the SQL stuff from the GEMET relational database. So we can change this
to fit Wikimedia. GEMET has its own XML definition.
We have therefore these important components:
*We have a SQL definition to fit the data
*We have the complete data from the GEMET available in XML format
*It provides a subset of what is required in structuring Wiktionary
It is an important resource in its own right; the GEMET data
It gives us the ability to handle open content glossary/thesauri
It gives us an idea how Wiktionary could/should evolve
The SQL defenitions are posted on Meta. I do not see how to add them on
bugzilla.
*[[:Image:Gemet.sql]]
*[[:Image:GEMET_status.sql]]
*[[:Image:GEMET relation type.sql]]
Thanks,
GerardM
Before going ahead to take this example Italian has a definite word for
rat and topo doesn't mean rat.
topo = mouse
ratto/topo di fogna/in coll. language "topone" = rat
topo d'acqua = this is a kind of rat and not of mouse
topo becomes rat only in combination with other words, but never alone.
To confirm this I just asked a colleague of mine (as even translators
can be wrong)
There is a problem of this kind but normally this happens when for
example there is no translation for a subspecies into the other language.
I don't have an exact example now, but I am sure, we will need it.
This is also a reason for me to work in lists as when asking colleagues
to check the cross translations normally these things come out easily.
Going back to work. - For now I'll read the messages but answer only as
soon as I have finished my job, sorry.
Ciao, Sabine
--
Sabine Cretella
s.cretella(a)wordsandmore.it
www.wordsandmore.it
Meetingplace for translators
www.wesolveitnet.com
Because I am so far behind on everything, I have been unable to study
this issue to my satisfaction. I hope to return to it soon. In the
meantime, I just wanted to say that I found the discussion
interesting, and I advise everyone to move forward cautiously and
thoughtfully. Not very helpful, I know, but so long as we do that, I'm
sure the right answers will become apparent.
--Jimbo
Anthere, Jimbo,
At this moment, we are in the position of getting the cooperation of all
kinds of outside people with regard to Wiktionary. This is documented to
a large extend on META and on the wiktionary-l mailinglist. Because of
the quality of the data that we are allowed to enter and because of the
quality of the persons that are involved, it would really help if we are
able to import but also export wiktionary data using XML structures.
The benefits are:
*We will open up to other open dictionary content for inclusion in
wiktionary.
*We will open up our content to other intrested parties.
*The wiktionary content will not be only in our own "proprietary" format.
*We will be able to import data from one wiktionary in the next. This
will not only enhance the quality of the wiktionaries; it will also
enhance the reputation of the wiktionaries.
*We will prevent the duplication of effort. Much effort now goes in
doing the same job over and over again. An example; the word
"Nederlands" in nl:wiktionary has 68 exact translations; these words
exist as well. Technically these words can be copied to ALL other
wiktionaries. This is not possible at the moment. The human effort is
huge and wasted as it can be automated.
When a consensus arrives on how to do this, it will also mean that some
programming will be required. Without your backing, I expect that
nothing is going to happen. When we have arrived at a road map, it will
mean that a lot of work will be needed to make the wiktionary data
conformant for inclusion in the new scheme and have the technique to get
this done. We cannot reasonably ask this of the wiktionary community and
the wikitech community when this road map is not co-owned by the
wikimedia board.
Questions:
*Acknowledge that you will consider this as being strategic to the
development of Wiktionary.
*Give a timeframe in which we will know that we can plan with a
reasonable chance of getting things implemented. (basically when will we
have an answer to the first question)
Thanks,
Gerard Meijssen (GerardM)
Reacting toAndrew Dunbar.
A word may have several meanings in a language. Each meaning has its own
definition. Your "topo" would propably best translate to rodent which
includes both mice and rats. When there is no exact word or phrase for
mouse in Italian, it should not be only translated with the Italian
topo. I am sure a description in Italian for a mouse is possible.
At this moment in time we do not have a new quick system. What we are
discussing is the need for publishing our content in a re-usable
intermediary way like XML this should allow us to publish our current
content. How this XML will be used is what you talk about and you are
right it should be used with care. However, an en:wikionary English
word, its pronounciation, its translations its usage can all be validly
used in nl:wiktionary, the definition of the meanings the etymology need
translation. We can make an interwiki to en: and we can find these for
the time we have not done that yet.
It is not only {{lang}} but also {{-trans-}} ((-syn-}} {{-ant-}} etc.
that will get their local meaning in the local setup. This is not ideal.
These codes and their associated content should end up in a proper
database so that all of this can be done by the software.
Now when a good non en:wictionary editor starts with a word and comes up
with inaccuracies in the en:wiktionary, would you not like to know about
that? Would it not be valuable to you that you can benefit from the work ?
Articles on wiktionary in META can be found by their inclusion in the
category:wiktionary.
As to copying and pasting, this is the technique that is currently open
to us. We do not have something better. Because of copy and paste I was
able to add loads of translations to English, Japanese, Vietnamese words.
Opening up the wiktionary content will be hard. That is why I want us to
discuss it first before we commit to it. Once we decide that we need
this, the database will change, the way we enter content will change,
much content needs to be revisited. All this while the basic content
stays the same. We still want all the data that we have, but we will be
able to share it.
I have asked the wikimedia board to consider open content strategic;
this is one way of eventually getting developer attention.
There are open content English on-line dictionaries bigger than
wiktionary that use XML, would it not be great if we could cooperate ??
Use their researched content and them using our content ??
One aim for content could be to have a definition in wiktionary for all
Open Office words...
Thanks,
GerardM