Ive done a bit of work translating to the best of my crude ability -- some of the basics for the arabiv wikipedia -- if anyone can read Arabic and would like to proof, please do..
http://ar.wikipedia.org/wiki/Main_Page http://ar.wikipedia.org/wiki/Translated_for_system_functions
I think it will come along nicely --- there are some issues with links.. particularly if there are too many langalinks at top, there will be some overlap ... nothing worth explaining or dealing with yet... the place needs some basic electrical wiring before we can start looking for furtniture....
There is also the issue with Arabic that Unicode might not be able to keep up with its morphology... Ill report more later, but apparently Arabic -as rendered by unicode is not quite up to grade in terms of the rules for morphing letters depending on their placement -- the Unicode Unidate has a special patch that makes do with arabic, but wont quite accomodate some of the other more morphic scripts out there.. Ill have a full report later, but it helps to plan for the future, especially where the issue of the base-encoding for an entire pedia is concerned... Itll be an open issue ... whether Unicode is the direction most Arabic dev is going...
Anyway... enough for today...
-S-
p.s. -- what happened to any text that was leftover from the phase 1 ar.wiki?
__________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
steve vertigo wrote:
There is also the issue with Arabic that Unicode might not be able to keep up with its morphology... Ill report more later, but apparently Arabic -as rendered by unicode is not quite up to grade in terms of the rules for morphing letters depending on their placement
Are you sure the problem is with Unicode itself, and not with your browser or operating system? Unicode doesn't "render" anything, Unicode only specifies what codepoint corresponds to what letter. I don't know any Arabic, but if I type random Arabic letters in Windows 2000 Notepad, they are indeed rendered as different shapes depending on whether each letter is isolated, intial, medial, or terminal. It is not Unicode itself that does this, it should be done by your operating system.
Timwi
TImwi wrote Are you sure the problem is with Unicode
itself, and
not with your browser or operating system? Unicode doesn't "render" anything, Unicode only specifies what codepoint corresponds to what letter. I don't know any Arabic, but if I type random Arabic letters in Windows 2000 Notepad, they are indeed rendered as different shapes depending on whether each letter is isolated, intial, medial, or terminal. It is not Unicode itself that does this, it should be done by your operating system.
Se�or Pablo wrote some smart clarification on the issue already :: "Unicode is already widely used to write Arabic. The problems are mainly on the bidi side of things (when portions of latin script are mixed inside an arabic flow); I haven't heard of any problem regarding the shape of letters; and if needed, then ZWJ and ZWNJ can be used (Farsi language does, for example)."
There is an summary here by Markus Kuhn: http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html -S-
Excerpt: "Why are there no Indic or Syriac glyphs in the ucs-fonts package?"
"In European and East Asian scripts, each Unicode character can be represented by a single graphical shape ("glyph"). The X11 font system is entirely built around the idea that there is a one-to-one relationship between characters and glyphs, which works fine for Latin, Greek, Cyrillic, Hebrew, Han, Hiragana, Katakana, Hangul, etc. However, things are far more complicated for handwritten cursive scripts such as Arabic, Syriac and the various Indic scripts (Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, etc.). For these scripts, the sequence of values ("characters") encoded in a Unicode string (which usually corresponds to the sequence of keystrokes during entry and the sequence of phonemes when speaking) first has to be converted into a sequence of graphical symbols ("glyphs") as they are found in a font, before a string can be displayed. "
"...The Unicode standard does contain encoding ranges for a simple scheme of Arabic glyphs, the "Arabic Presentation Forms". This was possible, because for Arabic there is a reasonably good consensus among font designers on how many glyphs are actually necessary for proper rendering of Arabic text, even though some argue that for really high-quality typesetting the Unicode collection of Arabic presentation forms is not sufficient. For Indic scripts on the other hand, there seems no consensus among font designers, which glyphs are actually necessary as this can vary significantly across different font styles."
__________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com
wikitech-l@lists.wikimedia.org