Hello all,
on the Italian Wikipedia it has recently been discussed to make a bot that is used for some maintenance work additionally convert characters like letters with accents to HTML-entities (è etc.). Does this make any sense? And if so, why was it.wikipedia converted to UTF-8 at all? And finally, if this really makes sense, shouldn't this be handled by the software instead of cluttering the edit window with &foobar;s that most people don't even understand?
It has been argued that those characters can't be entered directly e.g. with an American keyboard layout, but in my opinion this is at best a reason for converting the entities to the corresponding characters, not the other way round.
Sorry for taking this here, but the thought of having soon thousands of pages interspersed with cryptic entities was rather shocking for me ;-) and thus I hope to find some expert answers as soon as possible. Just imagine what the German and French Wikipedias, just to name those, would look like with lots of ä ß ç and the like in the middle of words.
If this has already been discussed, could someone please point me to the relevant pages/threads. Thank you,
[[en:User:Leonard Vertighel]] [[it:Utente:Leonard Vertighel]]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am Sünndag 18 Juli 2004 13:56 schrieb Leonard Vertighel:
on the Italian Wikipedia it has recently been discussed to make a bot that is used for some maintenance work additionally convert characters like letters with accents to HTML-entities (è etc.). Does this make any sense? And if so, why was it.wikipedia converted to UTF-8 at all?
an interwiki bot with a frensh AOL IP? The same was on plattdüütsch and frensh... Allways only one edit with one IP - no chance to catch him :(
- -- Kai F. Lahmann
1zu160-Bahner http://www.1zu160.net
On Sun, 18 Jul 2004 13:56:18 +0200, Leonard Vertighel leonard.vertighel@web.de wrote:
on the Italian Wikipedia it has recently been discussed to make a bot that is used for some maintenance work additionally convert characters like letters with accents to HTML-entities (è etc.). It has been argued that those characters can't be entered directly e.g. with an American keyboard layout...
It does seem the wrong way round to me. The edit box should be easy to read, and HTML entities are not going to help that. If people have a problem typing certain letters, it might be helpful to put a copy of those into the [[it:MediaWiki:Copyrightwarning]] page. This is already done on the Maori and German Wikipedias. See http://mi.wikipedia.org/wiki/MediaWiki:Copyrightwarning for example. This means the letters are shown at the end of every edit page and can be copied and pasted by those who can't type them on their keyboard.
Angela.
Angela_ wrote:
It does seem the wrong way round to me. The edit box should be easy to read, and HTML entities are not going to help that. If people have a problem typing certain letters, it might be helpful to put a copy of those into the [[it:MediaWiki:Copyrightwarning]] page. This is already done on the Maori and German Wikipedias. See http://mi.wikipedia.org/wiki/MediaWiki:Copyrightwarning for example. This means the letters are shown at the end of every edit page and can be copied and pasted by those who can't type them on their keyboard.
Angela.
The same idea for speecial characters is used in the Spanish Wikipedia, but we have improved it with some JavaScript functions borrwed from the edit toolbar. With these functions the character is directly inserted in the editing box. Take a look at:
http://es.wikipedia.org/wiki/MediaWiki:Copyrightwarning
and see it in action while editing any page on es.wikipedia.
Please consider using it on any Wikipedia with special character needs. It has been greatly appreciated on es.
Manuel G. R.
Am Sonntag, 18. Juli 2004 18:14 schrieb Manuel Gomez Rojo:
but we have improved it with some JavaScript functions borrwed from the edit toolbar. With these functions the character is directly inserted in the editing box. Take a look at:
Looks neat, with FireFox it's really great. (How come with Opera there's always this extra line between toolbar and edit box?)
Regarding my original question: So there is no technical reason for converting special characters to HTML entities in UTF-8 Wikipedias? Actually I hoped so, because that makes editing really awful for some languages.
[[en:User:Leonard Vertighel]] [[it:Utente:Leonard Vertighel]]
Leonard Vertighel wrote:
Am Sonntag, 18. Juli 2004 18:14 schrieb Manuel Gomez Rojo:
but we have improved it with some JavaScript functions borrwed from the edit toolbar. With these functions the character is directly inserted in the editing box. Take a look at:
Looks neat, with FireFox it's really great. (How come with Opera there's always this extra line between toolbar and edit box?)
Regarding my original question: So there is no technical reason for converting special characters to HTML entities in UTF-8 Wikipedias? Actually I hoped so, because that makes editing really awful for some languages.
[[en:User:Leonard Vertighel]] [[it:Utente:Leonard Vertighel]]
I think the only relevant reason to switch to UTF-8 is not showing #xxxx; codes for international scripts, so you are totally right.
On Sun, 18 Jul 2004 19:44:04 +0200, Manuel Gomez Rojo mgrojo@ya.com wrote:
I think the only relevant reason to switch to UTF-8 is not showing #xxxx; codes for international scripts, so you are totally right.
Well, in addition to that, it also lets you use the UTF-8 character in the article title, and probably enables full-text search as well.
Am Sonntag, 18. Juli 2004 19:46 schrieb Fennec Foxen:
Well, in addition to that, it also lets you use the UTF-8 character in the article title, and probably enables full-text search as well.
This hint just led me to the discovery that a fulltext search for "è" does not find "è". One more reason _not_ to use entities, IMHO. (Who would enter "è" as a search term? Obviously, the problem of typing in the special characters remains, but entities won't solve it.)
[[en:User:Leonard Vertighel]] [[it:Utente:Leonard Vertighel]]
On Sunday 18 July 2004 20:30, Leonard Vertighel wrote:
Am Sonntag, 18. Juli 2004 19:46 schrieb Fennec Foxen:
Well, in addition to that, it also lets you use the UTF-8 character in the article title, and probably enables full-text search as well.
This hint just led me to the discovery that a fulltext search for "è" does not find "è". One more reason _not_ to use entities, IMHO. (Who would enter "è" as a search term? Obviously, the problem of typing in the special characters remains, but entities won't solve it.)
About the only reason for using entities where there isUTF-8 is for characters which are otherwise not visible ( ).
Angela_ wrote:
On Sun, 18 Jul 2004 13:56:18 +0200, Leonard Vertighel leonard.vertighel@web.de wrote:
on the Italian Wikipedia it has recently been discussed to make a bot that is used for some maintenance work additionally convert characters like letters with accents to HTML-entities (è etc.). It has been argued that those characters can't be entered directly e.g. with an American keyboard layout...
It does seem the wrong way round to me. The edit box should be easy to read, and HTML entities are not going to help that. If people have a problem typing certain letters, it might be helpful to put a copy of those into the [[it:MediaWiki:Copyrightwarning]] page. This is already done on the Maori and German Wikipedias. See http://mi.wikipedia.org/wiki/MediaWiki:Copyrightwarning for example. This means the letters are shown at the end of every edit page and can be copied and pasted by those who can't type them on their keyboard.
I agree that Leonard's approach is the wrong way round. My inclination when I see that kind of HTML entity is to replace it with the character. The fact is that these *can* be enterred from the keyboard for the languages of Western Europe, including Italian. I write regularly in French and insteat of the è I simply use Alt + 0232 for è.
Ec
Am Sonntag, 18. Juli 2004 22:12 schrieb Ray Saintonge:
Angela_ wrote:
On Sun, 18 Jul 2004 13:56:18 +0200, Leonard Vertighel leonard.vertighel@web.de wrote:
on the Italian Wikipedia it has recently been discussed ...
It does seem the wrong way round to me. ...
I agree that Leonard's approach is the wrong way round. ...
Hey, this wasn't MY approach ;-) I agreed with Angela's opinion from the beginning. I just wanted to make sure there wasn't some important technical issue I was missing. But fortunately the replies I got so far are all in favour of converting entities to characters, if anything. I hope to convince my fellow Italian Wikipedians that this is indeed the better solution.
[[en:User:Leonard Vertighel]] [[it:Utente:Leonard Vertighel]]
Ray Saintonge wrote:
I write regularly in French and insteat of the è I simply use Alt + 0232 for è.
Are you sure you write it regularly enough that you find that a usable solution? Wouldn't using a suitable keyboard layout be way easier?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am Sünndag 18 Juli 2004 13:56 schrieb Leonard Vertighel:
on the Italian Wikipedia it has recently been discussed to make a bot that is used for some maintenance work additionally convert characters like letters with accents to HTML-entities (è etc.). Does this make any sense? And if so, why was it.wikipedia converted to UTF-8 at all?
an interwiki bot with a frensh AOL IP? The same was on plattdüütsch and frensh... Allways only one edit with one IP - no chance to catch him :(
- -- Kai F. Lahmann
1zu160-Bahner http://www.1zu160.net
Leonard Vertighel wrote:
on the Italian Wikipedia it has recently been discussed to make a bot that is used for some maintenance work additionally convert characters like letters with accents to HTML-entities (è etc.). Does this make any sense? And if so, why was it.wikipedia converted to UTF-8 at all? And finally, if this really makes sense, shouldn't this be handled by the software instead of cluttering the edit window with &foobar;s that most people don't even understand?
I'm not sure I can follow your train of thought, but:
1) whether or not you use such a bot is an issue relevant to your local Wikipedia, so there should be a vote on it there, and; 2) whether or not you use such a bot is independent of whether or not the Italian Wikipedia uses UTF-8.
The Italian Wikipedia, just like all other smaller Wikipedias, was converted to UTF-8 to allow you to enter more characters than just the dreadfully limited Latin-1. With UTF-8, users can enter Cyrillic, Arabic, Chinese/Japanese, Hebrew, etc.etc.etc. Switching from Latin-1 to UTF-8 *only* introduces extra possibilities, and imposes no limitation, so it's not even a trade-off.
It has been argued that those characters can't be entered directly e.g. with an American keyboard layout, but in my opinion this is at best a reason for converting the entities to the corresponding characters, not the other way round.
Firstly, you are right. Secondly, the argument isn't even valid. Just because some people can't enter an è because they use such a limited keyboard layout (which is their own fault), doesn't mean all instances of è need to be changed into è ... nor does it mean anything else of the kind, really.
(Personally, I wouldn't even mind a wiki syntax for entering special characters, but one that is replaced with the real character upon save, and if we do that, we might as well replace è etc. with è etc.)
(Incidentally, I've just thought of a way for people to enter the è properly if they know that its HTML code is è. Just use the preview and then copy & paste it back into the edit box. :) )
If this has already been discussed, could someone please point me to the relevant pages/threads.
I'm not aware of any past instance of this crackpot theory. ;-)
Timwi
wikitech-l@lists.wikimedia.org