Hi,
Thank you for reading my post. It is a very "general" question I have to ask here... I hope it is not such a bad place where to post it...
In Wikipedia's "http://en.wikipedia.org/robots.txt" file, one can read: ------------------------------------------------------------------------------------------------------ # zh: # https://bugzilla.wikimedia.org/show_bug.cgi?id=5104 Disallow: /wiki/Wikipedia:åˆ é™¤æŠ•ç¥¨/ä¾µæƒ Disallow: /wiki/Wikipedia:%E5%88%A0%E9%99%A4%E6%8A%95%E7%A5%A8/%E4%BE%B5%E6%9D%83 ------------------------------------------------------------------------------------------------------
Can you explain me why one can find: - strange characters like å or - and codes like: %E5 at the same time...?
Question 1: is it due to a kind of "negligence"? Question 2: what are these codes %E5, %99, %A4...? Could you tell me where to find a good table which would tell me for a given code what is the corresponding character...?
Thanks in advance for your precious help! :-) (And sorry for being so ignorant about that stuff). Sincerely, -- Lmhelp
In Wikipedia's "http://en.wikipedia.org/robots.txt" file, one can read:
# zh: # https://bugzilla.wikimedia.org/show_bug.cgi?id=5104 Disallow: /wiki/Wikipedia:å é¤æ票/ä¾µæ Disallow: /wiki/Wikipedia:%E5%88%A0%E9%99%A4%E6%8A%95%E7%A5%A8/%E4%BE%B5%E6%9D%83
As you can see from the abbreviation ZH this is Chinese text. And the encoding of the file is UTF-8. If you view the file in your browser you have to make sure it uses the correct encoding. Though you will only see the Chinese characters if you have a proper font installed on your PC.
hth Frank
On Tue, Mar 23, 2010 at 6:59 AM, lmhelp lmbox@wanadoo.fr wrote:
Hi,
Thank you for reading my post. It is a very "general" question I have to ask here... I hope it is not such a bad place where to post it...
In Wikipedia's "http://en.wikipedia.org/robots.txt" file, one can read:
# zh: # https://bugzilla.wikimedia.org/show_bug.cgi?id=5104 Disallow: /wiki/Wikipedia:åˆ é™¤æŠ•ç¥¨/ä¾µæ ƒ Disallow: /wiki/Wikipedia:%E5%88%A0%E9%99%A4%E6%8A%95%E7%A5%A8/%E4%BE%B5%E6%9D%83
Can you explain me why one can find:
- strange characters like å or
- and codes like: %E5
at the same time...?
Question 1: is it due to a kind of "negligence"? Question 2: what are these codes %E5, %99, %A4...? Could you tell me where to find a good table which would tell me for a given code what is the corresponding character...?
Thanks in advance for your precious help! :-) (And sorry for being so ignorant about that stuff). Sincerely, -- Lmhelp
-- View this message in context: http://old.nabble.com/Web-page-source---%22strange%22-characters-tp27999218p... Sent from the WikiMedia General mailing list archive at Nabble.com.
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
It's called URL or percent encoding.
-Chad
In Wikipedia's "http://en.wikipedia.org/robots.txt" file, one can read:
# zh: # https://bugzilla.wikimedia.org/show_bug.cgi?id=5104 Disallow: /wiki/Wikipedia:å é¤æ票/ä¾µæ
Should look:
# zh: # https://bugzilla.wikimedia.org/show_bug.cgi?id=5104 Disallow: /wiki/Wikipedia:????/??
nakohdo wrote:
Should look:
# zh: # https://bugzilla.wikimedia.org/show_bug.cgi?id=5104 Disallow: /wiki/Wikipedia:????/??
Oops, got mangled by my email program. Another try...
# zh: # https://bugzilla.wikimedia.org/show_bug.cgi?id=5104 Disallow: /wiki/Wikipedia:删除投票/侵权
Hi!
Thank you for your great answers! It helps a lot!
Just two more questions :) :
If you view the file in your browser you have to make sure it uses the correct encoding.
Where can it be checked?
What did you do, "nakohdo", to display the Chinese characters instead of the å é¤æ票/ä¾µæ sequence of characters? (Because I can really see them on my screen with the same web browser...)
Thanks again! Best regards :) , -- Lmhelp
If you view the file in your browser you have to make sure it uses the correct encoding.
Where can it be checked?
Look in the "View" menu of your browser for something like encoding settings (differs a little between browsers).
What did you do, "nakohdo", to display the Chinese characters instead of the å? é?¤æ??票/ä¾µæ? sequence of characters? (Because I can really see them on my screen with the same web browser...)
The browser can normaly recognise the encoding of a HTML or XML file and display it correctly. The robots.txt file you mentioned in your first posting doesn't provide a mechanism for telling its encoding so the browser has to guess or take the defaul settings.
Try opening the robots.txt in your browser and change the encoding to UTF-8. You could also try downloading the file (right click, "Save target as...") and opening it with a Unicode capable text editior, e.g. Windows own Notepad.
See http://en.wikipedia.org/wiki/Character_encoding for further information.
hth Frank
mediawiki-l@lists.wikimedia.org