Hi, we have a strange behaviour on the nap wikipedia. Strange enough the categories don't work well - the first one works - the others not ... hmmm
http://nap.wikipedia.org/wiki/Acerno
The categories are inserted like this:
[[category:giugrafia]] [[category:Comune d<nowiki>''</nowiki>a Campania]] [[category:Comune d<nowiki>''</nowiki>a pruvincia 'e Salierno]]
now we tried the following:
changed the category to Categoria, Category, categoria took out the nowiki-tag
The only thing that works is take out the '' + the nowiki tag - so we get a category, but the resulting spelling is then plain wrong. It worked well up to the last version of the Mediawiki software ... so something must have been changed there.
One thing I just tried out: also wiki-links do not work anymore - this means we cannot create correct pages anymore since <nowiki> does not seem to be allowed within a wiki link. This means problems with nap.wikipedia, but not only: also wiktionary will have (maybe already has) problems with this (we have approx. 3000 Neapolitan words (if I am right) in the Italian wiktionary - some of these probably use the '')
Could someone please look into this?
Thank you!
Best, Sabine Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
Sabine Cretella wrote:
[[category:Comune d<nowiki>''</nowiki>a Campania]]
Get rid of the <nowiki>.
The only thing that works is take out the '' + the nowiki tag - so we get a category, but the resulting spelling is then plain wrong. It worked well up to the last version of the Mediawiki software ... so something must have been changed there.
Yes, the bug was fixed that allowed <nowiki> in there.
-- brion vibber (brion @ pobox.com)
Brion Vibber schrieb:
Sabine Cretella wrote:
[[category:Comune d<nowiki>''</nowiki>a Campania]]
Get rid of the <nowiki>.
The only thing that works is take out the '' + the nowiki tag - so we get a category, but the resulting spelling is then plain wrong. It worked well up to the last version of the Mediawiki software ... so something must have been changed there.
Yes, the bug was fixed that allowed <nowiki> in there.
hmmm ... so that was a bug we needed on nap.wikipedia (it was not a bug for us, but a feature we need) ... if I take it out the apostrophs don't show up anymore - the category/wiki link is shown half in italics and it is no category/wiki link anymore - if I take out the apostrophs the text is plain wrong ...
We really do need '' as normal chars and not to initiate text written in italics ... we must be able to include this in a wiki link in some way and also in a category link ... is there a way to get a different tag for that - something like <apostrophs> </apostrophs> or whatever?
Best, Sabine Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
You can use '&39; instead of ''.
Rotem Liss
Rotem Liss mail@rotemliss.com http://www.rotemliss.com
Rotem Liss schrieb:
You can use '&39; instead of ''.
Rotem Liss
Thank you Rotem Liss, this should solve our problem - at least I hope so.
Best, Sabine Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
Sabine Cretella wrote:
We really do need '' as normal chars and not to initiate text written in italics ... we must be able to include this in a wiki link in some way and also in a category link ... is there a way to get a different tag for that - something like <apostrophs> </apostrophs> or whatever?
Since '' can't be round-tripped reliably in wikitext, it's most likely that we'll have to make '' explicitly forbidden in wiki page titles in the course of fixing parser bugs.
I strongly recommend you find another way of representing page titles that doesn't require the use of markup.
-- brion vibber (brion @ pobox.com)
Brion Vibber schrieb:
Sabine Cretella wrote:
We really do need '' as normal chars and not to initiate text written in italics ... we must be able to include this in a wiki link in some way and also in a category link ... is there a way to get a different tag for that - something like <apostrophs> </apostrophs> or whatever?
Since '' can't be round-tripped reliably in wikitext, it's most likely that we'll have to make '' explicitly forbidden in wiki page titles in the course of fixing parser bugs.
I strongly recommend you find another way of representing page titles that doesn't require the use of markup.
You mean a language should change its ortography to comply with Mediawiki software? That sounds really strange ... and if instead of Neapolitan we talked about English, German or any other of the "big" languages? Must these then also comply to the software or do we then have the momentum when a way is being searched to have these languages write their titles correctly? Hmmm ...
So what would you do if the English language had this requirement? Would you also say the whole world then must misspell because the software does not support it?
Ciao, Sabine Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
Hoi, I do not understand the need for markup in titles. I certainly believe that when markup prevents correct orthography in titles in any language, it means that markup cannot be handled in titles and consequently the markup in titles should not be allowed.
As MediaWiki is about supporting all languages, markup in titles is clearly at best a nice to have and certainly not a must have. Representing orthographies is indeed a MUST have feature.
PS I cross-post to the Wiktionary list. As they have as their aim to include all words in all languages, it affects ALL wiktionary projects.
Thanks, GerardM
On 5/28/06, Sabine Cretella sabine_cretella@yahoo.it wrote:
Brion Vibber schrieb:
Sabine Cretella wrote:
We really do need '' as normal chars and not to initiate text written in italics ... we must be able to include this in a wiki link in some way and also in a category link ... is there a way to get a different tag for that - something like <apostrophs> </apostrophs> or whatever?
Since '' can't be round-tripped reliably in wikitext, it's most likely that we'll have to make '' explicitly forbidden in wiki page titles in the course of fixing parser bugs.
I strongly recommend you find another way of representing page titles that doesn't require the use of markup.
You mean a language should change its ortography to comply with Mediawiki software? That sounds really strange ... and if instead of Neapolitan we talked about English, German or any other of the "big" languages? Must these then also comply to the software or do we then have the momentum when a way is being searched to have these languages write their titles correctly? Hmmm ...
So what would you do if the English language had this requirement? Would you also say the whole world then must misspell because the software does not support it?
Ciao, Sabine Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 5/27/06, GerardM gerard.meijssen@gmail.com wrote:
I do not understand the need for markup in titles. I certainly believe that when markup prevents correct orthography in titles in any language, it means that markup cannot be handled in titles and consequently the markup in titles should not be allowed.
Understand that here you are advocating the requirement that '' and ''' be forbidden from titles (since they are markup). That's the easy answer, and I would certainly agree with that under ideal circumstances.
I don't expect you guys will let me do that, though.
As MediaWiki is about supporting all languages, markup in titles is clearly at best a nice to have and certainly not a must have. Representing orthographies is indeed a MUST have feature.
Here I'm guessing you mean that we should have a workaround, where '' and ''' can be used in titles, but (hopefully, maybe) they will be (somehow?) prevented from being interpreted as markup in things that are titles (consistently, we hope).
That's tricky to do right, as it involves some combination of guessing what people intended (very unreliable and inconsistent, probably) or forcing people to follow obscure escaping conventions (very unreliable to get people to follow them, especially newbies).
-- brion vibber (brion @ pobox.com)
Hoi, I would not advocate a requirement from not having '' in titles. I would advocate the interpretation of titles for the inclusion of markup if this is what it takes to prevent for correctly spelled titles to be presented in a decidedly wrong way.
Replacing the double apostroph with one obscure character would be unacceptable from a usability point of view. The Dutch language has the ij character. This is typically written as both an i and a j. The consequence is that when a word like ijsselmeer is at the start of a sentence, it is always spelled incorrectly as Ijsselmeer. All text processors do this incorrectly including when the Dutch language is indicated as the language.
What I understand of the usage of the double apostrophe is, that it is always used in the middle of a word. This means that this could be fixed by not interpreting "markup" when it is in the middle of a word.
So to repeat, the '' is a valid and necessary structure in titles in the Neopolitan language and it is necessary to allow for its inclusion in titles and text. The <nowiki> strategy is a "make work" strategy that used to work. It was essential to allow for correct orthography; the inclusion of the new functionality broke the Neopolitan wikipedia and consequently needs to be ammended or removed.
Thanks, GerardM
On 5/28/06, Brion Vibber brion.vibber@gmail.com wrote:
On 5/27/06, GerardM gerard.meijssen@gmail.com wrote:
I do not understand the need for markup in titles. I certainly believe that when markup prevents correct orthography in titles in any language, it means that markup cannot be handled in titles and consequently the markup in titles should not be allowed.
Understand that here you are advocating the requirement that '' and ''' be forbidden from titles (since they are markup). That's the easy answer, and I would certainly agree with that under ideal circumstances.
I don't expect you guys will let me do that, though.
As MediaWiki is about supporting all languages, markup in titles is clearly at best a nice to have and certainly not a must have. Representing orthographies is indeed a MUST have feature.
Here I'm guessing you mean that we should have a workaround, where '' and ''' can be used in titles, but (hopefully, maybe) they will be (somehow?) prevented from being interpreted as markup in things that are titles (consistently, we hope).
That's tricky to do right, as it involves some combination of guessing what people intended (very unreliable and inconsistent, probably) or forcing people to follow obscure escaping conventions (very unreliable to get people to follow them, especially newbies).
-- brion vibber (brion @ pobox.com) _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 5/28/06, GerardM gerard.meijssen@gmail.com wrote:
What I understand of the usage of the double apostrophe is, that it is always used in the middle of a word. This means that this could be fixed by not interpreting "markup" when it is in the middle of a word.
Do you mean in general, or only for page titles? Changing that rule in general would surely break heaps of articles across all the Wikipedias. I imagine various products would have italics for half a word and so forth. But perhaps it could be enabled on a per-Wiki basis?
Other possibilities might be inclusion of a nothing-character (some'<nothing/>'word, or even some'<!---->'word) or a flag whereby the page title maps some other character/sequence of characters (such as the double quote) onto the taboo sequence (two apostrophes). This latter suggestion would be like what we were discussion a month or two ago about fixing lower case characters at the start of titles.
In fact it could be generalised:
{{Titlemap|"|''}} (titlemap|double quote|two apostrophes) could, for example, request that that substitution take place in the title. All pages linking to this one would still have to use double quotes rather than two apostrophes, however. It's a kludge though.
Failing that, I can't think of much. I'm actually surprised that there isn't a way of easily producing the '' sequence other than <nowiki>. I suppose it's a side-effect of the less-than-optimal choice of '' for italics and ''' for bold - if two different characters had been chosen (say __ for bold), then you could produce a '' with '''' and you could produce a __ with ____. Then, bold italics would be either __''bolditalics''__ or ''__italicsbold__'' . But anyway.
Steve
Hoi, I definetly mean not only but also changing the rule for both titles and text. The suggestion of allowing for a per-wiki basis would currently break where Interwikis link to a project like the Neapolitan.
This could however be language dependent behaviour; the Multilingual MediaWiki might be able to allow for this kind of specific behaviour. It would however necessitate the indication of an embedded text being in a specific language. This is in essence a great idea because we do have embedded text in other languages.
Thanks, GerardM
On 5/28/06, Steve Bennett stevage@gmail.com wrote:
On 5/28/06, GerardM gerard.meijssen@gmail.com wrote:
What I understand of the usage of the double apostrophe is, that it is always used in the middle of a word. This means that this could be fixed by not interpreting "markup" when it is in the middle of a word.
Do you mean in general, or only for page titles? Changing that rule in general would surely break heaps of articles across all the Wikipedias. I imagine various products would have italics for half a word and so forth. But perhaps it could be enabled on a per-Wiki basis?
Other possibilities might be inclusion of a nothing-character (some'<nothing/>'word, or even some'<!---->'word) or a flag whereby the page title maps some other character/sequence of characters (such as the double quote) onto the taboo sequence (two apostrophes). This latter suggestion would be like what we were discussion a month or two ago about fixing lower case characters at the start of titles.
In fact it could be generalised:
{{Titlemap|"|''}} (titlemap|double quote|two apostrophes) could, for example, request that that substitution take place in the title. All pages linking to this one would still have to use double quotes rather than two apostrophes, however. It's a kludge though.
Failing that, I can't think of much. I'm actually surprised that there isn't a way of easily producing the '' sequence other than <nowiki>. I suppose it's a side-effect of the less-than-optimal choice of '' for italics and ''' for bold - if two different characters had been chosen (say __ for bold), then you could produce a '' with '''' and you could produce a __ with ____. Then, bold italics would be either __''bolditalics''__ or ''__italicsbold__'' . But anyway.
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 5/28/06, GerardM gerard.meijssen@gmail.com wrote:
Hoi, I definetly mean not only but also changing the rule for both titles and text. The suggestion of allowing for a per-wiki basis would currently break where Interwikis link to a project like the Neapolitan.
Depends on how it's implemented. If the rule was that links to " (double quotes) were treated as links to '' (two apostrophes), then nothing would be broken - other wikipedias should simply link to the version with two apostrophes. In practice, it's probably fairly easy to set up redirects for all articles with two apostrophes, much as on EN wikipedia, it's pretty common to set up redirects with and without accents like circumflexes for all foreign words.
Steve
Hoi, You have to allow for the way the pywikipedia bot works. There is a whole infrastructure around MediaWiki. and interpreting these things differently depending will create breakage. Depending on what software you use two double quotes and two two apostrophes are displayed in a different way. It is therefore important to keep things as they are in that language. It is imho much better to have language dependent behaviour.
Thanks, GerardM
On 5/28/06, Steve Bennett stevage@gmail.com wrote:
On 5/28/06, GerardM gerard.meijssen@gmail.com wrote:
Hoi, I definetly mean not only but also changing the rule for both titles and text. The suggestion of allowing for a per-wiki basis would currently break where Interwikis link to a project like the Neapolitan.
Depends on how it's implemented. If the rule was that links to " (double quotes) were treated as links to '' (two apostrophes), then nothing would be broken - other wikipedias should simply link to the version with two apostrophes. In practice, it's probably fairly easy to set up redirects for all articles with two apostrophes, much as on EN wikipedia, it's pretty common to set up redirects with and without accents like circumflexes for all foreign words.
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Hi,
I didn't understand exactly what you're proposing. Can you clarify?
On 5/28/06, GerardM gerard.meijssen@gmail.com wrote:
Hoi, You have to allow for the way the pywikipedia bot works. There is a whole infrastructure around MediaWiki. and interpreting these things differently depending will create breakage.
Interpreting which things differently? What is the piwikipedia bot?
Depending on what software you use two double quotes and two two apostrophes are displayed in a different way. It is therefore
Yes, I'm not necessarily suggesting every displaying a double quote as if it was two apostrophes.
important to keep things as they are in that language. It is imho much better to have language dependent behaviour.
What is "language dependant behaviour" in this context exactly?
Steve
Steve Bennett wrote:
Hi,
I didn't understand exactly what you're proposing. Can you clarify?
On 5/28/06, GerardM gerard.meijssen@gmail.com wrote:
Hoi, You have to allow for the way the pywikipedia bot works. There is a whole infrastructure around MediaWiki. and interpreting these things differently depending will create breakage.
Interpreting which things differently? What is the piwikipedia bot?
The Pywikipedia bot is the (python) software that is used by many if not most of the bot operators on MediaWiki projects. When the MediaWiki software takes a different tack, the pywikipedia software has to follow. What I am suggesting is that changing the behaviour of MediaWiki can cause breakage in the bot environment (it did in the past).
Depending on what software you use two double quotes and two two apostrophes are displayed in a different way. It is therefore
Yes, I'm not necessarily suggesting every displaying a double quote as if it was two apostrophes.
important to keep things as they are in that language. It is imho much better to have language dependent behaviour.
What is "language dependant behaviour" in this context exactly?
Steve
At this moment, a Wikimedia project is associated with a project code. This code has a passing resemblance with language codes. At this moment, for a project there is an association with language aspects; for instance the Hebrew, Arab et al are left to right where we typically expect them to be right to left. Gangleri, Nikerabbit and Rob Church have spend quite a lot of effort in fixing many issues in MediaWiki to deal with those. They are doing a stellar job making these necessary fixes to the MediaWiki usability for the left to right scripts. These are however primarily script dependent fixes. There are other issues where there the fix is necessary on the level of the language.
There are also issues where there is a need for language dependant behaviour; there will be a large improvement with the Multilingual MediaWiki where it will be possible to associate an article with a language. When we can associate the article with a language, the next step would be to indicate a text within an article with a language. An example how this can already be done can be found on the WiktionaryZ main page where several of the language codes have been associated with language dependant. Here you find for instance that the Assamese text is shown in a different way from what is standard.
Yesterday there was a post of people translating English Wikipedia content to Navaho. When we mark content that is decidedly NOT English, it will make the machine translation engine more effective. I would advocate identifying text to be identified as not English content when we had the Multilingual MediaWiki, however at this moment it may be too early to do this.
Thanks, GerardM
Steve Bennett wrote:
On 5/28/06, GerardM gerard.meijssen@gmail.com wrote:
Hoi, I definetly mean not only but also changing the rule for both titles and text. The suggestion of allowing for a per-wiki basis would currently break where Interwikis link to a project like the Neapolitan.
Depends on how it's implemented. If the rule was that links to " (double quotes) were treated as links to '' (two apostrophes),
That way lies madness.
-- brion vibber (brion @ pobox.com)
On 5/27/06, GerardM gerard.meijssen@gmail.com wrote:
I do not understand the need for markup in titles. I certainly believe that when markup prevents correct orthography in titles in any language, it means that markup cannot be handled in titles and consequently the markup in titles should not be allowed.
Understand that here you are advocating the requirement that '' and ''' be forbidden from titles (since they are markup). That's the easy answer, and I would certainly agree with that under ideal circumstances.
I don't expect you guys will let me do that, though.
As MediaWiki is about supporting all languages, markup in titles is clearly at best a nice to have and certainly not a must have. Representing orthographies is indeed a MUST have feature.
Here I'm guessing you mean that we should have a workaround, where '' and ''' can be used in titles, but (hopefully, maybe) they will be (somehow?) prevented from being interpreted as markup in things that are titles (consistently, we hope).
That's tricky to do right, as it involves some combination of guessing what people intended (very unreliable and inconsistent, probably) or forcing people to follow obscure escaping conventions (very unreliable to get people to follow them, especially newbies).
-- brion vibber (brion @ pobox.com)
On 5/27/06, Sabine Cretella sabine_cretella@yahoo.it wrote:
So what would you do if the English language had this requirement?
If English had had this requirement, I don't think Ward would have chosen '' and ''' for his markup a decade ago. ;)
Seriously though, while we might be able to accommodate this it's going to be tricky to do it right. Expect frequent breakage.
-- brion vibber (brion @ pobox.com)
On 5/27/06, Sabine Cretella sabine_cretella@yahoo.it wrote:
So what would you do if the English language had this requirement?
If English had had this requirement, I don't think Ward would have chosen '' and ''' for his markup a decade ago. ;)
Seriously though, while we might be able to accommodate this it's going to be tricky to do it right. Expect frequent breakage.
-- brion vibber (brion @ pobox.com)
On Sun, May 28, 2006 at 08:19:32AM +0200, Sabine Cretella wrote:
I strongly recommend you find another way of representing page titles that doesn't require the use of markup.
You mean a language should change its ortography to comply with Mediawiki software? That sounds really strange ... and if instead of Neapolitan we talked about English, German or any other of the "big" languages? Must these then also comply to the software or do we then have the momentum when a way is being searched to have these languages write their titles correctly? Hmmm ...
I walked in a bit late... but are you suggesting that the proper orthographical representation of the language in question requires 2 successive apostrophes in the middle of a word?
Cheers, -- jra
Jay R. Ashworth wrote:
I walked in a bit late... but are you suggesting that the proper orthographical representation of the language in question requires 2 successive apostrophes in the middle of a word?
Yes, this is what she meant. You only need to look at the pages she linked to to see.
On 5/27/06, Sabine Cretella sabine_cretella@yahoo.it wrote:
We really do need '' as normal chars and not to initiate text written in italics ... we must be able to include this in a wiki link in some way and also in a category link ... is there a way to get a different tag for that - something like <apostrophs> </apostrophs> or whatever?
Just out of curiosity, what's the linguistic background to this? I'm presuming nap. is the Napoli dialect/language? And by browsing the site a little, it looks like single apostrophe (') occurs at both the start and ends of words (Chist' and 'a), as well as in the middle (p'o). It also seems that you can have an apostrohpe-final word followed by an apostrophe-initial word (Pe' 'ssapé, ncopp' 'e). Does the double apostrophe ('') have a different meaning from that - ' '? If not, would a work around simply be to add a space there (category:Comune d' 'a Campania)?
Note that all of my examples actually came out of the stub template! :)
Steve
Just out of curiosity, what's the linguistic background to this? I'm presuming nap. is the Napoli dialect/language? And by browsing the site a little, it looks like single apostrophe (') occurs at both the start and ends of words (Chist' and 'a), as well as in the middle (p'o). It also seems that you can have an apostrohpe-final word followed by an apostrophe-initial word (Pe' 'ssapé, ncopp' 'e). Does the double apostrophe ('') have a different meaning from that - ' '? If not, would a work around simply be to add a space there (category:Comune d' 'a Campania)?
Note that all of my examples actually came out of the stub template! :)
Sorry, I cannot answer extensively right now - I have more time in the afternoon.
A space inbetween would be wrong. We actually have a user who puts that space inbetween where we than have to correct. The grammatical background is that the preposition and the article become "one word" and some letters are substituted by ' --> de + la --> d''a
This is similar to the Italian rule where di + la becomes della.
So it would be wrong. Inserting '' seems to work - so we could live with that one - if people insert '' and then they find italic text normally they ask or have a look at other pages - so it creates some questions, but at least it would be a valid solution for us.
Best, Sabine Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
May be of help unicode version of ' and " ?
I included them at http://nap.wikipedia.org/wiki/Utente:Platonides. Of course, they'd need to be included at the "Inserisci" section.
The table show a quote, two quotes and the double quotes (both the ascii and unicode s). My browser doesn't show any difference (they're different at the edit time, though).
Note: I made some babel templates, but i'm not sure if the language names are those on napolitan, can you check?
------------------- "Sabine Cretella" wrote: Sorry, I cannot answer extensively right now - I have more time in the afternoon.
A space inbetween would be wrong. We actually have a user who puts that space inbetween where we than have to correct. The grammatical background is that the preposition and the article become "one word" and some letters are substituted by ' --> de + la --> d''a
This is similar to the Italian rule where di + la becomes della.
So it would be wrong. Inserting '' seems to work - so we could live with that one - if people insert '' and then they find italic text normally they ask or have a look at other pages - so it creates some questions, but at least it would be a valid solution for us.
Best, Sabine
On Sun, May 28, 2006 at 10:09:31AM +0200, Sabine Cretella wrote:
A space inbetween would be wrong. We actually have a user who puts that space inbetween where we than have to correct. The grammatical background is that the preposition and the article become "one word" and some letters are substituted by ' --> de + la --> d''a
To answer my own earlier (by a couple minutes -- I will read everything before writing anything) question, yes, there is a hard collision between the current MW wikitext markup and the Neapolitan orthography.
Yeah, that's gonna be hard. No question about it.
Cheers, -- jra
Last month, Sabine Cretella was having trouble with double apostrophes in page and category titles, since they were of course treated as wiki markup for italicizing. A workaround of using '' was suggested and seems to work. Platonides suggested using Unicode single quotes (U+2019) instead of apostrophes, and I meant to suggest using Unicode apostrophes (U+02BC). Phil Boswell suggested using an invisible "breaker" or "joiner" between the two apostrophes, such as U+200D Zero-width Joiner.
And there's at least one more possibility: today I happened to notice that there's also a Unicode double apostrophe character, U+02EE. I'm not sure, but it might be tailor-made for Neopolitan.
On 6/16/06, Steve Summit scs@eskimo.com wrote:
Last month, Sabine Cretella was having trouble with double apostrophes in page and category titles, since they were of course treated as wiki markup for italicizing. A workaround of using '' was suggested and seems to work. Platonides suggested using Unicode single quotes (U+2019) instead of apostrophes, and I meant to suggest using Unicode apostrophes (U+02BC). Phil Boswell suggested using an invisible "breaker" or "joiner" between the two apostrophes, such as U+200D Zero-width Joiner.
Allow me to correct some terminology: U+2019 is not the "Unicode single quote", It is "right single quotation mark" U+02BC is not the "Unicode apostrophe". It is "modifier letter apostrophe".
A "modifier letter" is very specifically not a punctuation mark. It is a letter and is marked as such in the Unicode database. For instance if you double-click on a word, this character should always be treated as part of the word. It is an exotic letter used for example in the Hawaiian language. Unicode specifically says "U+2019 is the preferred character for a punctuation apostrophe".
This is a basic misunderstanding which I see repeatedly so it's obviously poorly understood, even by technical people.
And there's at least one more possibility: today I happened to notice that there's also a Unicode double apostrophe character, U+02EE. I'm not sure, but it might be tailor-made for Neopolitan.
Again, there are multiple Unicode double apostrophe characters, but this one is also not punctuation but a letter: "modifier letter double apostrophe".
Andrew Dunbar (hippietrail)
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Andrew Dunbar wrote:
On 6/16/06, Steve Summit scs@eskimo.com wrote:
Platonides suggested using Unicode single quotes (U+2019) instead of apostrophes, and I meant to suggest using Unicode apostrophes (U+02BC).
Allow me to correct some terminology: U+2019 is not the "Unicode single quote", It is "right single quotation mark" U+02BC is not the "Unicode apostrophe". It is "modifier letter apostrophe".
Sure. I wasn't giving character names there (which is why I didn't capitalize them). The numeric codes made it clear which specific characters I was referring to.
A "modifier letter" is very specifically not a punctuation mark. It is a letter and is marked as such in the Unicode database. For instance if you double-click on a word, this character should always be treated as part of the word.
Aha. So what happens if you double-click on "doesn't"? :-)
And there's at least one more possibility: today I happened to notice that there's also a Unicode double apostrophe character, U+02EE. I'm not sure, but it might be tailor-made for Neopolitan.
Again, there are multiple Unicode double apostrophe characters, but this one is also not punctuation but a letter: "modifier letter double apostrophe".
Just so. And just as U+02BC Modifier Letter Apostrophe is appropriate for use in Hawaiian, there's some chance -- as I said, I'm not sure, but it's at least a possibility -- that U+02EE Modifier Letter Double Apostrophe is appropriate for use in Neapolitan.
On Sat, Jun 17, 2006 at 08:26:12AM -0600, Andrew Dunbar wrote:
And there's at least one more possibility: today I happened to notice that there's also a Unicode double apostrophe character, U+02EE. I'm not sure, but it might be tailor-made for Neopolitan.
Again, there are multiple Unicode double apostrophe characters, but this one is also not punctuation but a letter: "modifier letter double apostrophe".
Which, if I correctly followed all the earlier conversation, means that yes, it's exactly what the Neapolitan people want.
Cheers, -- jra
Jay R. Ashworth wrote:
On Sat, Jun 17, 2006 at 08:26:12AM -0600, Andrew Dunbar wrote:
And there's at least one more possibility: today I happened to notice that there's also a Unicode double apostrophe character, U+02EE. I'm not sure, but it might be tailor-made for Neopolitan.
Again, there are multiple Unicode double apostrophe characters, but this one is also not punctuation but a letter: "modifier letter double apostrophe".
Which, if I correctly followed all the earlier conversation, means that yes, it's exactly what the Neapolitan people want.
Like this you mean?
http://en.wikipedia.org/wiki/User:Phil_Boswell/Sandbox/Double_apostrophe
HTH HAND
Phil Boswell wrote:
Like this you mean? http://en.wikipedia.org/wiki/User:Phil_Boswell/Sandbox/Double_apostrophe
Exactly.
(If it was me, though, I'd use U+200C Zero Width Non-Joiner or maybe U+200B Zero Width Space. I believe the intent of U+200D Zero Width Joiner is that it tries to force ligatures, especially in scripts like Arabic that have lots of ligated characters, and fancy rules for when or when not to ligate them automatically. Clearly the double apostrophe isn't a ligature; it's just something you didn't want joined, in this case by the wiki markup parser.)
Steve Summit wrote:
Phil Boswell wrote:
Like this you mean? http://en.wikipedia.org/wiki/User:Phil_Boswell/Sandbox/Double_apostrophe
(If it was me, though, I'd use U+200C Zero Width Non-Joiner or maybe U+200B Zero Width Space. I believe the intent of U+200D Zero Width Joiner is that it tries to force ligatures, especially in scripts like Arabic that have lots of ligated characters, and fancy rules for when or when not to ligate them automatically. Clearly the double apostrophe isn't a ligature; it's just something you didn't want joined, in this case by the wiki markup parser.)
And I still say we should just use ''. What we want _is_ two consecutive apostrophes -- we just don't want them _parsed_ by the wiki engine. Thus, they need to be escaped, which is exactly what '' does.
On 6/19/06, Ilmari Karonen nospam@vyznev.net wrote:
And I still say we should just use ''. What we want _is_ two consecutive apostrophes -- we just don't want them _parsed_ by the wiki engine. Thus, they need to be escaped, which is exactly what '' does.
Or, as pointed out, ''
I was thinking about possible changes to the MediaWiki parsing engine to accommodate this better (like changing the reinterpretation of '''' from bold+' to ''), but I don't think many of them are good. The main confusion with suggestions like yours is being able to easily talk about the category etc in question:
* Is the name of the category "actually" foo''bar and it just happens to display as foo''bar ? * Is the name of the category "actually" foo''bar, and we put a page in that category with the hackish wikimarkup [[Category:foo''bar]] ?
Actually, on the MediaWiki question, it *would* theoretically be possible to just ignore all attempted formatting within [[Article]] or [[Category:]] links wouldn't it? Since we don't support formatting in page names (and are unlikely to with that syntax anytime soon), it seems conceivable that [[Category:Foo''bar]] should take that name literally and not attempt to interpret it as italics, which wouldn't work anyway...
Steve
Steve Bennett wrote:
- Is the name of the category "actually" foo''bar and it just
happens to display as foo''bar ?
No.
- Is the name of the category "actually" foo''bar, and we put a page
in that category with the hackish wikimarkup [[Category:foo''bar]]
Yes.
Actually, on the MediaWiki question, it *would* theoretically be possible to just ignore all attempted formatting within [[Article]] or [[Category:]] links wouldn't it? Since we don't support formatting in page names (and are unlikely to with that syntax anytime soon), it seems conceivable that [[Category:Foo''bar]] should take that name literally and not attempt to interpret it as italics, which wouldn't work anyway...
Theoretically, but links and bold/italics are handled on separate parser passes. Things get ugly... :)
-- brion vibber (brion @ pobox.com)
On 6/19/06, Brion Vibber brion@pobox.com wrote:
- Is the name of the category "actually" foo''bar, and we put a page
in that category with the hackish wikimarkup [[Category:foo''bar]]
Yes.
Ok, so presuming it's not a category, will page moves to/from '' notation work correctly? Would the hackish notation be required anywhere except when linking to the page via [[...]] syntax? Are there any cases where it would not be possible to link due to some further restrictions on syntax being imposed?
Just checking...:)
Theoretically, but links and bold/italics are handled on separate parser passes. Things get ugly... :)
Heh, I believe you.
Steve
Steve Bennett wrote:
On 6/19/06, Brion Vibber brion@pobox.com wrote:
- Is the name of the category "actually" foo''bar, and we put a page
in that category with the hackish wikimarkup [[Category:foo''bar]]
Yes.
Ok, so presuming it's not a category, will page moves to/from '' notation work correctly?
Maybe, maybe not.
Since titles containing "''" cannot be reliably round-tripped, they need to be made invalid. Unfortunately somebody's pulled out a language that claims to need it, so we may be stuck with such breakage.
-- brion vibber (brion @ pobox.com)
On 6/19/06, Brion Vibber brion@pobox.com wrote:
Since titles containing "''" cannot be reliably round-tripped, they need to be
You mean in the sense that {{PAGETITLE}} etc would return the unescaped name? Could those functions be fixed?
(yes, I admit I seem to be following a pattern of turning every remark into "can I make more work for Brion?") ;)
What other senses does "round-tripping" have here?
Steve
Steve Bennett wrote:
On 6/19/06, Brion Vibber brion@pobox.com wrote:
Since titles containing "''" cannot be reliably round-tripped, they need to be
You mean in the sense that {{PAGETITLE}} etc would return the unescaped name? Could those functions be fixed?
In theory; we'd need to wrap hundreds of calls in some kind of wiki-escaping function for this to work reliably.
-- brion vibber (brion @ pobox.com)
On Mon, Jun 19, 2006 at 10:50:57AM -0700, Brion Vibber wrote:
Maybe, maybe not.
Since titles containing "''" cannot be reliably round-tripped, they need to be made invalid. Unfortunately somebody's pulled out a language that claims to need it, so we may be stuck with such breakage.
Was it authoritatively decided, then, that no other rendition of that character was acceptable than two sequential ASCII apostrophes?
Cheers, -- jra
I've been investigating this for a while... I understand that per-page popularity data has been disabled. However, I'm trying to make some statistical investigations of Wikipedia, in an attempt to provide some utilities for the selection of articles for the One Encyclopedia Per Child (OEPC) project.
To the point, then. For the English and Simple English Wikipedias - is there any way it could be possible to get per-page popularity data, or a stripped log file from some Squid proxy that this information could be extracted from? I already have a simple Perl utility that could be used to strip a Squid log file down to the information I need (the URL).
Thanks!
Eric Astor wrote:
I've been investigating this for a while... I understand that per-page popularity data has been disabled. However, I'm trying to make some statistical investigations of Wikipedia, in an attempt to provide some utilities for the selection of articles for the One Encyclopedia Per Child (OEPC) project.
To the point, then. For the English and Simple English Wikipedias - is there any way it could be possible to get per-page popularity data, or a stripped log file from some Squid proxy that this information could be extracted from? I already have a simple Perl utility that could be used to strip a Squid log file down to the information I need (the URL).
Sorry, we do not keep such logs due to the significant overhead it would place on the squids (we handle over half a billion requests per day).
-- brion vibber (brion @ pobox.com)
On 6/19/06, Eric Astor eastor1@swarthmore.edu wrote:
I've been investigating this for a while... I understand that per-page popularity data has been disabled. However, I'm trying to make some statistical investigations of Wikipedia, in an attempt to provide some utilities for the selection of articles for the One Encyclopedia Per Child (OEPC) project.
To the point, then. For the English and Simple English Wikipedias - is there any way it could be possible to get per-page popularity data, or a stripped log file from some Squid proxy that this information could be extracted from? I already have a simple Perl utility that could be used to strip a Squid log file down to the information I need (the URL).
The Squids don't log. Generally the feeling is that the information isn't useful. I'd like to see us do at least a periodic sample for these sort of applications. This could be accomplished without impact on the production infrastructure by placing a box on a port mirror of one of the squids, sniffing the traffic, and reconstructing the TCP sessions enough to extract the requested URLs. Unfortunately this would require trusted staff access because of exposure to private data in raw traffic, and as far as I know none of our core devs are interested in this data.
In the absence of this data you can use pure connectectedness to estimate important articles. You would just need the pagelink and page tables to form metrics based on connectedness. We make mysql dumps of this data available on download.wikimedia.org.
For Wikipedia subsetting, the use of the pagerank algorithm (ideally combined with popularity as the initialization data) is probably the ideal automated algorithm. ... Because even if a page is not widely read if it is widely linked you should still probably include it.
A high performance implementation of pagerank is available in the Boost Graph Library.
I've computed the internal pagerank of Wikipedia articles in the past (initialized with neutral values), so if you need help with this I can make myself available.
If you have a substantial amount of human hours available for this purpose, I'd recommend that you get some folks building a man made list of core subjects which must be included. (See http://en.wikipedia.org/wiki/Wikipedia:Version_1.0_Editorial_Team for work thats already been done in this area). Such a list could be used as part of the initialization data for pagerank.
It would likely be useful to begin producing a negative list, a set of articles that you don't want to include because of the focus of your project. For example, you might wish to exclude pop-culture subjects. Be careful about using category data for this purpose: Since there is no built in method to view all the super categories that an article is a member of sometimes links are made within the hierarchy which produce confusing results. For example, there are a great many articles reachable from every one of our top level cats.
Steve Bennett wrote:
- Is the name of the category "actually" foo''bar and it just
happens to display as foo''bar ?
- Is the name of the category "actually" foo''bar, and we put a page
in that category with the hackish wikimarkup [[Category:foo''bar]] ?
The latter, as clearly shown by the fact that [[foo''bar]], [[foo''bar]] and [[foo''bar]] all link to the same page.
On 6/19/06, Steve Bennett stevage@gmail.com wrote:
I was thinking about possible changes to the MediaWiki parsing engine to accommodate this better (like changing the reinterpretation of '''' from bold+' to ''), but I don't think many of them are good. The main confusion with suggestions like yours is being able to easily talk about the category etc in question:
Heh, just yesterday I was going to write that '''' would almost never be used to mean "apostrophe followed by bold". Today, I used it at French Wikipedia in the first sentence of an article:
L''''oeuf mimosa'''...
Nothing better than discovering one's own shortsightedness so soon!
Steve
Steve Bennett-4 wrote:
Heh, just yesterday I was going to write that '''' would almost never be used to mean "apostrophe followed by bold". Today, I used it at French Wikipedia in the first sentence of an article:
L''''oeuf mimosa'''...
Nothing better than discovering one's own shortsightedness so soon!
See this is where using real apostrophes would actually be an advantage, insane mutterings about MS "smart quotes" be damned...
HTH HAND
On 6/20/06, Phil Boswell phil.boswell@gmail.com wrote:
L''''oeuf mimosa'''...
See this is where using real apostrophes would actually be an advantage, insane mutterings about MS "smart quotes" be damned...
I don't follow - real apostrophes as opposed to what? How? Where? Whodunnit?
Steve
Steve Bennett wrote:
On 6/20/06, Phil Boswell phil.boswell@gmail.com wrote:
L''''oeuf mimosa'''...
See this is where using real apostrophes would actually be an advantage, insane mutterings about MS "smart quotes" be damned...
I don't follow - real apostrophes as opposed to what? How? Where? Whodunnit?
You probably knew this much of the story, but:
1. In the beginning there was the ASCII apostrophe, hex 0x27, decimal 39. It was also used as a single quote character, as of course it is in typography. Some computer printers and displays displayed the apostrophe as a slanted glyph, reminiscent of a typographical apostrophe, and some users paired it with the grave accent or "backquote" (0x60, 96) when using it as a quote: `like this'. Other printers and displays used a directionless vertical glyph -- as indeed old typewriters did, since they had no ! character and users were expected to backspace and overstrike a ' and a . to create one. Some users paired the apostrophes when using them as quotes, 'like this', and many computer languages (C, SQL, etc.) used the character this way in character and string constants.
2. When Unicode 1.0 came out, recognizing the overuse of the ASCII apostrophe, it split that one character into three, so to speak:
hex dec UTF8 description --- --- ---- ----------- 0x27 39 ' Apostrophe-Quote (ambiguous) 0x02BC 700 ʼ Modifier Letter Apostrophe 0x2019 8217 ’ Single Comma Quotation Mark
0x02BC was the "preferred character for apostrophe", and 0x2019 was the "preferred character for closing single quotation mark".
3. Somewhere between Unicode 2.0 and Unicode 3.0, the recommendation was changed so that 0x2019 (whose official name is now "Right Single Quotation Mark") is now the preferred character for both the closing single quotation mark and the apostrophe. Personally I think this is a real shame, but I've turned into a real crank on the subject so I won't rant about it here.
Anyway, the point is that there are now a bunch of apostrophe-like characters which you can use if you (a) want to be typographically correct, or (b) want to reduce ambiguity, or (c) want to play games like the one being discussed here, to duck a special meaning of one particular character, or (d) *increase* ambiguity and get into endless philosophical arguments on Wikipedia and Wiktionary trying to decide which character the Manual of Style should recommend for use in ordinary text, or article titles. (Should it be "it's" or "it’s"?)
And the fun doesn't end there, either. Besides the obvious left single quotation mark, there are two other variants for emulating various European typographical styles, and there's also a distinct "prime" character which is preferred for indicating feet and minutes of arc (and which has been debated over on [[Template talk:Coor dms]]), and then there's this new double apostrophe. And then the whole story repeats for double quotes, though at least without the apostrophe angle.
0x2018 8216 ‘ Left Single Quotation Mark 0x201A 8218 ‚ Single Low-9 Quotation Mark 0x201B 8219 ‛ Single High-Reversed-9 Quotation Mark 0x2032 8242 ′ Prime 0x02EE 750 ˮ Modifier Letter Double Apostrophe
0x0022 34 " Quotation Mark 0x201C 8220 “ Left Double Quotation Mark 0x201D 8221 ” Right Double Quotation Mark 0x201E 8222 „ Double Low-9 Quotation Mark 0x201F 8223 ‟ Double High-Reversed-9 Quotation Mark 0x2033 8243 ″ Double Prime
(Probably more than you wanted to know. I'll stop now, without even mentioning fullwidth variants, or double prime quotation marks. I've managed to avoid any insane mutterings about "smart quotes", too.)
On 6/20/06, Steve Summit scs@eskimo.com wrote:
I've managed to avoid any insane mutterings about "smart quotes", too.)
Why the reticence? I want to hear them!
Not that I don't have my own: if you think smart quotes make life complicated in English, try writing documents that consist of both French and English. Now, write a document that is about translation between the two: you will want French text that has English text embedded in French smart-quotes (« bad English text »). Sooner or later Word works out that the quoted text is in English, and switches back to English smart-quotes when you close it (« this English text... was too long"). Lovely.
Steve
On Tue, Jun 20, 2006 at 03:28:58PM +0200, Steve Bennett wrote:
On 6/20/06, Steve Summit scs@eskimo.com wrote:
I've managed to avoid any insane mutterings about "smart quotes", too.)
Why the reticence? I want to hear them!
Not that I don't have my own: if you think smart quotes make life complicated in English, try writing documents that consist of both French and English. Now, write a document that is about translation between the two: you will want French text that has English text embedded in French smart-quotes (« bad English text »). Sooner or later Word works out that the quoted text is in English, and switches back to English smart-quotes when you close it (« this English text... was too long"). Lovely.
You might be a geek if (like me) you're roaring in laughter over this paragraph.
(Just like last week, when someone on NANOG itemized the power and cooling requirements for a 450Kserver data center, in detail.)
Cheers, -- jra
On 6/17/06, Jay R. Ashworth jra@baylink.com wrote:
On Sat, Jun 17, 2006 at 08:26:12AM -0600, Andrew Dunbar wrote:
And there's at least one more possibility: today I happened to notice that there's also a Unicode double apostrophe character, U+02EE. I'm not sure, but it might be tailor-made for Neopolitan.
Again, there are multiple Unicode double apostrophe characters, but this one is also not punctuation but a letter: "modifier letter double apostrophe".
Which, if I correctly followed all the earlier conversation, means that yes, it's exactly what the Neapolitan people want.
OK. Now I understand the context of this discussion more fully. I do not know much about Neapolitan so I cannot say whether it has a letter which looks like a double quote or whether it has a letter which looks like an apostrophe which can occur as a double letter. That would be an important difference.
What I can say is that at least one indigenous language of Mexico, Amuzgo, uses a letter which looks like an apostrophe which can occur as a double letter. So the situation is certainly possible. I doubt anybody is likely to start an Amuzgo Wikipedia any time soon though considering the small number of speakers.
Andrew Dunbar (hippietrail)
Cheers,
-- jra
Jay R. Ashworth jra@baylink.com Designer Baylink RFC 2100 Ashworth & Associates The Things I Think '87 e24 St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274
A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on Usenet and in e-mail?
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Andrew Dunbar wrote:
On 6/17/06, Jay R. Ashworth jra@baylink.com wrote:
On Sat, Jun 17, 2006 at 08:26:12AM -0600, Andrew Dunbar wrote:
And there's at least one more possibility: today I happened to notice that there's also a Unicode double apostrophe character, U+02EE. I'm not sure, but it might be tailor-made for Neopolitan.
Again, there are multiple Unicode double apostrophe characters, but this one is also not punctuation but a letter: "modifier letter double apostrophe".
Which, if I correctly followed all the earlier conversation, means that yes, it's exactly what the Neapolitan people want.
OK. Now I understand the context of this discussion more fully. I do not know much about Neapolitan so I cannot say whether it has a letter which looks like a double quote or whether it has a letter which looks like an apostrophe which can occur as a double letter. That would be an important difference.
What I can say is that at least one indigenous language of Mexico, Amuzgo, uses a letter which looks like an apostrophe which can occur as a double letter. So the situation is certainly possible. I doubt anybody is likely to start an Amuzgo Wikipedia any time soon though considering the small number of speakers.
Andrew Dunbar (hippietrail)
Hoi, Maybe there may not be an Amuzgo Wikipedia any time soon, but words of the Amuzgo language ( http://www.ethnologue.com/show_language.asp?code=amu ) are quite welcome in any of the Wiktionaries or in WiktionaryZ. This is not dependant of a new project being requested or anything. Thanks, GerardM
Sabine Cretella wrote:
Brion Vibber schrieb:
Sabine Cretella wrote:
[[category:Comune d<nowiki>''</nowiki>a Campania]]
Get rid of the <nowiki>. Yes, the bug was fixed that allowed <nowiki> in there.
We really do need '' as normal chars and not to initiate text written in italics ... we must be able to include this in a wiki link in some way and also in a category link
Have you tried using the Unicode Zero-width Joiner? HTML code is ‍ (which might need to be &#x200d; depending upon how you are viewing this).
Trying this on enwiki gets me this: http://en.wikipedia.org/w/index.php?title=Category:Comune_d%27%E2%80%8D%27a_...
Since page titles are not displayed using class="Unicode" or any other such dodge, this for me means that there is an odd-looking "thing" between the apostrophes. However this can be fixed by an appropriate use of fonts if available: for example if I wrap the link in {{Unicode}} it displays properly.
HTH HAND
wikitech-l@lists.wikimedia.org