In findColonNoLinks replace:
if( $pos === false) {
with
if( $pos === false || $pos == strlen($str)-1) {
Rationale: currently this input renders without a colon:
;Example: :Blah blah
I don't know if there is a reported bug for this, but the behaviour has bitten me once or twice, and I don't think there's anything useful in treating a trailing colon as the start of a (non-existent) definition.
Steve PS It may be preferable to implement this by searching the left($str, strlen($str)-1) or something, but I leave that to people who actually know PHP :)
If you want to keep the colon, you could use:
;Example<nowiki>:</nowiki> :Blah blah
I'd think that your suggested change would be too much of a deviation from current syntax - but I defer to the community for further input.
-- Jim R. Wilson (jimbojw)
On Jan 20, 2008 6:26 PM, Steve Bennett stevagewp@gmail.com wrote:
In findColonNoLinks replace:
if( $pos === false) {
with
if( $pos === false || $pos == strlen($str)-1) {
Rationale: currently this input renders without a colon:
;Example: :Blah blah
I don't know if there is a reported bug for this, but the behaviour has bitten me once or twice, and I don't think there's anything useful in treating a trailing colon as the start of a (non-existent) definition.
Steve PS It may be preferable to implement this by searching the left($str, strlen($str)-1) or something, but I leave that to people who actually know PHP :)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 21/01/2008, Jim Wilson wilson.jim.r@gmail.com wrote:
If you want to keep the colon, you could use: ;Example<nowiki>:</nowiki> :Blah blah I'd think that your suggested change would be too much of a deviation from current syntax - but I defer to the community for further input.
Examples would be good, i.e. pages which have a colon on the same line, then what the writer's intent appears to have been.
- d.
Examples would be good, i.e. pages which have a colon on the same line, then what the writer's intent appears to have been.
That's where the community comes in :)
I am not prepared to back up my opinion at this time with any actual "facts" per se - i don't feel strongly about it either way.
-- Jim
On Jan 21, 2008 12:07 PM, David Gerard dgerard@gmail.com wrote:
On 21/01/2008, Jim Wilson wilson.jim.r@gmail.com wrote:
If you want to keep the colon, you could use: ;Example<nowiki>:</nowiki> :Blah blah I'd think that your suggested change would be too much of a deviation from current syntax - but I defer to the community for further input.
Examples would be good, i.e. pages which have a colon on the same line, then what the writer's intent appears to have been.
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 1/22/08, David Gerard dgerard@gmail.com wrote:
Examples would be good, i.e. pages which have a colon on the same line, then what the writer's intent appears to have been.
Just resurrecting this thread: I think my change is a good one, and if there are any actual arguments against it, I'd like to hear them. What David is asking is a bit much: for me to find examples where a user has entered wikitext which renders as basically nothing, and chosen to leave it like that. More likely, the user would realise that the sequence ;foo: renders in an unhelpful way, and would use a <nowiki>. Perhaps the onus should be on someone else to find these mythical examples to show how they would render incorrectly with the change.
Steve
Would the change also include something to handle the case where a one-liner definition contains an intended literal colon? For example:
;Term: : Definition
In that case, the user really probably intendes a colon to trail "Term" in the resulting <dt>, but today that colon gets prepended instead as a literal to the Definition underneath.
-- Jim
On Jan 30, 2008 8:53 AM, Steve Bennett stevagewp@gmail.com wrote:
On 1/22/08, David Gerard dgerard@gmail.com wrote:
Examples would be good, i.e. pages which have a colon on the same line, then what the writer's intent appears to have been.
Just resurrecting this thread: I think my change is a good one, and if there are any actual arguments against it, I'd like to hear them. What David is asking is a bit much: for me to find examples where a user has entered wikitext which renders as basically nothing, and chosen to leave it like that. More likely, the user would realise that the sequence ;foo: renders in an unhelpful way, and would use a <nowiki>. Perhaps the onus should be on someone else to find these mythical examples to show how they would render incorrectly with the change.
Steve
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 1/31/08, Jim Wilson wilson.jim.r@gmail.com wrote:
Would the change also include something to handle the case where a one-liner definition contains an intended literal colon? For example:
;Term: : Definition
In that case, the user really probably intendes a colon to trail "Term" in the resulting <dt>, but today that colon gets prepended instead as a literal to the Definition underneath.
I don't think you can unambiguously state the user's intention there. It could just as easily be:
;Ratio: 3:1
or something. If I'm not mistaken, this is the only place in the syntax where a single character other than pipe has meaning in the middle of plain text like this. Ambiguity is almost unavoidable.
Anyway, I have, as requested, found examples of trailing colons. These from Wikiversity:
Bloom Clock_2006 data.txt:;''[[w:Clematis virginiana|Clematis virginiana]]'' (De vil's darning-needle): Bloom Clock_2006 data.txt:;''[[w:Rudbeckia fulgida|Rudbeckia fulgida]]'' (Black- eyed Susan, Orange coneflower): Help_Wiki markup examples.txt:;[[w:newline|Newline]]: Help_Wiki markup examples.txt:;Dates: Help_Wiki markup examples.txt:;Example: Help_Wiki markup examples.txt:;typewriter font: Help_Wiki markup examples.txt:;Show special character codes: Wikiversity_Register Language of Interest.txt:;Links to existing Wikiversity Sit es:
(underscores in filenames often mean colons, so for example the fourth example is the text ";Dates:" in [[Help:Wiki markup examples]])
Searching en wikisource, 460 examples turn up:
; Article 7 (Education). : ; Article 9 (Freedom of association). : ; Article 10 (Privacy of letters, posts, and telecommunications amended 24 June 1968). :
... ;THE GRAND JURY CHARGES: ;Coro: ;Chorus: ;Estrofa I: ... ;GOLDBERG, SPECIAL TRIAL JUDGE: ;NORMA HOLLOWAY JOHNSON, District Judge: ;Epilogue: ;Editions: ; Speakers: ;What Happens if the Three Elements Are Not Received Together: ;Renewal Registration: ...
In Wikibooks there are 770 examples. ;Characteristics of the ''petiole'': ;Arrangement of the veins ('''venation'''): ;Add:To add two numbers, we take the modulo-2 of the result. Here is a truth table for an add operation: ;Multiply:Multiplication in modulo 2 happens normally, and no new operator needs to be defined. Here is the truth table for a multiplication operation: ;Target Audience: ;Scope: ;Prerequisites:
IMHO, all of these ought to be rendered with a literal final colon. There's no reason to think the author deliberately intended a meaningless character to render as ... nothing.
Steve
I don't think you can unambiguously state the user's intention there. It could just as easily be:
;Ratio: 3:1
In that example, there are non-whitespace characters between the two colons, whereas in my example, it was just whitespace.
So I guess my question is, why should newline be special? That is, why would these two examples render differently:
;Term: : Definition
;Term: : Definition
-- Jim
On Jan 30, 2008 8:49 PM, Steve Bennett stevagewp@gmail.com wrote:
On 1/31/08, Jim Wilson wilson.jim.r@gmail.com wrote:
Would the change also include something to handle the case where a one-liner definition contains an intended literal colon? For example:
;Term: : Definition
In that case, the user really probably intendes a colon to trail "Term" in the resulting <dt>, but today that colon gets prepended instead as a literal to the Definition underneath.
I don't think you can unambiguously state the user's intention there. It could just as easily be:
;Ratio: 3:1
or something. If I'm not mistaken, this is the only place in the syntax where a single character other than pipe has meaning in the middle of plain text like this. Ambiguity is almost unavoidable.
Anyway, I have, as requested, found examples of trailing colons. These from Wikiversity:
Bloom Clock_2006 data.txt:;''[[w:Clematis virginiana|Clematis virginiana]]'' (De vil's darning-needle): Bloom Clock_2006 data.txt:;''[[w:Rudbeckia fulgida|Rudbeckia fulgida]]'' (Black- eyed Susan, Orange coneflower): Help_Wiki markup examples.txt:;[[w:newline|Newline]]: Help_Wiki markup examples.txt:;Dates: Help_Wiki markup examples.txt:;Example: Help_Wiki markup examples.txt:;typewriter font: Help_Wiki markup examples.txt:;Show special character codes: Wikiversity_Register Language of Interest.txt:;Links to existing Wikiversity Sit es:
(underscores in filenames often mean colons, so for example the fourth example is the text ";Dates:" in [[Help:Wiki markup examples]])
Searching en wikisource, 460 examples turn up:
; Article 7 (Education). : ; Article 9 (Freedom of association). : ; Article 10 (Privacy of letters, posts, and telecommunications amended 24 June 1968). :
... ;THE GRAND JURY CHARGES: ;Coro: ;Chorus: ;Estrofa I: ... ;GOLDBERG, SPECIAL TRIAL JUDGE: ;NORMA HOLLOWAY JOHNSON, District Judge: ;Epilogue: ;Editions: ; Speakers: ;What Happens if the Three Elements Are Not Received Together: ;Renewal Registration: ...
In Wikibooks there are 770 examples. ;Characteristics of the ''petiole'': ;Arrangement of the veins ('''venation'''): ;Add:To add two numbers, we take the modulo-2 of the result. Here is a truth table for an add operation: ;Multiply:Multiplication in modulo 2 happens normally, and no new operator needs to be defined. Here is the truth table for a multiplication operation: ;Target Audience: ;Scope: ;Prerequisites:
IMHO, all of these ought to be rendered with a literal final colon. There's no reason to think the author deliberately intended a meaningless character to render as ... nothing.
Steve
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 2/1/08, Jim Wilson wilson.jim.r@gmail.com wrote:
In that example, there are non-whitespace characters between the two colons, whereas in my example, it was just whitespace.
So I guess my question is, why should newline be special? That is, why would these two examples render differently:
;Term: : Definition
;Term: : Definition
It's hard for me to argue "why" because personally I think the ;term:definition syntax is pretty crappy. It's the sort of cutesy minimalist syntax that was in vogue when wikis were first invented, but no one would dream of it now. I (and perhaps many people) tend to look at it like this:
;term
(translates into a DT wrapped in a DL)
:definition
(translates into a DD wrapped in a DL)
;term :definition
(translates as above, except that they're wrapped in the same DL, much like two consecutive #'s or *'s)
;term:definition
Now *this* is the special case, IMHO. Asking why the newline is special is missing the point: newline is always special. Or rather:
*Newline is :always special
So actually the newline is just behaving as it always does: breaking the end of a list item. It's the colon which has the weird double behaviour: acting as both a newline and list item in one. Personally, I don't see much benefit from this.
In any case, getting back to the original question re: doubled colons. I've looked through my corpus (versity, source, books, quotes) and find extremely few instances of that:
; CVS Archive : :pserver:anoncvs@libre.adacore.com:/anoncvs/xmlada ; Covalent – directional: : Covalent bonds are between specific atoms and distorting the positions of the atoms will break the bonds.
I think you're right that these are probably intended to trail on the term line, but it doesn't look like a big issue.
Steve
On 1/22/08, Jim Wilson wilson.jim.r@gmail.com wrote:
If you want to keep the colon, you could use:
;Example<nowiki>:</nowiki> :Blah blah
Sure. The point is that whenever the construct ";foo:" is used at present, it is almost certainly an error, as there is no benefit to a trailing colon. The prevailing attitude to wikitext syntax has generally been to try and pre-empt such "errors" and keep the behaviour intuitive, if I'm not mistaken.
I'd think that your suggested change would be too much of a deviation from current syntax - but I defer to the community for further input.
I really don't think so. For example:
;foo:blah:blah
There the second colon is literal.
So it's quite arguable that the colon is a *separator* that splits the definition from the term ("definiendum"?). If there's no definition, then there's no separator, so it's just a literal.
Incidentally, does anyone have a readily available corpus of wikitext, perhaps from Wikipedia? Something in the format of a bunch of text files would be really convenient.
Steve
On 22/01/2008, Steve Bennett stevagewp@gmail.com wrote:
Incidentally, does anyone have a readily available corpus of wikitext, perhaps from Wikipedia? Something in the format of a bunch of text files would be really convenient.
The usual answer would be to get a dump for analysis in various languages. (Doesn't have to be the latest.)
- d.
On 1/22/08, David Gerard dgerard@gmail.com wrote:
On 22/01/2008, Steve Bennett stevagewp@gmail.com wrote:
Incidentally, does anyone have a readily available corpus of wikitext, perhaps from Wikipedia? Something in the format of a bunch of text files would be really convenient.
The usual answer would be to get a dump for analysis in various languages. (Doesn't have to be the latest.)
Afaik the dumps are in some format that has to be imported into MySQL then exported (or analysed in-DB), no? Hence my request for something a little more convenient...
Steve
On Wed, Jan 23, 2008 at 03:06:10PM +1100, Steve Bennett wrote:
Afaik the dumps are in some format that has to be imported into MySQL then exported (or analysed in-DB), no? Hence my request for something a little more convenient...
The wikitext dumps are in XML format and can be parsed pretty easily as if they were plain text files.
- Carl
On 1/23/08, Carl Beckhorn cbeckhorn@fastmail.fm wrote:
The wikitext dumps are in XML format and can be parsed pretty easily as if they were plain text files.
Cool. Looks like the current dump is 3 Gb though, is there a subset available?
Steve
Steve Bennett wrote:
On 1/23/08, Carl Beckhorn wrote:
The wikitext dumps are in XML format and can be parsed pretty easily as if they were plain text files.
Cool. Looks like the current dump is 3 Gb though, is there a subset available?
Steve
1) Choose the current flavout to avoid having all history. 2) Get the dump of a smaller wikipedia.
On 1/24/08, Platonides Platonides@gmail.com wrote:
- Choose the current flavout to avoid having all history.
- Get the dump of a smaller wikipedia.
Yeah, I think in the end I will use a combination of a few smaller en projects and non-en wikipedias.
Steve
wikitech-l@lists.wikimedia.org