I've just read the past couple of days of discussion, and would like to agree with Merlijn.
One of the points missed is that the pipe trick and many of the other "end cases" are actually pre-processed, not stored in the database.
The easy examples being: * [[turkey (bird)|]] is stored as [[turkey (bird)|turkey]] * [[stuff]]ing is stored as [[stuff|stuffing]]
Other such behaviors could be regularized, and not affect the existing articles. Some years back, I made some suggestions in this wise, but they were not accepted.
A case I was concerned with at the time was normalized pre-processing of [[stuff:]] versus [[:stuff]], and [[|stuff]] versus [[stuff|]], and their combinations -- [[:stuff (action)|]]. This is the kind of thing that could most easily be formalized.
In regularizing the grammar, think about how the back-end data could be normalized to a new grammar for editing, and then stored again in the back-end form. For example, the // and ** ideas we've talked about multiple times over the years. No reason that the database couldn't continue to store them as '' and '''. Or better as <i> and <b>!
If we stick to just front-end parsing, the project might be doable in our lifetimes.
===
And as a final note for the computer scientists, remember that we often use LR(1) and LALR(1) grammars, but RL(1) is also possible! MW syntax has often seemed to me more like RL....
(Yes, back in university we were all required to write a parser -- a year-long project. I've written several for later projects, too. But university was a very long time ago.)
On 11/10/07, William Allen Simpson william.allen.simpson@gmail.com wrote:
I've just read the past couple of days of discussion, and would like to agree with Merlijn.
One of the points missed is that the pipe trick and many of the other "end cases" are actually pre-processed, not stored in the database.
The easy examples being:
- [[turkey (bird)|]] is stored as [[turkey (bird)|turkey]]
- [[stuff]]ing is stored as [[stuff|stuffing]]
Other such behaviors could be regularized, and not affect the existing articles. Some years back, I made some suggestions in this wise, but they were not accepted.
Because they tend to result in features that are hard to discover. Really, the pipe trick shouldn't be an on-save transform. About the only legitimate things that should be are substing (which is explicitly on-save) and signatures/timestamps.
On 11/11/07, William Allen Simpson william.allen.simpson@gmail.com wrote:
In regularizing the grammar, think about how the back-end data could be normalized to a new grammar for editing, and then stored again in the back-end form. For example, the // and ** ideas we've talked about multiple times over the years. No reason that the database couldn't continue to store them as '' and '''.
Heh, take an unambiguous syntax, and save it as an ambiguous syntax. That's genius...
Seriously though, the general idea is sound. IMHO it would apply equally to ISBN: it would make more sense to detect ISBN 13245789 and *at save time* (and possibly with user warning) replace it with{{ISBN|123456789}}, which would make it clear in the code that it's treated specially.
I think Simetrical is wrong that such "features...are hard to discover". How on earth would you ever discover the current ISBN behaviour?
Steve
On 11/11/07, Steve Bennett stevagewp@gmail.com wrote:
I think Simetrical is wrong that such "features...are hard to discover". How on earth would you ever discover the current ISBN behaviour?
By seeing an existing ISBN link, saying "Hey, how did they do that?", and looking at the page source. That's how people (at least techy people) normally learn how to use various languages, they do it by copying examples. If it got magically converted to a template, you would say "Oh, it's just a template", and use the template. Unless you happened to stumble across a specific discussion of the feature, you would never know about it.
There's a reason, you know, that talk pages tend to have a message at the top telling everyone about ~~~~.
On 11/12/07, Simetrical Simetrical+wikilist@gmail.com wrote:
By seeing an existing ISBN link, saying "Hey, how did they do that?", and looking at the page source. That's how people (at least techy people) normally learn how to use various languages, they do it by copying examples. If it got magically converted to a template, you would say "Oh, it's just a template", and use the template. Unless you happened to stumble across a specific discussion of the feature, you would never know about it.
If typing "ISBN xxx" is automagically converted to "{{ISBN|xxx}}" then there's no harm in people only knowing the latter form, is there...
The same goes for pipe tricks - ultimately all it can do is save you a few keystrokes. You don't get any new functionality by typing [[Foo (blah)|]] that you couldn't get just by typing [[Foo (blah)|Foo]].
There's a reason, you know, that talk pages tend to have a message at
the top telling everyone about ~~~~.
That one's a bit different. There's no other way to produce the signature
(short of manually typing your linked username and the date and time), and the transformed output bears no relation to the input.
Steve
On 11/11/07, Steve Bennett stevagewp@gmail.com wrote:
If typing "ISBN xxx" is automagically converted to "{{ISBN|xxx}}" then there's no harm in people only knowing the latter form, is there...
The same goes for pipe tricks - ultimately all it can do is save you a few keystrokes. You don't get any new functionality by typing [[Foo (blah)|]] that you couldn't get just by typing [[Foo (blah)|Foo]].
Right, but if the functionality exists, we should not go out of our way to obscure it by pre-save transformation. There's no reason not to leave it in the page text itself.
That one's a bit different. There's no other way to produce the signature (short of manually typing your linked username and the date and time), and the transformed output bears no relation to the input.
Exactly why it's impossible in that case to leave it in the page text. There are no other excuses, I don't think.
On 11/12/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Right, but if the functionality exists, we should not go out of our way to obscure it by pre-save transformation. There's no reason not to leave it in the page text itself.
This thread is about separating the logic into two places. "The functionality" could exist in the save logic, or the rendering logic, or conceivably both. I'm suggesting we move it strictly to the save logic. Typing ISBN xxx looks and feels like shorthand, rather than some genuine syntactic rule. Making it a save-time feature akin to autocorrect would be appropriate.
Steve
On 11/11/07, Steve Bennett stevagewp@gmail.com wrote:
This thread is about separating the logic into two places. "The functionality" could exist in the save logic, or the rendering logic, or conceivably both. I'm suggesting we move it strictly to the save logic. Typing ISBN xxx looks and feels like shorthand, rather than some genuine syntactic rule. Making it a save-time feature akin to autocorrect would be appropriate.
I disagree that shorthand features should not be saved in the wikitext, and I know at least a couple of other devs have expressed vaguely similar sentiments in the past.
On 11/12/07, Simetrical Simetrical+wikilist@gmail.com wrote:
I disagree that shorthand features should not be saved in the wikitext, and I know at least a couple of other devs have expressed vaguely similar sentiments in the past.
It might simplify the task of people who want to reuse Wikipedia content in other forms if we could keep the grammar as small as possible.
Steve
Simetrical schrieb:
I disagree that shorthand features should not be saved in the wikitext, and I know at least a couple of other devs have expressed vaguely similar sentiments in the past.
I agree with that. All wikitext is "shorthand" for something which could otherwise be represented in HTML. Pre-save transforms should be limited.
-- Jim R. Wilson (jimbojw)
On Nov 11, 2007 6:21 PM, Steve Bennett stevagewp@gmail.com wrote:
On 11/12/07, Simetrical Simetrical+wikilist@gmail.com wrote:
I disagree that shorthand features should not be saved in the wikitext, and I know at least a couple of other devs have expressed vaguely similar sentiments in the past.
It might simplify the task of people who want to reuse Wikipedia content in other forms if we could keep the grammar as small as possible.
Steve
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 11/13/07, Jim Wilson wilson.jim.r@gmail.com wrote:
I agree with that. All wikitext is "shorthand" for something which could otherwise be represented in HTML. Pre-save transforms should be limited.
That's not the "shorthand" sense I mean. Think about any of the language features, whether it's italics, lists, links, __TOC__, parser functions, <gallery>, tables, whatever. They all work on the basis that someone deliberately typed some weird punctuation to tell the parser to treat the text differently.
ISBN is different: the parser deliberately tries to detect text that the user typed naturally ("ISBN 123456789" being the normal, unmarked formatting used in the real world) and treat it specially. It's not a real grammatical feature, it's a deliberate effort to achieve markup with no effort.
The only other feature I can think of that works that way is bare urls: http://foo.com
Anyway, it's not a major issue. There are bigger fish to fry.
Steve
Steve Bennett wrote:
ISBN is different: the parser deliberately tries to detect text that the user typed naturally ("ISBN 123456789" being the normal, unmarked formatting used in the real world) and treat it specially. It's not a real grammatical feature, it's a deliberate effort to achieve markup with no effort.
The only other feature I can think of that works that way is bare urls: http://foo.com
Anyway, it's not a major issue. There are bigger fish to fry.
Steve
It also works for RFCs.
And PMIDs. The RFC bit me and I had to disable it. In E. coli we have a gene named rfc, which caused all sorts of interesting problems when we tried to make wiki pages about it. It would be nice if these were less Easter-egg like and more configurable for us non WMF sites.
Come to think of it, upgrades probably turn this back on. Sigh.
JH
On Nov 13, 2007, at 7:05 AM, Platonides wrote:
Steve Bennett wrote:
ISBN is different: the parser deliberately tries to detect text that the user typed naturally ("ISBN 123456789" being the normal, unmarked formatting used in the real world) and treat it specially. It's not a real grammatical feature, it's a deliberate effort to achieve markup with no effort.
The only other feature I can think of that works that way is bare urls: http://foo.com
Anyway, it's not a major issue. There are bigger fish to fry.
Steve
It also works for RFCs.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
On 14/11/2007, Jim Hu jimhu@tamu.edu wrote:
And PMIDs. The RFC bit me and I had to disable it. In E. coli we have a gene named rfc, which caused all sorts of interesting problems when we tried to make wiki pages about it. It would be nice if these were less Easter-egg like and more configurable for us non WMF sites.
Come to think of it, upgrades probably turn this back on. Sigh.
Ouch! I just looked at that part of the parser - I hadn't realised the strings were hardcoded... they should be in the messages files like any other string...
On 11/15/07, Jim Hu jimhu@tamu.edu wrote:
And PMIDs. The RFC bit me and I had to disable it. In E. coli we have a gene named rfc, which caused all sorts of interesting problems when we tried to make wiki pages about it. It would be nice if these were less Easter-egg like and more configurable for us non WMF sites.
Yep. There's all sorts of reasons it's a bad idea. Take a look at this page:
http://en.wikipedia.org/wiki/Private_network
Here the RFC magic link is working to its full exent. But notice: - Every RFC is linked, even the ones that are used more than once - The RFC's are automatically linked like this: [http:... RFC 1234], rather than RFC 1234[http:...] or even RFC 1234<ref>http...</ref>
Having a piece of hardcoded parser magic dictate style and presentation over the Manula of Style is crappy. And that's in a best-case scenario. We have lots of other templates that do a similar job ({{imdb}} for instance), so it makes very little sense to me to give ISBN's, RFC's and PMID's (whatever they are) this special treatment.
</rant>
Our new parser will dutifully recognise them, of course. :)
Steve
On 11/15/07, Steve Bennett stevagewp@gmail.com wrote:
Yep. There's all sorts of reasons it's a bad idea. Take a look at this page:
http://en.wikipedia.org/wiki/Private_network
Here the RFC magic link is working to its full exent. But notice:
- Every RFC is linked, even the ones that are used more than once
- The RFC's are automatically linked like this: [http:... RFC
1234], rather than RFC 1234[http:...] or even RFC 1234<ref>http...</ref>
Having a piece of hardcoded parser magic dictate style and presentation over the Manula of Style is crappy. And that's in a best-case scenario. We have lots of other templates that do a similar job ({{imdb}} for instance), so it makes very little sense to me to give ISBN's, RFC's and PMID's (whatever they are) this special treatment.
</rant>
Would anyone object if I just deleted this functionality, so people could use templates like for everything else? Or would that be a little too drastic? As a historical thing, this stuff was introduced well before templates, I believe, and was subsequently obsoleted by them.
Would anyone object if I just deleted this functionality, so people could use templates like for everything else? Or would that be a little too drastic? As a historical thing, this stuff was introduced well before templates, I believe, and was subsequently obsoleted by them.
I'm sure somebody somewhere would object. At the very least, you should run a bot over the WMF wikis switching any use of the feature to templates. Better yet, provide a conversion script so anyone can run it when updating their installations.
Simetrical wrote:
Would anyone object if I just deleted this functionality, so people could use templates like for everything else? Or would that be a little too drastic? As a historical thing, this stuff was introduced well before templates, I believe, and was subsequently obsoleted by them.
While esoteric this feature is very nice and saves lots of work for users. Removing it makes no sense and will just generate more edits due to bots and people having to add such templates\links manually. At the very most introduce a global variable that disables this, however that seams too much effort for such a insignificant feature.
MinuteElectron.
On 11/17/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Would anyone object if I just deleted this functionality, so people could use templates like for everything else? Or would that be a little too drastic? As a historical thing, this stuff was introduced well before templates, I believe, and was subsequently obsoleted by them.
I once created {{ISBN}} which wrappered the ISBN magic word on en. It was deleted for redundancy. If this does happen, you should give people a bit of warning...
I can't agree with MinuteElectron that "While esoteric this feature is very nice and saves lots of work for users." How much work can be saved by eliminating 4 keystrokes every time you've looked up and typed in an ESBN?
Steve
Steve Bennett wrote:
users." How much work can be saved by eliminating 4 keystrokes every time you've looked up and typed in an ESBN
It is not neccesserily the ammount of work, but an inexperienced editor may neglect to put the ISBN template tag in at all, meaning there will either never be one, or someone else has to realise it and edit the article to add one. Many users do not know how to use templates and so by forcing them to use them for such basic functionality, many would either ignore the fact, or just not investigate how to fix it. Regardless why would one remove a feature from the parser, it makes no sense.
MinuteElectron.
On 11/17/07, MinuteElectron minuteelectron@googlemail.com wrote:
It is not neccesserily the ammount of work, but an inexperienced editor may neglect to put the ISBN template tag in at all, meaning there will either never be one, or someone else has to realise it and edit the
This is terrible justification. An experienced editor may forget to add: * Sources * Categories * Assertion of notability * Navbox * Infobox * Image * Link to Commons category * See also links * Links to other articles in the text. * Stub template * Footnotes * Unit conversion templates
The software can't magically fix those. Forgetting to add the *link* to a correctly cited ISBN? Pfft. The least of our worries.
Regardless why would one remove a feature from the parser, it makes no sense.
Because features that add complexity without a major benefit aren't good. Because you don't always want to link to absolutely every RFC and every ISBN.
Steve
Regardless, Brion's answer was "not at this time".
On 11/18/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Regardless, Brion's answer was "not at this time".
Yep. It doesn't mean that *other* arguments in favour of that construct aren't lousy :) The reasons for retaining the construct are basically a) to avoid changing the grammar we're trying to define, b) to avoid breaking existing wikitext. Not c) because the construct itself is actually valuable.
Steve
William Allen Simpson wrote:
I've just read the past couple of days of discussion, and would like to agree with Merlijn.
One of the points missed is that the pipe trick and many of the other "end cases" are actually pre-processed, not stored in the database.
The easy examples being:
- [[turkey (bird)|]] is stored as [[turkey (bird)|turkey]]
- [[stuff]]ing is stored as [[stuff|stuffing]]
Ony the first is actually expanded on save. http://es.wikipedia.org/w/index.php?diff=12747472&oldid=12747041&dif...
wikitech-l@lists.wikimedia.org