nested definition lists

List overview All Threads
Download

newer

older

Stealing error recovery from HTML5...

"Watch me make mistakes"

Sumana Harihareswara

10 Nov 2011 10 Nov '11

2:34 p.m.

https://bugzilla.wikimedia.org/show_bug.cgi?id=6569 Gabriel mentioned that he'd like the list's input on this patch, regarding how we treat nested lists like ; bla : blub -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

Show replies by date

Gabriel Wicke

10 Nov 10 Nov

4 p.m.

Sumana, the regular

...

; bla : blub

is actually not the issue. More problematic are for example: ;; bla :: blub *; bla : blub or even the simple ;; bla Right now the behavior is quite inconsistent: http://www.mediawiki.org/wiki/User:GWicke/Definitionlists The bug discussing this is https://bugzilla.wikimedia.org/show_bug.cgi?id=6569 Treating '; bla : blub' as a tightly-bound special-case construct seems to me the simplest way to make this area more consistent while avoiding very ugly syntax. This would mean that *; bla : blub is treated as equivalent to *; bla *: blub and ;; bla :: blub is equivalent to ;; bla ;: :blub What are your preferences on this? Is any of these cases commonly used today? Gabriel

Trevor Parscal

6:34 p.m.

Can we deconstruct the current parser's processing steps and build a set of rules that must be followed? This stikes me as an area where the very few places where this kind of strange mixed style nesting is rare enough we may even be able to introduce a little bit of reform without much ill effect on the general body of wikitext out there. I think we need to get a dump of English Wikipedia and start using a simple PEG parser to scan through it looking for patterns and figuring out how often certain things are used - if ever. Ward Cunninham had a setup that could do this sort of thing on a complete en-wiki dump in like 10-15 minutes, and a fraction of the dump (still tens of thousands of article in size) in under a minute. We supposedly have access to him and his mad science laboratory - now would be a good time to get that going. - Trevor On Thu, Nov 10, 2011 at 8:00 AM, Gabriel Wicke <wicke(a)wikidev.net> wrote:

...

Sumana, the regular

; bla : blub

Gabriel Wicke

11 Nov 11 Nov

12:28 a.m.

Trevor,

...

Can we deconstruct the current parser's processing steps and build a set of rules that must be followed?

I think the commonly-used structures are quite clearly defined, but the behaviour of these strange permutations is quite unspecified. The parser output for the case reported in the bug already changed in the meantime..

...

I think we need to get a dump of English Wikipedia and start using a simple PEG parser to scan through it looking for patterns and figuring out how often certain things are used - if ever.

I just ran an en-wiki article dump through a zcat/tee/grep pipeline: pattern count example ------------------------------------------------------------------ ^ 548498738 (total number of lines) ^; 681495 ^;[^:]+: 153997 ; bla : blub ^[;:*#]+;[^:]+: 3817 *; bla : blub ^;; 2332 ^[:;*#]*;[^:]*:: 41 most probably ;:: ^[;:*#]*;[^:]+:: 17 ;; bla :: blub Nested definition lists are not exactly common. Lines starting with ';;' often appear as comments in code listings. The most common other application appears to be indentation and emphasis. Any change in the produced structure that keeps indentation and bolding should thus avoid breaking pages.

...

Ward Cunninham had a setup that could do this sort of thing on a complete en-wiki dump in like 10-15 minutes, and a fraction of the dump (still tens of thousands of article in size) in under a minute. We supposedly have access to him and his mad science laboratory - now would be a good time to get that going.

Will keep him in mind- we'll need to perform quite a few checks like these while tweaking the parser. A pipeline with two grep patterns and wc -l at the end ran just under 6 minutes on my notebook, so it is actually quite doable. The javascript parser would take quite a bit longer though ;) Cheers, Gabriel

Platonides

10 Nov 10 Nov

9:36 p.m.

...

What are your preferences on this? Is any of these cases commonly used today? Gabriel

I wasn't aware of the special nature of : in the middle of a line if it began with ; or : It's probably quite unused.

Amgine

9:48 p.m.

On 11-11-10 01:36 PM, Platonides wrote:

...

What are your preferences on this? Is any of these cases commonly used today? Gabriel

I wasn't aware of the special nature of : in the middle of a line if it began with ; or : It's probably quite unused.

; dt: dd is used. I've never seen the nested use, but have often wished for it. Amgine

4558

days inactive

4559

days old

wikitext-l@lists.wikimedia.org

Manage subscription

5 comments

5 participants

tags (0)

participants (5)

Amgine
Gabriel Wicke
Platonides
Sumana Harihareswara
Trevor Parscal