No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
[1] http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_ru...
-Niklas
Well, the syntax is:
condition = and_condition ('or' and_condition)* and_condition = relation ('and' relation)* relation = is_relation | in_relation | within_relation | 'n' <EOL> is_relation = expr 'is' ('not')? value in_relation = expr ('not')? 'in' range_list
within_relation = expr ('not')? 'within' range_list expr = 'n' ('mod' value)? range_list = (range | value) (',' range_list)* value = digit+ digit = 0|1|2|3|4|5|6|7|8|9 range = value'..'value
Would this one work: http://pear.php.net/package/PHP_ParserGenerator
? Domas
On Jun 20, 2012, at 2:02 PM, Niklas Laxström wrote:
No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
[1] http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_ru...
-Niklas
-- Niklas Laxström
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Le 20/06/12 13:02, Niklas Laxström a écrit :
No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
[1] http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_ru...
Have you considered using the `intl` PHP extension? It provides classes that supports the plural / number formatting from the CLDR. Out of the box :-)
That is of course going to need a lot of rewriting and rethinking the translatewiki system, but that would definitely be a huge time saver on the long term.
On 20/06/12 21:02, Niklas Laxström wrote:
No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
[1] http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_ru...
For input which is guaranteed to be small, a recursive descent parser is a reasonable choice -- maybe not the fastest method, but easy to understand and fun to write. There's lots of useful reference material available with a web search, e.g.:
http://teaching.idallen.com/cst8152/98w/recursive_decent_parsing.html
-- Tim Starling
On 06/20/2012 01:02 PM, Niklas Laxström wrote:
No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
I like the ease of disambiguation in Parsing Expression Grammars (PEG). Most PEG parser generators use memoization to achieve a runtime linear in the input. I have no experience with PEG parser generators for PHP, but am using PEG.js for the Parsoid tokenizer with good results.
If you try a PHP PEG generator, then please let us know about your results!
Gabriel
On Wed, June 20, 2012 20:20, Gabriel Wicke wrote:
On 06/20/2012 01:02 PM, Niklas Laxström wrote:
No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
I like the ease of disambiguation in Parsing Expression Grammars (PEG). Most PEG parser generators use memoization to achieve a runtime linear in the input. I have no experience with PEG parser generators for PHP, but am using PEG.js for the Parsoid tokenizer with good results.
If you try a PHP PEG generator, then please let us know about your results!
A few links for the archive: * https://en.wikipedia.org/wiki/Comparison_of_parser_generators * https://github.com/hafriedlander/php-peg (triple licensed, under BSD, MPL and GPL by request) * http://sourceforge.net/projects/lime-php/ (GPL licensed)
On Jun 20, 2012, at 1:02 PM, Niklas Laxström wrote:
No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
[1] http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_ru...
-Niklas
You may already know this, but santhosh is working on a parser[1] in javascript (as a node module, to be specific). I added a test suite to his repository. Ready to be expanded and build upon!
-- Krinkle
On Jun 21, 2012, at 7:13 AM, Krinkle wrote:
On Jun 20, 2012, at 1:02 PM, Niklas Laxström wrote:
No, this is not about a wikitext parser. Rather something much simpler.
Have a look at [1] and you will see rules like: n in 0..1 n is 2 n mod 10 in 3..4,9 and n mod 100 not in 10..19,70..79,90..99
Long ago when I wanted to compare the plural rules of MediaWiki and CLDR I wrote a parser for the CLDR rule format. Unfortunately my implementation uses regular expression and eval, which makes it unsuitable for production. Now, writing parsers is not my area of expertise, so can you please point me how to do this properly with PHP. Bonus points if it is also easily adaptable to JavaScript.
[1] http://unicode.org/repos/cldr-tmp/trunk/diff/supplemental/language_plural_ru...
-Niklas
You may already know this, but santhosh is working on a parser[1] in javascript (as a node module, to be specific). I added a test suite to his repository. Ready to be expanded and build upon!
-- Krinkle
Would be nice if there was an official test suite to use as input for it, so we don't have to maintain the test suite manually.
Also useful link, syntax specification: http://unicode.org/reports/tr35/#Language_Plural_Rules
-- Krinkle
wikitech-l@lists.wikimedia.org