Wikitext-l December 2011

wikitext-l@lists.wikimedia.org

6 participants
7 discussions

Start a nNew thread

Occupy Wikitext!

by Ian Baker

http://www.cs.dartmouth.edu/~sergey/langsec/occupy/ -Ian

12 years, 3 months

Fwd: In-browser offline wiki viewer

by Erik Moeller

Some of you wikitext workers might find this interesting as well ... Erik ---------- Forwarded message ---------- From: Erik Moeller <erik(a)wikimedia.org> Date: Fri, Dec 30, 2011 at 5:31 PM Subject: In-browser offline wiki viewer To: Using Wikimedia projects and MediaWiki offline <offline-l(a)lists.wikimedia.org> Text only for now, but very cool: http://offline-wiki.googlecode.com/git/app.html In modern browsers, this gives you access to the most popular WP articles offline without any additional software. (via NeilK) -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

12 years, 3 months

Test case for tables

by Neil Kandalgaonkar

It truly amazes me what people do on Wikipedia. http://en.wikipedia.org/wiki/Wikipedia:IPA_for_Portuguese -- Neil Kandalgaonkar ) <neilk(a)wikimedia.org>

12 years, 3 months

Refactoring in progress with parserTests

by Neil Kandalgaonkar

I pulled out most of the parser-y parts from the parserTests, leaving behind just tests. However, the parser is still a bit of a monster object, hence the deliberately silly name, ParserThingy. I'm trying to decompose it into a chain, roughly like: 1. wikiText -> tokenize -> tokens 2. tokens -> treeBuilder -> dom-tree 3. dom-tree -> serialize -> wikiDom or HTML There's a "postprocess" step as well, but I think that makes sense as part of step 2. Each step should be individually testable. And the whole enchilada is like a big composition of initialized objects, e.g. wikiTextToWikiDom = new wikiTextToSerialization( new wikiTextTokenizer(tokenConfig), new domTreeBuilder(treeConfig), new domTreeToWikiDom(serializationConfig) ); var wikiDom = wikiTextToWikiDom( wikitext ); Just a query on what interfaces people would like: I'm assuming exceptions are not a good idea, due to Node's async nature and there are certain constructs where we are explicitly async -- tokenizing can be streamed, and I assume when we start doing lookups to figure out what to do with templates we'll need to go async for seconds at a time. I'm also assuming that 99.99% of the time we want a simple in-out interface as described above. But for testing and debugging, we want to instrument what's really going on. And we may want to pass control off for a while when we bring template parsing into the mix. So that means that either there are magic values, or there's some way to attach event listeners to the serializer? Is it okay to attach event listeners to the serializer without tying them to a specific pipeline of wikitext that's finding its way through the code? -- Neil Kandalgaonkar ) <neilk(a)wikimedia.org>

12 years, 3 months

RFC: Token transformation framework and template expansion

by Gabriel Wicke

Hello, you can now find some notes on the token transform framework and a few (very rough) ideas for the next steps at https://www.mediawiki.org/wiki/Future/Parser_development/Token_stream_trans… Please review / sanity check / comment! Are there things not listed in that page that we should keep in mind? Gabriel

12 years, 3 months

Parser status update

by Gabriel Wicke

Hello, so far the HTML5 parser integration seems to have turned out quite well. 180 parser tests are now passing, and most of the remaining ones are about missing functionality in later stages of the parser pipeline. Since this week, the produced HTMl DOM can also be converted to WikiDom (or close to it). A sample result of the [[en:Barack Obama]] article is available at http://dev.wikidev.net/gabriel/tmp/obama.wikidom.txt. The unoptimized parse to WikiDom without template expansions etc currently takes about 35 seconds on my laptop. The various moving parts of the setup (and how to try it out) are described in https://www.mediawiki.org/wiki/Future/Parser_development. In glorious ASCII, it might look roughly like this: PEG wiki/HTML tokenizer (could also be any SAX-style parser) -> Token stream transformations -> HTML5 tree builder -> HTML DOM tree -> DOM Postprocessors +-> (X)HTML +-> DOMConverter -> WikiDom -> Visual Editor The tokenizer is built from a completely static grammar, and leaves all configuration-dependent behavior to later stages. Most interesting bits happen in token stream transformations, which are dispatched using a registration mechanism by token type. The order of handlers can be specified, and early handlers can abort further processing for a token. Syntax-specific transformations on a token can register for early processing, so that later transformations on a token can operate on a normalized version of the token. MediaWiki's special quote handling for italic/bold for example is implemented in a core extension that registers handlers for 'quote', 'newline' and the special 'eof' token. Lists and a simple version of the Cite extension are similarly implemented. A general emulation of parser hook behavior on top of the token stream is quite straightforward. Both collected tokens between tags and plain text based on source positions noted in tokens are available. The token transform dispatcher class is prepared for asynchronous processing of tokens, which is already used in a synchronous fashion for the back-reference behavior of the italic/bold extension. This ability to overlap operations on multiple tokens will be very important for template expansions. Doing template expansions on the token level makes it possible to render unbalanced templates like the table start / row / end combinations for viewing, while encapsulating those if the output is destined for the visual editor. Template expansion is currently WIP. So far for now, looking forward to your thoughts! Gabriel

12 years, 4 months

Help us test the VisualEditor prototype

by Neil Kandalgaonkar

Here's the demo, where you can edit some canned texts (but not actual Wikipedia articles, yet): http://www.mediawiki.org/wiki/Special:VisualEditorSandbox Post bugs here: https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions… And here's the blog post, which puts it more in context. http://blog.wikimedia.org/2011/12/13/help-test-the-first-visual-editor-deve… This new editor was mostly written by Trevor Parscal and Inez Korczyński, although lots of others have contributed. I've sat a desk away from them for a few months and I have to say I'm extremely impressed with what they've put together. If you're expecting Google Docs, we're not there yet. But the basics are starting to solidify, and it's getting easier and easier to add cool features. Hey MediaWiki developers, surely you're not going to let Trevor and Inez have *all* the fun? -- Neil Kandalgaonkar (| <neilk(a)wikimedia.org>

12 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l December 2011