Wikitext-l July 2011

wikitext-l@lists.wikimedia.org

14 participants
11 discussions

Introducing Sztakipedia
by Mihály Héder 03 Oct '11

03 Oct '11

Dear Wikitext experts, please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself) which aims at implementing some of the Visions you described here: http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part) Some background: Sztakipedia did not start out as an editor for Wikipedia. It was meant to be a web-based editor for UIMA annotated rich content, supported with natural language background processing. The tool was functional by the end of 2010, and we wanted a popular application to demonstrate its features, so went on applying it to Wiki editing. To do that, we have made some wiki-specific stuff: -After checking out many parsers, we have created a new one in JavaCC -Created lots of content helpers based on dbpedia, like the link recommendation, infobox recommendation, infobox editor help -Integrated external resources to help editing, like the Book Recommendation or Yahoo-based category recommendation Sztakipedia is right now in its alpha phase, with many show stoppers, like handling cite references properly, or editing templates embedded in templates, etc... I am aware that you are working on a new syntax, parser and RTE and they will eventually become the official ones for Wiki editing (Sztakipedia is in Java anyway). However, I still think that there is much to learn from our project. We will write a paper next month on the subject and I will be honored is some of you read and comment it. The main contents will be: -problematic stuff in the current wikitext syntax we struggled with -usability tricks, like extracting the infobox pages to provide help for the fields, showing the abstracts of the articles to be linked -recommendations, machine learning to support the user+ background theory Our plan right now is to create an API for our recommendation services and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future. Please, share with me your thoughs/comments/doubts about Sztakipedia. Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other? -How do you measure the performance of a parser? I saw hints to some 300 parser test cases somewhere... -Which is the best way to mash up external services to support the Wiki editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)? Thank you very much, Best Regards Mihály Héder MTA Sztaki, Budapest, Hungary

3 3

WOM: An object model for MediaWiki’s Wikitext
by Hannes dohrn 30 Jul '11

30 Jul '11

Hi all, we've finally finished our work on our technical report about a Wikitext Object Model (WOM) for MediaWiki's Wikitext. The report is about an XML format (XWML) for data exchange and storage of Wikitext pages as well as the WOM Java interfaces. These interfaces make it possible to access and modify the semantic content of a Wikitext document in a unified way without worrying about the concrete Wikitext syntax (but possibly preserving it). More information and the report itself can be found on our Sweble blog: http://sweble.org/2011/07/wom-an-object-model-for-mediawikis-wikitext I've also added a page on the Sweble AST, XWML and our WOM in the MediaWiki wiki: http://www.mediawiki.org/wiki/Future/AST and http://www.mediawiki.org/wiki/Future/AST/Sweble On a related topic, we will present our paper on the Design and Implementation of the Sweble Wikitext Parser at WikiSym 2011. For those of you who want to take a peek before the conference, we’ve put a pre-print version of the paper on our blog: http://sweble.org/2011/07/design-and-implementation-of-the-sweble-wikitext-… Cheers Hannes

1 0

In-line editing thesis
by Jan Paul Posma 23 Jul '11

23 Jul '11

For those interested, my thesis on in-line editing can be found at http://commons.wikimedia.org/wiki/File:In-line_Editing_thesis.pdf (and a presentation at http://commons.wikimedia.org/wiki/File:In-line_Editing_presentation.pdf). Cheers, Jan Paul

2 1

collaborative editing tasks, edit surface work, tests, & projected deployment date
by Sumana Harihareswara 21 Jul '11

21 Jul '11

A quick summary of the last week in parser/visual editor work by Neil, Inez, Trevor, and Brion. It would especially be great if people added Parser Playground tests and TODO tasks towards a collaborative editor. We released the Wikimedia annual plan, which includes: "Develop Visual Editor. First opt-in user-facing production usage by December 2011, and first small wiki default deployment by June 2012." http://wikimediafoundation.org/wiki/2011-2012_Annual_Plan_Questions_and_Ans… Neil's been thinking about the tasks necessary in realtime collaborative editing: http://www.mediawiki.org/wiki/Future/Real-time_collaboration/Tasks Please add your ideas & experiences! Brion added some Parser Playground tests: http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/author/br… Please add more! Trevor's been adding to an SVN module working on the edit surface; his commits: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/tparscal Trevor's been pair programming with Inez, whose work has also been on the editing surface in general: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/inez Trevor also notes: "I am planning to get some information out this week, including some visuals of where we are headed, information about where we are, and some details about what got us here." So look out for that email within the next few days. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

1 0

"When Is A New Parser Done?" ( was: Re: [Wikitech-l] Parser bugs and their priority)
by Jay Ashworth 18 Jul '11

18 Jul '11

[cross-posted] ----- Original Message ----- > From: "Mark A. Hershberger" <mhershberger(a)wikimedia.org> > I suppose these are all linked to the parser work that Brion & co are > currently working on, but the arrival of the new parser 6 months to a > year or more away (http://www.mediawiki.org/wiki/Future/Parser_plan ), > I'd like to get these sort of parser issues sorted out now. My particular hobby horse, the last time that {wikitext-l was really active, I was involved with it heavily} (those are nearly identical, but not quite) was this question, which that wiki page does not seem to address, but the Etherpad might. If not, I still think it's a question that's fundamental to the implementation of a replacement parser, so I'm going to ask it again so everyone's thinking about it as work progresses down that path: How good is good enough? How many pages is a replacement parser allowed to break, and still be certified? That is: what is the *real* spec for mediawikitext? If we say "the formal grammar", then we are *guaranteed* to break some articles. That's the "Right Answer", from up here at 40,000 feet, where I watch from (having the luxury of not being responsible in any way for any of this :-), but it will involve breaking some eggs. I bring this back up because, the last time we had this conversation, the answer was "nope; the new parser will have to be bug-for-bug compatible with the current one". Or something pretty close to that. I just think this is a question -- and answer -- that people should be slowly internalizing as we proceed down this path. 1) Formal Spec 2) Multiple Implementations 3) Test Suite I don't think it's completely unreasonable that we might have a way to grind articles against the current parser, and each new parser, and diff the output. Something like that's the only way I can see that we *will* be able to tell how close new parsers come, and on which constructs they break (not that this means that I think The Wikipedia Corpus constitutes a valid Test Suite :-). Cheers, -- jra -- Jay R. Ashworth Baylink jra(a)baylink.com Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274

4 4

ParserPlayground gadget/extension updates
by Brion Vibber 13 Jul '11

13 Jul '11

I've retooled the 'Parser Playground' gadget as an extension, which lets us more easily edit it & keep a master copy up to date. The gadget on mediawiki.org now loads the JS files from the extension, from a SVN checkout on toolserver -- handy that! :) The updated gadget integrates a little better into the WikiEditor toolbar system, though it's still young and primitive. There's also now a primitive in-place editing mode when using the PegParser: you can click on any selectable node in either the preview or inspector panes and get a dialog box with the reconstructed source of just the piece you clicked on. When done editing, click OK and it re-parses and drops it back into the updated document. Spiffy eh? After a while this'll get replaced with the fancier editing surface systems that Trevor & Inez are working on, but this gives something to poke in the meantime. ;) Features, next todo steps, and screenshots on the extension's page: http://www.mediawiki.org/wiki/Extension:ParserPlayground Primary next steps will be getting some round-tripping test helpers in there, making it a little easier to plug a third or customized parser in, and getting automated tests running from command-line (probably using node.js). And of course actually expanding templates will start making things interesting. ;) The actual PEG grammar and the intermediate structure still need a lot more serious work to go beyond these demo stages, but at the moment I'm more actively looking at fleshing out the API between the parser/renderer and its host environment. -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)

1 0

Cunningham's exploratory parsing
by Erik Rose 12 Jul '11

12 Jul '11

Say, while everybody's trying to figure out a formal grammar, have you had a look at Ward Cunningham's exploratory parsing kit? He gave me a demo at OSBridge, and it's a really handy tool. Basically, it's a web app with an asynchronous C backend. You paste a tentative PEG grammar into a textarea, and it runs through whatever corpus you want, showing you representative instances of how it does or does not match. He was running it against the full English Wikipedia on his laptop, and it took only half an hour or something—with results coming in as they were generated, of course. Using that, they made a PEG-and-then-some implementation of MW syntax that parses darn near all of Wikipedia: https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg. (I call it "PEG-and-then-some" because it does have a lot of callbacks which might interlock with and affect the rule matching—not sure.) Cheers, Erik

5 6

MediaWiki parser in Python
by Peter17 12 Jul '11

12 Jul '11

Dear all, I have recently subscribed to this list and I wanted to introduce myself. I have been working as a student on the 2011 edition of the Google Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation. My mentor is Erik Rose. For this purpose, we use a Python PEG parser called Pijnu [2] and implement a grammar for it [3]. This way, we parse the wikitext into an abstract syntax tree that we will then transform to HTML or other formats. One of the advantages of Pijnu is the simplicity and readability of the grammar definition [3]. It is not finished yet, but what we have done so far seems very promising. Please don't hesitate to give advice of feedback, or even test it if you wish! Best regards [1] https://github.com/peter17/mediawiki-parser [2] https://github.com/peter17/pijnu [3] https://github.com/peter17/mediawiki-parser/blob/master/mediawiki.pijnu -- Peter Potrowl

4 3

progress on editor
by Inez Korczynski 09 Jul '11

09 Jul '11

Hello, Here is short update about our progress on editor. This week I worked with Trevor and we've added following functionalities to editor demo ( http://public.inez.wikia-dev.com/wmf/wikidom/demos/es/index.html): - displaying cursor after click - moving cursor with arrows (left/up/right/down) - typing text - deleting text Today we started working on mixed content - to support displaying not only raw text, but also text with annotations like: bold, italic, image, etc. - and we figured out that in order to do this we have to change data structure that we are using. We came with some idea for this data structure that we are implementing and testing right now. After we confirm that it works well we will communicate its details with the rest of the team - probably beginning of the next week. Thanks, Inez

1 0

intro
by Inez Korczynski 09 Jul '11

09 Jul '11

Hello, I would like to quickly introduce myself to this group. I'm Inez Korczyński from Wikia (where one of my major project was Rich Text Editor). Since last Friday for the next few months I will work with Visual Editor team. Mainly I'll be working on frontend JavaScript part, which source you can see here: http://svn.wikimedia.org/viewvc/mediawiki/trunk/parsers/wikidom/ Inez

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l July 2011