Dear Wikitext experts,
please, check out Sztakipedia, a new Wiki RTE at:
http://pedia.sztaki.hu/ (please check the video first, and then the tool itself)
which aims at implementing some of the Visions you described here:
http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part)
Some background:
Sztakipedia did not start out as an editor for Wikipedia. It was meant
to be a web-based editor for UIMA annotated rich content, supported
with natural language background processing.
The tool was functional by the end of 2010, and we wanted a popular
application to demonstrate its features, so went on applying it to
Wiki editing.
To do that, we have made some wiki-specific stuff:
-After checking out many parsers, we have created a new one in JavaCC
-Created lots of content helpers based on dbpedia, like the link
recommendation, infobox recommendation, infobox editor help
-Integrated external resources to help editing, like the Book
Recommendation or Yahoo-based category recommendation
Sztakipedia is right now in its alpha phase, with many show stoppers,
like handling cite references properly, or editing templates embedded
in templates,
etc...
I am aware that you are working on a new syntax, parser and RTE and
they will eventually become the official ones for Wiki editing
(Sztakipedia is in Java anyway).
However, I still think that there is much to learn from our project. We will
write a paper next month on the subject and I will be honored is some
of you read and comment it. The main contents will be:
-problematic stuff in the current wikitext syntax we struggled with
-usability tricks, like extracting the infobox pages to provide help
for the fields, showing the abstracts of the articles to be linked
-recommendations, machine learning to support the user+ background theory
Our plan right now is to create an API for our recommendation services
and helpers and a MediaWiki js plugin to get its results to the
current wiki editor. This way I hope the results of this research -
which started out as a rather theoretical one - will be used in a real
world scenario by at least a few people. I hope we will be able to
extend the your planned new RTE the same way in the future.
Please, share with me your thoughs/comments/doubts about Sztakipedia.
Also I wanted to ask some things:
-Which is the most wanted helper feature according to you:
infobox/category/link recommendation? External data import from the
Linked Open Data? (Like our Book Recommender right now which has
millions of book records in it?) Field _value_ recommendation for
infoboxes from the text? Other?
-How do you measure the performance of a parser? I saw hints to some
300 parser test cases somewhere...
-Which is the best way to mash up external services to support the Wiki editor
interface (because if you call an external REST service from JS in mediawiki, it
will be cross-site scripting I'm afraid)?
Thank you very much,
Best Regards
Mihály Héder
MTA Sztaki,
Budapest, Hungary
Hi all,
we've finally finished our work on our technical report about a Wikitext
Object Model (WOM) for MediaWiki's Wikitext. The report is about an XML
format (XWML) for data exchange and storage of Wikitext pages as well as
the WOM Java interfaces. These interfaces make it possible to access and
modify the semantic content of a Wikitext document in a unified way
without worrying about the concrete Wikitext syntax (but possibly
preserving it).
More information and the report itself can be found on our Sweble blog:
http://sweble.org/2011/07/wom-an-object-model-for-mediawikis-wikitext
I've also added a page on the Sweble AST, XWML and our WOM in the
MediaWiki wiki:
http://www.mediawiki.org/wiki/Future/AST and
http://www.mediawiki.org/wiki/Future/AST/Sweble
On a related topic, we will present our paper on the Design and
Implementation of the Sweble Wikitext Parser at WikiSym 2011. For those
of you who want to take a peek before the conference, we’ve put a
pre-print version of the paper on our blog:
http://sweble.org/2011/07/design-and-implementation-of-the-sweble-wikitext-…
Cheers
Hannes
A quick summary of the last week in parser/visual editor work by Neil,
Inez, Trevor, and Brion. It would especially be great if people added
Parser Playground tests and TODO tasks towards a collaborative editor.
We released the Wikimedia annual plan, which includes: "Develop Visual
Editor. First opt-in user-facing production usage by December 2011, and
first small wiki default deployment by June 2012."
http://wikimediafoundation.org/wiki/2011-2012_Annual_Plan_Questions_and_Ans…
Neil's been thinking about the tasks necessary in realtime collaborative
editing:
http://www.mediawiki.org/wiki/Future/Real-time_collaboration/Tasks
Please add your ideas & experiences!
Brion added some Parser Playground tests:
http://www.mediawiki.org/w/index.php?title=Special:Code/MediaWiki/author/br…
Please add more!
Trevor's been adding to an SVN module working on the edit surface; his
commits:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/tparscal
Trevor's been pair programming with Inez, whose work has also been on
the editing surface in general:
http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/inez
Trevor also notes: "I am planning to get some information out this week,
including some visuals of where we are headed, information about where
we are, and some details about what got us here." So look out for that
email within the next few days.
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
[cross-posted]
----- Original Message -----
> From: "Mark A. Hershberger" <mhershberger(a)wikimedia.org>
> I suppose these are all linked to the parser work that Brion & co are
> currently working on, but the arrival of the new parser 6 months to a
> year or more away (http://www.mediawiki.org/wiki/Future/Parser_plan ),
> I'd like to get these sort of parser issues sorted out now.
My particular hobby horse, the last time that {wikitext-l was really active,
I was involved with it heavily} (those are nearly identical, but not quite)
was this question, which that wiki page does not seem to address, but
the Etherpad might. If not, I still think it's a question that's fundamental
to the implementation of a replacement parser, so I'm going to ask it again
so everyone's thinking about it as work progresses down that path:
How good is good enough?
How many pages is a replacement parser allowed to break, and still be
certified?
That is: what is the *real* spec for mediawikitext? If we say "the formal
grammar", then we are *guaranteed* to break some articles. That's the
"Right Answer", from up here at 40,000 feet, where I watch from (having
the luxury of not being responsible in any way for any of this :-), but
it will involve breaking some eggs.
I bring this back up because, the last time we had this conversation, the
answer was "nope; the new parser will have to be bug-for-bug compatible
with the current one". Or something pretty close to that.
I just think this is a question -- and answer -- that people should be
slowly internalizing as we proceed down this path.
1) Formal Spec
2) Multiple Implementations
3) Test Suite
I don't think it's completely unreasonable that we might have a way to
grind articles against the current parser, and each new parser, and diff
the output. Something like that's the only way I can see that we *will*
be able to tell how close new parsers come, and on which constructs they
break (not that this means that I think The Wikipedia Corpus constitutes
a valid Test Suite :-).
Cheers,
-- jra
--
Jay R. Ashworth Baylink jra(a)baylink.com
Designer The Things I Think RFC 2100
Ashworth & Associates http://baylink.pitas.com 2000 Land Rover DII
St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274
I've retooled the 'Parser Playground' gadget as an extension, which lets us
more easily edit it & keep a master copy up to date. The gadget on
mediawiki.org now loads the JS files from the extension, from a SVN checkout
on toolserver -- handy that! :)
The updated gadget integrates a little better into the WikiEditor toolbar
system, though it's still young and primitive. There's also now a primitive
in-place editing mode when using the PegParser: you can click on any
selectable node in either the preview or inspector panes and get a dialog
box with the reconstructed source of just the piece you clicked on.
When done editing, click OK and it re-parses and drops it back into the
updated document. Spiffy eh? After a while this'll get replaced with the
fancier editing surface systems that Trevor & Inez are working on, but this
gives something to poke in the meantime. ;)
Features, next todo steps, and screenshots on the extension's page:
http://www.mediawiki.org/wiki/Extension:ParserPlayground
Primary next steps will be getting some round-tripping test helpers in
there, making it a little easier to plug a third or customized parser in,
and getting automated tests running from command-line (probably using
node.js). And of course actually expanding templates will start making
things interesting. ;)
The actual PEG grammar and the intermediate structure still need a lot more
serious work to go beyond these demo stages, but at the moment I'm more
actively looking at fleshing out the API between the parser/renderer and its
host environment.
-- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
Say, while everybody's trying to figure out a formal grammar, have you had a look at Ward Cunningham's exploratory parsing kit? He gave me a demo at OSBridge, and it's a really handy tool. Basically, it's a web app with an asynchronous C backend. You paste a tentative PEG grammar into a textarea, and it runs through whatever corpus you want, showing you representative instances of how it does or does not match. He was running it against the full English Wikipedia on his laptop, and it took only half an hour or something—with results coming in as they were generated, of course.
Using that, they made a PEG-and-then-some implementation of MW syntax that parses darn near all of Wikipedia: https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg. (I call it "PEG-and-then-some" because it does have a lot of callbacks which might interlock with and affect the rule matching—not sure.)
Cheers,
Erik
Dear all,
I have recently subscribed to this list and I wanted to introduce myself.
I have been working as a student on the 2011 edition of the Google
Summer of Code on a MediaWiki parser [1] for the Mozilla Foundation.
My mentor is Erik Rose.
For this purpose, we use a Python PEG parser called Pijnu [2] and
implement a grammar for it [3]. This way, we parse the wikitext into
an abstract syntax tree that we will then transform to HTML or other
formats.
One of the advantages of Pijnu is the simplicity and readability of
the grammar definition [3]. It is not finished yet, but what we have
done so far seems very promising.
Please don't hesitate to give advice of feedback, or even test it if you wish!
Best regards
[1] https://github.com/peter17/mediawiki-parser
[2] https://github.com/peter17/pijnu
[3] https://github.com/peter17/mediawiki-parser/blob/master/mediawiki.pijnu
--
Peter Potrowl
Hello,
Here is short update about our progress on editor.
This week I worked with Trevor and we've added following functionalities to
editor demo (
http://public.inez.wikia-dev.com/wmf/wikidom/demos/es/index.html):
- displaying cursor after click
- moving cursor with arrows (left/up/right/down)
- typing text
- deleting text
Today we started working on mixed content - to support displaying not only
raw text, but also text with annotations like: bold, italic, image, etc. -
and we figured out that in order to do this we have to change data structure
that we are using. We came with some idea for this data structure that we
are implementing and testing right now. After we confirm that it works well
we will communicate its details with the rest of the team - probably
beginning of the next week.
Thanks,
Inez
Hello,
I would like to quickly introduce myself to this group.
I'm Inez Korczyński from Wikia (where one of my major project was Rich Text
Editor). Since last Friday for the next few months I will work with Visual
Editor team. Mainly I'll be working on frontend JavaScript part, which
source you can see here:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/parsers/wikidom/
Inez