Wikitext-l September 2011

wikitext-l@lists.wikimedia.org

5 participants
7 discussions

Introducing Sztakipedia
by Mihály Héder 03 Oct '11

03 Oct '11

Dear Wikitext experts, please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself) which aims at implementing some of the Visions you described here: http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part) Some background: Sztakipedia did not start out as an editor for Wikipedia. It was meant to be a web-based editor for UIMA annotated rich content, supported with natural language background processing. The tool was functional by the end of 2010, and we wanted a popular application to demonstrate its features, so went on applying it to Wiki editing. To do that, we have made some wiki-specific stuff: -After checking out many parsers, we have created a new one in JavaCC -Created lots of content helpers based on dbpedia, like the link recommendation, infobox recommendation, infobox editor help -Integrated external resources to help editing, like the Book Recommendation or Yahoo-based category recommendation Sztakipedia is right now in its alpha phase, with many show stoppers, like handling cite references properly, or editing templates embedded in templates, etc... I am aware that you are working on a new syntax, parser and RTE and they will eventually become the official ones for Wiki editing (Sztakipedia is in Java anyway). However, I still think that there is much to learn from our project. We will write a paper next month on the subject and I will be honored is some of you read and comment it. The main contents will be: -problematic stuff in the current wikitext syntax we struggled with -usability tricks, like extracting the infobox pages to provide help for the fields, showing the abstracts of the articles to be linked -recommendations, machine learning to support the user+ background theory Our plan right now is to create an API for our recommendation services and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future. Please, share with me your thoughs/comments/doubts about Sztakipedia. Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other? -How do you measure the performance of a parser? I saw hints to some 300 parser test cases somewhere... -Which is the best way to mash up external services to support the Wiki editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)? Thank you very much, Best Regards Mihály Héder MTA Sztaki, Budapest, Hungary

3 3

Wikibooks and Wikisource triage report
by mhershberger＠wikimedia.org 30 Sep '11

30 Sep '11

This past Wednesday, I held a triage focused on issues from the Wikibooks an Wikisource projects ** https://bugzilla.wikimedia.org/18861 - Search should give transcluded text for the page First, a confession. I'm not an experienced Wikipedian. So when I saw this bug, I didn't understand the benefit. "Wouldn't you just need to add namespaces to the search index?" I thought. Luckily, Roan was in this triage and offered a simple use case: Wiktionary does crazy things like {{buildATableOfAllInflections|word|inflectioncase}} — which will produce a table of all inflections of "word", based on which case applies to it. So then if you search for an inflection of "word", say "words", you won't find it. With the help of this explanation, I was able to understand the usefulness and what was needed. MWSearch needs to index *expanded* Wikitext rather than just raw Wikitext. This would probably also fix that annoying bug where incategory:foo queries don't work properly with categories from templates This has been proposed as a Summer of Code idea for 2012. ** https://bugzilla.wikimedia.org/28277 - Error when importing pages from English Wikipedia to Portuguese Wikibooks This strange bug caused problems on the Portuguese Wikibooks project, and we were able to reproduce it during triage. I put Helder.wiki's steps for reproducing this on the bug and hope to find a developer to work on it soon. ** https://bugzilla.wikimedia.org/189 - please find a solution for a music module After some discussion, I decided to close this bug (which has gotten over 115 comments) and focus any new effort on action items derived from it like "Make [[mw:Extension:LilyPond]] safe against DoS attacks" (https://bugzilla.wikimedia.org/29630). Sumana is already using this issue as a possible area for volunteers to work on. Side note: prior to this triage, the LilyPond Extension existed only on a MediaWiki page. After the triage, I committed the LilyPond code to SVN and, almost immediately, it began getting valuable reviews and updates. http://hexm.de/80 I think this really shows the value of our code review process — especially as we've improved it over the past year or so. ** https://bugzilla.wikimedia.org/27256 - Correcting content page count at en.wikibooks and pt.wikibooks It looks like we spent a bit of time discussing this bug without any of us being aware of 1.18's new $wgArticleCountMethod. I've updated the bug with the necessary information. ** https://bugzilla.wikimedia.org/22911 - Install extension:SubpageSortkey on wikibooks Bawolff has created an extension to solve Helder.wiki's original request ("Default 'sort key' for namespaces should be more namespaces with subpages should be customisable"). At this point it simply needs to be reviewed and deployed. ** https://bugzilla.wikimedia.org/15071 - Wikibooks/Wikisource needs means to associate separate pages with books. ** https://bugzilla.wikimedia.org/2308 - ability to watch "bundles" of sub-pages These two bugs and Raylton's Extension:BookManager revolved around these projects' desire to treat books as entities that can be manipulated in the same way as wiki pages. They'd like the ability to watch, delete, or move books as well as a have pages like Special:RandomBook. Adding some of these features (watching, for example) to all pages in a category, might help admins in other projects besides these. ** https://bugzilla.wikimedia.org/30666 - Show subpages on page deletion As Bawolff said, this looks like a sane feature request in general, not just for wikibooks. Adrignola gave a couple of gadgets that enable subpage deletion, but the gadgets didn't provide a clean way to undelete sub pages en-masse. This sparked a discussion on some other enhancements that would be good to have. For example the ability to watch all articles in a category (https://bugzilla.wikimedia.org/1710) ** https://bugzilla.wikimedia.org/26881 - noinclude tag breaks Proofread under Internet Explorer This was a on the wishlist for wikibooks and included a patch. I committed it (http://mediawiki.org/wiki/Special:Code/MediaWiki/98422). ** https://bugzilla.wikimedia.org/12130 - Edit form eats heading CRs (leading blank newlines/whitespace) on save/preview This bug keeps popping up and, while there are work-arounds, the behavior is non-intuitive. Mediawiki erases the first (and only the first) blank like each time you click submit or preview. After everyone in the triage meeting confirmed this, I showed the problem to Krinkle who agreed that this should be fixed and left a comment with an idea of how to fix it. Next Triage: October 5th -- focus on Fund-raising issues http://hexm.de/81 Bug Triage calendar: http://hexm.de/TriageCal -- Mark A. Hershberger Bugmeister Wikimedia Foundation mhershberger(a)wikimedia.org 717.271.1084

1 0

ParserPlayground partial internals update
by Brion Vibber 29 Sep '11

29 Sep '11

I'm starting to take some more time away from helping the general MediaWiki 1.18 development and get back to the JavaScript back-end parser experiments[1] that'll power the initial testing of the visual editor's work on existing pages. [1] http://www.mediawiki.org/wiki/Extension:ParserPlayground I've made some progress[2] on in-tree template expansion, modeled on parts of PPFrame::expand[3], Parser::argSubstitution[4] and Parser::braceSubstitution[5] from the current MediaWiki parser. [2] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/98393 [3] http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/parser/Prep… [4] http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/parser/Pars… [5] http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/parser/Pars… There are two key differences between this in-progress code and how the stock parser does it: 1) The stock parser expands a preprocessor node tree as it flattens it into another string (eg for later parsing steps on the compound markup) -- the experimental parser produces another node tree. Later-stage parsing will deal with re-assembling start and end tokens when producing HTML or other output, but within the tree structure instead of a flat stream of string tokens -- that'll let output know how to mark where pieces of output came from in the input so they can be hooked up to editing activation. 2) Operations like fetching a template from the network are inherently asynchronous in a JavaScript environment, so the iteration loop uses a closure function to 'suspend' itself, calling back into the function when the async operation completes. Currently this is fairly naive and suspends the entire iterator, meaning that all work gets serialized and every network round-trip will add wait time. A smarter next step could be to keep on iterating over other nodes, then come back to the one that was waiting when it's ready -- this could be a win when working with many templates and high network latency. (Another win can be to pre-fetch things you know you're going to need in bulk!) I'm actually a bit intrigued by the idea of more aggressive asynchronous work on the PHP-side code now; while latency is usually low within the data center, things still add up. Working with multiple data centers for failover and load balancing may make it more likely for Wikimedia's own sites to experience slow data fetches, and things like InstantCommons[6] can already require waiting for potentially slow remote data fetches on third-party sites using shared resources over the internet. [6] http://www.mediawiki.org/wiki/InstantCommons Alas we can't just take JavaScript-modeled async code and port it straight to PHP, but PHP does let you do socket i/o and select() and whatnot and could certainly do all sorts of networked requests in background during other CPU operations. This is still very much in-progress code (and in-progress data structure), but do feel free to take a peek over and give feedback. :) -- brion vibber (brion @ wikimedia.org / brion @ pobox.com)

1 0

Regarding new recursive parser
by Proger_XP 28 Sep '11

28 Sep '11

Hi all, Not long ago I have stumbled upon Brion's presentation at http://upload.wikimedia.org/wikipedia/commons/d/d5/Parser_and_Editor_-_Haif… and I think I have something interesting to tell you folks about. Over a year ago me and my friend have started to create an intuitive, multipurpose and flexible wiki markup syntax. Looking back now I think we did a good job (and you can see it yourself, below) but that's not the main point of this message. I have written a PHP framework implementing descent recursive processing of a (wiki) document into corresponding DOM. Produced tree can be serialized into a binary format for storage/transmission or dumped (rendered) into different representations such as HTML or plain text (or FB2, XML, RTF, etc., etc.) - if each node supports the target format. Even though the project haven't got enough members yet we have developed several child projects that use the aforementioned framework - a fast blog engine and a JavaScript editor (incomplete). The latter is particularly interesting because it makes full use of per-node editing (e.g. text node -> bold fragment -> paragraph -> entire document) - and unlike solutions built on top of HTML/raw wiki code this one is 100% accurate since it relies on the document tree itself. What's also interesting about the framework is that in itself it and its DOM are markup-independent so, say, if a MediaWiki markup module is written it will allow the frontend (the wiki engine or something else) seamlessly operate on documents of both old and new syntaxes. Moreover, the compat module can support rendering into the new markup marking it transparent to migrate to the new format, if necessary. The project home page is: http://uverse.i-forge.net/wiki There's a concise but complete description of our markup language and our plans; you can experiment with the markup there, as well as find links to related sites and more in-depth articles. JavaScript editor page: http://uverse.i-forge.net/wiki/demo/Pages/Swacked The global goal of the project is to create a uniform markup syntax for all electronic texts and standardize it to allow 3rd party implementations. -- Regards, Proger_XP

1 0

this week in visual editor development
by Sumana Harihareswara 16 Sep '11

16 Sep '11

Thought you might want to see the EditSurface ToDo list that Trevor, Inez, and Neil are working off: http://etherpad.wikimedia.org/VisualEditorTodo Also, Trevor reports from last week & early this week: * Wrote lots of tests for new code structure and made some changes as needed * Continuing work on cleaning up and porting old code into new structure * Got new code structure to render paragraphs and lists * Working on rendering other things, such as tables * Redesigning selection system to better support a variety of block types *Wrote a bunch of code that proved our hypothesis was possible, but then there were design flaws. We learned a lot from that to redesign the architecture, and now are building it right. Made changes to wikidom, needs to be communicated to members of the team.* Neil's been very busy with the upload wizard and timed media handler, and Brion with code review & fixes ahead of the 1.18 deployment (example: http://www.mediawiki.org/wiki/User:Brion_VIBBER/Math_fixes ). So, no news on the Etherpad integration or the parser front. Also, the Wikimedia Foundation is hiring for a visual editor software developer, so please spread the word: http://wikimediafoundation.org/wiki/Job_openings/Software_Developer_Rich_Te… -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

2 1

parser code review (help deploy future versions faster)
by Sumana Harihareswara 10 Sep '11

10 Sep '11

We're trying to deploy MediaWiki 1.18 on Wikimedia Foundation sites by mid-September, and get into a faster deployment pace for future versions. That means we need help reviewing our backlog of unreviewed code and fixing the FIXMEs. Could you help by reviewing some code, and maybe even helping fix FIXMEs? NEW parser-related commits that could use comments (You can click on a revision number to get the code review page, where you can comment): http://www.mediawiki.org/w/index.php?path=%2Ftrunk%2Fphase3%2Fincludes%2Fpa… FIXMEs that are blocking the release of MediaWiki 1.18: http://www.mediawiki.org/wiki/MediaWiki_roadmap/1.18/Revision_report Thanks. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

1 0

Formal & informal grammar, and design of visual editor
by Sumana Harihareswara 07 Sep '11

07 Sep '11

Brion and Trevor just wrote up design documents you want to see: http://www.mediawiki.org/wiki/Visual_editor/software_design "This document specifies the information models and technologies required to interact with Wikitext visually.... [t]his project, like this document, is in a research and design phase." Covers: * Project status * Objectives * Constraints * Normalization * Document model ** Elements ** Blocks ** Content * Transactions ** Block transactions ** Document transactions ** Wiki transactions * Wikitext Representations * Linear Addressability http://www.mediawiki.org/wiki/Wikitext_parser/Stage_1:_Formal_grammar discussing low-level tokens and structures: * Tightly-bound tags * Brace structures * Loose structures * Line type tokens * Free/magic markup * Character references * Raw characters http://www.mediawiki.org/wiki/Wikitext_parser/Stage_2:_Informal_grammar discussing loose structure assembly and separate nesting levels Thanks, guys! -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Wikitext-l September 2011