Dear Wikitext experts,
please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself) which aims at implementing some of the Visions you described here: http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part)
Some background: Sztakipedia did not start out as an editor for Wikipedia. It was meant to be a web-based editor for UIMA annotated rich content, supported with natural language background processing. The tool was functional by the end of 2010, and we wanted a popular application to demonstrate its features, so went on applying it to Wiki editing.
To do that, we have made some wiki-specific stuff: -After checking out many parsers, we have created a new one in JavaCC -Created lots of content helpers based on dbpedia, like the link recommendation, infobox recommendation, infobox editor help -Integrated external resources to help editing, like the Book Recommendation or Yahoo-based category recommendation
Sztakipedia is right now in its alpha phase, with many show stoppers, like handling cite references properly, or editing templates embedded in templates, etc...
I am aware that you are working on a new syntax, parser and RTE and they will eventually become the official ones for Wiki editing (Sztakipedia is in Java anyway).
However, I still think that there is much to learn from our project. We will write a paper next month on the subject and I will be honored is some of you read and comment it. The main contents will be: -problematic stuff in the current wikitext syntax we struggled with -usability tricks, like extracting the infobox pages to provide help for the fields, showing the abstracts of the articles to be linked -recommendations, machine learning to support the user+ background theory
Our plan right now is to create an API for our recommendation services and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future.
Please, share with me your thoughs/comments/doubts about Sztakipedia.
Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other? -How do you measure the performance of a parser? I saw hints to some 300 parser test cases somewhere... -Which is the best way to mash up external services to support the Wiki editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)?
Thank you very much, Best Regards Mihály Héder MTA Sztaki, Budapest, Hungary
This is really interesting. I suggest everyone check it out, it's not going to be what you expect.
The RTE part isn't emphasized in the video as much as its capabilities to suggest enhancements for an article -- it uses machine learning to intelligently suggest categories, internal, and external links, and even infoboxes, and then helps you fill them out. Unlike a lot of other tools these suggestions actually seem to be useful, at least in the demo.
On 6/9/11 7:19 AM, Mihály Héder wrote:
Dear Wikitext experts,
please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself) which aims at implementing some of the Visions you described here: http://www.mediawiki.org/wiki/Future/Parser_plan (the RTE part)
Some background: Sztakipedia did not start out as an editor for Wikipedia. It was meant to be a web-based editor for UIMA annotated rich content, supported with natural language background processing. The tool was functional by the end of 2010, and we wanted a popular application to demonstrate its features, so went on applying it to Wiki editing.
To do that, we have made some wiki-specific stuff: -After checking out many parsers, we have created a new one in JavaCC -Created lots of content helpers based on dbpedia, like the link recommendation, infobox recommendation, infobox editor help -Integrated external resources to help editing, like the Book Recommendation or Yahoo-based category recommendation
Sztakipedia is right now in its alpha phase, with many show stoppers, like handling cite references properly, or editing templates embedded in templates, etc...
I am aware that you are working on a new syntax, parser and RTE and they will eventually become the official ones for Wiki editing (Sztakipedia is in Java anyway).
However, I still think that there is much to learn from our project. We will write a paper next month on the subject and I will be honored is some of you read and comment it. The main contents will be: -problematic stuff in the current wikitext syntax we struggled with -usability tricks, like extracting the infobox pages to provide help for the fields, showing the abstracts of the articles to be linked -recommendations, machine learning to support the user+ background theory
Our plan right now is to create an API for our recommendation services and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future.
Please, share with me your thoughs/comments/doubts about Sztakipedia.
Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other? -How do you measure the performance of a parser? I saw hints to some 300 parser test cases somewhere... -Which is the best way to mash up external services to support the Wiki editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)?
Thank you very much, Best Regards Mihály Héder MTA Sztaki, Budapest, Hungary
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
On Thu, Jun 9, 2011 at 7:19 AM, Mihály Héder hedermisi@gmail.com wrote:
Dear Wikitext experts,
please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself)
[snip]
That's really cool! The analysis & interactive suggestion tools are especially interesting; some of those can be the sorts of tools that could really aid in certain kinds of article editing, as well as for other uses like doing research or more educational-focused work where pulling information from other parts of the wiki resources would be useful.
Our plan right now is to create an API for our recommendation services
and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future.
Awesome! I would definitely love to see that happen; while we're still in early phases on the parser itself we're also starting to work on the editor integration API, so somewhere down the road we should have a standardish way to plug some of that in.
Please, share with me your thoughs/comments/doubts about Sztakipedia.
Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other?
We'll need to collect a little more reactions from users probably, but offhand I can definitely see real use for: * helper tool for making references/citations (those book records sound like they could really help in looking up info & formatting a book or periodical reference!) * suggestion of which infobox-style templates and categories to apply to a page based on content & already-found related pages -- we have such a *huge* array of templates and categories on the big Wikipedias, and it's easy to not know where to look. Having something that suggests 'infobox-city' to you when you're pretty clearly writing about a city could be a lot more useful than a simple typeahead list... it could even suggest it before the user knows to look for it!
-How do you measure the performance of a parser? I saw hints to some
300 parser test cases somewhere...
We have parser regression test cases in tests/parser/parserTests.txt (in maintenance/ on older released versions), but for performance testing you'd want to use some real-world articles.
Roughly speaking, you want to get some idea of both _average cases_ and _worst cases_. Overall server load is ruled by the average case, but it's the worst cases -- the slowest pages to render -- that have the most impact on user experience.
* grab a few dozen or hundred articles as a general subset (possibly weighted by popularity?) * grab several of the largest, most complex articles from english wikipedia
There tends to be at least some overlap here. ;) Large numbers of template invocations, large numbers of images, and large numbers of parser functions & tag extensions are usually the worst cases. Individual templates can also hide a lot of complexity that's not obvious from looking at the source of the base page.
-Which is the best way to mash up external services to support the Wiki
editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)?
If the JS can call via JSONP (executing via <script> and using a callback to pass data back to the caller), that should be fine. It's also possible to use cross-origin permission headers eg http://www.w3.org/TR/cors/
-- brion
Hello!
Thanks for the JSONP tip, it is ugly, but works :) And also thanks to everyone for the lots of encouragement we got.
So we spent the summer with coding and right now there is a Sztakipedia toolbar for the standard Wiki editor. It comes with four kinds of suggestions right now: pagelink, category, infobox, books. The stuff is communicating with our server trough an API via JSONP callbacks.
Please, check out the promo youtube: http://www.youtube.com/watch?v=_0ochjAwMkw
I hope you like it!
Also, if interested, check out the architecture of the system: http://pedia.sztaki.hu/?p=40
The plan is that it will be in beta for a month (in case there are no complications) and then I hope we manage to get it enabled by real end users.
I hope it will be among the first add-ons or plugins to the new editor you are working on. The question is, do you have any plans on how will stuff like this be integrated?
To help you with specifying this, I can offer some pieces of thought based on our experiences with this tool: -It would be super to know some id of the user in the plugin, even if it is hashed. The reason is that then we could store preferences, feed the machine learning algorithms and ensure that we do not offer already declined suggestions to the same person again. -To insert anything into the content we have to know exactly where the right place is. However, our natural language processing algorithms do not work well on wikitext (the problems are starting with the tokenizer and sentence boundaries). So, we have to strip those, but still remember where the parts originally were. We achieved this with our parser ( http://code.google.com/p/sztakipedia-parser/) which stores in the html it generates the character position of the corresponding wikitext. I did not really follow the WOM story, but I think it would be really helpful to a) have the original character pos in a WO or b) specify the WOM operations by which a plugin can change the content. -We become pros in JSONP :) but it did hurt a bit...
Okay these were only my quick remarks, I might write a more organized blog post about the issues we had.
But till the new editor is done, we have a way now to deliver smart stuff into the old one. So I ask you to tip us how to improve szp-toolbar! The plan is that we will collect any ideas in the google code page of the toolbar, and at some point make a voting.
Thanks
Best Regards Mihály Héder MTA Sztaki
On 9 June 2011 20:48, Brion Vibber bvibber@wikimedia.org wrote:
On Thu, Jun 9, 2011 at 7:19 AM, Mihály Héder hedermisi@gmail.com wrote:
Dear Wikitext experts,
please, check out Sztakipedia, a new Wiki RTE at: http://pedia.sztaki.hu/ (please check the video first, and then the tool itself)
[snip]
That's really cool! The analysis & interactive suggestion tools are especially interesting; some of those can be the sorts of tools that could really aid in certain kinds of article editing, as well as for other uses like doing research or more educational-focused work where pulling information from other parts of the wiki resources would be useful.
Our plan right now is to create an API for our recommendation services
and helpers and a MediaWiki js plugin to get its results to the current wiki editor. This way I hope the results of this research - which started out as a rather theoretical one - will be used in a real world scenario by at least a few people. I hope we will be able to extend the your planned new RTE the same way in the future.
Awesome! I would definitely love to see that happen; while we're still in early phases on the parser itself we're also starting to work on the editor integration API, so somewhere down the road we should have a standardish way to plug some of that in.
Please, share with me your thoughs/comments/doubts about Sztakipedia.
Also I wanted to ask some things: -Which is the most wanted helper feature according to you: infobox/category/link recommendation? External data import from the Linked Open Data? (Like our Book Recommender right now which has millions of book records in it?) Field _value_ recommendation for infoboxes from the text? Other?
We'll need to collect a little more reactions from users probably, but offhand I can definitely see real use for:
- helper tool for making references/citations (those book records sound
like they could really help in looking up info & formatting a book or periodical reference!)
- suggestion of which infobox-style templates and categories to apply to a
page based on content & already-found related pages -- we have such a *huge* array of templates and categories on the big Wikipedias, and it's easy to not know where to look. Having something that suggests 'infobox-city' to you when you're pretty clearly writing about a city could be a lot more useful than a simple typeahead list... it could even suggest it before the user knows to look for it!
-How do you measure the performance of a parser? I saw hints to some
300 parser test cases somewhere...
We have parser regression test cases in tests/parser/parserTests.txt (in maintenance/ on older released versions), but for performance testing you'd want to use some real-world articles.
Roughly speaking, you want to get some idea of both _average cases_ and _worst cases_. Overall server load is ruled by the average case, but it's the worst cases -- the slowest pages to render -- that have the most impact on user experience.
- grab a few dozen or hundred articles as a general subset (possibly
weighted by popularity?)
- grab several of the largest, most complex articles from english wikipedia
There tends to be at least some overlap here. ;) Large numbers of template invocations, large numbers of images, and large numbers of parser functions & tag extensions are usually the worst cases. Individual templates can also hide a lot of complexity that's not obvious from looking at the source of the base page.
-Which is the best way to mash up external services to support the Wiki
editor interface (because if you call an external REST service from JS in mediawiki, it will be cross-site scripting I'm afraid)?
If the JS can call via JSONP (executing via <script> and using a callback to pass data back to the caller), that should be fine. It's also possible to use cross-origin permission headers eg http://www.w3.org/TR/cors/
-- brion _______________________________________________ Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
wikitext-l@lists.wikimedia.org