Hi!
I am a perl programmer, and I am trying to make perl implementation of
mediawiki format parser in order to use it in my projects.
I want this parser to work exactly in the same way as original php code, and
since php and perl have a lot of common in syntax, I decided to take original
php code and change only the thing that should be changed.
So I've taken the body of preprocessToXml, implemented all set of the stack
classes (in a way it sould be done in perl), and reimplemented some php
functions that are used in the code (string operations and othes)
Things that can't be fixed by defining some function I've fixed right in the body
of the code.
This seems to work (I've although skipped processing of html tags for now),
it's in quite experimental state, it is not even packaged as a module yet,
just ./test.pl with submodules. You can check it here if you want:
https://github.com/dhyannataraj/perl-mediawiki-parser
Why do I write here?
1. Just let you know what I am doing. May be somebody is interested in the
same thing
2. Ask you do you have some test cases for preparser only that checks all
special cases like "====", "{{{text}}" or "{{unfinished", so I can
automatically test that my preparset and original one gives the same result
(I've looked in test/ in source code, these tests are much too complex than a
preparser only tests) May be there is something that I've missed
3. I would have to keep the code up to date, so it mean that I will have to
reimport it each major release. It is possible to make some regexps that will
do most of the work, but in some cases code can be modified so it would be more
perl-friendly. I.e. change comments from '//' into '#' change ===false into
empty() (if it is equivalents). If you are willing to cooperate in this area,
let me know. Then with the next reimport I would offer you a patch.
So, what do you think about it all?
Please have a look to some graphs visualizing interesting data from our
code review queues in Gerrit, focusing in the key Wikimedia software
projects.[1]
http://korma.wmflabs.org/browser/gerrit_review_queue.html
The queue of open chagesets keeps growing. We have open changesets
submitted in every month since March 2012. However, since last December we
must be doing something right, because the median times to update and
resolve submissions are decreasing.
Looking at http://korma.wmflabs.org/browser/scr.html , one reason for this
improvement might be that the volume of new changesets has also decreased
during the same period. Maybe newer patches get faster reviews? Any ideas?
We need to dig further.
We have created a "hall of shame" (add you preferred smiley here) to bring
under the light the repositories with the open changesets that haven't seen
any activity for a longer period. The principle is simple: you don't want
to see one of your repos appearing in the top 10.
Many of the _leading_ repos have a couple of open changesets only, and our
hope is that by showing up there, the maintainers will act on them quickly
(e.g. OpenStackManager, fluoride, commons, UserMerge, TorBlock, Vipscaler,
luasandbox...). This will leave the fight for the pole position to the
projects that actually have a real problem dealing with patches received
(Donationinterface, GuidedTour, UploadWizard...)
Who knows, perhaps we should organize "patch days", in the same way that we
have organized bug days in the past (which we want to recover now). We also
want to look at ways promote the oldest inactive requests. For instance,
what about directing new volunteers there, asking them to submit their code
revisions. For a patch that has been waiting in silence for over a year,
any feedback will be better than no feedback.
One last detail. Our initial motivation to look at the age of open
changesets by affiliation was to check whether submissions from WMF
employees and independent developers were treated equally. Interestingly,
there are no big differences between these groups. However, there are big
differences between the median age of open WMDE changesets (16.5 days) and
open Wikia changesets (almost 283 days). All this according to our
estimation of the origin of patches (domain of the submitter's email +
affiliation submitted by the developers that filled our survey.[2]
Your feedback about these metrics is welcome. Please reply here or file
Bugzilla reports directly to Analytics > Tech Community metrics
https://bugzilla.wikimedia.org/buglist.cgi?component=Tech%20community%20met…
(Short link just in case: http://bit.ly/1q0itsl )
[1] https://wikitech.wikimedia.org/wiki/Key_Wikimedia_software_projects
[2]
https://docs.google.com/forms/d/1RFUa2zBAOolw78W-ozJPoYlR2lYbrAOYvOZYgjaAYQ…
--
Quim Gil
Engineering Community Manager @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil
[x-posted]
Hello,
The Language Engineering team will be hosting the next monthly IRC office
hour on Wednesday, April 9 2014 at 1700 UTC at #wikimedia-office.
We will be discussing about our recent work and provide updates related to
changes in the translation file format (PHP to JSON) for MediaWiki core and
extensions. As always, we will be taking questions during the session.
Please see below for event details and local time. See you at the office
hour.
Thanks
Runa
Monthly IRC Office Hour:
==================
# Date: April 9, 2014
# Time: 1700 UTC/1000PDT (Check local time:
http://www.timeanddate.com/worldclock/fixedtime.html?iso=20140409T1700)
# IRC channel: #wikimedia-office
# Agenda:
1. Translation file format changes
2. Other project updates
3. Q & A (Questions can be sent to me ahead of the event)
--
Language Engineering - Outreach and QA Coordinator
Wikimedia Foundation
Reminder: if you're going to Wikimania in London this year, you can ~~~~
for the sessions you want to attend. That helps the reviewers "decide
which sessions are of high interest". Tech-related stuff is mostly in:
https://wikimania2014.wikimedia.org/wiki/Category:Technology_%26_Infrastruc…https://wikimania2014.wikimedia.org/wiki/Category:Open_Data_submissions
Stuff that caught my eye includes presentations on "Mesh Sayada, a free
community network for open data and free culture", Kannada digitisation,
TogetherJS, Wikisource infrastructure, an Athena Project update,
structured Wikiquote, site performance, "How the LCA team uses
technology to scale", ridiculous browser bugs, and technical & social
aspects of anonymous editing.
--
Sumana Harihareswara
Senior Technical Writer
Wikimedia Foundation
I've been bothered for awhile by the mess we have in resources/jquery/ –
3rd party libraries, custom libraries we have to maintain, and directly
MW related code using mediaWiki.* APIs all mixed together in the same
directory. So I went and audited the .js we have inside
resources/jquery/ and have wrote up an RFC on it:
https://www.mediawiki.org/wiki/Requests_for_comment/Isolate_custom_jQuery_l…
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
A new pane has been added[1] to the debug toolbar which visualizes[2]
Profiler output.
This visualization takes all the discrete events recorded by
wfProfileIn/Out, groups them by splitting titles on : and -, and draws a
series of timelines of these events. This was created with the intention
of helping developers to understand what the application is doing at
various points in time. Using this visualization its reasonably easy to
scroll down to the timeline marked 'query' and see at what point in the
request each query was issued, and what other events were triggered near
that time.
Erik B.
[1] https://gerrit.wikimedia.org/r/#/c/104318/
[2] http://i.imgur.com/i5tefrs.png
I'm a bit concerned that with all the work we've been doing on the
mobile site for the purposes of Wikimedia projects we seem to have
been neglecting the 3rd party use case for MobileFrontend. I was
curious if any volunteers would like to step up and become a
MobileFrontend 3rd party champion and take ownership of these bugs and
help us improve the software for these types of users.
Essentially this would mean ensuring the 3rd party user's voice is
heard and that we keep our code as generic as possible. It would also
make sure important things like anonymous editing get thought about
before we have the capacity to do so (which in turn will speed up a
lot of that development)
We have a ton of bugs around this - these 3 for example:
https://bugzilla.wikimedia.org/show_bug.cgi?id=63328https://bugzilla.wikimedia.org/show_bug.cgi?id=63459https://bugzilla.wikimedia.org/show_bug.cgi?id=63458
if anyone is wanting a reason to code, is interested in mobile and
getting sick of their code for code waiting for review in Gerrit, I
urge you to step forward.
I'm happy to mentor/make sure your code gets review and importantly
help get all these bugs get fixed.
Message me privately if you are interested or come visit us in
#wikimedia-mobile on irc.freenode.org
Long! tl;dr is in the first two paragraphs.
With the merging of https://gerrit.wikimedia.org/r/#/c/122787/ , probably
the largest patch set for MediaWiki ever (+548314, -714438), MediaWiki core
is now using JSON for localisation of interface messages, per a recently
adopted RfC[1]. Thanks Krinkle/Timo for reviewing!
Please be aware that if you have open patch sets touching *.i18n.php
messages files or MessagesXx.php files, you will have to update your patch
sets to match the new file layout and format.
In December 2013, the first MediaWiki extensions have already been migrated
to use the JSON format. Today, Antoine/hashar enabled a JSON linter on the
jslint job that runs on many Gerrit repositories' patch sets.
Since last week I've started to migrate first all MediaWiki extensions that
are used by WIkimedia to use JSON i18n. At this time, 1.23wmf20 has about
50% of its extensions using the updated format. Migration of two extensions
is taking a little longer[2], but Matt Flaschen is helping with that, and I
expect that to be resolved soon.
Migration of all extensions has been going very smoothly - it's about 80%
done. With the help of reedy/Sam Reed, Raimond Spekking/Raymond, Niklas
Laxström/Nikerabbit and Adam Wight, so far 427 patch sets related to this
project have already been reviewed and merged[3], 40 await review and I
expect some 90 more to be submitted for the project to be completed.
Thanks also go to Roan Kattouw/Catrope for implementation of parts of the
RfC together with Niklas, to Niklas for rewriting LocalisationUpdate to
support the JSON format and more, and all who helped draft the RfC,
including James Forrester, Santhosh Thottingal, David Chan, Ed Sanders,
Robert Thomas Moen, and those who deserve credit but I have forgotten to
mention.
Once all migrations are complete, I'll be doing a full export from
translatewiki.net, which will cause a lot of JSON files to be touched, but
will mostly update encoding (full UTF-8) and add a newline at enf of file
where missing.
What's next? With this project almost completed, next order of business is
creating an RfC on where to go with the data that now remains in the
MessagesXx.php files (like date formatting, fallback, directionality,
namespace names, special page names, etc.) and localisation for special
page names, magic words and namespace names that are still being
implemented using $wgExtensionMessagesDirs. Maybe this is something we
could discuss and prototype during the hackathon. Please let me know if
this is something you'd like to work on.
Again, thanks for the help, and apologies for the inconvenience these
changes may have caused you!
[1] https://www.mediawiki.org/wiki/Requests_for_comment/Localisation_format
[2]
https://gerrit.wikimedia.org/r/#/q/status:open+topic:json-i18n-special,n,z
[3] https://gerrit.wikimedia.org/r/#/q/status:merged+topic:json-i18n,n,z
Cheers!
--
Siebrand Mazeland
Kitano ICT
M: +31 6 50 69 1239
Skype: siebrand