For this week's triage, I worked with Tomasz Finc and Patrick Reilly
to focus on the new MobileFrontend extension. I also held it a few
hours earlier and, as a result had participation from developers
farther east.
Tomasz was especially interested in getting participation from non-WMF
developers and as an incentive, he went through the bugs and tagged
the easy ones that would be a good starting point for developers
interested in starting work on the MobileFrontend. You can see the
complete list of bugs in this week's etherpad here: http://hexm.de/5xhttps://bugzilla.wikimedia.org/24359 Adding an "e-mail to friends" link
to MobileFrontend
Bryan suggested this one was similar to requests that had been made
on the main site in the past and I initially closed as such -- a
duplicate of "mail this article to someone feature" like Bug #227.
Both Tomasz and MzMcBride objected, though, and re-opened the bug.
Part of Brion's concern for WONTFIXing 227 was that it "opens extra
spamming opportunities" which I'm concerned with as well. I updated
the bug with some steps we should take to minimize spamming
concerns.
https://bugzilla.wikimedia.org/22659 Give projects the opportunity to
add mobile specific JS/CSS
Amir showed up to ask about device-specific CSS and making them
available to wiki admins (e.g. Mediawiki:Android.css). Patrick said
that right now their were only two hard-coded CSS style-sheets
(.../iphone.css and .../android.css). I updated the bug with the
requested information.
https://bugzilla.wikimedia.org/30118 Custom project icon in
MobileFrontend extension
John Du Hart saw this bug in the list and volunteered to take it
on. As of Thursday night, it looks like he has gotten pretty far
along if not complete.
http://bugzilla.wikimedia.org/28515 - Dynamic fonts support in mobile
gateway
Amir raised the issue of webfonts for mobile. He also pointed out
that many mobile devices don't support web fonts at all. While
Santosh has done a lot of work on the webfonts extension, this
doesn't help. Amir speculated that the the webfonts extension could
render images of the page with the proper fonts and send the images
to the phone. We'll need to work on with Santosh on this.
After this we got people testing the Mobile frontend bugs that had
already been entered. Tomasz and and Patrick used the opportunity to
have people with different handhelds test the bugs. It is my impression
that there is a bit more variety with mobile devices right now then
there is with desktops, so this was a great opportunity to test the new
mobile site on a variety of devices and confirm problems or fixes.
After passing around the OptIn link (http://tinyurl.com/woptin) Tomasz
asked people to check out the following bugs:
http://bugzilla.wikimedia.org/28181 - Request for Kindle to be
redirected to the mobile site
This was one that we didn't get to test, but I'll be sure to request
that Kindle owners (and users of other mobile devices) show up for
our next mobile triage. We'll need you!
http://bugzilla.wikimedia.org/30293 - "View this page on regular
Wikipedia" disables mobile redirect
While testing this bug, we didn't see it and closed the bug.
However, I discovered https://bugzilla.wikimedia.org/30458 "mobile
link to view images site is flaky" and got someone else to confirm
it.
http://bugzilla.wikimedia.org/30356 - New gateway has broken layout for
opera mini
After some testing -- Thanks, Niklas! -- This looked like an
intermittent problem since we couldn't reproduce it.
We hope to do one of these mobile triages every month and, for future
ones it would be awesome if we could have Kindles, iPads, and maybe
even Nooks a well as Blackberries, Androids, iPhones and even Nokia
phones.
Thanks for all the help this time!
Mark.
Hi,
(I don’t post often here and I’m not a MW developer but I try to follow,
correct me if I’m wrong.)
I see a couple of things which must be done carefully and willingly about
page titles<ref>. Currently there is a difference between page_id and page
title, since the page_id is conserved when the title of the page changes
(during a move), so there is currently no canonical page title associated
to a revision, only a page_id, or in other words I think it is
theoretically non possible to retrieve the original page title of a given
past revision (this could be discussed on another thread) and I have some
doubts also about retrieving the original page_id of a revision in very
rare cases (with a succession of deletion-undeletion of some
revisions-moves) but I’m not sure of that.
So introduce a page_title in the revisions (your §1.) is a new interesting
information if your consider this as the title as of date of saving of the
revision, and then page_id->title and page_title can be different, the
same for the namespace. But this information is not currently available in
the database. This would pose the problem of definition of existing
revisions in the dumps: use the current page title associated to the
current page_id? If you put the current page_title associated to the
current page_id of the revision this means the page_title will change
accross dumps every time a move is done, I don’t find it is semantically
correct, but at least it should be clearly explained. This is the current
behaviour but since the page_title is outside of a revision you implicitly
aggree this behaviour which is semantically correct.
In the §2. there is a similar thing for the redirect: currently the
redirect points to a title, not a page_id (if you move the pointed page,
the redirect will point to the new page).
<ref>: I tried to work two years ago about an extension to restore ideally
pixel-per-pixel an old revision, but I think it’s not (currently) possible
mainly because of this problem of page titles. There are other problems
but this is the main problem. Others include retrieving of an old version
of the templates (related to the problem on the title), color of links and
categories, version of an image, external ressources like site CSS/JS,
status about deleted revisions (display or not), and finer things like
user preferences and rights, ultimately differences due to changes of MW
configuration or MW version, etc. (I don’t consider a change of version of
the user browser :) I didn’t publish it then (Sumana was not here to say
me to publish it ;) but I retrieved it on my computer, I try to publish it
and explain on mw.org.
Sébastien
Thu, 18 Aug 2011 13:30:18 -0400, Diederik van Liere <dvanliere(a)gmail.com>
wrote:
> Hi!
>
> Over the last year, I have been using the Wikipedia XML dumps
> extensively. I used it to conduct the Editor Trends Study [0] and me
> and the Summer Research Fellows [1] have used it in the last three
> months during the Summer of Research. I am proposing some changes to
> the current XML schema based on those experiences.
>
> The current XML schema presents a number of challenges for both the
> people who are creating dump files as the people who are consuming the
> dump files. Challenges include:
>
> 1) The embedded structure of the schema, a single <page> tag with
> multiple <revision> tags makes it very hard to develop an incremental
> dump utility
> 2) A lot of post processing is required.
> 3) By storing the entire text for each revision, the dump files are
> getting so large that they become unmanageable for most people.
>
>
> 1. Denormalization of the schema
> Instead of having a <page> tag with multiple <revision> tags, I
> propose to just have <revision> tags. Each <revision> tag would
> include a <page_id>, <page_title>, <page_namespace> and
> <page_redirect> tag. This denormalization would make it much easier to
> build an incremental dump utility. You only need to keep track of the
> final revision of each article at the moment of dump creation and then
> you can create a new incremental dump continueing from the last dump.
> It would also easier to restore a dump process that crashed. Finally,
> tools like Hadoop would have a way easier time handling this XML
> schema than the current one.
>
>
> 2. Post-processing of data
> Currently, a significant amount of time is required for
> post-processing the data. Some examples include:
> * The title includes the namespace and so to exclude pages from a
> particular namespace requires generating a separate namespace
> variable. Particularly, focusing on the main namespace is tricky
> because that can only be done by checking whether a page does not
> belong to any other namespace (see bug
> https://bugzilla.wikimedia.org/show_bug.cgi?id=27775).
> * The <redirect> tag currently is either True or False, more useful
> would be the article_id of the page to which a page is redirected.
> * Revisions within a <page> are sorted by revision_id, but they should
> be sorted by timestamp. The current ordering makes it even harder to
> generate diffs between two revisions (see bug
> https://bugzilla.wikimedia.org/show_bug.cgi?id=27112)
> * Some useful variables in the MySQL database are not yet exposed in
> the XML files. Examples include:
> - Length of revision (part of Mediawiki 1.17)
> - Namespace of article
>
>
> 3. Smaller dump sizes
> The dump files continue to grow as the text of each revision is stored
> in the XML file. Currently, the uncompressed XML dump files of the
> English Wikipedia are about 5.5Tb in size and this will only continue
> to grow. An alternative would be to replace the <text> tag with a
> <text_added> and <text_removed> tags. A page can still be
> reconstructed by patching multiple <text_added> and <text_removed>
> tags. We can provide a simple script / tool that would reconstruct the
> full text of an article up to a particular date / revision id. This
> has two advantages:
> 1) The dump files will be significantly smaller
> 2) It will be easier and faster to analyze the types of edits. Who is
> adding a template, who is wikifying an edit, who is fixing spelling
> and grammar mistakes.
>
>
> 4. Downsides
> This suggestion is obviously not backwards compatible and it might
> break some tools out there. I think that the upsides (incremental
> backups, Hadoop-ready and smaller sizes) outweigh the downside of
> being backwards incompatible. The current way of dump generation
> cannot continue forever.
>
> [0] http://strategy.wikimedia.org/wiki/Editor_Trends_Study,
> http://strategy.wikimedia.org/wiki/March_2011_Update
> [1] http://blog.wikimedia.org/2011/06/01/summerofresearchannouncement/
>
> I would love to hear your thoughts and comments!
>
> Best,
> Diederik
Hi everyone!
I've been working on a mediawiki extension to replace the WP1.0
bot[1]. The major aim is to make it easier to select and export
article selections for various offline collections. It's at a stage
where it could be considered 'feature reasonably complete' (90%
feature complete, now to do the other 90% :) ).
The code is in the repository[2] and is named GPoC[3]. I'm hoping the
devs with more experience in MW extension development will be able to
CR my code and tell me what I can do to make it better. Ultimate aim
is to make this 'deployment quality on WMF servers'. Pointers on what
more needs to be done would be helpful. The commits on MW:CR are
available here[4].
Thanks everyone! :)
[1]: http://en.wikipedia.org/wiki/User:WP_1.0_bot
[2]: http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/GPoC
[3]: It means 'GSoC Proof of Concept', and I'll be glad if someone
could find a better name
[4]: http://www.mediawiki.org/w/index.php?path=trunk%2Fextensions%2FGPoC&title=S…
--
Yuvi Panda T
http://yuvi.in/blog
Right now in wide usage on wiki markup like so is used:
{|
|-
|valign=top width=100%|
|}
The bgcolor, cellpadding, cellspacing, valign, align, width, height,
etc... presentational attributes have all been completely removed from
html5 and pages using these attributes aren't valid.
There's no way we'll expect all the instances of valign and width to
disappear from every wiki on their own. And frankly in the context of
authoring WikiText I don't believe the user should have to care about
that and be forced to write a longer style line.
What are people's opinions on the idea of taking these removed
presentational attributes, and turning them into sugared parts of
WikiText that are output as actual css in the output.
The change would essentially mean that this:
|valign=top width=100%|
Would become:
<td style="vertical-align: top; width: 100%;">
Instead of this:
<td valign="top" width="100%">
I can only find one downside. Text browsers like w3m do make use of
valign but don't support css, hence the change does make the valign
revert to normal vertically centered alignment.
I should make a few notes:
- This doesn't even affect all text browsers. lynx doesn't display
tables in a tabular form and hence doesn't care what type of alignment
attributes you have.
- This has absolutely nothing to do with web accessibility; Screen
readers output to things like audio and braille, and hence don't display
things visually so alignment means nothing to them. And w3 appears to
restrict users with poor eyesight to proper css capable browsers.
Standards for web accessibility for users with poor eyesight seams
focused on things like ensuring usability with screen zooms and larger
fonts, rather than expecting users with bad eyesight to use text browsers.
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Conrad Irwin requested deployment of his Transliterator extension two
years ago. Reviews have been slow in coming but they're finally in.
However, looking at Conrad's SVN activity
(http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/conrad),
I'm not sure he is actually around to implement the changes needed.
https://bugzilla.wikimedia.org/20246
So, two questions: Do the English Wiktionary and French Wiktionary
projects still want the extension installed?
If so, and if Conrad doesn't have the time to respond to the review, is
there anyone that can step in and address the issues raised in
Brion's comments?
Mark.
Hi everyone,
Just wanted to give a quick update on our efforts to provide greater
visibility/transparency to the engineering efforts for the 2011 WMF
fundraiser [0].
For those of you that don't know, the fundraiser engineering team is
implementing some agile methodologies for its development process.
Currently, we're working through our backlog of user stories in two-week
development sprints. I've begun chronicling our sprint 'retrospectives'
[1][2], held at the end of each sprint where we reflect on what worked/what
didn't work during the previous sprint.
In addition, we'll soon be making regular blog posts at the end of each
sprint highlighting achievements/setbacks/etc. during our development cycles
- you can expect to see the first one soon.
It's my hope that these updates will help provide more visibility into the
wild world of fundraiser engineering and share the successes/failures of our
new development process. Hopefully this will prove valuable for other
developers/teams.
You can find more information about the upcoming 2011 fundraiser (beyond our
engineering efforts) on MetaWiki [3].
Any feedback is welcome!
Thanks,
Arthur
[0] http://www.mediawiki.org/wiki/2011_Wikimedia_fundraiser
[1] http://en.wikipedia.org/wiki/Retrospective#Software_development
[2] http://www.mediawiki.org/wiki/2011_Wikimedia_fundraiser/sprints
[3] http://meta.wikimedia.org/wiki/Fundraising_2011
--
Arthur Richards
Software Engineer
Fundraising/Features/Offline/Mobile
[[User:Awjrichards]]
IRC: awjr
+1-415-839-6885 x6687
On Thu, Aug 11, 2011 at 11:13 PM, Jay Ashworth <jra(a)baylink.com> wrote:
> ----- Original Message -----
> > From: "Platonides" <Platonides(a)gmail.com>
>
> > People usually set it to 5000 and use the 'search in this page'
> > feature of their browser. Which is far from convenient.
> >
> > For a 2 years old bug requesting that needed feature, see
> > https://bugzilla.wikimedia.org/show_bug.cgi?id=20858
>
> Noted. Though I tend, myself, not to get the 'suggested implementation of
> fix' quite so tangled up in the bug report.
>
> How are those messages *implemented*, internally? Are they in a page
> namespace not exposed to the standard system text search? Or are they
> just hardwired in somehow?
>
When loading up a MediaWiki: message page where there is no local page data,
the string from the matching message/language is pulled from the
localization system's base arrays. You can see this in
Article::getContent().
This is reasonably transparent for looking at a page and getting initial
text when editing it, but obviously means that those pages don't exist in:
* page lists
* contributions
* history
* recent changes
* search index
Any message pages that *have* been edited and have actual local content will
show up in all the above, just like other pages.
In principle the search engine interface could probably be extended to
include something that would search through (the current language? site
language? all languages?) when doing searches that include the MediaWiki:
namespace so you end up with something that feels kind of like having them
all in the base index whether they have been saved or not. But that doesn't
exist at present.
-- brion