What: GIT Bug Triage
Where: #wikimedia-dev
http://webchat.freenode.net/?channels=#wikimedia-dev
When: Wednesday, March 28, 2012 at 19:00UTC http://hexm.de/gu
Etherpad: http://etherpad.wikimedia.org/BugTriage-2012-03
Niklas and others have run into problems since the git migration. After
looking at Chad's schedule (but not yet talking to him directly, so
fingers crossed) I've set up a time for a triage tomorrow to cover these
issues.
Hope to see you there!
Mark.
Hi,
I would like to propose the following idea
We already started working on a new virtual cluster known as labs
(wmflabs.org) which purpose is to allow people develop stuff and later
move it to some production, some time ago. I believe it would be nice
to have exactly same environment (probably we could just extend
wmflabs for that) running probably on same platform (virtual cluster
managed through some site, using nova extension) which would have
exactly same possibilities but it would be supposed to run final
products (not a testing environment as labs, but "production" where
the stable version would live)
Why do we need this?
Wikimedia labs will offer cloned db of production in future which
would allow it to run community managed tools like
http://toolserver.org/~quentinv57/tools/sulinfo.php and similar. I
think it would be best if such tools were developed using labs as a
testing platform and stable version pushed to this "production" which
should only run the stable code. In fact it doesn't even need to be
physically another cluster, just another set of virtual instances
isolated from testing environment on labs. The environment would have
restrictions which we don't have on labs. People would need to use
puppet and gerrit mostly for everything, and root would not be given
to everyone in this environment (some projects might be restricted to
wmf ops only), so that we could even move all stable bots, we
currently host on wmflabs there, without being afraid of leaking the
bot credentials and such (that's a reason why bots project is
restricted atm). Also the applications which ask for wikimedia
credentials could be allowed there, since the code living on this
"production" would be subject of review, and such projects which could
mean security risk could be managed by wmf ops only (the changes could
be done by volunteers but would need to be submitted to gerrit).
We could also move some parts of current production to this "community
managed" environment. I talked to Roan Kattouw in past regarding
moving the configuration of wikimedia sites to some git repository so
that volunteers could submit some patches to gerrit or handle bugzilla
reports without needing shell access. Changes to production config
would be merged by operation enginners, so that it would be completely
secure.
In a nutshell:
This environment could be set up on same platform as wmf labs (no
extra costs, just hard work :)), stable products (bots, user scripts)
would be living there, while labs would serve only for development and
nothing else.
The production version would live on another domain, like
wikimedia-tools.org or wmtools.org
Thanks for your comments and responses
On 27.03.2012 13:07, vitalif(a)yourcmc.ru wrote:
>> JSON is the internal serialization format.
>
> You're suggesting to use MediaWiki as a model :)
> What's stopping you from implementing it as a _file_ handler, not _article_
> handler?
Because of the actions I want to be able to perform on them, most importantly
editing, but also having diff views for the history, automatic merge to avoid
edit conflicts, etc.
These types of interaction is supported by mediawiki for "articles", but not for
"files".
In constrast, files are rendered/thumbnailed (we don't need that), get included
in articles with a box and caption (we don't want that), and can be
accessed/downloaded directly as a file via http (we definitely don't want that).
So, what we want to do with the structured data fits much better with
MediaWiki's concept of a "page" than with the concept of a "file".
> I mean, _articles_ contain text (now wikitext).
> All non-human readable/editable/diffable data is stored as "files".
But that data WILL be readable/editable/diffable! That's the point! Just not as
text, but as something else, using special viewers, editors, and differs. That's
precisely the idea of the ContentHandler.
> Now they all are in File namespace, but maybe it's much more simpler to allow
> storing them in other namespaces and write file handlers for displaying/editing
> them than to break the idea of "article"?
How does what I propose break the idea of an article? It just means that
articles do not *necessarily* contain text. And it makes sure that whatever it
is that is contained in the article can still be viewed, edited, and compared in
a meaningful way.
-- daniel
On 27.03.2012 09:33, Oren Bochman wrote:
> 1. JSON - that's not a very reader friendly format. Also not an ideal format
> for the search engine to consume. This is due to lack of Support for
> metadata and data schema. XML is universally supported, more human friendly
> and support a schema which can be useful way beyond their this initial .
JSON is the internal serialization format. It will not be shown to the user or
used to communicate with clients. Unless of course they use JSON for interaction
with the web API, as most do.
The full text search engine will be fed a completely artificial view of the
data. I agree that JSON wouldn't be good for that, though XML would be far worse
still.
As to which format and data model to use to represent Wikidata records
internally: that's a different discussion, independent of the idea of
introducing ContentHandlers to MediaWiki. Please post to wikidata-l about that.
> 2. Be bold but also be smart and give respect where it is due. Bots and
> everyone else who's written tools for and about MediaWiki, who made a basic
> assumption about the page structure would be broken. Many will not so readily
> adapt.
I agree that backwards compatibility is very important. Which is why I took care
not to break any code or client using the "old" interface on pages that contain
wikitext (i.e. the standard/legacy case). The current interface (both, the web
API as well as methods in MediaWiki core) will function exactly as before for
all pages that contain wikitext.
For pages not containing wikitext, such code can not readily function. There are
two options here (currently controlled by a global setting): pretend the page is
empty (the default) or throw an error (probably better in case of the web API,
but too strict for other uses).
> 3. A project like wikidata - in its infancy should make every effort to be
> backwards compatible, It would be far wiser to be place wikidata into a page
> with wiki source using an custom <xml/> tag or even <cdata/> xhtml tag.
I strongly disagree with that, it introduces more problems than it solves; Denny
and I decided against this option specifically in the light of the experience he
collected with embedding structured data in wikitext in Semantic MediaWiki and
Shortipedia.
But again: that's a different discussion, please post your concerns to wikidata-l.
Regards,
Daniel
On 27.03.2012 02:19, Daniel Friesen wrote:
> Non-wikitext data is supposed to give extensions the ability to do things beyond
> WikiText. The data is always going to be an opaque form controlled by the
> extension.
> I don't think that low level serialized data should be visible at all to
> clients. Even if they know it's there.
The serialized form of the data needs to be visible at least in the XML dump
format. How else could we transfer non-wikitext content between wikis?
Using the serialized form may also make sense for editing via the web API,
though I'm not sure yet what the best ways is here:
a) keep using the current general, text based interface with the serialized form
of the content
or b) require a specialized editing API for each content type.
Going with a) has the advantage of that it will simply work with current API
client code. However, if the client modifies the content and writes it back
without being aware of the format, it may corrupt the data. So perhaps we should
return an error when a client tries to edit a non-wikitext page "the old way".
The b) option is a bit annoying because it means that we have to define a
potentially quite complex mapping between the content model and API's result
model (nested php arrays). This is easy enough for Wikidata, which uses a JSON
based internal model. But for, say, SVG... well, I guess the specialized mapping
could still be "escaped XML as a string".
Note that if we allow a), we can still allow b) at the same time - for Wikidata,
we will definitely implement a special purpose editing interface that supports
stuff like "add value for language x to property y", etc.
> Just like database schemas change, I expect extensions to also want to alter the
> format of data as they add new features.
Indeed. This is why in addition to a data model identifier, the serialization
format is explicitly tracked in the database and will be present in dumps and
via the web API.
> Also I've thought about something like this for quite awhile. One of the things
> I'd really like us to do is start using real metadata even within normal
> WikiText pages. We should really replace in-page [[Category:]] with a real
> string of category metadata. Which we can then use to provide good intuitive
> category interfaces. ([[Category:]] would be left in for templates,
> compatibility, etc...).
That could be implemented using a "multipart" content type. But I don't want to
get into this too deeply - multipart has a lot of cool uses, but it's beyond
what we will do for Wikidata.
> This case especially tells me that raw is not something that should be
> outputting the raw data, but should be something which is implemented by
> whatever implements the normal handling for that serialized data.
you mean action=raw? yes, I agree. action=raw should not return the actual
serialized format. It should probably return nothing or an error for non-text
content. For multipart pages it would just return the "main part", without the
"extensions".
But the entire "multipart" stuff needs more thought. It has a lot of great
applications, but it's beyond the scope of Wikidata, and it has some additional
implications (e.g. can the old editing interface be used to edit "just the text"
while keeping the attachments?).
-- daniel
When playing with moving (cherry picking) commits from master to one
of the release branches (REL1_19) I noticed that "git cherry-pick"
and "git merge" do not invoke "commit-msg" hook and therefore don't
add Change-Id to the commits.
This is normally not a problem for people who are allowed changes
without "Change-Id" to the repository (i.e. trunk gatekeepers),
but this may add some problems for committers at large.
1) you can't push result of such merge or cherry-pick for review
2) you can't directly cherry-pick a commit imported from SVN
into git master (those have no change-ID's) to one of
the branches.
3) merge commits done by gatekeepers are not visible to gerrit
(you can't find them for example when searching by ID).
Therefore we loose ability to comment on them if necessary.
Here is an example how it failed on me today:
https://www.mediawiki.org/w/index.php?title=Git/Workflow&diff=prev&oldid=51…
Sometimes a "git commit -c <original commit>" or "git commit --amend"
are able to fix the issue, because "git commit" DOES invoke the hook.
This will be especially important for developers wanting to propose
their changes into release branches or deployment branches.
This may also bite you if you are using some private branches,
all commits there have Commit-IDs but merges will not have
and you may have hard time push it all together for review.
It is also interesting to see how Change-IDs represent
multiple review items (I52150208654fa14e02b6d80fb2cff4108089ef6c
is https://gerrit.wikimedia.org/r/3713 and
https://gerrit.wikimedia.org/r/3714).
I think the right workaround right now is to make sure
all commits (even merges or things going directly into git master
for some reason) have their gerrit "Change-ID".
With cherry pick it can be done by doing it in two stages:
git cherry-pick -n <revision>
git commit -c <revision>
(but you might want to improve the commit message).
Something similar can be done for merges probably
(there is --no-commit option to git merge).
There are probably some other git commands causing automatic
commit that may have this problem. Fast forwards are
fine, as they don't produce commits.
I did some of the testing today:
https://gerrit.wikimedia.org/r/#q,project:test/mediawiki/core+owner:saper%2…
//Saper
If you want to participate in Google Summer of Code, you must submit an
application via
http://www.google-melange.com/
before April 6, 19:00 UTC. google-melange.com is now open and accepting
applications. "Wikimedia Foundation" is the organization you're
submitting an application to.
--
Sumana Harihareswara
Volunteer Development Coordinator
Wikimedia Foundation
Dear everyone,
our current PHPUnit setup uses temporary tables for MySQL; tables that
are visible only to the current connection.
When trying to test core objects (e.g.: BackupDumper) that spawn a
separate database connection, this causes trouble, as this new
database connection does not see the temporary tables of the first
connection.
How to test such objects?
Possible ways around might be to
(i) modify the tested object,
(ii) modify the PHPUnit setup,
(iii) force a special load balancer for testing, or
(iv) completely mock the database for tests (when testing non backend objects).
Any suggestions / preferences?
Kind regards,
Christian
P.S.: As this might affect the discussion: A different incarnation of
the same problem is that our PHPUnit test suite currently does not run
on a MySQL cluster (without passing --use-normal-tables).
--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
4040 Linz, Austria Phone: +43 732 / 26 95 63
Fax: +43 732 / 26 95 63
Homepage: http://quelltextlich.at/
---------------------------------------------------------------
Hi all,
working on Wikipedia mobile app I realized that urls we want to share
are too long sometimes and this is a problem when your maximum number
of characters is limited. I think that Wikipedia should provide an own
shortner url service, like goo.gl, bit.ly etc.. There's just an
unofficial service for English Wikipedia: enwp.org... Petan on IRC
proposed something like this:
wm.org/language/Article
for example
wm.org/en/Prague
Ok, wm.org is not free actually, but we can think about something
similar... What do you think?
Regards,
=.4.S.=
--
=.4ndrea.Stagi.=