Wikitech-l March 2012

wikitech-l@lists.wikimedia.org

155 participants
200 discussions

by mah＠wikimedia.org

What: GIT Bug Triage Where: #wikimedia-dev http://webchat.freenode.net/?channels=#wikimedia-dev When: Wednesday, March 28, 2012 at 19:00UTC http://hexm.de/gu Etherpad: http://etherpad.wikimedia.org/BugTriage-2012-03 Niklas and others have run into problems since the git migration. After looking at Chad's schedule (but not yet talking to him directly, so fingers crossed) I've set up a time for a triage tomorrow to cover these issues. Hope to see you there! Mark.

12 years, 1 month

Community managed production environment

by Petr Bena

Hi, I would like to propose the following idea We already started working on a new virtual cluster known as labs (wmflabs.org) which purpose is to allow people develop stuff and later move it to some production, some time ago. I believe it would be nice to have exactly same environment (probably we could just extend wmflabs for that) running probably on same platform (virtual cluster managed through some site, using nova extension) which would have exactly same possibilities but it would be supposed to run final products (not a testing environment as labs, but "production" where the stable version would live) Why do we need this? Wikimedia labs will offer cloned db of production in future which would allow it to run community managed tools like http://toolserver.org/~quentinv57/tools/sulinfo.php and similar. I think it would be best if such tools were developed using labs as a testing platform and stable version pushed to this "production" which should only run the stable code. In fact it doesn't even need to be physically another cluster, just another set of virtual instances isolated from testing environment on labs. The environment would have restrictions which we don't have on labs. People would need to use puppet and gerrit mostly for everything, and root would not be given to everyone in this environment (some projects might be restricted to wmf ops only), so that we could even move all stable bots, we currently host on wmflabs there, without being afraid of leaking the bot credentials and such (that's a reason why bots project is restricted atm). Also the applications which ask for wikimedia credentials could be allowed there, since the code living on this "production" would be subject of review, and such projects which could mean security risk could be managed by wmf ops only (the changes could be done by volunteers but would need to be submitted to gerrit). We could also move some parts of current production to this "community managed" environment. I talked to Roan Kattouw in past regarding moving the configuration of wikimedia sites to some git repository so that volunteers could submit some patches to gerrit or handle bugzilla reports without needing shell access. Changes to production config would be merged by operation enginners, so that it would be completely secure. In a nutshell: This environment could be set up on same platform as wmf labs (no extra costs, just hard work :)), stable products (bots, user scripts) would be living there, while labs would serve only for development and nothing else. The production version would live on another domain, like wikimedia-tools.org or wmtools.org Thanks for your comments and responses

12 years, 1 month

Re: [Wikitech-l] [Wikitext-l] Fwd: Cutting MediaWiki loose from wikitext

by Daniel Kinzler

On 27.03.2012 13:07, vitalif(a)yourcmc.ru wrote: >> JSON is the internal serialization format. > > You're suggesting to use MediaWiki as a model :) > What's stopping you from implementing it as a _file_ handler, not _article_ > handler? Because of the actions I want to be able to perform on them, most importantly editing, but also having diff views for the history, automatic merge to avoid edit conflicts, etc. These types of interaction is supported by mediawiki for "articles", but not for "files". In constrast, files are rendered/thumbnailed (we don't need that), get included in articles with a box and caption (we don't want that), and can be accessed/downloaded directly as a file via http (we definitely don't want that). So, what we want to do with the structured data fits much better with MediaWiki's concept of a "page" than with the concept of a "file". > I mean, _articles_ contain text (now wikitext). > All non-human readable/editable/diffable data is stored as "files". But that data WILL be readable/editable/diffable! That's the point! Just not as text, but as something else, using special viewers, editors, and differs. That's precisely the idea of the ContentHandler. > Now they all are in File namespace, but maybe it's much more simpler to allow > storing them in other namespaces and write file handlers for displaying/editing > them than to break the idea of "article"? How does what I propose break the idea of an article? It just means that articles do not *necessarily* contain text. And it makes sure that whatever it is that is contained in the article can still be viewed, edited, and compared in a meaningful way. -- daniel

12 years, 1 month

Re: [Wikitech-l] [Wikitext-l] Fwd: Cutting MediaWiki loose from wikitext

by Daniel Kinzler

On 27.03.2012 09:33, Oren Bochman wrote: > 1. JSON - that's not a very reader friendly format. Also not an ideal format > for the search engine to consume. This is due to lack of Support for > metadata and data schema. XML is universally supported, more human friendly > and support a schema which can be useful way beyond their this initial . JSON is the internal serialization format. It will not be shown to the user or used to communicate with clients. Unless of course they use JSON for interaction with the web API, as most do. The full text search engine will be fed a completely artificial view of the data. I agree that JSON wouldn't be good for that, though XML would be far worse still. As to which format and data model to use to represent Wikidata records internally: that's a different discussion, independent of the idea of introducing ContentHandlers to MediaWiki. Please post to wikidata-l about that. > 2. Be bold but also be smart and give respect where it is due. Bots and > everyone else who's written tools for and about MediaWiki, who made a basic > assumption about the page structure would be broken. Many will not so readily > adapt. I agree that backwards compatibility is very important. Which is why I took care not to break any code or client using the "old" interface on pages that contain wikitext (i.e. the standard/legacy case). The current interface (both, the web API as well as methods in MediaWiki core) will function exactly as before for all pages that contain wikitext. For pages not containing wikitext, such code can not readily function. There are two options here (currently controlled by a global setting): pretend the page is empty (the default) or throw an error (probably better in case of the web API, but too strict for other uses). > 3. A project like wikidata - in its infancy should make every effort to be > backwards compatible, It would be far wiser to be place wikidata into a page > with wiki source using an custom <xml/> tag or even <cdata/> xhtml tag. I strongly disagree with that, it introduces more problems than it solves; Denny and I decided against this option specifically in the light of the experience he collected with embedding structured data in wikitext in Semantic MediaWiki and Shortipedia. But again: that's a different discussion, please post your concerns to wikidata-l. Regards, Daniel

12 years, 1 month

Re: [Wikitech-l] [Wikitext-l] Fwd: Cutting MediaWiki loose from wikitext

by Daniel Kinzler

On 27.03.2012 02:19, Daniel Friesen wrote: > Non-wikitext data is supposed to give extensions the ability to do things beyond > WikiText. The data is always going to be an opaque form controlled by the > extension. > I don't think that low level serialized data should be visible at all to > clients. Even if they know it's there. The serialized form of the data needs to be visible at least in the XML dump format. How else could we transfer non-wikitext content between wikis? Using the serialized form may also make sense for editing via the web API, though I'm not sure yet what the best ways is here: a) keep using the current general, text based interface with the serialized form of the content or b) require a specialized editing API for each content type. Going with a) has the advantage of that it will simply work with current API client code. However, if the client modifies the content and writes it back without being aware of the format, it may corrupt the data. So perhaps we should return an error when a client tries to edit a non-wikitext page "the old way". The b) option is a bit annoying because it means that we have to define a potentially quite complex mapping between the content model and API's result model (nested php arrays). This is easy enough for Wikidata, which uses a JSON based internal model. But for, say, SVG... well, I guess the specialized mapping could still be "escaped XML as a string". Note that if we allow a), we can still allow b) at the same time - for Wikidata, we will definitely implement a special purpose editing interface that supports stuff like "add value for language x to property y", etc. > Just like database schemas change, I expect extensions to also want to alter the > format of data as they add new features. Indeed. This is why in addition to a data model identifier, the serialization format is explicitly tracked in the database and will be present in dumps and via the web API. > Also I've thought about something like this for quite awhile. One of the things > I'd really like us to do is start using real metadata even within normal > WikiText pages. We should really replace in-page [[Category:]] with a real > string of category metadata. Which we can then use to provide good intuitive > category interfaces. ([[Category:]] would be left in for templates, > compatibility, etc...). That could be implemented using a "multipart" content type. But I don't want to get into this too deeply - multipart has a lot of cool uses, but it's beyond what we will do for Wikidata. > This case especially tells me that raw is not something that should be > outputting the raw data, but should be something which is implemented by > whatever implements the normal handling for that serialized data. you mean action=raw? yes, I agree. action=raw should not return the actual serialized format. It should probably return nothing or an error for non-text content. For multipart pages it would just return the "main part", without the "extensions". But the entire "multipart" stuff needs more thought. It has a lot of great applications, but it's beyond the scope of Wikidata, and it has some additional implications (e.g. can the old editing interface be used to edit "just the text" while keeping the attachments?). -- daniel

12 years, 1 month

Caveat: git merge and git cherry-pick don't add "Change-Id" by default

by Marcin Cieslak

When playing with moving (cherry picking) commits from master to one of the release branches (REL1_19) I noticed that "git cherry-pick" and "git merge" do not invoke "commit-msg" hook and therefore don't add Change-Id to the commits. This is normally not a problem for people who are allowed changes without "Change-Id" to the repository (i.e. trunk gatekeepers), but this may add some problems for committers at large. 1) you can't push result of such merge or cherry-pick for review 2) you can't directly cherry-pick a commit imported from SVN into git master (those have no change-ID's) to one of the branches. 3) merge commits done by gatekeepers are not visible to gerrit (you can't find them for example when searching by ID). Therefore we loose ability to comment on them if necessary. Here is an example how it failed on me today: https://www.mediawiki.org/w/index.php?title=Git/Workflow&diff=prev&oldid=51… Sometimes a "git commit -c <original commit>" or "git commit --amend" are able to fix the issue, because "git commit" DOES invoke the hook. This will be especially important for developers wanting to propose their changes into release branches or deployment branches. This may also bite you if you are using some private branches, all commits there have Commit-IDs but merges will not have and you may have hard time push it all together for review. It is also interesting to see how Change-IDs represent multiple review items (I52150208654fa14e02b6d80fb2cff4108089ef6c is https://gerrit.wikimedia.org/r/3713 and https://gerrit.wikimedia.org/r/3714). I think the right workaround right now is to make sure all commits (even merges or things going directly into git master for some reason) have their gerrit "Change-ID". With cherry pick it can be done by doing it in two stages: git cherry-pick -n <revision> git commit -c <revision> (but you might want to improve the commit message). Something similar can be done for merges probably (there is --no-commit option to git merge). There are probably some other git commands causing automatic commit that may have this problem. Fast forwards are fine, as they don't produce commits. I did some of the testing today: https://gerrit.wikimedia.org/r/#q,project:test/mediawiki/core+owner:saper%2… //Saper

12 years, 1 month

Students: apply for GSoC between now and April 6th

by Sumana Harihareswara

If you want to participate in Google Summer of Code, you must submit an application via http://www.google-melange.com/ before April 6, 19:00 UTC. google-melange.com is now open and accepting applications. "Wikimedia Foundation" is the organization you're submitting an application to. -- Sumana Harihareswara Volunteer Development Coordinator Wikimedia Foundation

12 years, 1 month

PHPUnit and table visibility

by Christian Aistleitner

Dear everyone, our current PHPUnit setup uses temporary tables for MySQL; tables that are visible only to the current connection. When trying to test core objects (e.g.: BackupDumper) that spawn a separate database connection, this causes trouble, as this new database connection does not see the temporary tables of the first connection. How to test such objects? Possible ways around might be to (i) modify the tested object, (ii) modify the PHPUnit setup, (iii) force a special load balancer for testing, or (iv) completely mock the database for tests (when testing non backend objects). Any suggestions / preferences? Kind regards, Christian P.S.: As this might affect the discussion: A different incarnation of the same problem is that our PHPUnit test suite currently does not run on a MySQL cluster (without passing --use-normal-tables). -- ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ---- Companies' registry: 360296y in Linz Christian Aistleitner Gruendbergstrasze 65a Email: christian(a)quelltextlich.at 4040 Linz, Austria Phone: +43 732 / 26 95 63 Fax: +43 732 / 26 95 63 Homepage: http://quelltextlich.at/ ---------------------------------------------------------------

12 years, 1 month

Cloning all the git migrated projects

by Platonides

As we have now lots of repositories: https://gerrit.wikimedia.org/r/#admin,projects I'm sharing a nifty script to clone them all at once. http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/code-utils/clone-all.… (edit $skipRepositories at the top to ignore another set, such as integration/*)

12 years, 1 month

Wikipedia Url Shortner Service

by Andrea Stagi

Hi all, working on Wikipedia mobile app I realized that urls we want to share are too long sometimes and this is a problem when your maximum number of characters is limited. I think that Wikipedia should provide an own shortner url service, like goo.gl, bit.ly etc.. There's just an unofficial service for English Wikipedia: enwp.org... Petan on IRC proposed something like this: wm.org/language/Article for example wm.org/en/Prague Ok, wm.org is not free actually, but we can think about something similar... What do you think? Regards, =.4.S.= -- =.4ndrea.Stagi.=

12 years, 1 month

← Newer
1
2
3
4
5
6
7
8
9
...
20
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2012