Hey,

For some time now, it has been clear that a lot of people have use for the code that can serialize and deserialize the Wikibase DataModel. Anyone interacting with the API or the dumps and doing non-trivial things with the data benefits from being able to use the domain objects provided by the DataModel. While those themselves are reusable for some time already, the code responsible for serialization is not. As the involved code also suffers from serious design issues and is a sizeable chunk of our technical debt, the idea is to create a new shiny dedicated component that is not bound to Wikibase Repo or Wikibase Client.

As interest has been expressed in contribution to this, I'll briefly outline the general idea, upcoming steps and relevant resources. If you are not interested in contributing, you can stop reading here :)

I have created a new git repo for this component, which can be found here https://github.com/wmde/WikibaseDataModelSerialization

This component should be to the DataModel component, what AskSerialization [0] is to the Ask library [1]. The approach and structure should be very similar. A few stubs have been added to illustrate how to organize the code, and autoloading and test bootsrap are in place, so the tests can be run by executing "phpunit" in the root directory. The existing legacy code for this serialization functionality can be found in Wikibase.git, in lib/serializers.

I myself will be working on a very similar component which is aimed at solving the technical debt around the serialization code for the format used by the Wikibase Repo data access layer. This code resides in [2] and follows much the same approach as the new component for the serialization code dealing with the public format. Once the most critical issues this new code will solve are tackled, I will likely start work on the former component.

Things to keep in mind when contributing to this component:

* https://www.mediawiki.org/wiki/Wikibase/Coding_conventions
* Almost no code should know concrete instances of serializers. Program against the interfaces. Ie, when the constructor of a serializer for a higher level object (ie SnakSerializer) takes a collaborator for a lower level one (ie DataValueSerializer), type hint against the interface (ie "Serializer $dataValueSerializer").
* Unit tests for all code should be provided. And round trip tests for matching serializers and deserializers. As well as high level serialization and deserialization integration tests.
* Write clean tests, with descriptive method names, ie as done in https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/unit/Deserializers/SnakDeserializerTest.php

[0] https://github.com/wmde/AskSerialization
[1] https://github.com/wmde/Ask
[2] https://github.com/wmde/WikibaseInternalSerialization

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--