Hey,
For some time now, it has been clear that a lot of people have use for the code that can serialize and deserialize the Wikibase DataModel. Anyone interacting with the API or the dumps and doing non-trivial things with the data benefits from being able to use the domain objects provided by the DataModel. While those themselves are reusable for some time already, the code responsible for serialization is not. As the involved code also suffers from serious design issues and is a sizeable chunk of our technical debt, the idea is to create a new shiny dedicated component that is not bound to Wikibase Repo or Wikibase Client.
As interest has been expressed in contribution to this, I'll briefly outline the general idea, upcoming steps and relevant resources. If you are not interested in contributing, you can stop reading here :)
I have created a new git repo for this component, which can be found here https://github.com/wmde/WikibaseDataModelSerialization
This component should be to the DataModel component, what AskSerialization [0] is to the Ask library [1]. The approach and structure should be very similar. A few stubs have been added to illustrate how to organize the code, and autoloading and test bootsrap are in place, so the tests can be run by executing "phpunit" in the root directory. The existing legacy code for this serialization functionality can be found in Wikibase.git, in lib/serializers.
I myself will be working on a very similar component which is aimed at solving the technical debt around the serialization code for the format used by the Wikibase Repo data access layer. This code resides in [2] and follows much the same approach as the new component for the serialization code dealing with the public format. Once the most critical issues this new code will solve are tackled, I will likely start work on the former component.
Things to keep in mind when contributing to this component:
* https://www.mediawiki.org/wiki/Wikibase/Coding_conventions * Almost no code should know concrete instances of serializers. Program against the interfaces. Ie, when the constructor of a serializer for a higher level object (ie SnakSerializer) takes a collaborator for a lower level one (ie DataValueSerializer), type hint against the interface (ie "Serializer $dataValueSerializer"). * Unit tests for all code should be provided. And round trip tests for matching serializers and deserializers. As well as high level serialization and deserialization integration tests. * Write clean tests, with descriptive method names, ie as done in https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/unit...
[0] https://github.com/wmde/AskSerialization [1] https://github.com/wmde/Ask [2] https://github.com/wmde/WikibaseInternalSerialization
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --
Hey,
IRC just made it clear I forgot to touch one important point: how to contribute :)
At this point the code is only on GitHub. The workflow there is to create a fork, push your modifications to the fork (optionally on a branch), and send a pull request. There is no mirror on gerrit yet, though if anyone prefers that worklow, feel free to set the things up and point me to your commits once submitted for review.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 -
- Almost no code should know concrete instances of serializers.
Program against the interfaces. Ie, when the constructor of a serializer for a higher level object (ie SnakSerializer) takes a collaborator for a lower level one (ie DataValueSerializer), type hint against the interface (ie "Serializer $dataValueSerializer").
Hmm, this causes some other design issues. We will have a SnakSerializer which has to take a $dataValueSerializer. Of course we should not specify a concrete type because of flexibility, but a problem of this solution is that wrong Serializers that cannot handle DataValues will only throw an exception at runtime but do not show a warning when "compiling". Do you have any suggestions how we can avoid those issues?
Best regrads, Bene*
Hey,
Hmm, this causes some other design issues. We will have a SnakSerializer
which has to take a $dataValueSerializer. Of course we should not specify a concrete type because of flexibility, but a problem of this solution is that wrong Serializers that cannot handle DataValues will only throw an exception at runtime but do not show a warning when "compiling". Do you have any suggestions how we can avoid those issues?
Good observation. This, a decrease in type safety, is a typical trade-off one makes when creating boundaries. This means the code constructing a SnakSerializer is responsible for making sure it is feeding in a Serializer implementation that can serialize data values. A small price to pay for the decoupling and flexibility gained.
Also note that not a lot of code should be constructing these service objects, and know about the dependencies. In fact, this likely should only be done in one place, which is a factory provided by the library. An example of this can be seen here: https://github.com/wmde/AskSerialization/blob/master/src/Ask/DeserializerFac...
Another important thing to realize is that if one writes the right component and integration tests, any incorrect wiring up of the serializes will be found.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --
Hi,
Hmm, this causes some other design issues. We will have a SnakSerializer which has to take a $dataValueSerializer. Of course we should not specify a concrete type because of flexibility, but a problem of this solution is that wrong Serializers that cannot handle DataValues will only throw an exception at runtime but do not show a warning when "compiling". Do you have any suggestions how we can avoid those issues?
Good observation. This, a decrease in type safety, is a typical trade-off one makes when creating boundaries. This means the code constructing a SnakSerializer is responsible for making sure it is feeding in a Serializer implementation that can serialize data values. A small price to pay for the decoupling and flexibility gained.
Ok, this sounds reasonable.
Also note that not a lot of code should be constructing these service objects, and know about the dependencies. In fact, this likely should only be done in one place, which is a factory provided by the library. An example of this can be seen here: https://github.com/wmde/AskSerialization/blob/master/src/Ask/DeserializerFac...
Thanks for the example. I think this will be the next step after we have finished the serializers and deserializers.
Another important thing to realize is that if one writes the right component and integration tests, any incorrect wiring up of the serializes will be found.
I see that if we write good tests all problems with types will be found at least when running the tests. This sounds very reasonable. Thank you for your explanations. :-)
Best regards, Bene*
Hey all,
I am happy to announce the first release of the WikibaseDataModelSerialization component!
https://github.com/wmde/WikibaseDataModelSerialization/blob/master/README.md
Many kudos to Tpt, who did most of the work in migrating our serialization code to this new format!
This release provides serializers and deserializers for the main Wikibase DataModel objects. Nearly all code is covered by unit and integration tests, and addshore has tried it successfully against the wikidata.org API. (See [0] and [1].) This means it is in good shape, though not everything has been implemented yet.
The next step for this component is to reach full feature-parity with our old serialization and deserialization code in Wikibase Lib, so this old code can be replaced. Once this has happened, we will create the 1.0 release.
[0] https://www.irccloud.com/pastebin/t5ewbv7W [1] https://github.com/addwiki/wikibase-api/blob/f0403aab8c2121df25bd349fbf85e86...
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --
wikidata-tech@lists.wikimedia.org