Nothing changed on my side, do it as long as it is possible to keep the git history.

2013/7/7 Jeroen De Dauw <jeroendedauw@gmail.com>
Hey,

A month ago we had some discussion on our internal list on only having a single component per git repo. I now want to follow up on this. Since there is no reason to do this on the internal list, I'm now doing it here and including the original mail.

WikibaseQueryEngine and WikibaseQuery have been split off, and more excitingly, we finally managed to do this with the DataModel component as well. Now the dust has settled from those changes, I think it is time to look at the next one: DataValues.

Right now this repo contains 6 components. Splitting this up thus means 5 new repos. We can do this one by one, though it probably makes sense to do it close together. I suggest starting with ValueView and DataTypes, as these are currently being the most awkward dependency wise.

Any objections against starting work on this next week (ie the one that starts tomorrow)? (Do read my original mail (below) first in case you have not yet already done so.)


Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--


---------- Forwarded message ----------
From: Jeroen De Dauw <jeroendedauw@gmail.com>
Date: 9 June 2013 01:28
Subject: One component per repository
To: Wikidata-intern <wikidata-intern@wikimedia.de>


Hey,

As you all know, right now we have git repositories containing multiple components. There are two main reasons for this:

* We started with different repos for client, lib and repo and then had a lot of hassle with moving code between them. This reason seems mainly historical at this point. This kind of code moving happening only very rarely at this point, at least in all components that are not client, lib or repo. It might still be going on to some extend in those three, though I suspect this has decreased since the start of the project, probably due to us more clearly defining the component boundaries. In fact, I think the primary reason we ended up moving so many things in these components was caused by those boundaries being so poorly defined.

* Installation hassle. Burdening regular users of the software to know about all the dependencies, the required versions and having to load them in a specific version in LocalSettings is a bad idea. Ideally they should not have to know about the dependencies at all. So putting everything in a single repo is a reasonable idea if you lack a package management system. Luckily we are now very close to having Composer work for wikibase, which enables us to have as many packages as we deem fit, without bothering the users with them. So that will not only mitigate making the installation process more complex, it will reduce current complexity (since right now people do need to care about the dependencies (Diff, Ask, DataValues, ...)).

In short, the main reasons for multiple components per repo no longer apply in our current situation. OTOH there are some clear disadvantages:

* People in the PHP world tend to assume there is only one component in a git repo. This is reflected in mainstream tools such as Composer and TravisCI having this assumption build in. It is not possible (AFAIK) to properly use Composer if you have all your packages in a single repo. For instance, it is currently not possible to list ValueParsers as a distinct component from DataValues in the package repositories. It is also not easily doable to have Travis do a build for each component, you'd have to add a two tier build process, which is a lot of added complexity.

* Since the packages contain multiple components, it is less easy to keep track of changes to a single component. Or to make use of just a single component, without having to care about the other ones. Reusability and likelihood of outside contributions are decreased if your package contains a ton of not applicable stuff to the person that just wants to use one of the components in there.

* The notion that we cannot create new repositories for new components promotes harmful practices. These components will just be put somewhere based on what currently needs them, and possibly need to be moved closer to the root of the dependency tree. A good example of this is the ValueView code. If I'm not mistaken, some of this was originally in repo, then in lib (same package but different component), and then in the DataValues repo. The last step was made since we happened to be loading the DataValues repo everywhere we need this JavaScript code. ValueView certainly does not belong in the same repo however. Both components do very different things and are even written in different languages, so they will have different consumers, different contributors and different reasons for change.

* More clear and explicit dependency structure. Makes messing up of dependencies easier to detect, and makes understanding the individual components easier.

...

Given that, I suggest we stop the earlier pattern of putting multiple components in the same repo, and work on moving misplaced components into new repos. I will be doing this for the Database, Query and QueryEngine components in the coming days, as these are not needed by any functionality currently enabled anywhere. I suggest we do the same for the components in DataValues and for DataModel. We can probably leave lib, repo and client as they are, at least for now. Those are not really reusable in their current form anyway, and are the hardest ones to make improvements to. Plus we might want to tackle fixing issues with lib first. Putting DataModel in its own repo is currently blocked by remaining legacy dependencies, which need to be resolved first. That leaves the components in the DataValues repo as the thing to start with. I suggest we tackle that one as soon as everyone of the team got a more clear idea of how this will work with Composer and when there are no new commits on gerrit against the relevant component(s).

"Oh no, we will end up with a million git repos now!" Not really. We have 14 components right now, and this number has not increased a lot lately. The only recent additions are Database and QueryEngine. We might end up with a few more over time, but not hundreds, or even dozens.

"I still don't want to care about 14 different repos!" The situation for developers will not get worse (else I'd not be a fan of the idea to begin with), and will actually improve in places. For instance, when working on the QueryEngine component, you will now no longer need all this unrelated client, lib, repo, datatypes, valueparsers, valuevalidators, valueformatters and valueview code. If someone breaks the build for one of those, you can happily not care, since that code is not relevant to the work you are doing. You'll also have better CI support. And if you are not working on any of the components you depend on, for instance when working just on client, you can use Composer much like a regular user and not give a damn about all indirect dependencies.

</walloftext><other people raging>

Cheers

--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--


_______________________________________________
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech




--
Daniel Werner
Software Engineer

Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. (030) 219 158 26-0

http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.