WikibaseLib is a horrible kitchen sink, and I don't want to add more to the mess. So I want to put the usage tracking code into sensible packages. However, I'm a bit at a loss as to how to best split the different responsibilities into packages. Here are some of the communication needs we have, implying which code needs to be shared between repo and client:
The client needs to:
* load entity data ** need to share entity storage code ** but should not know about EntityContent ** and should have no write access
* look up properties by label, and look up labels of items ** need to share term storage code ** no need for write access ** no need for code for constraints checks, etc. ** should not have related related maintenance scripts or schema update code
* look up data types for properties ** need to share property info storage code ** no need for write access ** should not have related maintenance scripts or schema update code
* load change details ** need to share change table storage code and value objects ** no need for write access ** no need for dispatching logic ** also should not have schema update code
* look up sitelinks by page title ** need to share link table storage code ** no need for write access ** should not have related maintenance scripts or schema update code
* update notification subscriptions ** need to share subscription storage code ** should not have related maintenance scripts or schema update code
So, there are 6 things the client and the repo both need to access. But the write logic, or at least the maintenance logic, should not be bundled with the leaner "read only" package. So I see 12 new packages... dependency hell.
So, what to do? Have 6 read level packages, and stuff the maintenance logic into a single package not used by the client? Also ugly.
Ideas?
-- daniel
Hey,
Great you are looking into this Daniel!
So, there are 6 things the client and the repo both need to access.
These 6 things listed here do not translate into 6 packages. That has to be considered more carefully.
But the write logic, or at least the maintenance logic, should not be
bundled with the leaner "read only" package.
I disagree with this being a rule or even something that is extremely important. In the end, this is not something we care about directly, and only do to avoid certain pains. I'm highly sceptical that such splitting is warranted for everything, and suggest one first looks at the interface segregation issues that plague the data access code.
Doing this well requires holding into account many more factors than have been outlined, and is probably better done in a more incremental fashion than trying to tackle all these different aspects at once. This makes me think asking the question in this format on the list is not the best way forward. In person discussion focusing on a single component is much better IMO.
Cheers
-- Jeroen De Dauw - http://www.bn2vs.com Software craftsmanship advocate Evil software architect at Wikimedia Germany ~=[,,_,,]:3
Am 01.09.2014 20:27, schrieb Jeroen De Dauw:
Hey,
Great you are looking into this Daniel!
So, there are 6 things the client and the repo both need to access.
These 6 things listed here do not translate into 6 packages. That has to be considered more carefully.
I absolutely agree. My intention was to point out the naive approach as a baseline, as food for thought.
Btw, some interfaces that are currently in lib or repo would be useful to have in the storage level components. Where should these go? A separate WikibaseStorageInterfaces package?
But the write logic, or at least the maintenance logic, should not be bundled
with the leaner "read only" package.
I disagree with this being a rule or even something that is extremely important. In the end, this is not something we care about directly, and only do to avoid certain pains.
We'd at least need different include files for repo side and client side usage (so schema updates can be hooked in where appropriate). It feels kind of ugly to have that in the actual library. Also, running the maintenance script on the "wrong" side of things will simply fail .
So it's not totally critical, but it does seem confusing to me to lump that together.
I'm highly sceptical that such splitting is warranted for everything, and suggest one first looks at the interface segregation issues that plague the data access code.
I agree.
Doing this well requires holding into account many more factors than have been outlined, and is probably better done in a more incremental fashion than trying to tackle all these different aspects at once.
I agree. But I also think it's useful to have a broad overview.
This makes me think asking the question in this format on the list is not the best way forward. In person discussion focusing on a single component is much better IMO.
I often find a face to face discussion useful for resolving a particular issue, but for collecting ideas and getting an overview, a broader discussion on a mailing list is useful in my experience.
Perhaps we can tackle some of this in the "splitting Wikibase.git" discussion today.
-- daniel
I want to put the usage tracking code into sensible packages.
Can you tell me more about this usage tracking? What exactly are you collecting and where do you publish reports and/or the the raw data?
Lukas
On Tue, Sep 2, 2014 at 7:35 AM, Lukas Benedix lukas.benedix@fu-berlin.de wrote:
Can you tell me more about this usage tracking? What exactly are you collecting and where do you publish reports and/or the the raw data?
https://www.wikidata.org/wiki/Wikidata:Development_plan#Data_usage_tracking
Cheers Lydia
wikidata-tech@lists.wikimedia.org