Hey Steffen and Andy,
Continuing what I started on Twitter here, as some more characters might be
helpful :)
It seems that both our projects (FLOW3 and Wikidata) are in a similar
situation. We are using Gerrit as CR tool, and TravisCI to run our tests.
And we both want to have Travis run tests for all patchsets submitted to
Gerrit, and then +1 or -1 on verified based on the build passing or
failing. To what extend have you gotten such a thing to work on your
project? Is there code available anywhere? If both projects can use the
same code for this, I'd be happy to contribute to what you already have.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hey,
For some time now, it has been clear that a lot of people have use for the
code that can serialize and deserialize the Wikibase DataModel. Anyone
interacting with the API or the dumps and doing non-trivial things with the
data benefits from being able to use the domain objects provided by the
DataModel. While those themselves are reusable for some time already, the
code responsible for serialization is not. As the involved code also
suffers from serious design issues and is a sizeable chunk of our technical
debt, the idea is to create a new shiny dedicated component that is not
bound to Wikibase Repo or Wikibase Client.
As interest has been expressed in contribution to this, I'll briefly
outline the general idea, upcoming steps and relevant resources. If you are
not interested in contributing, you can stop reading here :)
I have created a new git repo for this component, which can be found here
https://github.com/wmde/WikibaseDataModelSerialization
This component should be to the DataModel component, what AskSerialization
[0] is to the Ask library [1]. The approach and structure should be very
similar. A few stubs have been added to illustrate how to organize the
code, and autoloading and test bootsrap are in place, so the tests can be
run by executing "phpunit" in the root directory. The existing legacy code
for this serialization functionality can be found in Wikibase.git, in
lib/serializers.
I myself will be working on a very similar component which is aimed at
solving the technical debt around the serialization code for the format
used by the Wikibase Repo data access layer. This code resides in [2] and
follows much the same approach as the new component for the serialization
code dealing with the public format. Once the most critical issues this new
code will solve are tackled, I will likely start work on the former
component.
Things to keep in mind when contributing to this component:
* https://www.mediawiki.org/wiki/Wikibase/Coding_conventions
* Almost no code should know concrete instances of serializers. Program
against the interfaces. Ie, when the constructor of a serializer for a
higher level object (ie SnakSerializer) takes a collaborator for a lower
level one (ie DataValueSerializer), type hint against the interface (ie
"Serializer $dataValueSerializer").
* Unit tests for all code should be provided. And round trip tests for
matching serializers and deserializers. As well as high level serialization
and deserialization integration tests.
* Write clean tests, with descriptive method names, ie as done in
https://github.com/wmde/WikibaseInternalSerialization/blob/master/tests/uni…
[0] https://github.com/wmde/AskSerialization
[1] https://github.com/wmde/Ask
[2] https://github.com/wmde/WikibaseInternalSerialization
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hi there,
>From the October 24th Wikidata/WMF checkin, there's this note:
== October 24 ==
* deployment from single repo - ok now? Can we get a commitment to try
* this for the next deployment?
** Yes all good to go. Let's do this for the deployment on test on Oct
31st.
What is the status of this? I don't see that (single WikiData repo
manually manged by the wikidata team) in our production and labs
configs.
Why am I bringing this up now? Because there are planned changes to how
WD will do dependencies (what are dependencies where, I guess) that will
break the Beta Cluster if done without coordination. Jeroen is asking
for help so he can work on these changes.
The option proposed which I'm not happy about: turn off automatic
WikiData updates on Beta Cluster. If this has to happen, then we need
someone on WMF's side who's confident with Beta Cluster to be
around/help. Antoine is out on vacation (it is Dec 24th ;) ), but maybe
Timo could help? I'm not sure, honestly.
The option I want to be done: actually implement the single/manual repo
so that there aren't these breakages on Beta/Production.
I don't want to do the first option because it is a stop-gap solution
that is only necessary because the right thing hasn't been done (since
at least Oct 24th?).
What are the needed next steps from WikiData and WMF to finish this?
I see on change from Katie (thanks Katie!):
https://gerrit.wikimedia.org/r/#/c/95996/
* but unless I'm mistaken, that's just one piece of the puzzle.
Is the single repo created in gerrit? If not, we (WMF) should get that
done ASAP (considering holidays).
Thanks all, especially Jeroen for helping clarify some things.
Greg
--
| Greg Grossmeier GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg A18D 1138 8E47 FAC8 1C7D |
Hey,
I’m happy to announce the 0.6 release of Wikibase DataModel. This is the
first real release of this component. The awesome thing about this release
is that it finally makes the component ready for third party usage.
You can find a short description of Wikibase DataModel, some history and
highlights for this release at
http://www.bn2vs.com/blog/2013/12/23/wikibase-datamodel-released/
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--
Hello guys,
we have had a discussion about how entity redirects should be
implemented and there is also an overview on meta.wikimedia.org [1]. Now
the people I discussed with agreed that the second option [2] is the
best one and that we should use it. However, in the recent week there
did not come any more feedback so I suppose that everyone is fine with #2.
My questions concern the further development process of this feature. If
nobody has any objections within 24 hours I will add a note to the page
and would like to start discussing in a more detailed way how the
feature should be implemted. Furthermore I would also like to implement
some first changes to speed up the fix of this change. Please provide me
some feedback and enhancements how the next steps should look like and
what has to be discussed before starting to write some code.
Best regards,
Bene
[1]
https://meta.wikimedia.org/wiki/Wikidata/Development/Entity_redirect_after_…
[2] Implement redirects as a "mode" any *EntityContent* may have
I'm about to sign off for the holidays, until January 6th, so here's a quick
heads up:
For investigating sporadic failures of test cases, I have created a branch of
wikibase on github, which has travis set up for testing:
https://github.com/wmde/Wikibase/tree/fixtravishttps://travis-ci.org/wmde/Wikibase
This branch contains quite a few fixes/improvements to test cases. It would be
good to have them on gerrit soon.
The following tests were identified of (probably) using hard coded entity IDs in
an unhealthy way, but I didn't get around to fixing them yet:
repo/tests/phpunit/includes/api/SetClaimTest.php
repo/tests/phpunit/includes/api/SetQualifierTest.php
repo/tests/phpunit/includes/api/SetReferenceTest.php
repo/tests/phpunit/includes/api/SetSiteLinkTest.php
They should probably be fixed along the same lines as MergeItemsTest, using the
new EntityTestHelper::injectIds method to inject "real" ids for placeholders in
the data the providers return.
Cheers!
Daniel
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Hello all!
Once more, I'm taking a whack at implementing support for entity redirects -
that is, we want Q1234 to become a redirect to Q2345 after we merged Q1234 into
Q2345.
My question is how to best model this.
First off, a quick primer of the PHP classes involved:
* We have the Entity base class, with Item and Property deriving from it, for
modeling entities.
* As "glue" for treating entities as MediaWiki page content, we have the
EntityContent base class, and ItemContent and PropertyContent deriving from that
(along with their handler/factory classes, EntityHandler, ItemHandler and
PropertyHandler).
Presently, each type of Entity is assigned a MediaWiki namespace, and all pages
in that namespace have the corresponding content type.
Now, there are various ways to model redirects. I have tried a few:
1) first I tried implementing redirects as a "mode" any Entity may have. The
advantage is that redirects can seamlessly occur anywhere "real" entities can
occur, e.g. in dumps, diffs, undo operations, etc. However, many operations that
are defined on "real" entities are not defined on redirects (e.g. "set label",
etc), so we would end up with a lot if "if redirect, do this, else, do that"
checks all over the code base.
2) second I tried implementing redirects as a "mode" any EntityContent may have,
following the idea that redirects are really a MediaWiki concept, and should be
implemented outside the Wikibase data model. This still requires a lot of "if
redirect, do this, else, do that" checks, but only in places dealing with
EntityContent, not all places dealing with Entities.
3) third I tried using a separate entity type for representing redirects: a
RedirectContent points to an EntityContent, there is no "mode", no need for
extra checks, no chance for confusion. However, we break a few basic
assumptions, most importantly the assumption that all pages in an entity
namespace contain an EntityContent of the respective type.
None of these solutions seem satisfactory for the indicated reasons, and also
some additional consideration explained below.
4) This lead me to come fully cycle and consider another option: make redirects
a special *type* of Entity, besides Item and Property. This would again allow
straight forward diffs (and thus undo-operations) between the redirected and the
previous, un-redirected version of a page; Also, it would not compromise code
that operates on Item on Property objects. But since there are still many
operations defined for Entity that make no sense for redirects, we would need to
insert another class into the hierarchy (let's call it LabeledEntity) which
would be a base class for Item and Property, but not for Redirect.
This would require quite a bit of refactoring, and would introduce redirects as
a "first order citizen" into the Wikibase data model.
Which of the four options do you prefer? Or can you think of another, better,
option?
Below are some more points to consider when deciding on a design:
In a addition to the OO design questions outlined above, one important question
is how to handle redirects in the wb_entity_per_page table; that table contains
a 1:1 mapping of entity_id <-> page_id. It is used, among other things, for
listing all entities and for checking whether an entity exists. HOw should
redirects be represented in that table? They could be:
a) omitted (making it hard to list redirects),
b) or use the redirect's page ID (maintaining the 1:1 nature, but pointing to a
page that actually does not contain an entity),
c) or use the target entities page ID (breaking the 1:1 nature of the table, but
associating the ID with the accurate place).
For b) and c), a new column would be needed to mark redirects as such.
When deciding on how to represent redirects internally, we need to consider how
they should be handled externally. Of course, when asking for an entity that got
redirected, we should get the target entity's content, be it using the "regular"
HTML page interface, the API, the linked data interface, etc.
In RDF dumps, we would probably want redirects to show up as owl:sameAs
relations (or something similar that allows one URI to be indicated as the
primary one). Should redirects also be present in JSON dumps? If so, should it
also be possible to retrieve the JSON representation of entities from the API,
in places where usually "real" entities would be?
-- daniel
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
We just discussed how to best make sure that any pre- or post-deployment tasks
required by some core changes get done. We agreed on the following:
If a change you submit to gerrit needs some pre- or post-deployment task to be
performed (e.g. applying a database patch, changing a setting, purging some
cache, etc), mention this in the commit message, using the DEPLOYMENT keyword
(at the start of a line starting a new paragraph). E.g.:
DEPLOYMENT: frob the xyzzy when deploying this change!
I have written a brief deployment checklist that also reflects this new convention:
https://meta.wikimedia.org/wiki/Wikidata/Deployment
Please extend/modify as appropriate.
-- daniel
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Am 10.12.2013 22:38, schrieb Brad Jorsch (Anomie):
> Looking at the code, ParserCache::getOptionsKey() is used to get the
> memc key which has a list of parser option names actually used when
> parsing the page. So for example, if a page uses only math and
> thumbsize while being parsed, the value would be array( 'math',
> 'thumbsize' ).
Am 11.12.2013 02:35, schrieb Tim Starling:
> No, the set of options which fragment the cache is the same for all
> users. So if the user language is included in that set of options,
> then users with different languages will get different parser cache
> objects.
Ah, right, thanks! Got myself confused there.
The thing is: we are changing what's in the list of relevant options. Before the
deployment, there was nothing in it, while with the new code, the user language
should be there. I suppose that means we need to purge these "pointers".
Would bumping wgCacheEpoch be sufficient for that? Note that we don't care much
about puring the actual parser cache entries, we want to purge the "pointer"
entries in the cache.
>> We just tried to enable the use of the parser cache for wikidata, and it failed,
>> resulting in page content being shown in random languages.
>
> That's probably because you incorrectly used $wgLang or
> RequestContext::getLanguage(). The user language for the parser is the
> one you get from ParserOptions::getUserLangObj().
Oh, thanks for that hint! Seems our code is inconsistent about this, using the
language from the parser options in some places, the one from the context in
others. Need to fix that!
> It's not necessary to call ParserOutput::recordOption().
> ParserOptions::getUserLangObj() will call it for you (via
> onAccessCallback).
Oh great, magic hidden information flow :)
Thanks for the info, I'll get hacking on it!
-- daniel
Hi.
I (rather urgently) need some input from someone who understands how parser
caching works. (Rob: please forward as appropriate).
tl;dr:
what is the intention behind the current implementation of
ParserCache::getOptionsKey()? It's based on the page ID only, not taking into
account any options. This seems to imply that all users share the same parser
cache key, ignoring all options that may impact cached content. Is that
correct/intended? If so, why all the trouble with ParserOutput::recordOption, etc?
Background:
We just tried to enable the use of the parser cache for wikidata, and it failed,
resulting in page content being shown in random languages.
I tried to split the parser cache by user language using
ParserOutput:.recordOption to include userlang in the cache key. When tested
locally, and also on our test system, that seemed to work fine (which seems
strange now, looking at the code of getOptionsKey()).
On the life site however, it failed.
Judging by its name, getOptionsKey should generate a key that includes all
options relevant to caching page content in the parser cache. But it seems it
forces the same parser cache entry for all users. Is this intended?
Possible fix:
ParserCache::getOptionsKey could delegate to ContentHandler::getOptionsKey,
which could then be used to override the default behavior. Would that be a
sensible approach?
And if so, would it be feasible to push out such a change before the holidays?
Thanks,
Daniel
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.