Hi everyone,
in the current deployment, we made a breaking change to
wikibase.RepoApi. We will do another breaking change to it in the next
deployment. If you use RepoApi on wikidata.org or any Wikidata client
wiki, please read these instructions on how to migrate your code.
== Use case 1: RepoApi.get, RepoApi.post ==
You exclusively use RepoApi.get or RepoApi.post.
You can and will have to use a mw.Api directly.
If you are on wikidata.org, just use the following:
> var api = new mw.Api();
> // api.get( { … } );
> // api.post( { … } );
If you are on a Wikidata client wiki, use the new
wikibase.client.getMwApiForRepo:
> mw.loader.using( [ 'wikibase.client.getMwApiForRepo' ] ).done( function() {
> var api = wikibase.client.getMwApiForRepo();
> // api.get( { … } );
> // api.post( { … } );
> } );
In this case, I would encourage you to look into the features RepoApi
provides. Chances are RepoApi could build your api call for you.
== Use case 2: RepoApi's wrapper methods ==
You use one of RepoApi.createEntity, RepoApi.editEntity,
RepoApi.formatValue, RepoApi.getEntities, RepoApi.getEntitiesByPage,
RepoApi.parseValue, RepoApi.searchEntities,
RepoApi.setLabel, RepoApi.setDescription, RepoApi.setAliases,
RepoApi.setClaim, RepoApi.createClaim, RepoApi.removeClaim,
RepoApi.getClaims, RepoApi.setClaimValue, RepoApi.setReference,
RepoApi.removeReferences, RepoApi.setSitelink, or RepoApi.mergeItems.
You should continue using RepoApi, but have to provide it a mw.Api instance.
If you are on wikidata.org, just use the following:
> mw.loader.using( [ 'wikibase.RepoApi' ] ).done( function() {
> var mwApi = new mw.Api();
> var repoApi = new wikibase.RepoApi( mwApi );
> // mwApi.get( { … } );
> // mwApi.post( { … } );
> // repoApi.setClaim( … );
> } );
If you are on a Wikidata client wiki, use the new
wikibase.client.getMwApiForRepo:
> mw.loader.using( [ 'wikibase.client.getMwApiForRepo' ] ).done( function() {
> var mwApi = wikibase.client.getMwApiForRepo();
> var repoApi = new wikibase.RepoApi( mwApi );
> // mwApi.get( { … } );
> // mwApi.post( { … } );
> // repoApi.setClaim( … );
> } );
If you have any questions, feel free to contact me.
Bye,
Adrian
Hi all,
in the dump file
wikidatawiki-20140912-pages-articles.xml.bz2
I seem to find some items with a key "description", some with
"descriptions".
For example, near the beginning of the file:
Q15 seems to have key "description"
Q17 seems to have key "descriptions"
This is rather unhelpful when running e.g. my stats script.
a) Can someone please confirm that I'm not crazy? I mean, in this instance.
b) Is this a bug, or a feature?
c) If a bug, is it already fixed for the next dump? Which key will it be?
(If a feature: why?)
Thanks,
Magnus
Hey,
I just noticed this commit [0], which gets rid of a pile of direct
BasicEntityIdParser usages for performance reasons.
It also addresses a problem which is apparently more widespread than I
thought: using an EntityIdParser that only works for Item IDs and Property
IDs. Unless you are only dealing with Item IDs or with Property IDs, an
EntityIdParser should always be injected. This allows the thing
constructing the object graph to add support for all required entity types,
and gives extensions to Wikibase Repo (or Wikibase Client) a chance to
register new entity types.
Of course this also means that no new code that introduces such occurrences
should be allowed through review, even if it contains a "fix this later"
TODO (for new code there is no excuse to do it wrong).
[0] https://gerrit.wikimedia.org/r/#/c/167136/
Cheers
--
Jeroen De Dauw - http://www.bn2vs.com
Software craftsmanship advocate
Evil software architect at Wikimedia Germany
~=[,,_,,]:3
Hey,
I was wondering if we still used PHP serialization in our change
replication mechanism. (We need to be very careful making changes to the
objects in WB DM if that is the case.) Looking at the code, I discovered we
have a changesAsJson setting, presumably introduced to migrate away from
the PHP serialization. Has such a migration happened? Can we get rid of the
setting an the old PHP serialize code?
Cheers
--
Jeroen De Dauw - http://www.bn2vs.com
Software craftsmanship advocate
Evil software architect at Wikimedia Germany
~=[,,_,,]:3
Hey,
In the current JSON, statements are grouped by property ("statement
groups") and the statements in each group are in a list (ordered). The
statement groups, however, are in a map (unordered) and no extra order
information is given [1].
Should we consider this the official definition? When we defined the
original datamodel, we used to have statements ordered (and there were
no groups by properties), but now we have the groups and abandoning
statement-group order makes sense with Reasonator-style UIs that find a
good order automatically. However, this should then also match the
internal datamodel implementation as used, e.g., for diffs (if a bot
edits an item, and statement order is not part of the data, then it
might randomly reorder statement groups).
Tools that work on the JSON as it is now cannot guarantee any order of
statement groups. In Java the order you get can really be different from
run to run. So we should change our interfaces in WDTK from List to Set
(otherwise our tests fail as soon as we read more than one statement).
But before doing this, I would like to know if this is really the future
or if there are plans to keep the order of statement groups and to
extend the JSON instead.
Cheers,
Markus
[1] The current JSON has the following ordered elements:
* statements in one statement group (the collection of all statements
with the same property) are stored as a list (ordered)
* statement qualifiers are a map (no order) but extra order information
is given ("qualifier-order")
* the "list of references" of a statement is indeed a list (ordered)
* the snaks of each reference are a map (no order) but extra order
information is given ("snak-order")
* aliases of each language are stored in a list (ordered)
What is not ordered:
* statement groups are stored in a map (from property ids to lists of
statements with this property); no order is expressed between statements
with different main properties.
* labels, descriptions, alias lists (map from language to content)
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/
Hi Everyone,
below are some chatlogs between Daniel and me.
<hoo> aude: Around?
<DanielK_WMDE__> hoo: aude is having dinner with the multimedia team
<hoo> I feared that :S
<hoo> DanielK_WMDE__: Did you talk about the Q183 issues today?
* hoo came home late today...
<DanielK_WMDE__> no.
<DanielK_WMDE__> hoo: actually - Thiemo was doing some benchmarking
earlier, together with aude. Might have been that
<hoo> I see... but the issue is (sadly) complexer
<DanielK_WMDE__> regular work is pretty much zero right now though. we
have been in sessions with the multimedia folks all day
<hoo> some revisions of that item segfault, some fatal and some throw an
exception
<DanielK_WMDE__> bah
<DanielK_WMDE__> this happens since the switch to DataModel 1.0, right?
<hoo> That's a good question, actually
<DanielK_WMDE__> sigh
<hoo> not sure why shit hit the fan exactly now and not earlier
<DanielK_WMDE__> JeroenDeDauw: care to look into that?
<DanielK_WMDE__> hoo: can you collect your findings somewhere, and mail
a link to wikidata-tech?
<hoo> DanielK_WMDE__: Well, everything is bugzillad
<hoo> just search for Q183
<hoo> might be that it's spread across various products, though
(Wikimedia and Wikidata repo)
<hoo> This problem has so many parts :S
<hoo> But the one that old revisions sometimes can't be viewed/
unserialized(?) is the most minor one IMO
* James_F|Away is now known as James_F
<DanielK_WMDE__> but it seems like all of them are related to the change
in the DataModel and/or the serialization format
<DanielK_WMDE__> hoo: my feelign is that we should (partially) go back
to deferred unstubbing: have stub implementations of StatementList, etc,
that would only instantiate the full structure when needed.
<DanielK_WMDE__> JeroenDeDauw: what do you think?
<hoo> Probably, yes
[...]
I post them here in order to coordinate a bit on the efforts to solve
this. So if you investigate anything, have any findings or are working
on something please let the others (and especially me) know.
Right now Tim is trying to help with the segfaults, but that's only one
of the problems we see here.
Also:
<TimStarling> "value":"\u0b9a\u0bc6\u0bb0\u0bc1\u0bae\u0ba9\u0bbf"
<TimStarling> very efficient encoding there
<TimStarling> well, using UTF-8 only reduces it from 736163 to 721591
<ori> it's JSON, though the JSON specification doesn't require that you
encode code points outside the ASCII range; it simply allows it
<ori>
http://php.net/manual/en/json.constants.php#constant.json-unescaped-unicode
<ori> I think FormatJson::encode() supports a similar option, and it's
compatible with PHP 5.3
<TimStarling> yes, that's what I just tried, reduces oldid 158433886
from 736163 to 721591 bytes
<TimStarling> I haven't reproduced that crash in eval.php yet
<ori> hoo: FormatJson::encode( $value, /* $pretty = */ false,
FormatJson::UTF8_OK );
I think we should do this, if all major json implementation support it
(which I guess they do). But also, as Tim points out, this is not going
to help very much.
Cheers,
Marius
Hi,
as some of you may noticed we recently had huge troubles[0] with
Wikidata's Q183 (the item about Germany). The immediate cause of these
problems was the huge size of that item.
Below I'm going to explain what led to the problems and how I was able
to workaround the problem for now.
As can been seen in the item's edit history[1], the item hadn't been
edited since early July, until HHVM had been enabled on Wikidata. I
can't tell for sure, but I presume the item had been unchangeable since
July, because of it's size.
While the huge size of the item presumably prevented editing, it never
was a (at least on a huge scale) noticeable problem viewing it and using
it in the client.
After HHVM got enabled on Wikidata it was, due to HHVM being way faster
than Zend PHP, possible to change the item again. This was done once[1]
and led to the item internally being converted into our new
serialization format (this can be seen in the huge size change of that
small content change). The now much larger item then became a problem
for the slower Zend PHP (in both the client and repo) because it was
much larger then the old version.
As explained above the item's size led to out of memory errors, php
segfaults, exceptions and various other problems causing huge disruption
within both Wikidata and the Wikipedias. Because of the huge impact of
the problem I was forced to find a way around the immediate problems in
a timely manner.
I started mitigating the problems by simply undoing the change to the
new serialization format and restored the version from July 8[1]. To my
own surprise even that version didn't render, probably because our
DataModel is less efficient now then it used to be.
After that I pulled out an older (and thus smaller) version[2] from the
edit history which I could view (with oldid=120566337). But sadly also
this revision, although much smaller than what we used to have, didn't
always render (it might have worked sometimes and probably caused a
little less trouble in other components).
After all I was forced to revert to revision 116786096[3] which has way
less data then what used to be on the item. Using that revision the item
could be rendered again and can also be used by other components again
(eg. in the Wikipedias).
As any change to that item would trigger an update of the serialization
format again (which would also make the stored data much bigger), I
(together with Marc-André Pelletier (coren)) decided that the item needs
to be "freezed" in a working state, so that it can't be altered (that
also makes sure we can go back to the version with the most data once
this has been solved).
Because of that we chose to superprotect the item to make sure it can't
be edited by anyone[4].
The above steps work as a temporary workaround to the problems caused in
all components, but of course they are only a temporary countermeasure
and we will need to fix this properly.
Cheers,
Marius
[0]: https://bugzilla.wikimedia.org/show_bug.cgi?id=71519 and others
[1]: https://www.wikidata.org/wiki/Q183?action=history
[2]: https://www.wikidata.org/w/index.php?title=Q183&oldid=120566337
[3]: https://www.wikidata.org/w/index.php?title=Q183&oldid=116786096
[4]:
https://www.wikidata.org/w/index.php?oldid=162225140#Temporary_protection_o…
We currently use memcached to share cached objects across wikis, most
importantly, entity objects (like data items). Ori suggested we should look into
alternatives. This is what he wrote:
[21:15] <ori> I was wondering if you think the way you use memcached is optimal
(this sounds like a loaded question but I mean it sincerely). And if not, I was
going to propose that you identify an optimal distributed object store, and I
was also going to offer to help push for procurement and deployment of such a
service on the WMF cluster.
[21:17] <ori> memcached is a bit of a black box. it is very difficult to get
comprehensible metrics about how much space and bandwidth you're utilizing,
especially when your data is mixed up with everything else that goes into memcached
[21:18] <ori> and the fact that you're serializing objects using php serialize()
rather than simple values makes it even harder, because it means that you can
only really poke around from php with wikidata code available
Just food for thought, for now... any suggestions for a shared object store?
In any case, thanks for looking into this, Ori!
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Hey all,
I'm happy to announce the 1.1 release of Wikibase DataModel. This release
only contains new features:
* The `Property` constructor now accepts an optional `StatementList`
parameter
* Added `Property::getStatements` and `Property::setStatements`
* Added `PropertyIdProvider` interface
* Added `ByPropertyIdGrouper`
* Added `BestStatementsFinder`
* Added `EntityPatcher` and `EntityPatcherStrategy`
* Added `StatementList::getAllSnaks` to use instead of `Entity::getAllSnaks`
* The `Statement` constructor now also accepts a `Claim` parameter
* Added `Statement::setClaim`
* The `Reference` constructor now accepts a `Snak` array
* Added `ReferenceList::addNewReference`
Internal changes where also made to enhance the quality, which is reflected
for instance here:
https://scrutinizer-ci.com/g/wmde/WikibaseDataModel/reports/
More information on Wikibase DataModel, including installation instructions
and release notes, can be found at https://github.com/wmde/WikibaseDataModel
Cheers
--
Jeroen De Dauw - http://www.bn2vs.com
Software craftsmanship advocate
Evil software architect at Wikimedia Germany
~=[,,_,,]:3