Hi!
> The best place for this kind of question would be the wikidata-tech mailing list
> <wikidata-tech(a)lists.wikimedia.org>. It would probably be a good idea if you
> (and whoever else deals with wikidata on the technical level) were subscribed
> there. It's pretty low traffic.
Thanks, I've sent the subscription request and adding it to the CC.
Still learning the right places to go for things :)
> Statement IDs are GUIDs (with the Item ID prefixed), and they do not change when
> the Statement changes (otherwise, they would be hashes, not IDs - References are
> currently handled by hash).
>From the export/import point of view, I think I'd prefer immutable
claims (i.e. ID changes each time claim changes) as they are easier to
handle, but as it is not the case, I can switch to using the content
hash instead. The performance impact (time spent calculating the hashes)
should not be too big.
> One thing that would be rather easy to do is to make JSON dumps of just the
> items that changed in the last X hours. But that wouldn't tell you wich
> statements changed.
I think for imports the best thing would be to have real diffs - i.e.
list of claims/item fields that were added/removed/changed - but if
that's not feasible, list of changed items would be great too. We may
want this with even more frequency than hours. Item data is not that
big, so loading it and running the diff manually would still be
workable. It would be slightly slower for big items (since each claim
for the item has to be examined) and requires maintaining additional
data structure to efficiently enumerate the claims, but it should be
still workable.
Thanks,
Stas
Hi, today I've enabled MobileFrontend on wikidata.org. So far, it's still
considered a trial, so no automatic redirection for mobile devices happens.
While it was primarily needed to satisfy a dependency in WikiGrok, this is
as good chance as any to revisit the topic of having a mobile UI for
Wikidata. The news are both good and bad: while claims unstyled and
therefore look broken, they don't take whole desktop screen's width which
is a good indicator that a bit of CSS should fix it. I think that even
viewing Wikidata from mobile would be really awesome.
Compare yourself: https://www.wikidata.org/wiki/Q2 vs.
https://m.wikidata.org/wiki/Q2
--
Best regards,
Max Semenik ([[User:MaxSem]])
Facebook just published this summary of a summit for database researchers
held at Menlo Park last September. I recommend it. It contains a clear and
concise description of Facebook's data infrastructure, and a description of
the open problems they are thinking about, which is even more interesting.
https://research.facebook.com/blog/1522692927972019/facebook-s-top-open-dat…
To whet your appetite, here are the problems (the summaries mostly my own
paraphrase):
* Mobile: How should the shift toward mobile devices affect Facebook’s data
infrastructure?
* Reducing replication: How can we reduce the number of round trips between
the application and data layers?
* Impact of Caching on Availability (aka "oh no, we just restarted
memcached"): How do we harness the efficiency gains provided by caching
without being brought to our knees by a sudden drop in cache hit rate?
* Sampling at logging time in a distributed environment: How should we
sample log streams if we want to maintain accuracy and flexibility to
answer post-hoc queries?
* Trading storage space and CPU: TL;DR: gzip --best or gzip --fast?
* Reliability of pipelines: Pipelines are less reliable than the sum of
their parts. A pipeline composed of two systems, each 0.999 reliable,
is 0.989 reliable. Much sadness. What to do?
* Globally distributed warehouse: consistency models and synchronization
problems.
* Time series correlation and anomaly detection: AKA: I want an alert for
that massive memcached bytes_out spike that doesn't also wake me up with
false positives at 2AM.
Hello,
I have a list of place names and want to find the according wikidata
item with this name. The list includes "Köln, "Düsseldorf" but also
parts of towns which are recorded as compounds of the superior
administrative entity and the district like
"Schmallenberg-Westernbödefeld" or "Kerpen-Manheim".
If I lookup these via the Wikidata API with the wbsearchentities action
I get no problems with "Köln" and the like [1] but won't get any results
for compounds, see e.g. [2] although both strings are part of the label
and the description of a wikidata item.
Via the wikidata interface I get the right result, though.[3]
I have looked quite some time but couldn't find a way to query wikidata
programatically and get results similar to the website search. Thus, my
question is:
Is there a way to query wikidata via an API over both the label fields
and the description?
Background
I am working at the North Rhine-Westphalian Library Service Center
(hbz)and we are currently building a new website for the
Northrhine-Westphalian bibliography. [4] This bibliography collects
articles, books and other media about places in the German federal state
of Northrhine- Westphalia. Each record contains a string which indicates
which place a resource is about. As soon as we have those links to
Wikidata we will think about how to link to a list of bibliographic
resources about a place from the place's wikipedia page. See the GitHub
issue on this particular problem at [5].
All the best
Adrian
[1]
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Köln&lang…
[2]
https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Kerpen%20…
[3] https://www.wikidata.org/w/index.php?search=Kerpen+Manheim
[4] http://lobid.org/nwbib
[5] https://github.com/hbz/nwbib/issues/42
--
Adrian Pohl
hbz - Hochschulbibliothekszentrum des Landes NRW
Tel: (+49)(0)221 - 400 75 235
http://www.hbz-nrw.de
There are currently ~500 item-to-item links on WIkidata where the "target
item" is a redirect.
Is there a bot resolving those? Should the merge API do that automatically?
Or the merge script on site? Or Wikidata itself, after, say, a day of not
reverting the merge?
Hey!
Here's a few performance relevant changes I think should get merged before we
branch next week:
https://gerrit.wikimedia.org/r/#/c/170961/ "Determine update actions based on
usage aspects." <--- the last bit missing for usage tracking
https://gerrit.wikimedia.org/r/#/c/176650/ "Use wb_terms table for label
lookup." <--- should improve memory consumption a lot, and possibly also speed.
https://gerrit.wikimedia.org/r/#/c/167224/ "Defer entity deserialization" <---
should reduce memory footprint and improve speed of trivial operations like
checkign whether something is a redirect.
Are there any other performance improvements that we should get in? I imagine
that this will be the last time we branch until the third week of January.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
Hi Everyone,
just a wanted to post a quick summary of what I did today in order to
significantly reduce the SiteStore related memcached traffic.
I stumbled upon this comment
https://phabricator.wikimedia.org/T58602#808530 and thus poked a bit at
when we load sites from memcached. During that I found that we still
were loading the sites basically all the time. To get the number of that
down, I uploaded the following patches that have been reviewed, merged
and even deployed yet (thanks for the reviews Daniel and Katie):
* Don't lookup Sites from mc for the 'languageLinkSiteGroup' setting:
https://gerrit.wikimedia.org/r/177419
* Don't load all sites for LangLinkHandler:
https://gerrit.wikimedia.org/r/177429
* Don't access sites on WikibaseClient::getEntityIdForTitle:
https://gerrit.wikimedia.org/r/177434
That lead to a noticeable memcached traffic change (see attachment).
Memcached traffic graphs:
https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Memcached%
20eqiad&m=cpu_report&r=hour&s=by%
20name&hc=4&mc=2&st=1417641558&g=network_report&z=large
I still have https://gerrit.wikimedia.org/r/177416 in review, which
slightly changes the behavior of the other projects sidebar, but I think
that this change also has quite some potential to reduce memcached
traffic even further. Would be great if we could get that ready for
backporting until Monday.
Cheers,
Marius