Hi,
I'm looking into ways to use tabular data like
https://commons.wikimedia.org/wiki/Data:Zika-institutions-test.tab
in SPARQL queries but could not find anything on that.
My motivation here is in part coming from the time out limits, and the
basic idea here would be to split queries that typically time out into
sets of queries that do not time out and - if their results were
aggregated - would yield the results that would be expected for the
original query would it not time out.
The second line of motivation here is that of keeping track of how
things develop over time, which would be interesting for both content
and maintenance queries as well as usage of things like classes,
references, lexemes or properties.
I would appreciate any pointers or thoughts on the matter.
Thanks,
Daniel
Hello all!
We are currently dealing with a bot overloading the Wikidata Query
Service. This bot does not look actively malicious, but does create
enough load to disrupt the service. As a stop gap measure, we had to
deny access to all bots using python-request user agent.
As a reminder, any bot should use a user agent that allows to identify
it [1]. If you have trouble accessing WDQS, please check that you are
following those guidelines.
More information and a proper incident report will be communicated as
soon as we are on top of things again.
Thanks for your understanding!
Guillaume
[1] https://meta.wikimedia.org/wiki/User-Agent_policy
--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST
Hello all,
This is an important announcement for all the tool builders and maintainers
who access Wikidata’s data by *querying directly Labs database replicas*.
In May-June 2019, the Wikidata development team will drop the wb_terms
table from the database in favor of a new optimized schema. Over years,
this table has become too big, causing various issues.
This change requires the tools using wb_terms to be updated. Developers and
maintainers will need to *adapt their code* to the new schema before the
migration starts and switch to the new code when the migration starts.
The migration will start on *May 29th*. On May 15th, a test system will be
available for you to test your code.
The table being used by plenty of external tools, we are setting up a
process to make sure that the change can be done together with the
developers and maintainers, without causing issues and broken tools. Most
of the documentation and updates will take place on Phabricator:
- In this Phabricator task <https://phabricator.wikimedia.org/T221764>,
you can find a description of the changes and the process, and you can ask
for more details or for help in the comments. This is also where updates
will be announced if necessary.
- On the Tool Builders Migration board
<https://phabricator.wikimedia.org/tag/wb_terms_-_tool_builders_migration>
you will find all the details about the migration, how to update your
tool <https://phabricator.wikimedia.org/T221765>, and you can add your
own tasks.
- If you need to discuss with the Wikidata developers or get more
specific help, we set up two dedicated IRC meetings and a session at the
Wikimedia hackathon. More information in this task
<https://phabricator.wikimedia.org/T221764>.
We are aware that this change will ask you to make some important changes
in your code, and we are willing to help you as much as our resources allow
us to. We hope that you will understand that this change is made to avoid
bigger issues in the near future.
Note that this change is not impacting Wikibase instances outside of
Wikidata. A dedicated migration plan and announcement will follow.
We strongly encourage you to not wait until last minute to make the changes
in your code. If you have any question or issue, we will be happy to help.
In order to keep the discussions in one place, please ask questions or
raise issues directly in the Phabricator task and board.
Thanks for your understanding,
Cheers,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hoi,
The performance of the query update is getting worse. Questions about this
have been raised before. I do remember quality replies like it is not
exponential so there is no problem. However, here we are and there is a
problem.
The problem is that I run batch jobs, batch jobs that do not run [1]. I
have the impression that they are put in some kind of suspended animation
by a person. These jobs are submitted by the SourceMD tool by Magnus,
Magnus is well known for being responsive to suggestions on how he can
improve them. So do not use as an argument that there is something wrong
with these job. At most it is acceptable for these run to put on some kind
of hold for the duration of a crisis and then there has to be a release.
At the same time I notice that the reports indicating multiple items with
the same ORCiD id include items that should have been picked up by earlier
reports. I notice that query does not pick up existing items with an ORCid
id and creates new ones. For me this is an indication that Query is not
reliable.
There is talk on the Wiki that there is no point in having fixed
descriptions in anything but English. What caused this discussion is the
sheer amount of updates needed just for one language. At the London
Wikimania this perceived need for fixed descriptions was discussed vis a
vis automated descriptions and as I recall the only argument for having
them at all was "standards" in relation to dumps. Yes, automated
descriptions may be cached and included in a dump.
I have been asked to write for the ORCiD blog and thereby in effect plug
the relevance of the Scholia presentation for scientists. When I do, the
number of jobs like the ones I run will mushroom. It is why I have not put
anything forward so far because we cannot cope as it is.
The issues I see is,
* again to what extend can we grow our content, both for query and update
for the short medium and long term
* will batch jobs like mine be able to complete
* can we ingest the attention when scholars discover how relevant Scholia
is for them, the subject they care for.
* do we care that motivation of volunteers relies on the availability of
sufficient performance to do the tasks they care for.
Thanks,
Gerard
[1] https://tools.wmflabs.org/sourcemd/?action=batches&user=GerardM
Hello,
Does somebody know the minimal hardware requirements (disk size and
RAM) for loading wikidata dump in Blazegraph?
The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
The bigdata.jnl file which stores all the triples data in Blazegraph
is 478G but still growing.
I had 1T disk but is almost full now.
Thanks,
Adam