When checking the quality of the latest bot edits on disease terms I am seeing some strange results from the WDQS. The numbers of statements with rank Normal and rank Deprecated don't add up, to the number added from the original source.
When I ran the following query:
PREFIX wd: http://www.wikidata.org/entity/ PREFIX p: http://www.wikidata.org/prop/ PREFIX wikibase: http://wikiba.se/ontology#
SELECT DISTINCT ?diseases ?doid WHERE { ?diseases p:P699 ?doid . ?doid wikibase:rank wikibase:NormalRank . ?doid wikibase:rank wikibase:DeprecatedRank . }
I did expect no result, since it returns statements with both rank normal, as rank deprecated. However, I got 2041 tuples [1].
Andra
Hey Andra,
it seems like at least the first example was a normal ranked statement and got converted to a deprecated one only recently. [1] In theory, that change should have been made on the query service as well but maybe there are issues with changing ranks and only new ranks are added without removing the old ones.
Best regards Bene
[1] https://www.wikidata.org/w/index.php?title=Q7234435&diff=275081442&o...
Am 23.11.2015 um 16:20 schrieb Andra Waagmeester:
When checking the quality of the latest bot edits on disease terms I am seeing some strange results from the WDQS. The numbers of statements with rank Normal and rank Deprecated don't add up, to the number added from the original source.
When I ran the following query:
PREFIX wd: http://www.wikidata.org/entity/ PREFIX p: http://www.wikidata.org/prop/ PREFIX wikibase: http://wikiba.se/ontology#
SELECT DISTINCT ?diseases ?doid WHERE { ?diseases p:P699 ?doid . ?doid wikibase:rank wikibase:NormalRank . ?doid wikibase:rank wikibase:DeprecatedRank . }
I did expect no result, since it returns statements with both rank normal, as rank deprecated. However, I got 2041 tuples [1].
Andra
[1] http://tinyurl.com/pgre6gh
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
I did expect no result, since it returns statements with both rank normal, as rank deprecated. However, I got 2041 tuples [1].
May be linked to https://phabricator.wikimedia.org/T116622. Looks like there's some bug in updating data, I'm looking into it. Will report as soon as I find something.
Greetings All, Regarding this issue, I am having trouble with WDQ querying recently created items. An item created on Thursday the 19th (Q21514037) is still not found by WDQ. It is however possible to retrieve it through WDQS. -Tim
On Nov 23, 2015, at 9:47 AM, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
I did expect no result, since it returns statements with both rank normal, as rank deprecated. However, I got 2041 tuples [1].
May be linked to https://phabricator.wikimedia.org/T116622. Looks like there's some bug in updating data, I'm looking into it. Will report as soon as I find something.
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 23/11/2015 18:18, Timothy Putman wrote:
Greetings All, Regarding this issue, I am having trouble with WDQ querying recently created items. An item created on Thursday the 19th (Q21514037) is still not found by WDQ. It is however possible to retrieve it through WDQS. -Tim
See https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Lag
and https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#WDQ_not_...
Short summary: A bot made so many edits so quickly that the normal update signalling fell over.
Magnus's plan now is to completely re-initialise the WDQ data-store from the Wikidata dump created today.
But I don't know how long that will take.
-- James.
Well, my import code chokes on the last two JSON dumps (16th and 23rd). As it fails about half an hour or so in, debugging is ... inefficient. Unless there is something that has changed with the dump itself (new data type or so), and someone tells me, it will be quite some time (days, weeks) until I figure it out.
On Mon, Nov 23, 2015 at 6:41 PM James Heald j.heald@ucl.ac.uk wrote:
On 23/11/2015 18:18, Timothy Putman wrote:
Greetings All, Regarding this issue, I am having trouble with WDQ querying recently
created items.
An item created on Thursday the 19th (Q21514037) is still not found by
WDQ.
It is however possible to retrieve it through WDQS. -Tim
See https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#Lag
and
https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#WDQ_not_...
Short summary: A bot made so many edits so quickly that the normal update signalling fell over.
Magnus's plan now is to completely re-initialise the WDQ data-store from the Wikidata dump created today.
But I don't know how long that will take.
-- James.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Well, my import code chokes on the last two JSON dumps (16th and 23rd). As it fails about half an hour or so in, debugging is ... inefficient. Unless there is something that has changed with the dump itself (new data type or so), and someone tells me, it will be quite some time (days, weeks) until I figure it out.
To update everyone here as well: Magnus has been able to pinpoint the problem and fix the tools. They're catching up again. The issue was one the extremely big pages that have have recently been created for research papers: https://www.wikidata.org/wiki/Special:LongPages
Cheers Lydia
On 25.11.2015 16:05, Lydia Pintscher wrote:
On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Well, my import code chokes on the last two JSON dumps (16th and 23rd). As it fails about half an hour or so in, debugging is ... inefficient. Unless there is something that has changed with the dump itself (new data type or so), and someone tells me, it will be quite some time (days, weeks) until I figure it out.
To update everyone here as well: Magnus has been able to pinpoint the problem and fix the tools. They're catching up again. The issue was one the extremely big pages that have have recently been created for research papers: https://www.wikidata.org/wiki/Special:LongPages
Thanks for explaining. This explains why we did not see any problems or unusual behaviour in Wikidata Toolkit. I guess Java simply does not care about how long pages are, as long as they are not very big in absolute terms.
Markus
It was the "absolute terms" problem here ;-)
On Fri, Nov 27, 2015 at 2:12 PM Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 25.11.2015 16:05, Lydia Pintscher wrote:
On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske magnusmanske@googlemail.com wrote:
Well, my import code chokes on the last two JSON dumps (16th and 23rd).
As
it fails about half an hour or so in, debugging is ... inefficient.
Unless
there is something that has changed with the dump itself (new data type
or
so), and someone tells me, it will be quite some time (days, weeks)
until I
figure it out.
To update everyone here as well: Magnus has been able to pinpoint the problem and fix the tools. They're catching up again. The issue was one the extremely big pages that have have recently been created for research papers: https://www.wikidata.org/wiki/Special:LongPages
Thanks for explaining. This explains why we did not see any problems or unusual behaviour in Wikidata Toolkit. I guess Java simply does not care about how long pages are, as long as they are not very big in absolute terms.
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 27.11.2015 15:22, Magnus Manske wrote:
It was the "absolute terms" problem here ;-)
But 3MB uncompressed string data does not seem to be so big in absolute terms, or are you referring to something else (I got this number from the long pages special)? Parsing a 3MB string may need some extra memory, but the data you get in the end should not be much bigger than the original string, or should it?
Markus
On Fri, Nov 27, 2015 at 2:12 PM Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
On 25.11.2015 16:05, Lydia Pintscher wrote: > On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske > <magnusmanske@googlemail.com <mailto:magnusmanske@googlemail.com>> wrote: >> Well, my import code chokes on the last two JSON dumps (16th and 23rd). As >> it fails about half an hour or so in, debugging is ... inefficient. Unless >> there is something that has changed with the dump itself (new data type or >> so), and someone tells me, it will be quite some time (days, weeks) until I >> figure it out. > > To update everyone here as well: Magnus has been able to pinpoint the > problem and fix the tools. They're catching up again. The issue was > one the extremely big pages that have have recently been created for > research papers: https://www.wikidata.org/wiki/Special:LongPages Thanks for explaining. This explains why we did not see any problems or unusual behaviour in Wikidata Toolkit. I guess Java simply does not care about how long pages are, as long as they are not very big in absolute terms. Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
The "absolute" was the char[] size, which I had set to ~1MB back in the day. Subsequent use of STL string type does support any memory-fitting string.
On Fri, Nov 27, 2015 at 3:24 PM Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 27.11.2015 15:22, Magnus Manske wrote:
It was the "absolute terms" problem here ;-)
But 3MB uncompressed string data does not seem to be so big in absolute terms, or are you referring to something else (I got this number from the long pages special)? Parsing a 3MB string may need some extra memory, but the data you get in the end should not be much bigger than the original string, or should it?
Markus
On Fri, Nov 27, 2015 at 2:12 PM Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
On 25.11.2015 16:05, Lydia Pintscher wrote: > On Mon, Nov 23, 2015 at 10:54 PM, Magnus Manske > <magnusmanske@googlemail.com <mailto:magnusmanske@googlemail.com>> wrote: >> Well, my import code chokes on the last two JSON dumps (16th and 23rd). As >> it fails about half an hour or so in, debugging is ... inefficient. Unless >> there is something that has changed with the dump itself (new data type or >> so), and someone tells me, it will be quite some time (days, weeks) until I >> figure it out. > > To update everyone here as well: Magnus has been able to pinpoint
the
> problem and fix the tools. They're catching up again. The issue
was
> one the extremely big pages that have have recently been created
for
> research papers: https://www.wikidata.org/wiki/Special:LongPages Thanks for explaining. This explains why we did not see any problems
or
unusual behaviour in Wikidata Toolkit. I guess Java simply does not
care
about how long pages are, as long as they are not very big in
absolute
terms. Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata