Relaying a question from a brief discussion on Twitter [1], I am curious to
hear how people feel about the idea of creating a a "SPARQL query example"
property for properties, modeled after "Wikidata property example" [2]?
This would allow people to discover queries that exemplify how the property
is used in practice. Does the approach make sense or would it stretch too
much the scope of properties of properties? Are there better ways to
reference SPARQL examples and bring them closer to their source?
Dario
[1] https://twitter.com/ReaderMeter/status/768101464572997632
[2] https://www.wikidata.org/wiki/Property:P1855
Hi!
I was thinking recently about various data processing scenarios in
wikidata and there's one case we don't have a good coverage for I think.
TLDR: One of the things I think we might do to make it easier to work
with data is having ntriples (line-based) RDF dump format available.
If you need to process a lot of data (like all enwiki sitelinks, etc.)
then the Query Service is not very efficient there, due to limits and
sheer volume of data. We could increase limits but not by much - I don't
think we can allow a 30-minute processing task to hog the resources of
the service to itself. We have some ways to mitigate this, in theory,
but in practice they'll take time to be implemented and deployed.
The other approach would be to do dump processing. Which would work in
most scenarios but the problem is that we have two forms of dump right
now - JSON and TTL (Turtle) and both are not easy to process without
tools with deep understanding of the formats. For JSON, we have Wikidata
Toolkit but it can't ingest RDF/Turtle, and also has some entry barrier
to get everything running even when operation that needs to be done is
trivial.
So I was thinking - what if we had also ntriples RDF dump? The
difference between ntriples and Turtle is that ntriples is line-based
and fully expanded - which means every line can be understood on its own
without needing any context. This enables to process the dump using the
most basic text processing tools or any software that can read a line of
text and apply regexp to it. The downside of ntriples is it's really
verbose, but compression will take care of most of it, and storing
another 10-15G or so should not be a huge deal. Also, current code
already knows how to generate ntriples dump (in fact, almost all unit
tests internally use this format) - we just need to create a job that
actually generates it.
Of course, with right tools you can generate ntriples dump from both
Turtle one and JSON one (Wikidata toolkit can do the latter, IIRC) but
it's one more moving part which makes it harder and introduces potential
for inconsistencies and surprises.
So, what do you think - would having ntriples RDF dump for wikidata help
things?
--
Stas Malyshev
smalyshev(a)wikimedia.org
Hi,
I'm working on a project to assess Wikidata item quality. As part of this,
to begin with, I'm trying to get basic statistics on Wikidata items, which
requires work both on item level and over several items together.
For example "I want to know the number of statements on an average with
dead external reference links".
What would be the best way(programatically) to do so: scanning all the
items using api, or a subset of them or downloading the dump offline and
working on it?
-Thanks,
Sumit Asthana,
B.Tech Final Year,
Dept. of CSE,
IIT Patna
Hi,
A short mail to report a bug in the organization of the wikidata workshop held today during the wikiconvention in Paris[1]. Weeks in advance I convinced a few people to attend so that we can work on our topic of interest: Software and Free Software in particular[2]. There is no better context than being surrounded by seasoned wikidata contributors to improve and contribute at the same time. The description was appealing:
"After a one hour training, we will create groups according to the desire of the participants: contribute, discover and install useful gadgets..."
When the audience was asked for their preferences, I raised my hand and happily declared "Contributing to the Software and FLOSS projects !". Much to my surprise this was quickly dismissed: "we're only going to learn about tools, we create groups to use and learn about tools, not to work on a specific project". My interest for the Software project is known to both Harmonia Amanda and Ash_Crow, to the extent that I discussed with them my intent to recruit people to participate in the workshop for that particular topic[3], well in advance. The focus of our interest was also made clear in the participant list[4].
Of course the rebuttal was not enough to discourage us from contributing to wikidata :-) We found another room and did useful work together, just without the pleasure of being part of the group.
When facing such a minor disappointment, I suppose there is not much cause for concern. But I thought it would be useful to report what appears to be a glitch in the organization so that it can be fixed. This is not a pleasant thing to write or read, but what would be life without a few mistakes ?
In conclusion I would like to thank everyone for a wonderful first experience, specially Harmonia Amanda and Ash_Crow who do a tremendous work with wikidata.
Cheers
[1] https://meta.wikimedia.org/wiki/WikiConvention_francophone/2016/Programme/A…
[2] https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/Softwarehttps://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/FLOSS
[3] 15 july 2016 <dachary> Ash_Crow: pour https://meta.wikimedia.org/wiki/WikiConvention_francophone/2016/Programme/A… nous serons au moins trois avec pour centre d'intérêt https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/FLOSS et https://www.wikidata.org/wiki/Wikidata:WikiProject_Informatics/Software . J'espère convaincre deux autres personnes mais c'est pas gagné ... l'attrait des vacances...
[4] https://meta.wikimedia.org/wiki/WikiConvention_francophone/2016/Programme/A…
--
Loïc Dachary, Artisan Logiciel Libre
Hey all,
We just deployed a change that changed default sensitivity of ORES review
tool from "hard" to "soft" (meaning recall would drop from 0.9 to 0.75 but
percentage of false positives drops too). You are still able to change it
back in your preferences (Recent changes tab).
Please come to us for any issues or questions.
Best
Hello all!
After a brief period for final comments (thanks everyone for your input!), the
Stable Interface Policy is now official. You can read it here:
<https://www.wikidata.org/wiki/Wikidata:Stable_Interface_Policy>
This policy is intended to give authors of software that accesses Wikidata a
guide to what interfaces and formats they can rely on, and which things can
change without warning.
The policy is a statement of intent given by us, the Wikidata development team,
regarding the software running on the site. It does not apply to any content
maintained by the Wikidata community.
--
Daniel Kinzler
Senior Software Developer
Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.
I am trying to create new items by supplying a large-ish JSON structure,
but I keep getting "The serialization is invalid". Sadly, that does not
tell me which part is invalid.
Can anyone see what's wrong? (I suspect the references, but I don't want to
create unreferenced testing items, and test.wikidata.org has different
properties apparently...)
https://tools.wmflabs.org/paste/view/7b2aab11
Cheers,
Magnus
Dear Wikidata community,
we are happy to announce the next release of COOL-WD: a Completeness Tool
for Wikidata, packed with new features, inspired by the engaging discussion
with you all on the first release [1].
The main purpose of COOL-WD is to allow to create and manage completeness
information about Wikidata, such as "Complete for all Switzerland's
cantons" and "Complete for all of Obama's children". While previously one
had to access an external tool to add and view Wikidata completeness
information at http://cool-wd.inf.unibz.it/, now a user script is available
to enable adding and viewing completeness information directly inside
Wikidata [2].
When the script is enabled, properties annotated as complete are marked by
a green box, while all other properties are marked in yellow. To state that
a certain property is complete, one can simply click on the yellow box to
make it turn green. To add a reference URL for the completeness, one can
click the small '(i)' icon next to the property name to add the reference
URL. An example screenshot is available at [3].
Several other new features are:
- Completeness analytics: show the progress in data completion wrt class of
objects of interest (http://cool-wd.inf.unibz.it/?p=aggregation), example
screenshot at [4].
- Query completeness diagnostics: give an explanation (which completeness
statements are used, and how) whenever (in)-complete query answers are
given (http://cool-wd.inf.unibz.it/?p=query), example screenshot at [5].
- Linked data publication of completeness statements, for instance, RDF
description of the completeness statement of all cantons in Switzerland:
http://cool-wd.inf.unibz.it/resource/statement-Q39-P150
- RDF dump of over 10,000 completeness statements in COOL-WD collected from
various sources: http://completeness.inf.unibz.it/rdf-export/
Last but not least, a description of these features is to appear as a paper
at COLD 2016 Workshop, which currently can be downloaded [6].
The tool is still a prototype, so we very much look forward to your
feedback regarding how useful you consider the tool, and your ideas for
conceptual or technical improvements! Our COOL-WD project chat is also
available at:
https://www.wikidata.org/wiki/Wikidata:Project_chat#A_gadget_for_Wikidata_c…
Best,
Fariz, Simon, Rido, and Werner
(Free University of Bozen-Bolzano, Italy)
[1] https://lists.wikimedia.org/pipermail/wikidata/2016-March/008319.html
[2] https://www.wikidata.org/wiki/User:Fadirra/coolwd.js
[3] http://completeness.inf.unibz.it/coolwd-screenshots/gadget.JPG
[4] http://completeness.inf.unibz.it/coolwd-screenshots/analytics.png
[5] http://completeness.inf.unibz.it/coolwd-screenshots/diagnostics.png
[6]
http://completeness.inf.unibz.it/coolwd-screenshots/paper_cameraReady.pdf