Hi,
I'm looking into ways to use tabular data like
https://commons.wikimedia.org/wiki/Data:Zika-institutions-test.tab
in SPARQL queries but could not find anything on that.
My motivation here is in part coming from the time out limits, and the
basic idea here would be to split queries that typically time out into
sets of queries that do not time out and - if their results were
aggregated - would yield the results that would be expected for the
original query would it not time out.
The second line of motivation here is that of keeping track of how
things develop over time, which would be interesting for both content
and maintenance queries as well as usage of things like classes,
references, lexemes or properties.
I would appreciate any pointers or thoughts on the matter.
Thanks,
Daniel
Hello,
The following primary database masters will be switched over during the
next few weeks (more details at https://phabricator.wikimedia.org/T230788):
Impact:
*Writes will be blocked*
*Reads will remain unaffected*
These are the concrete days, hours and affected wikis:
* s8: 10th Sept from 05:00-05:30 UTC. The affected wiki is: wikidatawiki -
tracking task: T230762
* s2: 17th Sept from 05:00-05:30 UTC. The list of affected wikis is at:
https://raw.githubusercontent.com/wikimedia/operations-mediawiki-config/mas…
- tracking task: T230785
* s3: 24th Sept from 05:00-05:30 UTC. The list of affected wikis is at:
https://raw.githubusercontent.com/wikimedia/operations-mediawiki-config/mas…
- tracking task: T230783
* s4: 26th Sept from 05:00-05:30 UTC. The affected wiki is: commonswiki -
tracking task: T230784
If everything goes well, we do not expect to use those 30 minutes of
read-only and rather just a few minutes.
We will email send an email the day of each failover before and after it is
done.
Sorry for any inconvenience this might cause.
Hi all!
I really try not to spam the chat too much with pointers to my work on the
Abstract Wikipedia, but this one is probably also interesting for Wikidata
contributors. It is the draft for a chapter submitted to Koerner and
Reagle's Wikipedia@20 book, and talks about knowledge diversity under the
light of centralisation through projects such as Wikidata.
Public commenting phase is open until July 19, and very welcome:
"Collaborating on the sum of all knowledge across languages"
About the book: https://meta.wikimedia.org/wiki/Wikipedia@20
Link to chapter: https://wikipedia20.pubpub.org/pub/vyf7ksah
Cheers,
Denny
Dear all,
I thank you for your efforts. I invite to see my Wikimania report at https://meta.m.wikimedia.org/wiki/Wikimedia_France/Micro-financement/Wikima…. Waiting for the video of my session entitled Wikidata and Health: Current situation and perspectives.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
____________________
+21629499418
Hello,
As the importance of Wikidata increases, so do the demands on the quality
of the data. I would like to put the following proposal up for discussion.
Two basic ideas:
1. Each Wikidata page (item) is scored after each editing. This score
should express different dimensions of data quality in a quickly manageable
way.
2. A property is created via which the item refers to the score value.
Certain qualifiers can be used for a more detailed description (e.g. time
of calculation, algorithm used to calculate the score value, etc.).
The score value can be calculated either within Wikibase after each data
change or "externally" by a bot. For the calculation can be used among
other things: Number of constraints, completeness of references, degree of
completeness in relation to the underlying ontology, etc. There are already
some interesting discussions on the question of data quality which can be
used here ( see https://www.wikidata.org/wiki/Wikidata:Item_quality;
https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
Advantages
- Users get a quick overview of the quality of a page (item).
- SPARQL can be used to query only those items that meet a certain
quality level.
- The idea would probably be relatively easy to implement.
Disadvantage:
- In a way, the data model is abused by generating statements that no
longer describe the item itself, but make statements about the
representation of this item in Wikidata.
- Additional computing power must be provided for the regular
calculation of all changed items.
- Only the quality of pages is referred to. If it is insufficient, the
changes still have to be made manually.
I would now be interested in the following:
1. Is this idea suitable to effectively help solve existing quality
problems?
2. Which quality dimensions should the score value represent?
3. Which quality dimension can be calculated with reasonable effort?
4. How to calculate and represent them?
5. Which is the most suitable way to further discuss and implement this
idea?
Many thanks in advance.
Uwe Jung (UJung <https://www.wikidata.org/wiki/User:UJung>)
www.archivfuehrer-kolonialzeit.de/thesaurus
Hello,
thank you very much for your contributions and comments. I would sign most
of your remarks without hesitation.
But I would like to clarify some things again:
- The importance of Wikidata grows with its acceptance by the
"unspecialized" audience. This includes also a lot of people who are
allowed to decide about project funds or donations. As a rule, they have
little time to inform themselves sufficiently about the problems of
measuring data quality. In these hectic times, it is unfortunately common
for the audience to demand solutions that are as simple and quick to
analyse as possible. (I will leave the last sentence here as a hypothesis.)
I think it is important to try to meet these expectations.
- Recoin is known. And yes, it serves only in connection with the
dimension of *relative* completeness. At present, however, it is primarily
aimed at people who enter data manually. Thus it remains invisible or
unusable for many others. To stick with the idea - would it not be possible
to calculate a one- or multi-dimensional value from the recoin information,
which then can be stored as a literal via a property "relative
completeness" into the item? The advantage would be that this value can be
queried via SPARQL together with the item. Possible decision-makers from
the field of "jam science" can thus gain an overview of how complete the
data from this field are in Wikidata and for which data completion projects
funds may still have to be provided. As described in my last article, a
single property "relative completeness" is not sufficient to describe data
quality.
- I am sorry if I expressed it in a misleading way. I use this mailing
list to get feedback for an idea. It may be "my" idea (or not), but it is
far from being "my" project. However, if the idea should ever be realized
by anyone in any way, I would be interested in making my small modest
contribution.
- It's true that the number of current Wikidata items is hard to
imagine. If a single instance would need only one second per item to
calculate the different quality scores, it would take about 113 years for
all. The fact that many items are modified over and over again and
therefore have to be recalculated is not yet taken into account in the
calculation. Therefore, the implemented approach would have to use
strategies that make the first results visible with less effort. One
possibility is to initially concentrate on the part of the data that is
being used. We are hear at the question about dynamic quality.
- People need support so that they can use data and find and fix their
flaws. In the foreseeable future, there will not be so many supporters who
will be able to manually check all 60 million items for errors. This is
another reason why information about the quality of the data should be
queried together with the data.
Thanks
Uwe Jung
Dear all,
I thank you for your efforts. I am managing to begin writing a survey about wiki sites and publish it in an appropriate journal... This survey will involve the software used (Mediawiki or other), the size, reference support, topic... I am looking for contributors to this project. Anyone who has published two papers about wikis in a research journal is invited to join the initiative.
Yours Sincerely,
Houcemeddine Turki (he/him)
Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
Undergraduate Researcher, UR12SP36
GLAM and Education Coordinator, Wikimedia TN User Group
Member, WikiResearch Tunisia
Member, Wiki Project Med
Member, WikiIndaba Steering Committee
Member, Wikimedia and Library User Group Steering Committee
Co-Founder, WikiLingua Maghreb
Founder, TunSci
_____________________
+21629499418