Hi Tom,
I think the example you have there is actually linked up properly at the
moment?
https://en.wikipedia.org/wiki/SPATA5 is about both the gene and the protein
as are most Wikipedia articles of this nature. And it is linked to the
gene the way we encourage modeling
https://www.wikidata.org/wiki/Q18052679
- and indeed the protein item is not linked to a Wikipedia article again
following our preferred pattern.
For the moment... _our_ merge problem seems to be mostly resolved.
Correcting the sitelinks on the non-english Wikipedias in a big batch
seemed to help slow the flow dramatically. We have also introduced some
flexibility into the Lua code that produces infobox_gene on Wikipedia. It
can handle most of the possible situations (e.g. wikipedia linked to
protein, wikipedia linked to gene) automatically so that helps prevent
visible disasters..
On the main issue you raise about merges.. I'm a little on the fence.
Generally I'm opposed to putting constraints in place that slow people down
- e.g. we have a lot of manual merge work that needs to be done in the
medical arena and I do appreciate that the current process is pretty fast.
I guess I would advocate a focus on making the interface more vehemently
educational as a first step. E.g. lots of 'are you sure' etc. forms to
click through but ultimately still letting people get their work done
without enforcing an approval process.
-Ben
On Tue, Jun 14, 2016 at 10:53 AM, Tom Morris <tfmorris(a)gmail.com> wrote:
Bad merges have been mentioned a couple of times
recently and I think one
of the contexts with Ben's gene/protein work.
I think there are two general issues here which could be improved:
1. Merging is too easy. Because splitting/unmerging is much harder than
merging, particularly after additional edits, the process should be biased
to mark merging more difficult.
2. The impedance mismatch between Wikidata and Wikipedias tempts
wikipedians who are new to wikidata to do the wrong thing.
The second is a community education issue which will hopefully improve
over time, but the first could be improved, in my opinion, by requiring
more than one person to approve a merge. The Freebase scheme was that
duplicate topics could be flagged for merge by anyone, but instead of
merging, they'd be placed in a queue for voting. Unanimous votes would
cause merges to be automatically processed. Conflicting votes would get
bumped to a second level queue for manual handling. This wasn't foolproof,
but caught a lot of the naive "these two things have the same name, so they
must be the same thing" merge proposals by newbies. There are lots of
variations that could be implemented, but the general idea is to get more
than one pair of eyes involved.
A specific instance of the structural impedance mismatch is enwiki's
handling of genes & proteins. Sometimes they have a page for each, but
often they have a single page that deals with both or, worse, a page who's
text says its about the protein, but where the page includes a gene infobox.
This unanswered RFC from Oct 2015 asks whether protein & gene should be
merged:
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Oxytocin_and_OX…
I recently ran across a similar situation where this Wikidata gene SPATA5
https://www.wikidata.org/wiki/Q18052679 is linked to an enwiki page about
the associated protein
https://en.wikipedia.org/wiki/SPATA5, while the
Wikidata protein is not linked to any wikis
https://www.wikidata.org/wiki/Q21207860
These differences in handling make the reconciliation process very
difficult and the resulting errors encourage erroneous merges. The
gene/protein case probably needs multiple fixes, but many mergers harder
would help.
Tom
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata