So far no argument is given why the primary sources tool WOULD work. People will be interested in curating Wikidata. People will not be interested in checking the primary sources tool one item / statement at a time. It is a numbers game; there is too much to do in this way.

I know that comparing Wikidata against other sources is a different tool. It however provides a sane way of working on data. When used in an iterative way it provides a clean process to integrate this data effectively. It has our communitity involved and concentrated on things where human effort makes a difference.

I have asked time and again to provide arguments why the primary sources tool would work. Arguably it does not function at all and the statistics prove this. I have argued for a different approach and as there are no arguments there is silence. I do not want to rubbish the work of others but given that it is the only route, the official route to import data from other sources at some stage there is no alternative.

At some stage a tool like the primary sources tool becomes a liability. In my mind it certainly is when we have the announced tool for comparing data against sources. When the only argument for the primary sources tool is the effort people / companies put in there, it is a pitiful argument. Pity for the people involved but that is it.

On 28 September 2015 at 21:36, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:

Why do you spend so much energy on criticising the work of other volunteers and companies that want to help Wikidata? Switching off Primary Sources would not achieve any progress towards what you want. I have made some proposals in my email on what else could be done to speed things up. You could work on realising some of these ideas, you could propose other activities to the community, or you could just help elsewhere on Wikidata. Focussing on a tool you don't like and don't want to use will not make you (or the rest of us) happy.


On 28.09.2015 20:01, Gerard Meijssen wrote:

Sorry I disagree with your analysis. The fundamental issue is not
quality and it is not the size of our community. The issue is that we
have our priorities wrong. As far as I am concerned the "primary sources
tool" is a wrong approach for a dataset like Freebase or DBpedia.

What we should concentrate on is find likely issues that exist in
Wikidata. Make people aware of them and have a proper workflow that will
point people to the things they care about. When I care about "polders"
show me content where another source disagrees with what we have. As I
care about "polders" I will spend time on it BECAUSE I care and am
invited to resolve issues. I will be challenged because every item I
touch has an issue. I do not mind to do this when the data in Wikidata
differs from DBpedia, Freebase or whatever.. My time is well spend. THAT
is why I will be challenged, that is why I will be willing to work on this.

I will not do this for new data in the primary sources tool. At most I
will give it a glance and accept it. I would only do this where data in
the primary sources tool differs. That however is exactly the same
scenario that I just described.

I am not willing to look at data in Wikidata Freebase or DBpedia in the
primary sources tool one item/statement at a time; we know that they are
of a similar quality as Wikidata. The percentages make it a waste of
time. With iterative comparisons of other sources we will find the
booboos easy enough. We will spend the time of our communities
effectively and we will increase quality and quality and community.

The approach of the primary sources tool is wrong. It should only be
about linking data and define how this is done.

The problem is indeed with the community. Its time is wasted and it is
much more effective for me to add new data than work on data that is
already in the primary sources tool.

On 28 September 2015 at 16:52, Markus Krötzsch
<markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>


    Hi Gerard, hi all,

    The key misunderstanding here is that the main issue with the
    Freebase import would be data quality. It is actually community
    support. The goal of the current slow import process is for the
    Wikidata community to "adopt" the Freebase data. It's not about
    "storing" the data somewhere, but about finding a way to maintain it
    in the future.

    The import statistics show that Wikidata does not currently have
    enough community power for a quick import. This is regrettable, but
    not something that we can fix by dumping in more data that will then
    be orphaned.

    Freebase people: this is not a small amount of data for our young
    community. We really need your help to digest this huge amount of
    data! I am absolutely convinced from the emails I saw here that none
    of the former Freebase editors on this list would support low
    quality standards. They have fought hard to fix errors and avoid
    issues coming into their data for a long time.

    Nobody believes that either Freebase or Wikidata can ever be free of
    errors, and this is really not the point of this discussion at all
    [1]. The experienced community managers among us know that it is not
    about the amount of data you have. Data is cheap and easy to get,
    even free data with very high quality. But the value proposition of
    Wikidata is not that it can provide storage space for lot of data --
    it is that we have a functioning community that can maintain it. For
    the Freebase data donation, we do not seem to have this community
    yet. We need to find a way to engage people to do this. Ideas are

    What I can see from the statistics, however, is that some users (and
    I cannot say if they are "Freebase users" or "Wikidata users" ;-)
    are putting a lot of effort into integrating the data already. This
    is great, and we should thank these people because they are the ones
    who are now working on what we are just talking about here. In
    addition, we should think about ways of engaging more community in
    this. Some ideas:

    (1) Find a way to clean and import some statements using bots. Maybe
    there are cases where Freebase already had a working import
    infrastructure that could be migrated to Wikidata? This would also
    solve the community support problem in one way. We just need to
    import the maintenance infrastructure together with the data.

    (2) Find a way to expose specific suggestions to more people. The
    Wikidata Games have attracted so many contributions. Could some of
    the Freebase data be solved in this way, with a dedicated UI?

    (3) Organise Freebase edit-a-thons where people come together to
    work through a bunch of suggested statements.

    (4) Form wiki projects that discuss a particular topic domain in
    Freebase and how it could be imported faster using (1)-(3) or any
    other idea.

    (5) Connect to existing Wiki projects to make them aware of valuable
    data they might take from Freebase.

    Freebase is a much better resource than many other data resources we
    are already using with similar approaches as (1)-(5) above, and yet
    it seems many people are waiting for Google alone to come up with a



    [1] Gerard, if you think otherwise, please let us know which error
    rates you think are typical or acceptable for Freebase and Wikidata,
    respectively. Without giving actual numbers you just produce empty
    strawman arguments (for example: claiming that anyone would think
    that Wikidata is better quality than Freebase and then refuting this
    point, which nobody is trying to make). See

    On 26.09.2015 18:31, Gerard Meijssen wrote:

        When you analyse the statistics, it shows how bad the current
        state of
        affairs is. Slightly over one in a thousanths of the content of the
        primary sources tool has been included.

        Markus, Lydia and myself agree that the content of Freebase may be
        improved. Where we differ is that the same can be said for
        Wikidata. It
        is not much better and by including the data from Freebase we have a
        much improved coverage of facts. The same can be said for the
        content of
        DBpedia probably other sources as well.

        I seriously hate this procrastination and the denial of the
        efforts of
        others. It is one type of discrimination that is utterly deplorable.

        We should concentrate on comparing Wikidata with other sources
        that are
        maintained. We should do this repeatedly and concentrate on
        that seek the differences and provide workflows that help our
        to improve what we have. What we have is the sum of all available
        knowledge and by splitting it up, we are weakened as a result.

        On 26 September 2015 at 03:32, Thad Guidry <thadguidry@gmail.com
        <mailto:thadguidry@gmail.com <mailto:thadguidry@gmail.com>>> wrote:

             Also, Freebase users themselves who did daily, weekly
        work.... some
             where passing users, some tried harder, but made lots of
             entries (battling against our Experts at times).  We could
             provide a list of those sorta community blacklisted users
        who's data
             submissions should probably not be trusted.

             +1 for looking at better maintained specific properties.
             +1 for being cautious for some Freebase usernames and their
             +1 for trusting wholesale all of the Freebase Experts
             We policed each other quite well.

             +ThadGuidry <https://www.google.com/+ThadGuidry>

             On Fri, Sep 25, 2015 at 11:45 AM, Jason Douglas
             <jasondouglas@google.com <mailto:jasondouglas@google.com>
        <mailto:jasondouglas@google.com>>> wrote:

                 > It would indeed be interesting to see which
        percentage of proposals are
                 > being approved (and stay in Wikidata after a while),
        and whether there
                 > is a pattern (100% approval on some type of fact that
        could then be
                 > merged more quickly; or very low approval on
        something else that would
                 > maybe better revisited for mapping errors or other
        systematic problems).

                 +1, I think that's your best bet. Specific properties
        were much
                 better maintained than others -- identify those that
        meet the
                 bar for wholesale import and leave the rest to the primary
                 sources tool.

                 On Thu, Sep 24, 2015 at 4:03 PM Markus Krötzsch

        <mailto:markus@semantic-mediawiki.org>>> wrote:

                     On 24.09.2015 23:48, James Heald wrote:
                      > Has anybody actually done an assessment on
        Freebase and
                     its reliability?
                      > Is it *really* too unreliable to import wholesale?

                       From experience with the Primary Sources tool
                     the quality is
                     mixed. Some things it proposes are really very
        valuable, but
                     things are also just wrong. I added a few very
        useful facts
                     and fitting
                     references based on the suggestions, but I also
                     others. Not
                     sure what the success rate is for the cases I
        looked at, but
                     my feeling
                     is that some kind of "supervised import" approach
        is really
                     needed when
                     considering the total amount of facts.

                     An issue is that it is often fairly hard to tell if a
                     suggestion is true
                     or not (mainly in cases where no references are
        suggested to
                     check). In
                     other cases, I am just not sure if a fact is
        correct for the
                     used. For example, I recently ended up accepting
                     Husband" for Lovell Telescope (Q555130), but to be
        honest I
                     am not sure
                     that this is correct: he was the leading engineer
                     to design
                     the telescope, which seems different from an
        architect; no
                     official web
                     site uses the word "architect" it seems; I could
        not find a
                     property though, and it seemed "good enough" to
        accept it
                     (as opposed to
                     the post code of the location of this structure, which
                     apparently was
                     just wrong).

                      > Are there any stats/progress graphs as to how
        the actual
                     import is in
                      > fact going?

                     It would indeed be interesting to see which
        percentage of
                     proposals are
                     being approved (and stay in Wikidata after a
        while), and
                     whether there
                     is a pattern (100% approval on some type of fact
        that could
                     then be
                     merged more quickly; or very low approval on
        something else
                     that would
                     maybe better revisited for mapping errors or other
                     systematic problems).


                      >    -- James.
                      > On 24/09/2015 19:35, Lydia Pintscher wrote:
                      >> On Thu, Sep 24, 2015 at 8:31 PM, Tom Morris
                     <tfmorris@gmail.com <mailto:tfmorris@gmail.com>
        <mailto:tfmorris@gmail.com <mailto:tfmorris@gmail.com>>> wrote:
                      >>>> This is to add MusicBrainz to the primary
        source tool,
                     not anything
                      >>>> else?
                      >>> It's apparently worse than that (which I hadn't
                     realized until I
                      >>> re-read the
                      >>> transcript).  It sounds like it's just going to
                     generate little warning
                      >>> icons for "bad" facts and not lead to the
        recording of
                     any new facts
                      >>> at all.
                      >>> 17:22:33 <Lydia_WMDE> we'll also work on
        getting the
                      >>> deployed that
                      >>> will help with checking against 3rd party
                      >>> 17:23:33 <Lydia_WMDE> the result of constraint
                     and checks
                      >>> against 3rd
                      >>> party databases will then be used to display
                     indicators next to a
                      >>> statement in case it is problematic
                      >>> 17:23:47 <Lydia_WMDE> i hope this way more people
                     become aware of
                      >>> issues and
                      >>> can help fix them
                      >>> 17:24:35 <sjoerddebruin> Do you have any names of
                     databases that are
                      >>> supported? :)
                      >>> 17:24:59 <Lydia_WMDE> sjoerddebruin: in the first
                     version the german
                      >>> national library. it can be extended later
                      >>> I know Freebase is deemed to be nasty and
                     but is MusicBrainz
                      >>> considered trustworthy enough to import
        directly or
                     will its facts
                      >>> need to
                      >>> be dripped through the primary source soda
        straw one at
                     a time too?
                      >> The primary sources tool and the extension that
        helps us
                     check against
                      >> other databases are two independent things.
                      >> Imports from Musicbrainz have been happening
        since a
                     very long time
                      >> already.
                      >> Cheers
                      >> Lydia
                      > _______________________________________________
                      > Wikidata mailing list
                      > Wikidata@lists.wikimedia.org

                     Wikidata mailing list
        Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>

                 Wikidata mailing list

             Wikidata mailing list

        Wikidata mailing list
        Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>

    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>

Wikidata mailing list

Wikidata mailing list