I think more fundamentally there is the issue that Wikidata doesn't serve end users well because the end users are not paying for it.  (Contrast an NGO that would be doing things for people in Africa without asking the people what they want as opposed to a commercial operation that is going to fly or die based on the ability to serve identified needs of Africans.) 

I am by no means a market fundamentalist but when you look at Amazon.com,  you see there is a virtuous circle where small incremental improvements that make the store better put money on the bottom line,  linking career advancement to customer success,  etc.  Over time the incremental changes snowball.  (Alternatively we could have exponential convergence instead of expansion)

I was looking around for API management solutions,  and they all address things like "creating stubs for the end user",  "increasing developer engagement",  "converting XML to JSON and vice versa" and the always dubious idea that adding a proxy server of some kind on the public internet would help you meet an SLA.  None of them support the minimal viable product function of 'charging people to use the API' at a basic level,  although if you talk to the sales people maybe they will help you with a "monetization engine" (who knows if it puts ads in the results) but you will pay at least as much a month for this feature as the Silk Road spent on software development (unfortunately earning it back in the form of marked bitcoins)

And the API management sites are dealing with big name companies like Target and Clorox,  all of these companies that are avaricious and smart about money are not charging people for APIs.

If you are not the customer,  you are the product.

"End user" is a fuzzy word though because that Dutch guy who is interested in Polders is not the ordinary end user,  although you practically need to bring people like that into things like Wikidata because you need their curation.  Another tough problem is that we all have our specialties,  so one person really needs a good database of wine regions,  another one ski areas,  another one cares about books and another couldn't care less about books but is into video games.  (The person who wants to contribute or pay for improvements for area Z does not care about area Y)

Freebase was not particularly successful at getting unpaid help to improve their database because of these fundamental economics;  you might make the case that friction in the form of "this data format is different from everything else" or "the UI sux"  or "the rest of the world hasn't caught up with us on tooling" is the main problem,  but people would overcome those problems if the motivation existed.

Anyhow,  there is this funny little thing that the gap between "5 cents" and free is bigger than the gap between "5 cents" and $1000,  so you have the Bloombergs and Elseviers of the world charging $1000 for what somebody could provide for much less.  This problem exists for the human readable web and so far advertising has been the answer,  but it has not been solved for open data.



On Mon, Sep 28, 2015 at 2:01 PM, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:
Hoi,

Sorry I disagree with your analysis. The fundamental issue is not quality and it is not the size of our community. The issue is that we have our priorities wrong. As far as I am concerned the "primary sources tool" is a wrong approach for a dataset like Freebase or DBpedia.

What we should concentrate on is find likely issues that exist in Wikidata. Make people aware of them and have a proper workflow that will point people to the things they care about. When I care about "polders" show me content where another source disagrees with what we have. As I care about "polders" I will spend time on it BECAUSE I care and am invited to resolve issues. I will be challenged because every item I touch has an issue. I do not mind to do this when the data in Wikidata differs from DBpedia, Freebase or whatever.. My time is well spend. THAT is why I will be challenged, that is why I will be willing to work on this.

I will not do this for new data in the primary sources tool. At most I will give it a glance and accept it. I would only do this where data in the primary sources tool differs. That however is exactly the same scenario that I just described.

I am not willing to look at data in Wikidata Freebase or DBpedia in the primary sources tool one item/statement at a time; we know that they are of a similar quality as Wikidata. The percentages make it a waste of time. With iterative comparisons of other sources we will find the booboos easy enough. We will spend the time of our communities effectively and we will increase quality and quality and community.

The approach of the primary sources tool is wrong. It should only be about linking data and define how this is done.

The problem is indeed with the community. Its time is wasted and it is much more effective for me to add new data than work on data that is already in the primary sources tool.
Thanks,
       GerardM

On 28 September 2015 at 16:52, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
Hi Gerard, hi all,

The key misunderstanding here is that the main issue with the Freebase import would be data quality. It is actually community support. The goal of the current slow import process is for the Wikidata community to "adopt" the Freebase data. It's not about "storing" the data somewhere, but about finding a way to maintain it in the future.

The import statistics show that Wikidata does not currently have enough community power for a quick import. This is regrettable, but not something that we can fix by dumping in more data that will then be orphaned.

Freebase people: this is not a small amount of data for our young community. We really need your help to digest this huge amount of data! I am absolutely convinced from the emails I saw here that none of the former Freebase editors on this list would support low quality standards. They have fought hard to fix errors and avoid issues coming into their data for a long time.

Nobody believes that either Freebase or Wikidata can ever be free of errors, and this is really not the point of this discussion at all [1]. The experienced community managers among us know that it is not about the amount of data you have. Data is cheap and easy to get, even free data with very high quality. But the value proposition of Wikidata is not that it can provide storage space for lot of data -- it is that we have a functioning community that can maintain it. For the Freebase data donation, we do not seem to have this community yet. We need to find a way to engage people to do this. Ideas are welcome.

What I can see from the statistics, however, is that some users (and I cannot say if they are "Freebase users" or "Wikidata users" ;-) are putting a lot of effort into integrating the data already. This is great, and we should thank these people because they are the ones who are now working on what we are just talking about here. In addition, we should think about ways of engaging more community in this. Some ideas:

(1) Find a way to clean and import some statements using bots. Maybe there are cases where Freebase already had a working import infrastructure that could be migrated to Wikidata? This would also solve the community support problem in one way. We just need to import the maintenance infrastructure together with the data.

(2) Find a way to expose specific suggestions to more people. The Wikidata Games have attracted so many contributions. Could some of the Freebase data be solved in this way, with a dedicated UI?

(3) Organise Freebase edit-a-thons where people come together to work through a bunch of suggested statements.

(4) Form wiki projects that discuss a particular topic domain in Freebase and how it could be imported faster using (1)-(3) or any other idea.

(5) Connect to existing Wiki projects to make them aware of valuable data they might take from Freebase.

Freebase is a much better resource than many other data resources we are already using with similar approaches as (1)-(5) above, and yet it seems many people are waiting for Google alone to come up with a solution.

Cheers,

Markus

[1] Gerard, if you think otherwise, please let us know which error rates you think are typical or acceptable for Freebase and Wikidata, respectively. Without giving actual numbers you just produce empty strawman arguments (for example: claiming that anyone would think that Wikidata is better quality than Freebase and then refuting this point, which nobody is trying to make). See https://en.wikipedia.org/wiki/Straw_man


On 26.09.2015 18:31, Gerard Meijssen wrote:
Hoi,
When you analyse the statistics, it shows how bad the current state of
affairs is. Slightly over one in a thousanths of the content of the
primary sources tool has been included.

Markus, Lydia and myself agree that the content of Freebase may be
improved. Where we differ is that the same can be said for Wikidata. It
is not much better and by including the data from Freebase we have a
much improved coverage of facts. The same can be said for the content of
DBpedia probably other sources as well.

I seriously hate this procrastination and the denial of the efforts of
others. It is one type of discrimination that is utterly deplorable.

We should concentrate on comparing Wikidata with other sources that are
maintained. We should do this repeatedly and concentrate on workflows
that seek the differences and provide workflows that help our community
to improve what we have. What we have is the sum of all available
knowledge and by splitting it up, we are weakened as a result.
Thanks,
       GerardM

On 26 September 2015 at 03:32, Thad Guidry <thadguidry@gmail.com
<mailto:thadguidry@gmail.com>> wrote:

    Also, Freebase users themselves who did daily, weekly work.... some
    where passing users, some tried harder, but made lots of erroneous
    entries (battling against our Experts at times).  We could probably
    provide a list of those sorta community blacklisted users who's data
    submissions should probably not be trusted.

    +1 for looking at better maintained specific properties.
    +1 for being cautious for some Freebase usernames and their entries.
    +1 for trusting wholesale all of the Freebase Experts submissions.
    We policed each other quite well.



    Thad
    +ThadGuidry <https://www.google.com/+ThadGuidry>

    On Fri, Sep 25, 2015 at 11:45 AM, Jason Douglas
    <jasondouglas@google.com <mailto:jasondouglas@google.com>> wrote:

        > It would indeed be interesting to see which percentage of proposals are
        > being approved (and stay in Wikidata after a while), and whether there
        > is a pattern (100% approval on some type of fact that could then be
        > merged more quickly; or very low approval on something else that would
        > maybe better revisited for mapping errors or other systematic problems).

        +1, I think that's your best bet. Specific properties were much
        better maintained than others -- identify those that meet the
        bar for wholesale import and leave the rest to the primary
        sources tool.

        On Thu, Sep 24, 2015 at 4:03 PM Markus Krötzsch
        <markus@semantic-mediawiki.org
        <mailto:markus@semantic-mediawiki.org>> wrote:

            On 24.09.2015 23:48, James Heald wrote:
             > Has anybody actually done an assessment on Freebase and
            its reliability?
             >
             > Is it *really* too unreliable to import wholesale?

              From experience with the Primary Sources tool proposals,
            the quality is
            mixed. Some things it proposes are really very valuable, but
            other
            things are also just wrong. I added a few very useful facts
            and fitting
            references based on the suggestions, but I also rejected
            others. Not
            sure what the success rate is for the cases I looked at, but
            my feeling
            is that some kind of "supervised import" approach is really
            needed when
            considering the total amount of facts.

            An issue is that it is often fairly hard to tell if a
            suggestion is true
            or not (mainly in cases where no references are suggested to
            check). In
            other cases, I am just not sure if a fact is correct for the
            property
            used. For example, I recently ended up accepting "architect:
            Charles
            Husband" for Lovell Telescope (Q555130), but to be honest I
            am not sure
            that this is correct: he was the leading engineer contracted
            to design
            the telescope, which seems different from an architect; no
            official web
            site uses the word "architect" it seems; I could not find a
            better
            property though, and it seemed "good enough" to accept it
            (as opposed to
            the post code of the location of this structure, which
            apparently was
            just wrong).

             >
             > Are there any stats/progress graphs as to how the actual
            import is in
             > fact going?

            It would indeed be interesting to see which percentage of
            proposals are
            being approved (and stay in Wikidata after a while), and
            whether there
            is a pattern (100% approval on some type of fact that could
            then be
            merged more quickly; or very low approval on something else
            that would
            maybe better revisited for mapping errors or other
            systematic problems).

            Markus


             >
             >    -- James.
             >
             >
             > On 24/09/2015 19:35, Lydia Pintscher wrote:
             >> On Thu, Sep 24, 2015 at 8:31 PM, Tom Morris
            <tfmorris@gmail.com <mailto:tfmorris@gmail.com>> wrote:
             >>>> This is to add MusicBrainz to the primary source tool,
            not anything
             >>>> else?
             >>>
             >>>
             >>> It's apparently worse than that (which I hadn't
            realized until I
             >>> re-read the
             >>> transcript).  It sounds like it's just going to
            generate little warning
             >>> icons for "bad" facts and not lead to the recording of
            any new facts
             >>> at all.
             >>>
             >>> 17:22:33 <Lydia_WMDE> we'll also work on getting the
            extension
             >>> deployed that
             >>> will help with checking against 3rd party databases
             >>> 17:23:33 <Lydia_WMDE> the result of constraint checks
            and checks
             >>> against 3rd
             >>> party databases will then be used to display little
            indicators next to a
             >>> statement in case it is problematic
             >>> 17:23:47 <Lydia_WMDE> i hope this way more people
            become aware of
             >>> issues and
             >>> can help fix them
             >>> 17:24:35 <sjoerddebruin> Do you have any names of
            databases that are
             >>> supported? :)
             >>> 17:24:59 <Lydia_WMDE> sjoerddebruin: in the first
            version the german
             >>> national library. it can be extended later
             >>>
             >>>
             >>> I know Freebase is deemed to be nasty and unreliable,
            but is MusicBrainz
             >>> considered trustworthy enough to import directly or
            will its facts
             >>> need to
             >>> be dripped through the primary source soda straw one at
            a time too?
             >>
             >> The primary sources tool and the extension that helps us
            check against
             >> other databases are two independent things.
             >> Imports from Musicbrainz have been happening since a
            very long time
             >> already.
             >>
             >>
             >> Cheers
             >> Lydia
             >>
             >
             >
             > _______________________________________________
             > Wikidata mailing list
             > Wikidata@lists.wikimedia.org
            <mailto:Wikidata@lists.wikimedia.org>
             > https://lists.wikimedia.org/mailman/listinfo/wikidata


            _______________________________________________
            Wikidata mailing list
            Wikidata@lists.wikimedia.org
            <mailto:Wikidata@lists.wikimedia.org>
            https://lists.wikimedia.org/mailman/listinfo/wikidata


        _______________________________________________
        Wikidata mailing list
        Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/wikidata



    _______________________________________________
    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/wikidata




_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Paul Houle

Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com

:BaseKB -- Query Freebase Data With SPARQL

Legal Entity Identifier Lookup

Join our Data Lakes group on LinkedIn