Gerard,

    I should probably keep my mouth shut about this but I am so offended but what you say that I am not.

I am not an academic.  The people behind Wikidata are.

I am a professional programmer who has spent a lot of time being the guy who finishes what other people started;  I typically come on when a project has been two years late for two years and I do whatever it takes to get the product in front of the customer.

I know that building a system in PHP to do what Wikidata is the road to hell not because I hate PHP but because I have done it myself and learned it through experience.

I first heard about Wikidata at SemTech in San Francisco and I was told very directly that they were not interested in working with anybody who was experienced with putting data from generic database in front of users because they had worked so hard to get academic positions and get a grant from the Allen Institute and it is more cost-effective and more compatible with academic advancement to hire a bunch of young people who don't know anything but will follow orders.

If you hire 3x the people you need and have good management you can make that work;  just the fact that the project has a heavyweight project manager is a very good sign.  I mean that is how the CMM 5 shops in India do it,  and perhaps they have done that because actually Wikidata has succeeded quite well from a software engineering perspective.

Now so far as RDF and SPARQL go if you'd seen my history you'd see I am an American in the Winston Churchill sense that I've tried everything except for the right thing and finally settled on it.  I really had my conversion when I discovered I could take data from Freebase and put it through something more like a reconstruction than a transformation,  convert it to RDF and I could write SPARQL queries that "just worked".

RDF* and SPARQL* do not come from an academic background but from a commercial organization that expects to make money by satisfying people's needs and it is being supported by a number of other commercial organizations.  See

http://wiki.bigdata.com/wiki/index.php/Reification_Done_Right

This is something that builds on everything successful about RDF and SPARQL and adds the "missing links" that it takes to implement data wikis.  If somebody was starting Wikidata today or if the kind of billionaire who buys sports teams the way I might buy a game console wanted to fund an effort to keep Freebase going,  RDF*/SPARQL* is the way to do it.

What happens when you build a half-baked system from scratch and don't what algebra is using is that you run into problems that get progressively worse and you wind up like woman who ate the cat because she ate the rat and so forth.  If I had a dime for every time I had to fix up some application where people could not figure out how to make primary keys that are unique or every boss who didn't want me to take my time to understand a race condition and wished I would be like the guy who made the race conditions,  just trying random things until it sorta works I would be a billionaire and I would take Freebase over and fix all the things that are wrong with it.  (Which are really not that bad,  but never happened because Google didn't have any incentive to say,  improve the book database.)

Or would I?

A structural problem with open data is that people are NOT paying for it.  If you were paying for it,  the publishers of the data would have an incentive to serve the PEOPLE who are using it.  Wikidata is playing to whims of a few rich people and it could disappear at any time when those people get tired of it or decide they have what they want and don't want to make it any easier for competitors to follow them.

 Most conventional forms of academic funding that come from governments have the same problems.  I mean,  you get your grant,  you publish a paper,  it doesn't particularly matter that what you did worked or not.  There is also the perpetual "project" orientation which is not suitable for things like arXiv.org or DBpedia which are really programs or operations.  You can have a very hard time finding $400k a year for something that is high impact (i.e. 50,000 scientists use it every day),  while next door there is somebody who got $5 million to make something that goes nowhere (i.e. the postdoc wants to use Hadoop to process the server logs but the number of hits is small enough you could do it by hand.)

In terms of producing a usable product Wikidata has made some very good progress in terms of having data that is clean (if not copious),  but in terms of having a usable query facility or dump files that are easy to work with,  it is still behind DBpedia.  I mean,  you can do a lot with the DBpedia files with grep and awk and tools like that and it is not that hard to load it into a triple store and you have SPARQL 1.1 which is eminently practical because you can use your whole bag of tricks that you use with relational databases.

In the big picture though,  quality is in the eye of the end user,  so it only happens if you have a closed feedback loop where the end user has a major influence on the behavior of the producer and certainly the possibility of making more money if you do a better job and (even more so) going out of business if you fail to do so is a powerful way to do it.

The trouble is that most people interested in open data seem to think their time is worth nothing and other people's time is worth nothing and aren't interested in paying even a small amount for services so the producers throw stuff that almost works over the wall.  I don't think it would be all that difficult for me to do for Wikidata what I did for Freebase but I am not doing it because you aren't going to pay for it.

[... GOES BACK TO WORK ON A "SKUNK WORKS" PROJECT THAT JUST MIGHT PAY OFF]


On Fri, Feb 20, 2015 at 8:09 AM, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:
Hoi,
I have waited for some time to reply. FIrst of all. Wikidata is not your average data repository. It would not be as relevant as it is if it were not for the fact that it links Wikipedia articles of any language to statements on items.

This is the essence of Wikidata. After that we can all complain about the fallacies of Wikidata.. I have my pet pieves and it is not your RDF SPARQL and stuff. That is mostly stuff for academics and it its use is largely academic and not useful on the level where I want progress. Exposing this information to PEOPLE is what I am after and by and large they do not live in the ivory towers where RDF and SPARQL live.

I am delighted to learn that a production grade replacement of WDQ is being worked on. I am delighted that a front-end (javascript) ? developers is being sought. That is what it takes to bring the sum of al knowledge to all people. It is in enriching the data in Wikidata not in yet another pet project where we can make a difference because that is what the people will see. When SPARQL is available with Wikidata data.. do wonder how you would serve all the readers of Wikipedia.. Does SPARQL sparkle enough when it is challenged in this way ?
Thanks,
     GerardM

On 18 February 2015 at 21:25, Paul Houle <ontology2@gmail.com> wrote:
What bugs me about it is that Wikidata has gone down the same road as Freebase and Neo4J in the sense of developing something ad-hoc that is not well understood.

I understand the motivations that lead there,  because there are requirements to meet that standards don't necessarily satisfy,  plus Wikidata really is doing ambitious things in the sense of capturing provenance information.

Perhaps it has come a little too late to help with Wikidata but it seems to me that RDF* and SPARQL* have a lot to offer for "data wikis" in that you can view data as plain ordinary RDF and query with SPARQL but you can also attach provenance and other metadata in a sane way with sweet syntax for writing it in Turtle or querying it in other ways.

Another way of thinking about it is that RDF* is formalizing the property graph model which has always been ad hoc in products like Neo4J.  I can say that knowing what the algebra is you are implementing helps a lot in getting the tools to work right.  So you not only have SPARQL queries as a possibility but also languages like Gremlin and Cypher and this is all pretty exciting.  It is also exciting that vendors are getting on board with this and we are going to seeing some stuff that is crazy scalable (way past 10^12 facts on commodity hardware) very soon.




On Tue, Feb 17, 2015 at 12:20 PM, Jeroen De Dauw <jeroendedauw@gmail.com> wrote:
Hey,

As Lydia mentioned, we obviously do not actively discourage outside contributions, and will gladly listen to suggestions on how we can do better. That being said, we are actively taking steps to make it easier for developers not already part of the community to start contributing.

For instance, we created a website about our software itself [0], which lists the MediaWiki extensions and the different libraries [1] we created. For most of our libraries, you can just clone the code and run composer install. And then you're all set. You can make changes, run the tests and submit them back. Different workflow than what you as MediaWiki developer are used to perhaps, though quite a bit simpler. Furthermore, we've been quite progressive in adopting practices and tools from the wider PHP community.

I definitely do not disagree with you that some things could, and should, be improved. Like you I'd like to see the Wikibase git repository and naming of the extensions be aligned more, since it indeed is confusing. Increased API stability, especially the JavaScript one, is something else on my wish-list, amongst a lot of other things. There are always reasons of why things are the way they are now and why they did not improve yet. So I suggest to look at specific pain points and see how things can be improved there. This will get us much further than looking at the general state, concluding people do not want third party contributions, and then protesting against that.

Cheers

--
Jeroen De Dauw - http://www.bn2vs.com
Software craftsmanship advocate
Evil software architect at Wikimedia Germany
~=[,,_,,]:3


_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




--
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontology2@gmail.com

_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l




--
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontology2@gmail.com
http://legalentityidentifier.info/lei/lookup