Hi Denny,
Thanks! I am not sure how accurate it will be, if it doesn't meet expectations, I might need to think about optimizing the model, different metrics etc.; I haven't really thought about those at the moment.
what do the two properties without a value mean here?
Let me explain those. You can take a look at the wiki pages mentioned on this page - https://github.com/nilesh-c/wikidata-entity-suggester
Currently, things like these are stored on the recommendation engine: 100,32,7 60,151,7 ... 56,152----10256,12 ...
In the first kind you can see, pairs of <item> and <property> are there, along with the relative affinity 7. Now, suppose we have lots of "city" items and their respective properties. Say someone tries adding another item that is a city. Now, as he begins adding properties to that item (properties that generally belong to a city of course), irrespective of whether he enters any values for them or not, the entity suggester will suggest "similar" properties. We are not even talking about "values" here. If the user *does* add values, better recommendations are fetched. This is primarily about fetching recommendations for "properties"
So, if someone starts adding a new city called "Wonderland" and adds properties like "is in the administrative unithttp://www.wikidata.org/wiki/Property:P131" or "head of local government http://www.wikidata.org/wiki/Property:P6", the suggester will tell the user that probably "countryhttp://www.wikidata.org/wiki/Property:P17" and "flag image http://www.wikidata.org/wiki/Property:P41" are some properties that he/she should add. At least that's the idea.
Now, suggesting values - the current implementation of suggesting values is just a side-addition. It might not be really accurate. What I intend to add afterwards is something like this: after the user enters stuff like 41,32,45----462347,.... blah blah, he wants "value" suggestions for property 31, ie. suggesting "values" to properties.
So, in brief, what currently happens: Suggest "property-value" mappings to new "item". Suggest "properties" to new "item". (new item means, anonymous item, an item without an ID, yet to be added)
What I need to add: Suggest "value" to a "property". (This is exactly what you were expecting)
In essence, we combine these 3 types of recommendations and do some magic. I hope this helps you to understand it better. :)
How quickly are updates processed by the backend, any idea?
On an Intel core i5-2500K quad-core machine with 4G RAM, this dataset (17th April - wikidatawiki-20130417-pages-meta-current.xml.bz2http://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages-meta-current.xml.bz2 ):
Data points (pairs) - 8360275 Items - 1965516 Properties and Property-Value pairs - 686318
-- took about 45 mins to build the CSV files, and 15-20 mins to build the Myrrix model. So it's about 1 hour in total. Parallelizing the CSV file building will probably bring that time down a bit, not certain though.
Adding new data (items, properties etc) at runtime is pretty much instantaneous - adding a bunch of 1000 data points will probably take 1 sec, adding 10 data points will be 100ms approximately (including the PHP client's time and all). It's just an estimate from what I've experienced. I haven't done any proper benchmarks myself.
Cheers, Nilesh
On Wed, May 22, 2013 at 5:23 PM, Denny Vrandečić < denny.vrandecic@wikimedia.de> wrote:
Awesome, that looks already pretty promising!
I am not completely sure I understand a few things:
107----4167410 106 107----215627 156
what do the two properties without a value mean here?
I would have expected:
107----4167410 107----215627
and now ask for suggested values for 31, or for suggested properties to add.
But these are already details. The results seem pretty promising.
How quickly are updates processed by the backend, any idea?
2013/5/21 Nilesh Chakraborty nilesh@nileshc.com
Hello,
I have some updates on the Entity Suggester prototype. Here are the two repos:
As it stands now, deployment-wise, I have a single Java war file that's deployed on Tomcat. And there's a PHP client that can be used from PHP
code
to push data into or fetch suggestions from that engine.
I have made a simple, crude demo that you can access here -http://home.nileshc.com/wesTest.php. You can find the code for it in the wes-php-client repo. It's hosted on
my
home desktop temporarily. I am having some non-technical problems with
the
VPS I'm managing and customer support is working on it. After it starts
to
work, I may try deploying this to the VPS. So, if you have to face an embarrassing 404 page, I'm really sorry, I'll be working on it. If it
stays
up, well and good. :) http://home.nileshc.com/wesTest.php
You can give it a bunch of property IDs, or a bunch of property-value pairs, or a mix of both; select the the type of recommendation and hit
"Get
suggestions!" :) Feedback is much appreciated.
Cheers, Nilesh
On Tue, May 14, 2013 at 2:36 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 05/13/2013 04:28 PM, Nilesh Chakraborty wrote:
Hi Matt,
Yes, you're right, they are available as separately licensed
downloads.
Only the stand-alone "Serving Layer" is needed for the Entity
Suggester.
It's licensed under Apache v. 2.0. Since I'm using the software
as-is,
without any code modifications, I suppose it's compatible with what Wikidata would allow?
Apache 2.0-licensed software should be fine, even if you do need/want
to
modify it.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- A quest eternal, a life so small! So don't just play the guitar, build
one.
You can also email me at contact@nileshc.com or visit my websitehttp://www.nileshc.com/ _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l