Hi everyone,
I'm working on a prototype for the Wikidata Entity Suggester (Bug #46555https://bugzilla.wikimedia.org/show_bug.cgi?id=46555). As of now, it is a command-line client, completely written in Java, that fetches recommendations from a Myrrix server layer.
Please take a look at the GitHub repository here: https://github.com/nilesh-c/wikidata-entity-suggester/ I would really appreciate it if you can take the time to go through the README and provide me with some much-needed feedback. Any questions or suggestions are welcome. If you're curious, you can set up the whole thing on your own machine.
Check out a few examples too: https://github.com/nilesh-c/wikidata-entity-suggester/wiki/Examples
It can suggest properties and values for new/not-yet-created items (and also currently present items), if it's given a few properties/values as input data.
I intend to write a REST API and/or a simple PHP frontend for it before I set it up on a remote VPS, so that everyone can test it out. Some experimentation and quality optimization is also due.
Cheers, Nilesh (User Page - https://www.mediawiki.org/wiki/User:Nilesh.c)
That's awesome!
Two things: * how set are you on a Java-based solution? We would prefer PHP in order to make it more likely to be deployed. * could you provide a link to a running demo?
Cheers, Denny
2013/5/13 Nilesh Chakraborty nilesh@nileshc.com
Hi everyone,
I'm working on a prototype for the Wikidata Entity Suggester (Bug #46555https://bugzilla.wikimedia.org/show_bug.cgi?id=46555). As of now, it is a command-line client, completely written in Java, that fetches recommendations from a Myrrix server layer.
Please take a look at the GitHub repository here: https://github.com/nilesh-c/wikidata-entity-suggester/ I would really appreciate it if you can take the time to go through the README and provide me with some much-needed feedback. Any questions or suggestions are welcome. If you're curious, you can set up the whole thing on your own machine.
Check out a few examples too: https://github.com/nilesh-c/wikidata-entity-suggester/wiki/Examples
It can suggest properties and values for new/not-yet-created items (and also currently present items), if it's given a few properties/values as input data.
I intend to write a REST API and/or a simple PHP frontend for it before I set it up on a remote VPS, so that everyone can test it out. Some experimentation and quality optimization is also due.
Cheers, Nilesh (User Page - https://www.mediawiki.org/wiki/User:Nilesh.c)
-- A quest eternal, a life so small! So don't just play the guitar, build one. You can also email me at contact@nileshc.com or visit my websitehttp://www.nileshc.com/ _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Thank you! :)
Simply put, there are two prime components here: 1. The recommendation engine (Myrrix, need to run a .jar file as a daemon, done. Easier than deploying Lucene) 2. Recommendation client (Myrrix has a rich Java API. My current code uses it to provide recommendations. The actual client-side Java API-using code is less than 150 LOC, minus a couple of classes that are injected into the Myrrix daemon.)
Now I have three options here: i) To write a PHP wrapper over (2) ii) I can expose (2) as a REST-based API which can be easily used from PHP code. iii) Completely replace (2) with PHP code.
(i) and (ii) are feasible options. But (iii) would mean rewriting quite a large amount of code/functionality, that's already in the Java API, in PHP. And I can't see any gains from going with (iii) since it wouldn't really help deployment any more than (i) or (ii).
I am a bit busy with my university exams; I will try to deploy this on a VPS, update the repo with some PHP code and a link to the demo, and share it here in a couple of days.
Cheers, Nilesh
On Mon, May 13, 2013 at 4:02 PM, Denny Vrandečić < denny.vrandecic@wikimedia.de> wrote:
That's awesome!
Two things:
- how set are you on a Java-based solution? We would prefer PHP in order to
make it more likely to be deployed.
- could you provide a link to a running demo?
Cheers, Denny
2013/5/13 Nilesh Chakraborty nilesh@nileshc.com
Hi everyone,
I'm working on a prototype for the Wikidata Entity Suggester (Bug #46555https://bugzilla.wikimedia.org/show_bug.cgi?id=46555). As of now, it is a command-line client, completely written in Java, that fetches recommendations from a Myrrix server layer.
Please take a look at the GitHub repository here: https://github.com/nilesh-c/wikidata-entity-suggester/ I would really appreciate it if you can take the time to go through the README and provide me with some much-needed feedback. Any questions or suggestions are welcome. If you're curious, you can set up the whole
thing
on your own machine.
Check out a few examples too: https://github.com/nilesh-c/wikidata-entity-suggester/wiki/Examples
It can suggest properties and values for new/not-yet-created items (and also currently present items), if it's given a few properties/values as input data.
I intend to write a REST API and/or a simple PHP frontend for it before I set it up on a remote VPS, so that everyone can test it out. Some experimentation and quality optimization is also due.
Cheers, Nilesh (User Page - https://www.mediawiki.org/wiki/User:Nilesh.c)
-- A quest eternal, a life so small! So don't just play the guitar, build
one.
You can also email me at contact@nileshc.com or visit my websitehttp://www.nileshc.com/ _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Am 13.05.2013 12:32, schrieb Denny Vrandečić:
That's awesome!
Two things:
- how set are you on a Java-based solution? We would prefer PHP in order to
make it more likely to be deployed.
Just saw that I never replied to this.
I think running Java core on the Wikimedia cluster isn't a problem.
Deploying a servlet however may not be so easy, though probably possible as long as it's internal.
Can someone from ops weigh in on this?
-- daniel
On 05/13/2013 06:11 AM, Nilesh Chakraborty wrote:
Hi everyone,
I'm working on a prototype for the Wikidata Entity Suggester (Bug #46555https://bugzilla.wikimedia.org/show_bug.cgi?id=46555). As of now, it is a command-line client, completely written in Java, that fetches recommendations from a Myrrix server layer.
Is all the Myrrix code you're using open source? It looks like only the "Serving Layer" is, but they also have a proprietary "Computation Layer".
Matt Flaschen
Hi Matt,
Yes, you're right, they are available as separately licensed downloads. Only the stand-alone "Serving Layer" is needed for the Entity Suggester. It's licensed under Apache v. 2.0. Since I'm using the software as-is, without any code modifications, I suppose it's compatible with what Wikidata would allow?
Given the amount of data in the data dump, we won't be needing to use a Hadoop cluster with multiple machines. The proprietary "Computation Layer" is only needed for heavy-weight distributed processing.
Cheers, Nilesh
On Tue, May 14, 2013 at 1:48 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 05/13/2013 06:11 AM, Nilesh Chakraborty wrote:
Hi everyone,
I'm working on a prototype for the Wikidata Entity Suggester (Bug #46555https://bugzilla.wikimedia.org/show_bug.cgi?id=46555). As of now, it is a command-line client, completely written in Java, that fetches recommendations from a Myrrix server layer.
Is all the Myrrix code you're using open source? It looks like only the "Serving Layer" is, but they also have a proprietary "Computation Layer".
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 05/13/2013 04:28 PM, Nilesh Chakraborty wrote:
Hi Matt,
Yes, you're right, they are available as separately licensed downloads. Only the stand-alone "Serving Layer" is needed for the Entity Suggester. It's licensed under Apache v. 2.0. Since I'm using the software as-is, without any code modifications, I suppose it's compatible with what Wikidata would allow?
Apache 2.0-licensed software should be fine, even if you do need/want to modify it.
Matt Flaschen
Hello,
I have some updates on the Entity Suggester prototype. Here are the two repos: 1. https://github.com/nilesh-c/wikidata-entity-suggester 2. https://github.com/nilesh-c/wes-php-client
As it stands now, deployment-wise, I have a single Java war file that's deployed on Tomcat. And there's a PHP client that can be used from PHP code to push data into or fetch suggestions from that engine.
I have made a simple, crude demo that you can access here -http://home.nileshc.com/wesTest.php. You can find the code for it in the wes-php-client repo. It's hosted on my home desktop temporarily. I am having some non-technical problems with the VPS I'm managing and customer support is working on it. After it starts to work, I may try deploying this to the VPS. So, if you have to face an embarrassing 404 page, I'm really sorry, I'll be working on it. If it stays up, well and good. :) http://home.nileshc.com/wesTest.php
You can give it a bunch of property IDs, or a bunch of property-value pairs, or a mix of both; select the the type of recommendation and hit "Get suggestions!" :) Feedback is much appreciated.
Cheers, Nilesh
On Tue, May 14, 2013 at 2:36 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 05/13/2013 04:28 PM, Nilesh Chakraborty wrote:
Hi Matt,
Yes, you're right, they are available as separately licensed downloads. Only the stand-alone "Serving Layer" is needed for the Entity Suggester. It's licensed under Apache v. 2.0. Since I'm using the software as-is, without any code modifications, I suppose it's compatible with what Wikidata would allow?
Apache 2.0-licensed software should be fine, even if you do need/want to modify it.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 05/21/2013 05:29 PM, Nilesh Chakraborty wrote:
You can give it a bunch of property IDs, or a bunch of property-value pairs, or a mix of both; select the the type of recommendation and hit "Get suggestions!" :) Feedback is much appreciated.
It would be good to show the original properties in a separate section on the result screen, so you can compare what you provided to what was suggested.
All should be linked to Wikidata for convenience.
Matt Flaschen
Thanks for the idea! I'll add those pretty soon, shouldn't take much effort.
Cheers, Nilesh
On Wed, May 22, 2013 at 5:16 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 05/21/2013 05:29 PM, Nilesh Chakraborty wrote:
You can give it a bunch of property IDs, or a bunch of property-value pairs, or a mix of both; select the the type of recommendation and hit
"Get
suggestions!" :) Feedback is much appreciated.
It would be good to show the original properties in a separate section on the result screen, so you can compare what you provided to what was suggested.
All should be linked to Wikidata for convenience.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm taking the demo offline for a few hours. It will be back up again in a day's time.
Cheers, Nilesh
On Wed, May 22, 2013 at 6:06 PM, Nilesh Chakraborty nilesh@nileshc.comwrote:
Thanks for the idea! I'll add those pretty soon, shouldn't take much effort.
Cheers, Nilesh
On Wed, May 22, 2013 at 5:16 AM, Matthew Flaschen <mflaschen@wikimedia.org
wrote:
On 05/21/2013 05:29 PM, Nilesh Chakraborty wrote:
You can give it a bunch of property IDs, or a bunch of property-value pairs, or a mix of both; select the the type of recommendation and hit
"Get
suggestions!" :) Feedback is much appreciated.
It would be good to show the original properties in a separate section on the result screen, so you can compare what you provided to what was suggested.
All should be linked to Wikidata for convenience.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- A quest eternal, a life so small! So don't just play the guitar, build one. You can also email me at contact@nileshc.com or visit my websitehttp://www.nileshc.com/
Awesome, that looks already pretty promising!
I am not completely sure I understand a few things:
107----4167410 106 107----215627 156
what do the two properties without a value mean here?
I would have expected:
107----4167410 107----215627
and now ask for suggested values for 31, or for suggested properties to add.
But these are already details. The results seem pretty promising.
How quickly are updates processed by the backend, any idea?
2013/5/21 Nilesh Chakraborty nilesh@nileshc.com
Hello,
I have some updates on the Entity Suggester prototype. Here are the two repos:
As it stands now, deployment-wise, I have a single Java war file that's deployed on Tomcat. And there's a PHP client that can be used from PHP code to push data into or fetch suggestions from that engine.
I have made a simple, crude demo that you can access here -http://home.nileshc.com/wesTest.php. You can find the code for it in the wes-php-client repo. It's hosted on my home desktop temporarily. I am having some non-technical problems with the VPS I'm managing and customer support is working on it. After it starts to work, I may try deploying this to the VPS. So, if you have to face an embarrassing 404 page, I'm really sorry, I'll be working on it. If it stays up, well and good. :) http://home.nileshc.com/wesTest.php
You can give it a bunch of property IDs, or a bunch of property-value pairs, or a mix of both; select the the type of recommendation and hit "Get suggestions!" :) Feedback is much appreciated.
Cheers, Nilesh
On Tue, May 14, 2013 at 2:36 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 05/13/2013 04:28 PM, Nilesh Chakraborty wrote:
Hi Matt,
Yes, you're right, they are available as separately licensed downloads. Only the stand-alone "Serving Layer" is needed for the Entity
Suggester.
It's licensed under Apache v. 2.0. Since I'm using the software as-is, without any code modifications, I suppose it's compatible with what Wikidata would allow?
Apache 2.0-licensed software should be fine, even if you do need/want to modify it.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- A quest eternal, a life so small! So don't just play the guitar, build one. You can also email me at contact@nileshc.com or visit my websitehttp://www.nileshc.com/ _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi Denny,
Thanks! I am not sure how accurate it will be, if it doesn't meet expectations, I might need to think about optimizing the model, different metrics etc.; I haven't really thought about those at the moment.
what do the two properties without a value mean here?
Let me explain those. You can take a look at the wiki pages mentioned on this page - https://github.com/nilesh-c/wikidata-entity-suggester
Currently, things like these are stored on the recommendation engine: 100,32,7 60,151,7 ... 56,152----10256,12 ...
In the first kind you can see, pairs of <item> and <property> are there, along with the relative affinity 7. Now, suppose we have lots of "city" items and their respective properties. Say someone tries adding another item that is a city. Now, as he begins adding properties to that item (properties that generally belong to a city of course), irrespective of whether he enters any values for them or not, the entity suggester will suggest "similar" properties. We are not even talking about "values" here. If the user *does* add values, better recommendations are fetched. This is primarily about fetching recommendations for "properties"
So, if someone starts adding a new city called "Wonderland" and adds properties like "is in the administrative unithttp://www.wikidata.org/wiki/Property:P131" or "head of local government http://www.wikidata.org/wiki/Property:P6", the suggester will tell the user that probably "countryhttp://www.wikidata.org/wiki/Property:P17" and "flag image http://www.wikidata.org/wiki/Property:P41" are some properties that he/she should add. At least that's the idea.
Now, suggesting values - the current implementation of suggesting values is just a side-addition. It might not be really accurate. What I intend to add afterwards is something like this: after the user enters stuff like 41,32,45----462347,.... blah blah, he wants "value" suggestions for property 31, ie. suggesting "values" to properties.
So, in brief, what currently happens: Suggest "property-value" mappings to new "item". Suggest "properties" to new "item". (new item means, anonymous item, an item without an ID, yet to be added)
What I need to add: Suggest "value" to a "property". (This is exactly what you were expecting)
In essence, we combine these 3 types of recommendations and do some magic. I hope this helps you to understand it better. :)
How quickly are updates processed by the backend, any idea?
On an Intel core i5-2500K quad-core machine with 4G RAM, this dataset (17th April - wikidatawiki-20130417-pages-meta-current.xml.bz2http://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages-meta-current.xml.bz2 ):
Data points (pairs) - 8360275 Items - 1965516 Properties and Property-Value pairs - 686318
-- took about 45 mins to build the CSV files, and 15-20 mins to build the Myrrix model. So it's about 1 hour in total. Parallelizing the CSV file building will probably bring that time down a bit, not certain though.
Adding new data (items, properties etc) at runtime is pretty much instantaneous - adding a bunch of 1000 data points will probably take 1 sec, adding 10 data points will be 100ms approximately (including the PHP client's time and all). It's just an estimate from what I've experienced. I haven't done any proper benchmarks myself.
Cheers, Nilesh
On Wed, May 22, 2013 at 5:23 PM, Denny Vrandečić < denny.vrandecic@wikimedia.de> wrote:
Awesome, that looks already pretty promising!
I am not completely sure I understand a few things:
107----4167410 106 107----215627 156
what do the two properties without a value mean here?
I would have expected:
107----4167410 107----215627
and now ask for suggested values for 31, or for suggested properties to add.
But these are already details. The results seem pretty promising.
How quickly are updates processed by the backend, any idea?
2013/5/21 Nilesh Chakraborty nilesh@nileshc.com
Hello,
I have some updates on the Entity Suggester prototype. Here are the two repos:
As it stands now, deployment-wise, I have a single Java war file that's deployed on Tomcat. And there's a PHP client that can be used from PHP
code
to push data into or fetch suggestions from that engine.
I have made a simple, crude demo that you can access here -http://home.nileshc.com/wesTest.php. You can find the code for it in the wes-php-client repo. It's hosted on
my
home desktop temporarily. I am having some non-technical problems with
the
VPS I'm managing and customer support is working on it. After it starts
to
work, I may try deploying this to the VPS. So, if you have to face an embarrassing 404 page, I'm really sorry, I'll be working on it. If it
stays
up, well and good. :) http://home.nileshc.com/wesTest.php
You can give it a bunch of property IDs, or a bunch of property-value pairs, or a mix of both; select the the type of recommendation and hit
"Get
suggestions!" :) Feedback is much appreciated.
Cheers, Nilesh
On Tue, May 14, 2013 at 2:36 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 05/13/2013 04:28 PM, Nilesh Chakraborty wrote:
Hi Matt,
Yes, you're right, they are available as separately licensed
downloads.
Only the stand-alone "Serving Layer" is needed for the Entity
Suggester.
It's licensed under Apache v. 2.0. Since I'm using the software
as-is,
without any code modifications, I suppose it's compatible with what Wikidata would allow?
Apache 2.0-licensed software should be fine, even if you do need/want
to
modify it.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- A quest eternal, a life so small! So don't just play the guitar, build
one.
You can also email me at contact@nileshc.com or visit my websitehttp://www.nileshc.com/ _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org