Hey everyone,
I have the feeling it would be good to make an official introduction. Nilesh has been working on the Wikidata entity suggester. There is now a team of students who are working on the entity suggester to get it finished and ready for production as part of their bachelor project. It would be good if you could work together and coordinate on the public wikidata-tech list. I'm sure with you all working together we can provide the Wikidata community with the great entity suggester they are waiting for. Virginia and co: Are you still having issues with the data import? Maybe Nilesh can help you with that as a first good step.
Cheers Lydia
Hello Nilesh!
Good to hear from you. I was off for a couple of days, and asked Lydia to make introductions. Thanks Lydia!
A quick heads up:
The architecture we have discussed with the team at the HPI is a bit different from what we designed for the GSoC project. The idea is to have a MediaWiki extension that relies directly on the data in a MySQL table, and generates suggestions from that. It does not care where the data comes from, so the database table(s) server as an interface between the "front" (mediawiki) part and the "back" (data analysis) part. This has two advantages: 1) front and back are decoupled and only have to agree on the structure and interpretation of the data in the database (this is the current TODO). 2) No new services need to be deployed in the public-facing subnet.
I think your expertise with data ingestion could help the folks at the HPI quite a bit. Also, the modular architecture allows for data analysis components to be swapped out easily, and we would like to try and compare different approaches for data analysis. One based on Hadoop and/or Myrrix could well be an option - though I'm not sure whether Myrrix would be very useful, since the actual generation of suggestions from the pre-processed data would already be covered.
This is just an idea, I think you can best figure things out among yourself.
Cheers, Daniel
Am 25.11.2013 17:01, schrieb Lydia Pintscher:
Hey everyone,
I have the feeling it would be good to make an official introduction. Nilesh has been working on the Wikidata entity suggester. There is now a team of students who are working on the entity suggester to get it finished and ready for production as part of their bachelor project. It would be good if you could work together and coordinate on the public wikidata-tech list. I'm sure with you all working together we can provide the Wikidata community with the great entity suggester they are waiting for. Virginia and co: Are you still having issues with the data import? Maybe Nilesh can help you with that as a first good step.
Cheers Lydia
Hey Daniel!
The architecture we have discussed with the team at the HPI is a bit different from what we designed for the GSoC project. The idea is to have a MediaWiki extension that relies directly on the data in a MySQL table, and generates suggestions from that. It does not care where the data comes from, so the database table(s) server as an interface between the "front" (mediawiki) part and the "back" (data analysis) part. This has two advantages: 1) front and back are decoupled and only have to agree on the structure and interpretation of the data in the database (this is the current TODO). 2) No new services need to be deployed in the public-facing subnet.
This is great. Makes me feel like "why didn't I think about this!". Less coupling!
I think your expertise with data ingestion could help the folks at the HPI quite a bit. Also, the modular architecture allows for data analysis components to be swapped out easily, and we would like to try and compare different approaches for data analysis.
Brilliant.
One based on Hadoop and/or Myrrix could well be an option
though I'm not sure whether Myrrix would be very useful, since the actual generation of suggestions from the pre-processed data would already be covered.
I see. So in this case we're not needing the real-time fetching of suggestions from a Java web-service. Rather, the backend part (the data analysis component) will be something that parses the datasets, performs analyses (collaborative filtering, something else, anything) to generate data that'll be pushed directly to a MySQL database (in a certain format that will be agreed upon by both the frontend API and the data analysis module).
You're right, Myrrix won't be needed to run as a service. We can still use it though as a command line program to generate suggestions and store them in the MySQL DB. (or use Mahout, or any machine learning library that we decide on). Hadoop is just for making things faster.
Please let's have a proper discussion on this on IRC and get a bit of planning done, get everyone on the same page, including what data analysis methods we'd like to explore. This week I have some time on my hands from 29th Nov to 1st Dec, and I'm free from 13th Dec (holidays!!).
Cheers, Nilesh
This is just an idea, I think you can best figure things out among yourself.
Cheers, Daniel
Am 25.11.2013 17:01, schrieb Lydia Pintscher:
Hey everyone,
I have the feeling it would be good to make an official introduction. Nilesh has been working on the Wikidata entity suggester. There is now a team of students who are working on the entity suggester to get it finished and ready for production as part of their bachelor project. It would be good if you could work together and coordinate on the public wikidata-tech list. I'm sure with you all working together we can provide the Wikidata community with the great entity suggester they are waiting for. Virginia and co: Are you still having issues with the data import? Maybe Nilesh can help you with that as a first good step.
Cheers Lydia
That sounds very interesting. Would a use case be that on the creation of a new article in wikipedia to suggest an existing wikidata entry it might be connected to? Am 25.11.2013 17:16 schrieb "Daniel Kinzler" daniel.kinzler@wikimedia.de:
Hello Nilesh!
Good to hear from you. I was off for a couple of days, and asked Lydia to make introductions. Thanks Lydia!
A quick heads up:
The architecture we have discussed with the team at the HPI is a bit different from what we designed for the GSoC project. The idea is to have a MediaWiki extension that relies directly on the data in a MySQL table, and generates suggestions from that. It does not care where the data comes from, so the database table(s) server as an interface between the "front" (mediawiki) part and the "back" (data analysis) part. This has two advantages: 1) front and back are decoupled and only have to agree on the structure and interpretation of the data in the database (this is the current TODO). 2) No new services need to be deployed in the public-facing subnet.
I think your expertise with data ingestion could help the folks at the HPI quite a bit. Also, the modular architecture allows for data analysis components to be swapped out easily, and we would like to try and compare different approaches for data analysis. One based on Hadoop and/or Myrrix could well be an option - though I'm not sure whether Myrrix would be very useful, since the actual generation of suggestions from the pre-processed data would already be covered.
This is just an idea, I think you can best figure things out among yourself.
Cheers, Daniel
Am 25.11.2013 17:01, schrieb Lydia Pintscher:
Hey everyone,
I have the feeling it would be good to make an official introduction. Nilesh has been working on the Wikidata entity suggester. There is now a team of students who are working on the entity suggester to get it finished and ready for production as part of their bachelor project. It would be good if you could work together and coordinate on the public wikidata-tech list. I'm sure with you all working together we can provide the Wikidata community with the great entity suggester they are waiting for. Virginia and co: Are you still having issues with the data import? Maybe Nilesh can help you with that as a first good step.
Cheers Lydia
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
On Mon, Nov 25, 2013 at 11:41 PM, rupert THURNER rupert.thurner@gmail.com wrote:
That sounds very interesting. Would a use case be that on the creation of a new article in wikipedia to suggest an existing wikidata entry it might be connected to?
No this is unrelated.
Cheers Lydia
Hello everyone,
I think it'd be a pleasure to work with the team. This is the first time I'm hearing of this development, probably because I've been a bit out of the loop since last month. I've been busy with academic assignments, projects and exams in college among other things. I haven't been able to work on the Suggester during this period.
After 13th December, I'll be at my leisure and intend to get back to working on the entity suggester. I think it's a good idea to set a date and have an IRC meeting with the student team, Lydia, Daniel et al., to figure out where the project is standing, what needs to be done - enumerate the development goals (if they are supposed to be changed, something added/removed) and deployment goals, what the timeline will be, blah blah.
I'm looking forward to this.
Cheers, Nilesh
On 11/25/13, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey everyone,
I have the feeling it would be good to make an official introduction. Nilesh has been working on the Wikidata entity suggester. There is now a team of students who are working on the entity suggester to get it finished and ready for production as part of their bachelor project. It would be good if you could work together and coordinate on the public wikidata-tech list. I'm sure with you all working together we can provide the Wikidata community with the great entity suggester they are waiting for. Virginia and co: Are you still having issues with the data import? Maybe Nilesh can help you with that as a first good step.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
wikidata-tech@lists.wikimedia.org