Hi Lydia,
I am currently drafting my proposal, I shall submit within a few hours once the initial version is complete.
I installed mediawiki-vagrant on my PC and it went quite smoothly. I could do all the usual things through the browser; I logged into the mysql server to examine the database schema.
I also began to clone the wikidata-vagranthttps://github.com/SilkeMeyer/wikidata-vagrant repo. But it seems that the 'git submodule update --init' part would take a long time - if I'm not mistaken, it's a huge download (excluding the vagrant up command, which alone takes around 1.25 hours to download everything). I wanted to clarify something before downloading it all.
Since the entity suggester will be working with wikidata, it'll obviously need to access the whole live dataset from the database (not the xml dump) to make the recommendations. I tried searching for database access APIs or high-level REST APIs for wikidata, but couldn't figure out how I to do that. Could you point me to the proper documentation?
And also, what is the best way to add a few .jar files to wikidata and execute them with custom commands (nohup java blah.jar --blah blah --> running as daemons)? I can of course set it up on my development box inside virtualbox - I want to know how to "integrate" it into the system so that any other user can download vagrant and wikidata and have the jars all ready and running? What is the proper development workflow for this?
Thanks, Nilesh
On Sun, Apr 28, 2013 at 3:01 AM, Nilesh Chakraborty nilesh@nileshc.comwrote:
Awesome. Got it.
I see what you mean, great, thank you. :)
Cheers, Nilesh On Apr 28, 2013 2:56 AM, "Lydia Pintscher" lydia.pintscher@wikimedia.de wrote:
On Sat, Apr 27, 2013 at 11:14 PM, Nilesh Chakraborty nilesh@nileshc.com wrote:
Hi Lydia,
That helps a lot, and makes it way more interesting. Rather than being a one-size-fits-all solution, as it seems to me, each property or each
type
of property (eg. different relationships) will need individual attention and different methods/metrics for recommendation.
The examples you gave, like continents, sex, relations like father/son, uncle/aunt/spouse, or place-oriented properties like place of birth, country of citizenship, ethnic group etc. - each type has a certain
pattern
to it (if a person was born in the US, US should be one of the
countries he
was a citizen of; US census/ethnicity statistics may be used to predict ethnic group etc.) I'm already starting to chalk out a few patterns and
how
they can be used for recommendation. In my proposal, should I go into details regarding these? Or should I just give a few examples and
explain
how the algorithms would work, to explain the idea?
Give some examples and how you'd handle them. You definitely don't need to have it for all properties. What's important is giving an idea about how you'd tackle the problem. Give the reader the impression that you know what you are talking about and can handle the larger problem.
Also: Don't make the system too intelligent like it knowing about US census data for example. Keep it simple and stupid for now. Things like "property A is usually used with value X, Y or Z" should cover a lot already and are likely enough for most cases.
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Technical Projects
Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l