> With a brief discussion about preserving privacy in aggregate data,
> randomizing test and control samples, and a tweak to allow web forms
> on pages that are aware of your wikipedia userid, we could have a
> simple projects-wide survey completed within a month. Let's make this
> a priority and make such a thing happen -- then figure out how to
> optimize future iterations.
> The latest discussions on meta are here:
SJ, great to hear you welcome the survey. After Wikimania 2005 the project
because I had too many other WM obligations and a not so good winter
Wikimania 2006 gave the project new elan and now someone else will code it.
Technical design has started but needs some more work:
I'll make a mockup input script for the form generator.
Programming will start in a reasonable time frame, see Kevin's earlier post.
I'm not so sure this takes only one month :(
Best after single login is active, in a few weeks time?
2 Anonimisation of results
May need some more thinking, this is sensitive matter
We had a heated debate about this in Frankfurt, we'll probably get into
when we have a proof of concept, and more people show up to give feedback.
3 Translation issues
A Mediawiki wide survey needs to be held in many languages to reduce bias
where opinions are asked.
4 Results should be script-processable, e.g. no free format feedback.
Thus all answers should be on a numeric scale or predefined
(e.g. country numbers instead of country names in all esoteric languages
that no script can handle)
Because of 3 a survey form needs to be built dynamically.
No English/German/Japanese/etc texts intermingled with PHP script.
That would be a maintenance nightmare.
Depending on how much time the programmer can spend on the project, we could
probably show an alpha version about 4 weeks after he starts.
Then start major discussion on final questions (this will work better when
people see a alpha version to play with),
and finally freeze questions and invite translators.
I would be happy if we did a major survey in November/December.
Please don't ask for quick hacks. I know all this sounds like an invitation
for some self-proclaimed code magician to make something barely functional
in a weekend, pronounce the job done and then leave the 'dirty details'
(usually 80% of what needs to be done) for others to clean up. I'd rather
see to it that the first version is usable and a good platform for future
reuse and extension.
Erik Zachte :)
I want to introduce myself - I'm Rut Jesus (vulpeto) - portuguese but
studying in Copenhagen. I am starting a PhD under the title 'Cooperation in
Emergent Cognition of Socio-Technological Networks'. It's quite
interdisciplinary and it can (still) go in many different directions.
I've studied Physics and Philosophy before - and now I'm again at their
crossroads: at the Center for Philosophy of Nature and Science Studies
(humanities'ish) which is at the Niels Bohr Institute (physics'ish).
For now I am mostly interested in being an observer - but certainly asking
questions, perhaps making interviews, engaging in discussions, etc - but
later I would also like to collaborate - specially in bringing some of the
discussions/good-ideas/practices to wikipedias smaller than the english one
(probably esperanto, perhaps portuguese, perhaps danish).
A virtual hug.
Hello everybody !
I'm a graduate student at the Institute of Political Studies, Lyon, France,
and I'm writing my Ma thesis about philosophical and political grounds of
Wikipedia. The subject sounds broad, it surely is, and I'll try to make it
more precise in the future. I hope this mailing list will be a source of
help and discovery for each other. Last, please excuse my english, I 'll try
to improve it ! ;)
We would like to announce a new research paper that uses Wikipedia
for computing semantic relatedness of natural language texts.
Evgeniy Gabrilovich and Shaul Markovitch (2007).
''Computing Semantic Relatedness using Wikipedia-based Explicit Semantic
Proceedings of The 20th International Joint Conference on Artificial
Hyderabad, India, January 2007
Computing semantic relatedness of natural language texts requires
access to vast amounts of common-sense and domain-specific world
knowledge. We propose Explicit Semantic Analysis (ESA), a novel
method that represents the meaning of texts in a high-dimensional
space of concepts derived from Wikipedia. We use machine learning
techniques to explicitly represent the meaning of any text as a
weighted vector of Wikipedia-based concepts. Assessing the
relatedness of texts in this space amounts to comparing the
corresponding vectors using conventional metrics (e.g., cosine).
Compared with the previous state of the art, using ESA results in
substantial improvements in correlation of computed relatedness
scores with human judgments: from r=0.56 to 0.75 for individual
words and from r=0.60 to 0.72 for texts. Importantly, due to the use
of natural concepts, the ESA model is easy to explain to human users.
Ph.D. student in Computer Science
Department of Computer Science, Technion - Israel Institute of Technology
Technion City, Haifa 32000, Israel
Email: gabr(a)cs.technion.ac.il WWW: http://www.cs.technion.ac.il/~gabr