It's not so much "topics", as I wrote in the email subject, but more like "types", as I wrote in the email body. Sorry about the confusion.
We are getting serious about analyzing how do people translate articles. Basically, all articles are worth translating, but we may find, for example, that Wikipedia has 60% biographies, 30% articles about places and 10% articles about math, but of the translated articles, 80% are about places, and biographies and and math are 10% each. So if this will be the case, we may want to understand why don't people translate articles about biographies and math more - are they simply less interesting? is it harder for some social reason? for some technical reason? If there is something that we can do to make translation easier, we may want to do it.
This example is, of course, highly simplified and the numbers are completely made up, but I hope that it explains the intention.
Now when I say "biographies", "articles about places" and "articles about math", it's immediately clear and intuitive to a person what I'm talking about. I am asking whether there is a known easy way for software to understand such things.
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2014-03-18 14:34 GMT+02:00 Ziko van Dijk zvandijk@gmail.com:
Hello Amir, The question rising would be for me: what do you use the classification for? Depending on that you can get a lot different answers. The biography of Otto von Bismarck may be in the category "history", the biography of Justin Bieber in "entertainment". Kind regards Ziko
2014-03-18 8:31 GMT+01:00 Maik Anderka maik.anderka@uni-paderborn.de:
Dear Amir,
two years ago, we have utilized Wikipedia categories to analyze the distribution of articles over a set of main topics. We used the 24 direct subcategories of "Category:Main topic classifications" as main topics.
For
further information, see Section 4.2 in this paper:
http://www.uni-weimar.de/medien/webis/publications/papers/stein_2012d.pdf
Best regards, Maik
-- Maik Anderka Research Group "Knowledge-Based Systems" Department of Computer Science University of Paderborn, Germany http://www.uni-paderborn.de/cs/ag-klbue
Am 17.03.2014 16:21, schrieb Amir E. Aharoni:
Hallo,
Is there any known easy way to classify Wikipedia articles into a
relatively
small number of types?
By "relatively small" I mean no more than twenty, and by "types" I mean things that are intuitively clear to readers, for example:
- Biographies
- Articles about scientific phenomena (can be sub-grouped to math,
astronomy, physics, geology, medicine)
- Articles about works of art (paintings, movies, books, records,
statues)
- Articles about places
- Articles about historical events
- Articles about biological species
- Articles that mostly present data, such as demography or results of
competitions (sports, elections, game shows)
There are a few more, but not much. I hope that you get the idea.
We have categories, but I'm not sure that it's easy to use categories for such things because of the very loose category structure. For example, [[Eurovision 2007]] is somewhere under [[Category:Humans]], even though
it's
not an article about a human.
Such information can be useful for study about the types of articles that different people write. In particular, I thought about it in the context
of
analyzing the types of articles that people are translating now
(manually)
and will translate in the future using the ContentTranslation, which is
in
its early stages of development.
Thanks,
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l