if you’re not interested in actual topic extraction a good heuristic to identify high-level topic areas is to rely on Wikiprojects on the English Wikipedia and then use language links from Wikidata to apply them to other languages. That won’t immediately cover articles that only exist in one language, but it’s the most effective heuristic I can think of for your use case.

Dario

On Mar 17, 2014, at 8:21 AM, Amir E. Aharoni <amir.aharoni@mail.huji.ac.il> wrote:

Hallo,

Is there any known easy way to classify Wikipedia articles into a relatively small number of types?

By "relatively small" I mean no more than twenty, and by "types" I mean things that are intuitively clear to readers, for example:
* Biographies
* Articles about scientific phenomena (can be sub-grouped to math, astronomy, physics, geology, medicine)
* Articles about works of art (paintings, movies, books, records, statues)
* Articles about places
* Articles about historical events
* Articles about biological species
* Articles that mostly present data, such as demography or results of competitions (sports, elections, game shows)

There are a few more, but not much. I hope that you get the idea.

We have categories, but I'm not sure that it's easy to use categories for such things because of the very loose category structure. For example, [[Eurovision 2007]] is somewhere under [[Category:Humans]], even though it's not an article about a human.

Such information can be useful for study about the types of articles that different people write. In particular, I thought about it in the context of analyzing the types of articles that people are translating now (manually) and will translate in the future using the ContentTranslation, which is in its early stages of development.

Thanks,

--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
‪“We're living in pieces,
I want to live in peace.” – T. Moore‬
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l