On Sun, Jun 13, 2010 at 5:55 AM, 杨杰 <xtyangjie(a)gmail.com> wrote:
1. what is the category (or categories) of a web page
(an article)?
eg. once I can get the two tips, the information is enough.
a. Web page P1 belongs to category C1;
b. Category C1 is under two parent categories CC1 and CC2, while
the two categories own their parent category chains seperately.
Then I can build a tree, which leaves are the web pages.
You can use API [1] function "prop=categories" to query any pages. Or you
could obtain a database dump [2] and query the `categorylinks` table.
1.
http://en.wikipedia.org/w/api.php
2.
http://dumps.wikimedia.org/backup-index.html
2. how do guys in wikipedia deal with the category
work upon the huge
amount of articles, for example, category method, level or inheritance
between categories.
They are stored in MySQL, see [3] and [4].
3.
http://www.mediawiki.org/wiki/Manual:Category_table
4.
http://www.mediawiki.org/wiki/Manual:Categorylinks_table
Department of Computer Science and Technology, Xi’an
Jiaotong University
I'm in Xi'an too :P
--
Jimmy Xu