priyank bagrecha wrote:
I am trying to pull up a list of pageids from the snapshot, which belong to a specific category. Basically I am trying to pull up pages which are on book portal pages. I looked at the snapshot mysql db to see which tables i can use. but the fields of tables category and category_links didnt make any sense to me in regard to what they stand for. so i was wondering if somebody could help me with the sql.
Example category: "Mathematics" Example namespace: "Portal" (100)
SELECT page_namespace, page_title FROM page JOIN categorylinks ON cl_from = page_id WHERE cl_to = 'Mathematics' AND page_namespace = 100 AND page_is_redirect = 0 /* optional */;
* http://www.mediawiki.org/wiki/Manual:Page_table * http://www.mediawiki.org/wiki/Manual:Categorylinks_table * http://www.mediawiki.org/wiki/Manual:Category_table
You don't really need the category table in this case because you just want relationships (links). If you wanted meta-data about particular categories, you could (left) join against the category table on cat_name = cl_to.
In order to figure out which namespace ID (integer) corresponds to your target namespace, you can check the wiki configuration (LocalSettings.php and DefaultSettings.php) or the wiki's API.
MZMcBride