Also, important categories like Computer Architechture, Human based computation, Programming language theory, Software Engineering, and Theory of Computation, are missing from the subcategories of Areas of Computer Science.Regards,Shubhanshu MishraResearch Assistant,iSchool at University of Illinois at Urbana-Champaign-------------------------------------------------- Website: http://shubhanshu.comLinkedIn Profile: http://www.linkedin.com/in/shubhanshumishra On Wed, Nov 1, 2017 at 10:42 AM, Shubhanshu Mishra <shubhanshumishra@gmail.com> wrote:Hi,When using the wikipedia dump files, I am unable to find many categories and pages in the dump.E.g. under the Areas_of_computer_science category I get only 13 subcategories and 2 pages instead of 17 subcategories, 2 pages. Furthermore, 1 page "Computational_creativity" is not present as a subcategory.I am using the following wikipedia dump files to extract the categorylinks, and page details:1.6G Sep 21 00:45 enwiki-20170920-page.sql.gz21M Sep 21 00:45 enwiki-20170920-category.sql.gz 113M Sep 21 00:55 enwiki-20170920-redirect.sql.gz 2.2G Sep 21 03:10 enwiki-20170920-categorylinks.sql.gz 221M Sep 21 03:13 enwiki-20170920-page_props.sql.gz I use https://github.com/napsternxg/WikiUtils to parse the sql.gz dump files, but I also tried searching in the sql.gz files and couldn't find any entry for 16300571 in the page.sql.gz and in category.sql.gz files. 16300571 supposedly refers to the Computational_creativity page as the following categories are linked to this page:16300571 'All_NPOV_disputes' 'page'16300571 'All_articles_needing_additional_references' 'page' 16300571 'All_articles_with_dead_external_links' 'page' 16300571 'All_articles_with_unsourced_statements' 'page' 16300571 'Areas_of_computer_science' 'page'16300571 'Articles_needing_additional_references_from_May_2013' 'page' 16300571 'Articles_with_French-language_external_links' 'page' 16300571 'Articles_with_dead_external_links_from_November_2016' 'page' 16300571 'Articles_with_permanently_dead_external_links' 'page' 16300571 'Articles_with_unsourced_statements_from_April_2015' 'page' 16300571 'Articles_with_unsourced_statements_from_April_2016' 'page' 16300571 'Articles_with_unsourced_statements_from_December_2015' 'page' 16300571 'Articles_with_unsourced_statements_from_January_2010' 'page' 16300571 'Articles_with_unsourced_statements_from_October_2016' 'page' 16300571 'Artificial_intelligence' 'page'16300571 'Arts' 'page'16300571 'CS1_maint:_Extra_text:_authors_list' 'page' 16300571 'Cognitive_psychology' 'page'16300571 'Computational_fields_of_study' 'page' 16300571 'Creativity_techniques' 'page'16300571 'NPOV_disputes_from_January_2013' 'page' 16300571 'Philosophical_movements' 'page'16300571 'Webarchive_template_wayback_links' 'page' 16300571 'Wikipedia_articles_needing_clarification_from_November_2008 ' 'page' More details can be found at: https://twitter.com/TheShubhanshu/status/925736635572072 449 Is there something, I am doing wrong, or are these rows just missing from the dumps.Regards,Shubhanshu MishraResearch Assistant,iSchool at University of Illinois at Urbana-Champaign-------------------------------------------------- Website: http://shubhanshu.comLinkedIn Profile: http://www.linkedin.com/in/shubhanshumishra
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics