Is there any way to distinguish between categories like History, or Literature for example, and what I would think of as categories that are used for internal housekeeping like "Unprintworthy_redirects" or "Nonindexed_pages"? They're not hidden categories, but conceptually there is a clear difference between housekeeping categories and categories that define fields of knowledge. But is there anything in the tables that distinguishes them?
Thanks,
Robert
Both of the categories you mentioned *are* hidden, so I think you can use that.
Petr Onderka [[en:User:Svick]]
On Wed, Feb 13, 2013 at 6:24 PM, Robert Crowe robert@ourwebhome.com wrote:
Is there any way to distinguish between categories like History, or Literature for example, and what I would think of as categories that are used for internal housekeeping like "Unprintworthy_redirects" or "Nonindexed_pages"? They're not hidden categories, but conceptually there is a clear difference between housekeeping categories and categories that define fields of knowledge. But is there anything in the tables that distinguishes them?
Thanks,
Robert
Sorry, those were poorly chosen examples. Here are some better ones:
Copy_to_Wikimedia_Commons_(bot-assessed) Stub-Class_biography_articles Automatically_assessed_biography_articles WikiProject_Disambiguation_pages Creative_Commons_Attribution-ShareAlike_3.0_files
I don't think that any of these are hidden, at least by looking for them in page_props. These are all from the enwiki-20121201 dump.
Thanks,
Robert
-----Original Message----- From: Petr Onderka [mailto:gsvick@gmail.com] Sent: Wednesday, February 13, 2013 9:34 AM To: Robert Crowe Cc: xmldatadumps-l@lists.wikimedia.org Subject: Re: [Xmldatadumps-l] Housekeeping categories?
Both of the categories you mentioned *are* hidden, so I think you can use that.
Petr Onderka [[en:User:Svick]]
On Wed, Feb 13, 2013 at 6:24 PM, Robert Crowe robert@ourwebhome.com wrote:
Is there any way to distinguish between categories like History, or Literature for example, and what I would think of as categories that are used for internal housekeeping like "Unprintworthy_redirects" or "Nonindexed_pages"? They're not hidden categories, but conceptually there is a clear difference between housekeeping categories and categories that define fields of knowledge. But is there anything in the tables that distinguishes them?
Thanks,
Robert
I don't think there's any simple/reliable way: your only option is probably crossing the whole category tree and find out whether a category is not a (sub-){1,1000000}category of https://en.wikipedia.org/wiki/Category:Articles or equivalent... and hope there are not too many loops!
Nemo
Copy_to_Wikimedia_Commons_(bot-assessed) Creative_Commons_Attribution-ShareAlike_3.0_files
These contain only files.
Stub-Class_biography_articles Automatically_assessed_biography_articles WikiProject_Disambiguation_pages
These contain only talk pages.
Also, all of them are indirect subcategories of Category:Wikipedia administration.
Neither of these might be easy to find out for your, but I think at least one of them should be doable.
and hope there are not too many loops!
Just 2500 of them. [1]
[1]: http://en.wikipedia.org/wiki/Wikipedia:Dump_reports/Category_cycles
Petr Onderka [[en:User:Svick]]
So I guess ideally we would create a table that lists all the subcategories of Category:Wikipedia administration. Are there any subcategories that are not housekeeping? I could try to write some code for that, but I don't have any idea what the check in process is like to include it in the build target.
Thanks,
Robert
-----Original Message----- From: Petr Onderka [mailto:gsvick@gmail.com] Sent: Wednesday, February 13, 2013 1:14 PM To: Robert Crowe Cc: xmldatadumps-l@lists.wikimedia.org Subject: Re: [Xmldatadumps-l] Housekeeping categories?
Copy_to_Wikimedia_Commons_(bot-assessed) Creative_Commons_Attribution-ShareAlike_3.0_files
These contain only files.
Stub-Class_biography_articles Automatically_assessed_biography_articles WikiProject_Disambiguation_pages
These contain only talk pages.
Also, all of them are indirect subcategories of Category:Wikipedia administration.
Neither of these might be easy to find out for your, but I think at least one of them should be doable.
and hope there are not too many loops!
Just 2500 of them. [1]
[1]: http://en.wikipedia.org/wiki/Wikipedia:Dump_reports/Category_cycles
Petr Onderka [[en:User:Svick]]
I tried finding all the subcategories of Category:Wikipedia_administration, but unfortunately that includes many non-administration categories also.
Will administration categories be limited to those that contain only:
- Categories - Files - Talk pages
Or are there other defining characteristics? Or are administration categories simply any category that is not a descendent of Category:Articles?
Thanks,
Robert
-----Original Message----- From: Petr Onderka [mailto:gsvick@gmail.com] Sent: Wednesday, February 13, 2013 1:14 PM To: Robert Crowe Cc: xmldatadumps-l@lists.wikimedia.org Subject: Re: [Xmldatadumps-l] Housekeeping categories?
Copy_to_Wikimedia_Commons_(bot-assessed) Creative_Commons_Attribution-ShareAlike_3.0_files
These contain only files.
Stub-Class_biography_articles Automatically_assessed_biography_articles WikiProject_Disambiguation_pages
These contain only talk pages.
Also, all of them are indirect subcategories of Category:Wikipedia administration.
Neither of these might be easy to find out for your, but I think at least one of them should be doable.
and hope there are not too many loops!
Just 2500 of them. [1]
[1]: http://en.wikipedia.org/wiki/Wikipedia:Dump_reports/Category_cycles
Petr Onderka [[en:User:Svick]]
Robert Crowe, 23/02/2013 21:58:
I tried finding all the subcategories of Category:Wikipedia_administration, but unfortunately that includes many non-administration categories also.
Will administration categories be limited to those that contain only:
- Categories
- Files
- Talk pages
Or are there other defining characteristics?
There are. I doubt there's a precise definition, though.
Or are administration categories simply any category that is not a descendent of Category:Articles?
In theory, yes. But the category tree is not even a tree, so all sorts of things can happen...
Nemo
So if I start at Category:Articles and recurse "downward" into all category members, will it include all pages that are not administration pages? Or do I have to also recurse "upward", keeping in mind that it's a directed graph and not a tree?
Thanks,
Robert
-----Original Message----- From: Federico Leva (Nemo) [mailto:nemowiki@gmail.com] Sent: Saturday, February 23, 2013 1:17 PM To: Robert Crowe Cc: xmldatadumps-l@lists.wikimedia.org Subject: Re: [Xmldatadumps-l] Housekeeping categories?
Robert Crowe, 23/02/2013 21:58:
I tried finding all the subcategories of Category:Wikipedia_administration, but unfortunately that includes many non-administration categories also.
Will administration categories be limited to those that contain only:
- Categories
- Files
- Talk pages
Or are there other defining characteristics?
There are. I doubt there's a precise definition, though.
Or are administration categories simply any category that is not a descendent of Category:Articles?
In theory, yes. But the category tree is not even a tree, so all sorts of things can happen...
Nemo
On article pages the only non-hidden administrative categories should be stub categories and "Living people" over which there is a dispute.
On 23/02/2013 21:31, Robert Crowe wrote:
So if I start at Category:Articles and recurse "downward" into all category members, will it include all pages that are not administration pages? Or do I have to also recurse "upward", keeping in mind that it's a directed graph and not a tree?
Thanks,
Robert
-----Original Message----- From: Federico Leva (Nemo) [mailto:nemowiki@gmail.com] Sent: Saturday, February 23, 2013 1:17 PM To: Robert Crowe Cc: xmldatadumps-l@lists.wikimedia.org Subject: Re: [Xmldatadumps-l] Housekeeping categories?
Robert Crowe, 23/02/2013 21:58:
I tried finding all the subcategories of Category:Wikipedia_administration, but unfortunately that includes many non-administration categories also.
Will administration categories be limited to those that contain only:
- Categories
- Files
- Talk pages
Or are there other defining characteristics?
There are. I doubt there's a precise definition, though.
Or are administration categories simply any category that is not a descendent of Category:Articles?
In theory, yes. But the category tree is not even a tree, so all sorts of things can happen...
Nemo
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
xmldatadumps-l@lists.wikimedia.org