[WikiEN-l] Category analysis with perl
Anthony DiPierro
wikilegal at inbox.org
Mon Jun 5 10:45:32 UTC 2006
On 6/5/06, Steve Bennett <stevagewp at gmail.com> wrote:
> On 6/5/06, Anthony DiPierro <wikilegal at inbox.org> wrote:
> > You could always put "See also: [[:Category:The Beatles]]" (I think
> > that's the syntax) in the description for [[Category:British rock
> > bands]].
>
> That's probably not bad. The page would have a basic structure like:
>
> Description
> Related categories <-- new
> Subcategories
> Articles in this category
>
> People *want* to put "related categories", but they break the category
> system if they make them subcats. We need to channel that desire.
>
> Steve
While playing around with perl and [[Category:Airports]] last night I
noticed an instance of what will probably be another common one:
[[Category:Lists of airports]] is a subcategory. There are probably
enough instances of that sort of thing that we would have to make an
exception (in the interest of consensus), but the cleaner solution
would be to go with something like the related categories idea above.
I should also note that [[Category:Airports]] itself is treated like a
theme for articles, but the subcategories are treated like attributes.
(Actually, now that I look at it directly on Wikipedia I see "Airport
lounges" and "Airport operators" are also subcategories. I didn't
notice that in my original, very fast, skim of the tree, but looking
back it *was* there).
If anyone wants to take a look at my "tree" for airports, or my perl
script (which is really simplistic), let me know where I can upload
it. To run it you need to download and import two mysql database
files, enwiki-20060518-categorylinks.sql (290 megs) and
enwiki-20060518-page.sql (354 megs). The "tree" looks like this:
Airports_by_country
*Airports_in_Afghanistan
*#Bagram_Air_Base
*#Kabul_International_Airport
*Airports_in_Albania
*#Rinas_Mother_Teresa_Airport
[...]
*Airports_in_Australia
**Royal_Australian_Air_Force_bases
***Former_RAAF_Bases
***#RAAF_Station_Archerfield
***#RAAF_Station_Bairnsdale
***#RAAF_Base_Rathmines
***#RAAF_Station_Tocumwal
**#RAAF_Base_Amberley
**#RAAF_Bare_Bases
Are air force bases airports? I'd say so.
I also tried making a tree for [[Category:Buildings and structures]]
(a "parent" of airports). That one grew quite messy, and my script
bailed at 10 levels of recursion, so I haven't really analysed it
much. I'll try turning it up to 20 or 25 and see what happens.
Anthony
More information about the WikiEN-l
mailing list