On 9/25/06, Aerik Sylvan aerik@thesylvans.com wrote: [snip]
I got several sets of results in under 1 second (the lowest time being .3 seconds), one query returned in 8 seconds and another in 36 seconds.
[snip]
.. 8 seconds... 36 seconds... ?
Why do we keep shooting ourselves in the head with a toy database?
Here is whats possible with postgresql (this is all hot cache, but it would all be hot cache on production too.. the page table just isn't that big)...
First some setup:
wikidb=> alter table page add column cats text[]; ALTER TABLE wikidb=> update page set cats = array(select cl_to from categorylinks where cl_from=page_id); UPDATE create index page_cat_idx on page using gin(cats); CREATE INDEX
Okay, lets see what this baby can do:
wikidb=> explain analyze select page_title,page_namespace,cats from page where cats @> '{Living_people,ABBA_members}'; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on page (cost=104.37..17426.81 rows=4760 width=119) (actual time=11.900..11.912 rows=4 loops=1) Recheck Cond: (cats @> '{Living_people,ABBA_members}'::pg_catalog.text[]) -> Bitmap Index Scan on page_cat_idx (cost=0.00..104.37 rows=4760 width=0) (actual time=11.886..11.886 rows=4 loops=1) Index Cond: (cats @> '{Living_people,ABBA_members}'::pg_catalog.text[]) Total runtime: 11.973 ms (5 rows)
11.973ms, cool, and the results:
wikidb=> select page_title,page_namespace,cats from page where cats @> '{Living_people,ABBA_members}'; page_title | page_namespace | cats --------------------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------- Benny_Andersson | 0 | {1946_births,ABBA_members,Living_people,Musical_theatre_composers,Pop_pianists,Swedish_pianists,Swedish_songwriters} Björn_Ulvaeus | 0 | {1945_births,ABBA_members,Humanists,Living_people,Musical_theatre_composers,People_from_Gothenburg,Swedish_songwriters} Anni-Frid_Lyngstad | 0 | {1945_births,ABBA_members,Anni-Frid_Lyngstad,German_nobility,Living_people,Norwegian_musicians,Princesses,Swedish_female_singers,Swedish_pop_singers} Agnetha_Fältskog | 0 | {1950_births,ABBA_members,Agnetha_Fältskog,Living_people,People_from_Jönköping,Swedish_female_singers,Swedish_pop_singers} (4 rows)
Lets be mean:
wikidb=> select count(1) from page where cats @> '{Living_people}'; count -------- 112153 (1 row) Total runtime: 1156.697 ms
And mean some more:
wikidb=> select count(1) from page where cats @> '{GFDL_images}'; count -------- 100803 (1 row) Total runtime: 987.029 ms
If we tried the following in mysql it might take hours:
wikidb=> select count(1) from page where cats @> '{Living_people,GFDL_images}'; count ------- 0 (1 row) Total runtime: 25.201 ms
Hmm... 25ms < hours.
Enough people...
wikidb=> select count(1) from page where cats @> '{Unprintworthy_redirects}'; count ------- 14602 (1 row) Total runtime: 69.335 ms wikidb=> select count(1) from page where cats @> '{Unprintworthy_redirects,Redirects_from_plurals}'; count ------- 2415 (1 row) Total runtime: 12.002 ms
Hmm, what redirects from plurs are print worthy? :)
wikidb=> select page_namespace,page_title from page where cats @> '{Redirects_from_plurals}' and not cats @> '{Unprintworthy_redirects}' ; page_namespace | page_title ----------------+--------------------------------------------- 0 | Guide_dogs 4 | WikiProject_Red_Link_Recovery/Pluralisation 0 | Assistance_dogs 0 | Hearing_dogs (4 rows) Total runtime: 3.867 ms
Okay, how about some useful trivia:
wikidb=> select page_title from page where cats @> '{1982_deaths,American_novelists}'; page_title ------------------------- Ayn_Rand Philip_K._Dick Djuna_Barnes John_Cheever John_Gardner William_R._Burnett William_P._McGivern Theresa_Hak_Kyung_Cha Margaret_Culkin_Banning Beirne_Lay,_Jr. (10 rows) Total runtime: 0.417 ms
Wow, realistic queries go quick ... I bet PHP takes longer to *start* than that took to complete..
wikidb=> select page_title from page where cats @> '{American_philosophers,American_physicists}'; page_title ------------ (0 rows) Total runtime: 0.205 ms
So... submilisecond on normal queries.. Worst case at a second.. Cases which are evil for inner join are no issue..
Hmmmmm.......
How else do little choices like our database platform hold us back?