On 9/25/06, Aerik Sylvan <aerik(a)thesylvans.com> wrote:
[snip]
I got several sets of results in under 1 second (the
lowest time being
.3 seconds), one query returned in 8 seconds and another in 36
seconds.
[snip]
..
8 seconds...
36 seconds...
?
Why do we keep shooting ourselves in the head with a toy database?
Here is whats possible with postgresql (this is all hot cache, but it
would all be hot cache on production too.. the page table just isn't
that big)...
First some setup:
wikidb=> alter table page add column cats text[];
ALTER TABLE
wikidb=> update page set cats = array(select cl_to from categorylinks
where cl_from=page_id);
UPDATE
create index page_cat_idx on page using gin(cats);
CREATE INDEX
Okay, lets see what this baby can do:
wikidb=> explain analyze select page_title,page_namespace,cats from
page where cats @> '{Living_people,ABBA_members}';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on page (cost=104.37..17426.81 rows=4760 width=119)
(actual time=11.900..11.912 rows=4 loops=1)
Recheck Cond: (cats @> '{Living_people,ABBA_members}'::pg_catalog.text[])
-> Bitmap Index Scan on page_cat_idx (cost=0.00..104.37 rows=4760
width=0) (actual time=11.886..11.886 rows=4 loops=1)
Index Cond: (cats @>
'{Living_people,ABBA_members}'::pg_catalog.text[])
Total runtime: 11.973 ms
(5 rows)
11.973ms, cool, and the results:
wikidb=> select page_title,page_namespace,cats from page where cats @>
'{Living_people,ABBA_members}';
page_title | page_namespace |
cats
--------------------+----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------
Benny_Andersson | 0 |
{1946_births,ABBA_members,Living_people,Musical_theatre_composers,Pop_pianists,Swedish_pianists,Swedish_songwriters}
Björn_Ulvaeus | 0 |
{1945_births,ABBA_members,Humanists,Living_people,Musical_theatre_composers,People_from_Gothenburg,Swedish_songwriters}
Anni-Frid_Lyngstad | 0 |
{1945_births,ABBA_members,Anni-Frid_Lyngstad,German_nobility,Living_people,Norwegian_musicians,Princesses,Swedish_female_singers,Swedish_pop_singers}
Agnetha_Fältskog | 0 |
{1950_births,ABBA_members,Agnetha_Fältskog,Living_people,People_from_Jönköping,Swedish_female_singers,Swedish_pop_singers}
(4 rows)
Lets be mean:
wikidb=> select count(1) from page where cats @> '{Living_people}';
count
--------
112153
(1 row)
Total runtime: 1156.697 ms
And mean some more:
wikidb=> select count(1) from page where cats @> '{GFDL_images}';
count
--------
100803
(1 row)
Total runtime: 987.029 ms
If we tried the following in mysql it might take hours:
wikidb=> select count(1) from page where cats @>
'{Living_people,GFDL_images}';
count
-------
0
(1 row)
Total runtime: 25.201 ms
Hmm... 25ms < hours.
Enough people...
wikidb=> select count(1) from page where cats @>
'{Unprintworthy_redirects}';
count
-------
14602
(1 row) Total runtime: 69.335 ms
wikidb=> select count(1) from page where cats @>
'{Unprintworthy_redirects,Redirects_from_plurals}';
count
-------
2415
(1 row)
Total runtime: 12.002 ms
Hmm, what redirects from plurs are print worthy? :)
wikidb=> select page_namespace,page_title from page where cats @>
'{Redirects_from_plurals}' and not cats @> '{Unprintworthy_redirects}'
;
page_namespace | page_title
----------------+---------------------------------------------
0 | Guide_dogs
4 | WikiProject_Red_Link_Recovery/Pluralisation
0 | Assistance_dogs
0 | Hearing_dogs
(4 rows)
Total runtime: 3.867 ms
Okay, how about some useful trivia:
wikidb=> select page_title from page where cats @>
'{1982_deaths,American_novelists}';
page_title
-------------------------
Ayn_Rand
Philip_K._Dick
Djuna_Barnes
John_Cheever
John_Gardner
William_R._Burnett
William_P._McGivern
Theresa_Hak_Kyung_Cha
Margaret_Culkin_Banning
Beirne_Lay,_Jr.
(10 rows)
Total runtime: 0.417 ms
Wow, realistic queries go quick ... I bet PHP takes longer to *start*
than that took to complete..
wikidb=> select page_title from page where cats @>
'{American_philosophers,American_physicists}';
page_title
------------
(0 rows)
Total runtime: 0.205 ms
So...
submilisecond on normal queries.. Worst case at a second.. Cases which
are evil for inner join are no issue..
Hmmmmm.......
How else do little choices like our database platform hold us back?