Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

25 Nov 2009

Hi!

...
  Please read my comment over again: "I can't
imagine this is a query
 you want to run over and over again.  If it is, you'd probably want to
 use partitioning." 
Which would make sense if no other queries are being ran :)
With PG though you can define an index on smaller subset, may be  
better than partitioning.

...
  The word "DELETE" does not appear anywhere
on that page I referred to.
 The examples on the page are all SELECTs.  Try again. 
Argh, damn terminology, was thinking about partition drops.
Anyway, those SELECTs are 'faster' if they hit partition key, but  
then, people usually use PKs as their partition keys, so it doesn't  
really matter :-)

Partitions will make queries faster for people who don't have indexes  
(that is actually the major use case for people doing DW)

...
  I suspect you either know the answers to these
questions or can easily
 look them up. 
Oh well, PG added collation support in 'CREATE DATABASE' in 8.4, and  
those collations still rely on system ones, that aren't too perfect  
(how many applications actually do use system collations?)

...
   Is there a particular problem you're having with
them
 which is unsuitable for Wikipedia? 
*shrug*, wrong native collations? Not using locale-specific character  
locality in unique matching (haha, I could use this argument in  
opposite, when talking about MySQL support :), etc

...
   Does Wikipedia not use a separate database for each
language? 
In PG terminology, that would 'separate schema', which doesn't really  
support separate charsets/collations. Though of course, using separate  
DBs/instances is what we do now.

...
  Sorry, I can't reproduce your error: 
Because you didn't read what I wrote. I wrote I was using language- 
specific collation :) Generic collation will also fail on other  
characters (e.g. š will be mapped to s, when it should be treated as  
separate letter).

...
  I suspect operator error, but if you want to submit
your bug to
 http://www.postgresql.org/support/submitbug I'm sure someone will go
 over it with you. 
It was collation error, not operator error. I just showed it to  
illustrate my point, that there's quite some work to get working  
collations (which usually involves building locales yourself).
Do note, that once you have indexes in place, any locale change is  
really painful and requires full database rebuild.
One of reasons we're still 'binary' is that nobody really wants to own  
the pain of maintaining charsets server-side. It is much bigger  
project, than most of people see, at our scale.
Of course, one may just chose to believe, that there's silver bullet  
for everything.

Cheers,
Domas

P.S. Where is PG's replication? How does it deal with DDL? :)
P.P.S. Anyone running PG in production on a big website?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL