Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

25 Nov 2009


      On Wed, Nov 25, 2009 at 2:22 AM, Domas Mituzas midom.lists@gmail.com wrote:
...
Hi!
...
Please read my comment over again: "I can't imagine this is a query
you want to run over and over again.  If it is, you'd probably want to
use partitioning."
Which would make sense if no other queries are being ran :)
It'd make sense if most of your queries used one partition or the
other, and not both.  Kind of like Wikipedia's history/current tables,
which are effectively using partitioning, though it's being done in
the PHP (and any other language with software which tries to use the
database) instead of being done in the database using rules and
triggers (and thereby being accessible to software written in any
language).
...
With PG though you can define an index on smaller subset, may be
better than partitioning.
Not in this case.  You want to physically move the data so you can
access fewer pages, not just create an index on it.  PG doesn't move
data just because you create an index on it.
...
...
The word "DELETE" does not appear anywhere on that page I referred to.
The examples on the page are all SELECTs.  Try again.
Argh, damn terminology, was thinking about partition drops.
Anyway, those SELECTs are 'faster' if they hit partition key, but
then, people usually use PKs as their partition keys, so it doesn't
really matter :-)
Reread my messages now that you realize you were confused the first
time: "In MySQL, you could achieve the same thing through clustering,
however."  MySQL clusters on the primary key.  This is great, or at
least it would be great if it didn't mean MySQL locks the whole table
every time you use any DDL.  Still no "create index concurrently",
right?
...
...
Is there a particular problem you're having with them
which is unsuitable for Wikipedia?
*shrug*, wrong native collations?
It's not PostgreSQL's fault if you've got buggy locales installed on
your system.
...
Not using locale-specific character
locality in unique matching (haha, I could use this argument in
opposite, when talking about MySQL support :), etc
Exactly my point.  PostgreSQL does by default exactly what Wikipedia
wants to do with respect to uniqueness matching.  You can still do it
the other way by making an index on an expression, but in the case of
Wikipedia, which is what I thought we were talking about, there's no
need for that.
MySQL, on the other hand, doesn't give the option.  You either choose
binary, and lose the collation, or you drop the binary and have to
drop the unique key constraint.
...
...
Sorry, I can't reproduce your error:
Because you didn't read what I wrote. I wrote I was using language-
specific collation :)
Maybe you were using a *broken* language-specific collation?  Do you
get the same error when you use a locale-aware sorting program, like
gnu sort?
...
It was collation error, not operator error. I just showed it to
illustrate my point, that there's quite some work to get working
collations (which usually involves building locales yourself).
Do note, that once you have indexes in place, any locale change is
really painful and requires full database rebuild.
One of reasons we're still 'binary' is that nobody really wants to own
the pain of maintaining charsets server-side. It is much bigger
project, than most of people see, at our scale.
Of course, one may just chose to believe, that there's silver bullet
for everything.
The thing is, I never claimed there was a silver bullet.  You asked
for *any reason* to use PostgreSQL.  I gave one.  Just one, because
that's all you asked for.
I'm sure you could find someone to "own the pain of maintaining
charsets server-side".  Anyone ask Gerard M if he knows somebody?  If
not, the WMF should hire someone, because the comment in the bug
submission is right on that it's just embarrassing for an encyclopedia
not to be able to sort things correctly.  Just using a generic locale
would be orders of magnitude better than using a binary collation.
And you could keep the current hacks and kludges in place in each
language until a proper locale for that language is written.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL