On Mon, Dec 09, 2002 at 11:46:30AM -0800, Jonathan Walther wrote:
On Mon, Dec 09, 2002 at 08:22:42PM +0100, Tomasz Wegrzanowski wrote:
We have done it twice, without much benefit. I'm all for evolutionary way.
Yes, but we were limited by MySQL each time. Have you used Postgres or Oracle or Sybase before?
Yes.
I have a hard time expressing briefly how great SQL92 compliance is compared to the subset that MySQL supports. Using MySQL for this is walking the long way around the bay when you could paddle the Postgres canoe directly across it in a fraction of the time.
I'm more concerned about row-level locking really. Could you try to make minimal number of changes necessary to make Wikipedia run on Postgres so we can see how much would better locking help ? Then we could make it even faster.
Most important current issues are:
- poor markup
- no media independence
- 15-or-so-pass parser instead of good hierarchical LALR parser
- slow database
- poor mirroring ability
- no dedicated offline client
I'm currently working on first two (well, three).
I agree and I would love to see your thoughts on all of these issues, and how best to solve them. The last two items should be fairly simple, if we require users to install Apache on their local boxes.
Not really. SQL databases are hard to mirror - nothing close to rsync comes for free with them.
How can we improve markup? I too feel it might be lacking somehow.
Now I'm implementing <math> tags, see all these "TeX, version X" threads on wikitech-l for details.
What do you mean by media independance?
Mainly being able to export pure-HTML versions and plaintext dict versions.
Rewriting it in C won't help with any of these issues.
By using C, we can use lex and yacc to make an excellent LALR parser
You would have hard time, as C doesn't implement good text processing or symbolic trees processing.
texvc is in ocaml now.
Postgres will solve the slow database issue.
Moving current script to Postgres is easy modulo FULLTEXT index. Better think what to do with that one.
No, Postgres really does require a full redesign to reap the benefits.
80% of benefit comes with 20% of effort usually, of course if you start from right 20%.
But how do you plan to implement searching in Postgres ?
Oh, and think how to you implement database mirroring, as this is what lost most after move from filesystem database (rsync) to MySQL.
Replication (database mirroring) comes with Postgres; we have a choice of the dbmirror, and the dbbalancer contributed modules. We can choose synchronous or asynchronous; I recommend asynchronous for speed.
How do they compare to rsync ?