Re: Re: [Wikitech-l] Some stats for service times with the new server

List overview All Threads
Download

newer

older

Re: [Wikitech-l] Re: New server...

Re: Re: Re: [Wikitech-l] Some...

lcrocker＠nupedia.com

22 Jul 2002 22 Jul '02

6:40 p.m.

...

>The "Astronomer" page contains an extraordinary number of links, >mostly to "year" pages. Each link on a page requires a database >lookup (a quick one, but still a lookup). "Current events" had >the same problem before I reorganized it. I'm inclined to just >remove the year links here. They don't really serve much purpose.

...

Argh! I didn't notice that. Thanks for pointing that out. Wouldn't page caching sort this, though?

I have something of a love/hate relationship with caching. It solves some problems, and it creates others. I'd like to avoid it if possible, and I think it is possible. But I may be proven wrong about that if people create lots of other link-fest pages.

Show replies by date

The Cunctator

22 Jul 22 Jul

6:37 p.m.

New subject: Some stats for service times with the new server

On 7/22/02 2:40 PM, "lcrocker(a)nupedia.com" <lcrocker(a)nupedia.com> wrote:

...

> The "Astronomer" page contains an extraordinary number of links, > mostly to "year" pages. Each link on a page requires a database > lookup (a quick one, but still a lookup). "Current events" had > the same problem before I reorganized it. I'm inclined to just > remove the year links here. They don't really serve much purpose.

Argh! I didn't notice that. Thanks for pointing that out. Wouldn't page caching sort this, though?

Some kind of page-caching is probably a good idea. We should definitely allow for linkfest pages.

Lars Aronsson

11:56 p.m.

New subject: Some stats for service times with the new server

...

>>> The "Astronomer" page contains an extraordinary number of links, >>> mostly to "year" pages. Each link on a page requires a database >>> lookup (a quick one, but still a lookup). "Current events" had

If this is true, I think the code should be rewritten to use an in-core hash table that contains (as keys) the names of all existing pages. It would only be used to determine whether a [[link]] should be underlined or a ?-link. This in-core data structures would not be shared between threads of a PHP application (at least not if load balancing over multiple hosts), but it could be recreated from the database (select name from cur) every few minutes or so (and whenever a reload or post is served). Pages that are created between these intervals will falsely have ?-links for a short while, big deal. Or hits would be read from the hash, but misses would force a read from the database. Avoiding a roundtrip down to the database sounds like a great saving to me. -- Lars Aronsson (lars(a)aronsson.se) tel +46-70-7891609 http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/

Brion VIBBER

23 Jul 23 Jul

11:52 p.m.

New subject: Some stats for service times with the new server

Lars Aronsson wrote:

...

>>>The "Astronomer" page contains an extraordinary number of links, >>>mostly to "year" pages. Each link on a page requires a database >>>lookup (a quick one, but still a lookup). "Current events" had

Hmm, maybe.

...

Or hits would be read from the hash, but misses would force a read from the database. Avoiding a roundtrip down to the database sounds like a great saving to me.

As Neil suggested, couldn't these checks be done in just one big query rather than a lot of separate ones? ie, something like: SELECT cur_title FROM cur WHERE cur_title IN ('Scientist','Research','Astronomy','Astrophysics',[...]) or would that just get more complicated? -- brion vibber (brion @ pobox.com)

Neil Harris

11:09 p.m.

New subject: Some stats for service times with the new server

Brion VIBBER wrote:

...

Lars Aronsson wrote:

Hmm, maybe.

Or hits would be read from the hash, but misses

would force a read from the database. Avoiding a roundtrip down to the database sounds like a great saving to me.

I agree. Roll-your-own hash tables are hard to scale, and a bad idea unless nothing else works. Keeping everything SQL will make things much easier to go to a 3-tier Wikipedia (with a separate database server/server cluster and multiple page servers) at a later date. I _think_ that the big-query approach should be _lots_ faster: for say, 300 links, it will do only one lot of query-round-trip and parsing overhead, instead of 300 lots. Neil

Lars Aronsson

24 Jul 24 Jul

8:40 a.m.

New subject: Some stats for service times with the new server

On Wed, 24 Jul 2002, Neil Harris wrote:

...

I agree. Roll-your-own hash tables are hard to scale, and a bad idea

Hash tables or associative arrays are built into Perl, and I guess into PHP as well. -- Lars Aronsson (lars(a)aronsson.se) tel +46-70-7891609 http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/

Neil Harris

22 Jul 22 Jul

6:40 p.m.

New subject: Some stats for service times with the new server

lcrocker(a)nupedia.com wrote:

...

Argh! I didn't notice that. Thanks for pointing that out. Wouldn't page caching sort this, though?

Lee, I haven't read the code, but do you do the lookups as lots of individual SQL calls per link, or do you gather up a list of them and do a single SQL call? This might improve performance, as possibly might doing the lookups in batches, if the latency/pipelining hit would be too much for a single call? Neil

7971

days inactive

7973

days old

wikitech-l@lists.wikimedia.org

Manage subscription

6 comments

6 participants

tags (0)

participants (6)

Brion VIBBER
Lars Aronsson
lcrocker＠nupedia.com
Neil Harris
Neil Harris
The Cunctator