Things seem more or less o.k. now. I'm running rebuildLinks.php -- it will be running
for several more hours, it seems.
Magnus Manske wrote:
> > -----Original Message-----
> > From: wikitech-l-admin(a)nupedia.com
> > [mailto:wikitech-l-admin@nupedia.com]On Behalf Of Jimmy Wales
> > Sent: Tuesday, May 14, 2002 7:23 PM
> > To: wikitech-l(a)nupedia.com
> > Subject: [Wikitech-l] Re: Link update script
> >
> >
> > I've switched to the CVS version and corrected a few bugs. But the
> > site is sluggish again, and load is creeping upward. I wonder if
> > getOtherNamespaces is the issue again?
>
> I don't think so - it should be fixed so it only checks for a single article
> (talk or the original topic) now. In fact, I cut the time for page rendering
> down by 20% compared to the patched version (without the talk links) of
> yours.
>
> Anyway, currently the 'pedia seems to run fast and stable. What does the top
> command say? Is the "really big process" bug back??
>
> Magnus
>
I've switched to the CVS version and corrected a few bugs. But the
site is sluggish again, and load is creeping upward. I wonder if
getOtherNamespaces is the issue again?
Magnus Manske wrote:
> When you've fixed the current version, please run the link update script. I
> just fixed the database so it will work properly (I hope;)
>
> The script is in the CVS, run it with
> php updateLinks.php
> (or whatever it is actually called)
>
> Attention: This will take several minutes to complete! You might consider
> taking the site offline for the time being.
>
> Magnus
>
Jason: The SQL for updating the cache is
UPDATE cur SET cur_cache="",cur_timestamp=cur_timestamp
as Brion mailed earlier. Please run that one.
Jimbo: The scipt that rebuilds the links is in the CVS. You probably checked
it out already. I think it is called "rebuildLinks.php" or something
similar...
This will run several minutes! *Maybe* you should take the site offline for
that, so the database doesn't get cunfused between user edits and that
script.
What I said baout the Recent Changes is about the header ("Track the most
recent changes..."), *not* the list of recent changes! It is a brief text,
but it is loaded quite often, and each time, the database is checked for all
those links in that text.
That's it for now,
Magnus
People are justifiably annoyed sometimes when we make technical
changes that affect them, without sufficient notice. Our defense is
valid in the current case, i.e. there was a crisis going on, and we
had to do *something*. But still, people are right to ask to be
notified.
It would be helpful if the Recent Changes text were editable. In the
old software, the text at the top of the Recent Changes page was just
another wiki text. We could (and did) edit it from time to time to
keep people updated on technical issues. This is a great place for
it, because that text is viewed by all serious contributors many times
a day.
--Jimbo
Less than two hours ago, I found the page
http://www.wikipedia.com/wiki/Sweden blank again. So I clicked Edit,
then Save, to restore its full contents. This has happened before.
See the page's History.
Is this "the caching bug"? How does it work? Wouldn't the obsolete
cached copy me destroyed the first time I did this? Why does the
blank page come back?
Is caching really necessary for performance now? When caching was
activated, I thought several other functions were disabled at the same
time. Do we know the impact on response times of not using caching?
--
Lars Aronsson
<lars(a)aronsson.se>
tel +46-70-7891609
http://aronsson.se/http://elektrosmog.nu/http://susning.nu/
I know wikitech-l is probably working on this, but I want to express my
dismay at the fact that the most recently saved version of my user page
(and lots of other page) is something from back in February. That's kind
of scary.
Larry
Have a look at the history of [[Pim Fortuyn]]
http://www.wikipedia.com/wiki/Pim_Fortuyn&action=history
All versions are properly listed and stored correctly as far as I can tell.
However, starting with the version of Tuesday May 7, 8:59, the diffs
are screwed up. They do not compare to the next version, but to very
old versions.
I noticed that on the Talk page, Tim Shell explained one of his
edits to the main article early Tuesday morning, and I remember seeing
that edit (taking out the label "right wing"). This change however
does not show up on the main page's history page.
Does anybody have an idea what is going on here? Are similar things
happening on other pages?
Axel
I finished working on the speedup for now. Some of my test data (Athlon
1800):
ab -n100 -c8 (for the Biology article):
* uncached: ca. 11 requests/sec.
* cached: ca. 20 requests/sec.
This is about 20% faster compared to what I started with this morning.
The real time killer is the {{NUMBEROFARTICLES}} variable on the main page.
Its use increases the time per page by a factor of about 10! Only a change
of the database can help here.
Another thing that takes a lot of time is the headed or the Recent Changes
page. I guess since all the links there have to be checked every time the
page is displayed, it increases the load on the database quite a bit. On my
(short) local Recent Changes page, the header text alone is responsible for
about 20% of the total rendering time. We should probably force the Recent
Changes page *not* to look for existing articles in the header text.
Everyone, please test the changes I made (especially the login procedure)
for bugy, so Jimbo can use the current version ASAP.
Magnus
On dim, 2002-05-12 at 08:32, Jason Richey wrote:
> Brion L. VIBBER wrote:
> > No idea. Maybe it's the same reason that THERE IS A TABLE MISSING FROM
> > THE DATABASE? The "unlinked" table is still inaccessable... "Can't open
> > file: 'unlinked.MYD'. (errno: 145)"
> >
> > I most sincerely hope that there's a backup of the database more recent
> > than March 21.
>
> The file is, in fact, still there. I don't know how much that means
> if it is unreadable...
I don't know, but it still doesn't work. :(
The error message on every page load seems to have been repressed for
now, but:
* Explicit SQL queries to the unlinked table give the already mentioned
error, "Can't open file: 'unlinked.MYD'. (errno: 145)".
* special:WantedPages comes up with nothing, and notes PHP errors on the
lines that deal with the query to the unlinked table.
* Doing a "Pages that links here" gives PHP errors on the lines that
deal with the request to the unlinked table, and thus no longer can give
results for pages that don't exist yet.
* Saving a new page may also have some problems in this vein, since it
should be checking the unlinked table for pages that linked to the newly
created page and thus need their caches cleared to mark the link as
existing. (in wikiPage::ensureExistence())
-- brion vibber (brion @ pobox.com)
(PS: I've noticed that replies to my messages lately seem to be coming
directly to me rather than to the list. Do I have a freak Reply-To
header that I'm not aware of or something?)
On dim, 2002-05-12 at 08:17, Magnus Manske wrote:
> > > I suggest to add two new fields to the database (cur table)
> > > * cur_namespace (varchar) so articles can be distinguished by namespace
> > > without using the "LIKE" MySQL query (this was mentioned before)
> >
> > Should there be a separate title-without-namespace field as well, or
> > would cur_title now be that?
>
> I think a namespace field would do fine for now, as most of the slow queries
> look for the namespace.
Well, the statistics checks do I suppose, but the watchlist and
list-all-namespaces functions check for everything with the same
non-namespace portion. No?
> > > * cur_is_redirect (boolean) so #REDIRECT pages can be excluded likewise
> >
> > Not a bad idea! Might it be even easier to make it a string field which
> > contains the name of the page being redirected to? This would save
> > parsing the #REDIRECT [[x]] line or relying on the link tables when
> > doing mass checks in, eg, "links to this page".
>
> I agree. A boolean field might be faster, but not by much.
>
>
> Concerning the {{NUMBEROFARTICLES}} function, we should implement a
> cur_has_commas (boolean) field while we're at it. It seems to be the best
> way of telling articles apart from stubs. Or, we should name it
> cur_is_article, in case we come up with a better method some day.
Or, we could just update the value once a day. :)
Actually, that reminds me: should we add a timestamp field to indicate
when cur_cache was last filled? A simple check of that against
cur_timestamp would have prevented our present cache-of-ancient-
version-is-displayed-instead-of-recent-one troubles.
It would then be possible to expire caches on certain pages, such as
those containing {{variables}}.
-- brion vibber (brion @ pobox.com)