We've had a couple odd errors cropping up in the last couple days:
* something tried to update a record in the 'recemtchanges' table
* something tried to run a 'peplaceinternallinks' function
These are both single-bit changes in strings, which is rather suspicious.
The 'recemtchanges' was on srv68, as recorded in the dberrorlog a number of
times over a couple days:
Sun Jan 8 16:56:02 UTC 2006 srv68 RecentChange::markPatrolled
10.0.0.101 1146 Table 'nlwiki.recemtchanges' doesn't exist (10.0.0.101)
UPDATE `recemtchanges` SET rc_patrolled = '1' WHERE rc_id = '2429501'
When I checked it, the source code looked correct on that system; so it could be
either a bit flip in main memory (bad!), or some mysterious corrupting bug
somewhere else along the way...
There's no hits in /var/log/mcelog on srv68, which _should_ have listed any
detected memory parity errors, allegedly.
The 'pelplaceinternallinks' one I'm not sure where it happened; I don't think
our PHP error log files were working properly, and we somewhere along the line
stopped patching PHP to include the hostname in error output to clients. But it
was reported multiple times until we restarted the web servers to clear whatever
held the bogus cached code in memory.
I've set PHP to log errors to the syslog now; these then go to the log server on
zwinger and so can be collected and listed per-server in the syslog there.
-- brion vibber (brion @ pobox.com)
Well, that attempt at forwarding didn't go too well, so here's the same
message from the NANOG list, quoted below, inline...
-- Neil
>
> Hello,
>
> In june 2005 LACNIC received two new IP block from IANA (189/8 and
> 190/8). And before starting to make allocation in it, we conducted
> some reachability tests with valuable help from this community.
>
> Those test showed that great majority of transit providers were not
> block those network ranges.
>
> But unfortunately turned out that there are now some end nodes
> networks which are still blocking them.
>
> Contact have been tried from LANCIC members being affected, their
> upstream and also from LACNIC itself since end of November 2005 with
> not much success so far.
>
> So we kindly ask you all to revise your filters and if some of the
> following networks/address range administrator could contact us
> offline we would appreciate:
>
> American Express
> Oracle
> Wikipedia
> HostDime.com, Inc
> American Airlines Inc.
> Gannett Co
> Adslzone.net
> 82.223.64.0/18
>
> Thanks
> Ricardo Patara
> RSG Manager
> --
> | LACNIC
> | http://LACNIC.NET
>
>
>
Hello,
I asked to remove {{ messages from en.wikipedia skin
( Mediawiki:aboutsite, Mediawiki:copyright, Mediawiki:pagetitle ),
and we gained 7% of performance. In these hard times when we're again
somewhat overloaded even small bits count.
If you're a sysop on big wiki and notice a message that is used on
skin and has {{ in it, do not hesitate and remove it. In future we
might hack that 10 (or more) largest wikis definitely won't be able
to use {{ in skin messages.
Every 1% saved results in multiple kilo$ in savings.. ;-)
Domas
I am from zh-wikipedia. There were some problems in system at january 6th . Since
that, thousands of articles are not included in the category pages. These articles has to be edited in order to enter the category pages.Because of reasons metioned above, we have to edit these thousands of articles into category pages. It would be a big task.Is there anyone can purge the data? Please purge it to help these pages getting into category pages.Thanks a lot.
ffaarr
I've been doing a lot of thinking lately about globals and their place in
MediaWiki in the long term. I rewrote globals.txt to reflect the fact that
PHP does not love globals, in fact the need for a declaration to bring
globals into the local scope puts it among the more global-hostile languages.
In many cases, use of globals obscures data flow and makes classes less
flexible, inhibiting reuse. This is patently true in the case of $wgTitle
and $wgArticle, the existence of which encourage lazy programmers to write
code which fails in the common case where more than one of these objects
exist. At present, these two objects are almost exclusively used in the
output phase, so it would make sense to make them members of OutputPage or
Skin instead of globals.
The most extreme anti-global architecture would be one involving application
objects:
$mw = new MediaWiki;
$mw->executeWebRequest();
The application object could theoretically be passed to most class
constructors, providing a form of global context. That, however, would make
writing new classes a bit tedious. In my experience, it turns out to be
easier to make the application object a global, and pull it in wherever it
is needed. This would have advantages when MediaWiki needs to be embedded as
a library, since it keeps the global scope cleaner, but it's not really more
flexible than what we're doing now.
After some thinking, I was forced to admit that there are some cases where
globals make sense, from a data flow perspective. The clearest example is
caching. A cache should have the widest possible scope. If you have two
application objects, you would want them to share the same caches if
possible. Indeed, it's better if different threads, processes and even
servers can share their caches.
There are, however, disadvantages to using global variables for this or any
other similar purpose. The problem is that the use of global variables
inhibit lazy initialisation. The familiar solution is to use an accessor
function, and indeed this approach has already been implemented in several
places in MediaWiki. I would like to make such accessor functions more
pervasive.
There is also the problem that the global namespace is somewhat crowded.
Using a global function for an accessor just moves this problem to somewhere
else. The alternative is to use a static class member as an accessor. This
concept is well known, and where the static object is the only one ever
needed, the object is called a singleton. The PHP 5 manual recommends
calling the accessor function singleton(), and I'll go along with that
despite personally preferring getInstance().
The disadvantage to the singleton pattern is that it requires the class name
to be hard-coded throughout the code base, removing some flexibility. We
could get around that by having base classes construct derived classes, if
you don't mind the dependency implications.
I'm currently working on converting $wgLinkCache to a singleton pattern, and
I also have a few other objects in my sights. But I still don't know exactly
how far we want to go with this. What do we want our long-term architecture
to be?
What should we do with the User class? $wgUser is used very heavily. If not
global, the scope of the object would have to be very wide. There are a few
applications for multiple user objects, but they don't really interfere with
the use of $wgUser elsewhere.
Another tricky case is configuration. There's about 300
configuration-related globals, it might be nice to encapsulate them purely
from a namespace perspective. We already have a SiteConfiguration object,
and on Wikimedia sites, this object has a configuration array which is
extracted into the global namespace. Should we just use it directly instead?
The conversion cost would obviously be high.
There might also be some need for encapsulating configuration from a data
flow perspective. setupGlobals() in dumpHTML.inc could perhaps be made a bit
more elegant.
Should objects such as $wgUser and $wgConf be members of an application
object? Should the application object be global? Some other heavily-used
globals are $wgLang, $wgContLang, $wgOut and $wgParser. What should we do
with them?
We need to be guided by our applications, and choose the simplest
architecture which supports all of them. Are we interested in:
* Embedding? Need to avoid namespace pollution.
* Per-wiki daemons to do background tasks? Need a means for periodically
refreshing configuration and caches.
* A daemon that responds to requests for multiple wikis? Needs multiple
language objects, and a caching system which discriminates between different
wikis.
I'm interested in daemon (or servlet) applications because of the efficiency
implications.
-- Tim Starling
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
You should not be inheriting from SpecialPage under most situation. Keep
everything inside the reflectively called function and make sure your
Special Page's title is added to the message cache before you register
the Special Page. More discussion here: [[meta:Talk:Writing a new
special page]]
- --
Edward Z. Yang Personal: edwardzyang(a)thewritingpot.com
SN:Ambush Commander Website: http://www.thewritingpot.com/
GPGKey:0x869C48DA http://www.thewritingpot.com/gpgpubkey.asc
3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFDwFb5qTO+fYacSNoRAjACAJ9/e09cACS9B8n6RUbAfBnHIkhYtACfb+12
ZYQImInfVs+FAQTNKfE9K5g=
=iqCz
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi, regarding your Image-Batch-Upload question, I developed an extension
that allows MediaWiki to wire in already uploaded images in a local
folder. This allows you to use FTP to upload all your files, and then
hook them in without the HTTP interface. Alternatively, you can upload a
single tarball and decompress it, then upload the files (even faster but
a bit more clumsy). It's not very well-documented/tested code and it has
problems with wikis that have magic_quotes enabled and filenames with
single quotes in them (I have a patch that fixes that though), so I'm
not ready to release it yet, but if you're interested, send me an email
and I'll hook you up.
It should be *a lot* faster than using pywikipediabot, although if you
already got it to work, no sense beating a dead horse. I use the script
to upload particularly large files, where FTP serves me much better.
- --
Edward Z. Yang Personal: edwardzyang(a)thewritingpot.com
SN:Ambush Commander Website: http://www.thewritingpot.com/
GPGKey:0x869C48DA http://www.thewritingpot.com/gpgpubkey.asc
3FA8 E9A9 7385 B691 A6FC B3CB A933 BE7D 869C 48DA
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)
iD8DBQFDwFSWqTO+fYacSNoRAlBIAJ9qW+uWIp0aUbVG4qLq64beoDfY6QCggRDg
POH1g/96EZW156wzgL8ZXKg=
=V9DG
-----END PGP SIGNATURE-----
Edward,
I don't have the answer to your questions about making new pages
directly in the database, but I accomplished the same thing in a
SpecialPage I wrote by creating the content and a new Article from
within the SpecialPage code itself. The SpecialPage defines the form,
and posts back to itself. That way you are still in context with all
the globals you need to make new pages easily. My only difficulty
with this is that the $newArticle->insertNewArticle method ends by
telling the new Article to display itself, yanking you out of the
SpecialPage code.
My SpecialPage is derived from BoardVote.php, which used to be
referenced as an example. You can find it here:
http://cvs.sourceforge.net/viewcvs.py/wikipedia/extensions/BoardVote/
BoardVote.php?rev=1.3&only_with_tag=MAIN&view=markup
John
Hi,
I worte a little PHP object that takes
* a list of article IDs
* a list of categories
and returns those article IDs which belong to these categories *or their
subcategories*.
The algorithm is written to minimize the number of read queries when
parsing the category tree. The maximum number of read queries is the
"tallest" category tree in the set of articles (times two, one for the
"parents", one for their page_ids), which IMHO should mostly be below 10
(I guess it's usually 5 or less, but I have no data to back that up).
So, if the tree depth for an article is 8, and I add more articles to
search for which all have a depth of 8 or less, no additional database
queries are neccessary. The individual query will grow in size, though.
I intend to use it on the Tasks feature search, which is why I put it
into that extension ("extensions" module, "Tasks/categoryfinder.php").
But I hope this can be used for other searches as well. The idea is,
instead of searching for, say, 25 articles, to internally search for
more (e.g. a few hundred), then filter that set through the
categoryfinder, until there are 25 matching articles.
Before I start implementing the actual search interface, can anyone tell
me if that would put too much stress on the DB slaves? Keep in mind that
limiting several searches to "articles in [[Category:Physics]] and its
subcategories" appears to be immensely useful, so it might be well worth
the DB stress from a user standpoint. (Damn you, users! ;-)
Magnus