The new server (Altus 1000E) that I'm loaning to Wikipedia (I had
ordered this a few weeks ago for Bomis) did not arrive on Friday, but
is scheduled to arrive today. Also, extra RAM for it is scheduled to
arrive today.
Depending on those two things happening correctly, of course, Jason
will be installing both machines (geoffrin plus new) for Wikipedia to
use later this afternoon.
Geoffrin, last I spoke to Jason, Saturday, is still not 100% healthy.
It will run without errors with 3 gig of RAM, but not with 4 gig of
RAM. Jason has tried a number of different things, but what we're
most likely to do at some point is simply send it back to Penguin and
insist that they fix it or replace it or refund our money.
However, the warranty is for 3 years, so there's no particular hurry
on that front. What we might do in the meantime is populate it with 3
gig of RAM and install it for use. So this would give Wikipedia 2
dual opteron machines in working order, to see us through until the
new cluster arrives here and is installed. (Scheduled ship date is
late January, and presumably it'll take a week or so to get it
installed and working. We'd rather take *longer* to do that, so that
we can make sure it's right before switching, but of course if
Wikipedia is still limping at that time, we'll just do it.)
--Jimbo
Timwi wrote:
>Discussion should really be taking place on Talk pages. This has irked
>me for a long time; the Village Pump should really be on a Talk page, too.
For some page the whole point of the page is discussion. It's /subject/ is
discussion. In those cases having discussion on that page is perfectly fine.
But even so, subpages functionality would really help to move that discussion
off the main Wikipedia pages onto subpages and talk pages. That would greatly
reduce edit conflicts and the huge page sizes of certain pages.
-- Daniel Mayer (aka mav)
Hello,
I am new at wikitech-l, so I dont know what is said before to the theme
"sinle-sign-on" (yes I know I could read the archive and I tried, but it is
too long for today :-)...) but here are some thoughts from me...
If I`ve got it, we have 3 main problems:
1. There could exists users on different lang-WPs with the same name.
2. There is no (good) way to find out automatically which accounds with the
same name in different lang-WPs belong to different persons and which
belong to the one person.
3. We have actually no good solution for the first problem but as long as we
wait the problem gets larger.
So we should think first about to solve the third problem. This should be very
easy since we just have to create a new account in a new database for each
account in the old databases. Each account gets a new ID, the old name, says
from which lang-WP it is and which ID belongs to it there (and all the other
stuff belonging to the account).
If someone creates a new account, he/she has to take a name which is not used
in the new DB. A new account creates a new entry in the new database without
pointing to an lang-WP and an ID.
The lang-WPs have to use the new user-DB (and before that should be translated
to this, meaning the IDs has to be changed in the right way).
If this is done, the old user-dbs can be thrown away. (But we still need the
rows with the pointers to the old lang-WPs and the corresponding ID in the
new database as I will explain below.)
Thus we have the following situation: we have one single new user-database,
where it is allowed just for old uses to have accounts with the same name
like other users (or have more as one account with the same name).
These users have one big problem, they have to use there new ID to log in, so
they have a reason to solve there name-conflicts as described below.
Now we have everything we wanted and just the small name-conflict-problem
described above.
In the second step we have time to solve the problem with the names.
We could send automatically E-Mails to all accounts which have no unique name
(if the adresses are known) telling them which (new) IDs belong to the old
accounts (by telling them from which lang-WP there are, we even could tell
them there old ID: Ahh, for this we still need the informations!) and invite
them to merge the accounts which belongs to them! (Of course this needs a
special formular, where you have to enter the new IDs, and the old (and new)
password for each ID. Merging accounts could be a new feature of course).
We even could give them (and all others) the possibility to change there name
(which also should be a new feature), but of course just to unique names
which does not exist. This feature is somehow special, since for several
reasons the old name should be save for a longer time, i.e. it should not be
allowed for other persons to choose it for this time (but this is another
topic).
Thus the new user-database can be cleaned up to a better level. If there would
be users which does not want to change there name to solve the name-conflict
with other persons they can mark there account(s) as "under conflict". This
may will be the hard problems later.
After a while each account, whichs is not marked as "under conflict", but has
a name-conflict with an other account is changed automatically to a new name
(gets a additional number in the end of it...).
This may solves some "under-conflict"-accounts since may all other accounts
with the same name are may changed, cause they was not marked as
"under-conflict".
What to do with the rest could be discussed later...
Regards
Ivo Köthnig
Brion, et al.
As far as I understand, the watch list query has a date/time range
condition on the revisions table (which is VERY big). Most DB servers don't
use non-clustered indexes for range conditions. There are two ways to solve
this:
1. Change the revision table clustered index to timestamp (and primary key
to timestamp+ID). The ID should be kept only for timestamp collisions.
2. Save for each user in his watch list the cut-off revision ID for each
article.
I think that #1 is the easiest and will give significant improvement.
Meir :->
-----Original Message-----
From: Brion Vibber [mailto:brion@pobox.com]
Sent: Thursday, January 15, 2004 8:13 PM
To: Mendelovich Meir
Cc: 'Peter Gervai'
Subject: Re: Wikitech-l Digest, Vol 6, Issue 45
On Jan 15, 2004, at 05:20, Mendelovich Meir wrote:
> Brion,
> Small question: What is the primary key of the revision histroy
> table? Is it the timestamp? Is it clustered?
The primary key is the revision ID number.
There are also indexes by namespace/title/timestamp and by timestamp
alone.
-- brion vibber (brion @ pobox.com)
On the nl: wikipedia we are discussing the introduction of some custom
{{msg:xxx}} templates. During that discussion it occured to me that this
may be another namespace with two goals: internal messages for the
mediawiki software, plus templates for in the wikitext. If we make a new
message MediaWiki:weg on nl:, or MediaWiki:abc, who tells me that this
will not conflict with a future built-in message? Do we need two
separate namespaces for these features?
Regards,
Rob Hooft
--
Rob W.W. Hooft || rob(a)hooft.net || http://www.hooft.net/people/rob/
Could somebody enable subpages for the Wikipedia namespace on at least the
English Wikipedia? Some discussion-type pages like [[Wikipedia:Conflicts
between users]] would really benefit from that.
http://en.wikipedia.org/wiki/Wikipedia:Conflicts_between_users
-- Daniel Mayer (aka mav)
Brion has recently added code to store articles in 'old' SQL table in
compressed format, so I will need to adjust the scripts for the
international stats.
I spent several hours on it, and despite some useful tips from Brion I
can't get those article data inflated, all I get is a Z_DATA_ERROR (-3)
Brion sent me a small sample of the articles in the fr: 'old' dump in
compressed raw format, without escape sequences and other fields, just
article data. Even this I could not tackle.
Brion wrote:
> Here's a zip file containing the raw bytes of compressed old_text from
> the first up to 100 columns in the table:
http://leuksman.com/misc/raw.zip
> They do decompress with gzdeflate() in PHP.
Here is my test script
#!/usr/bin/perl
use CGI::Carp qw(fatalsToBrowser);
use Compress::Zlib;
$path = "raw/" ;
($refinf, $status) = inflateInit();
for ($i = 1 ; $i <= 100 ; $i++)
{ &ReadFile ($i) ; }
exit ;
sub ReadFile
{
$file_in = $path . "old-" . $i . ".raw" ;
open "FILE_IN", "<", $file_in ||
die ("Input file " . $file_in . " could not be opened.") ;
binmode FILE_IN ;
$article = "" ;
while ($line = <FILE_IN>)
{
chomp ($line) ;
$article .= $line ;
}
($article2, $status) = $refinf->inflate ($article) ;
if ($status == Z_OK) # Z_OK = 0
{ print "$i:OK: " . substr ($article2,0,50) . "\n" ; }
else
{ print "$i:Unzip error: $status\n" ; } # Z_DATA_ERROR = -3
}
Can someone help me out with this?
I can deflate/inflate dummy texts, so libraries are all in place.
(I use ActivePerl 5.8, on Windows)
-------------------------------------------------------------
There is a second problem,
possibly trivial after problem above has been solved:
(well actually I hope the problem above is a trivial oversight of mine
too)
The SQL dump contains escape sequences:
A small section of fr: old dump, new style, that Brion sent me contains
\Z: 3541 times
\\: 3497
\": 3428
\n: 3598
\r: 3550
\0: 3190
\Z is not listed on http://www.mysql.com/doc/en/String_syntax.html
I could not find any other doc referring to it.
\z is listed, so maybe upper/lower makes no difference, but I doubt it.
Anyone encountered this before?
Thanks for any help.
Erik Zachte