We get more and more complaints from german Wikipedians about the inavailability of the german language Wikipedia. So, what's the status here right now? The hardware that went down at christmas time should be replaced by now?!? Or are we still running on that low-performing fail-over-system, and if so, why?
Uli
On Jan 14, 2004, at 09:25, Ulrich Fuchs wrote:
We get more and more complaints from german Wikipedians about the inavailability of the german language Wikipedia. So, what's the status here right now? The hardware that went down at christmas time should be replaced by now?!?
It should have been, yeah.
The replacement motherboard apparently didn't solve everything though, and it's been sent back again.
Or are we still running on that low-performing fail-over-system, and if so, why?
Yes.
-- brion vibber (brion @ pobox.com)
The replacement motherboard apparently didn't solve everything though, and it's been sent back again.
So when do you exepct a new motherboard to arrive? These downtimes drive our contributors away, and are blocking us in getting new ones. I think this is really a serious issue. Since there are more than enough funds available now, I cannot understand why that much time is needed to get the system back to the status we had in the beginning of December. Sorry if that sounds a bit angry, but the german language Wikipedia is more or less unuseable since three weeks now.
Uli
Ulrich Fuchs wrote:
The replacement motherboard apparently didn't solve everything though, and it's been sent back again.
So when do you exepct a new motherboard to arrive? These downtimes drive our contributors away, and are blocking us in getting new ones. I think this is really a serious issue. Since there are more than enough funds available now, I cannot understand why that much time is needed to get the system back to the status we had in the beginning of December. Sorry if that sounds a bit angry, but the german language Wikipedia is more or less unuseable since three weeks now.
Keep in mind that some of the December troubles might have been avoided if the new hardware had gotten more testing before being put online, so let's not be so impatient this time around. All the Wikipedias are slow, it's not just the German one.
Stan
In message 400588E3.1040805@apple.com, Stan Shebs shebs-2kanFRK1NckAvxtiuMwx3w@public.gmane.org writes
Ulrich Fuchs wrote:
The replacement motherboard apparently didn't solve everything though, and it's been sent back again.
So when do you exepct a new motherboard to arrive? These downtimes drive our contributors away, and are blocking us in getting new ones. I think this is really a serious issue. Since there are more than enough funds available now, I cannot understand why that much time is needed to get the system back to the status we had in the beginning of December. Sorry if that sounds a bit angry, but the german language Wikipedia is more or less unuseable since three weeks now.
Keep in mind that some of the December troubles might have been avoided if the new hardware had gotten more testing before being put online, so let's not be so impatient this time around. All the Wikipedias are slow, it's not just the German one.
Actually I'm finding the Welsh Wikipedia is running pretty well these last few days...
Keep in mind that some of the December troubles might have been avoided if the new hardware had gotten more testing before being put online, so let's not be so impatient this time around. All the Wikipedias are slow, it's not just the German one.
I don't think you have to do much hardware testing. Hardware it's working, or it's not working. Some errors may shine up only if the hardware is running some weeks. Ok. If there's an error, hardware as simple as we are running right now should be replaceable within days.
We need to overcome the current problems fast with a solution that isn't for eternity, but that works for now. It's a bad idea to wait for the new hardware, since *this* hardware and *setup* needs to be tested, probably software must be rewritten and so on. I do not believe the new server farm will be up and running smoothely before end of february.
I've got the feeling that everyone right now is concerned with and waiting for the new hardware, and forgetting the one we have. As I said, we get complaints from contributors, we are losing contributors. Today we got a mail from the PhD assistant of a german university professor announcing he and his boss would like to write an article. What do I tell them? "Sure, we like to have it, but keep in mind to save it well on your hard disc, probably the ipload will fail. Or please, wait until end of February, probaly we have new hardware then, but probably we have not? " That's ridiculous. This guy will turn away faster than you can watch.
Uli
On Jan 14, 2004, at 11:00, Ulrich Fuchs wrote:
We need to overcome the current problems fast with a solution that isn't for eternity, but that works for now.
Ulrich, do you have any suggestions? If so, please stop dancing around and give them. I've been asking for help on wikitech-l for days and mostly getting useless diatribes about stuff we should get in the future.
-- brion vibber (brion @ pobox.com)
Am Mittwoch, 14. Januar 2004 20:04 schrieb Brion Vibber:
On Jan 14, 2004, at 11:00, Ulrich Fuchs wrote:
We need to overcome the current problems fast with a solution that isn't for eternity, but that works for now.
Ulrich, do you have any suggestions? If so, please stop dancing around and give them.
Brian, I do not "dance around" - you told me the new mainboard wasn't working, so you sent it back. Ok. But my question was, when will the replacement arrive? That's a phone call to the supplier, and probably it's pressing the supplier a bit. If he can't assure you to deliver a new mainboard within let's say a four days, go to another one. I do not know too much about the old configuration, but obviously it wasn't such high-tech stuff that has months of order lead time.
As far as I can see, we had a working and well performing solution at the mid of december. Please bear with my for not reading wikitech, I've got to much to do doing the editing part. But from my point of view, all that needs to be done is to buy exact replacements for the failed hardware we had in december. And I don't think that should need that much time, given the funds we (still) have.
Uli
Ulrich Fuchs wrote:
Brian, I do not "dance around" - you told me the new mainboard wasn't working, so you sent it back. Ok. But my question was, when will the replacement arrive?
The replacement arrived, and Jason tested it, and it had the same ram errors. The problem is mysterious, but anyhow we are to the point where we are likely going to send the machine back and have them fix it and return it. I haven't spoken to Jason this morning, so I don't know the exact status of that situation.
As far as I can see, we had a working and well performing solution at the mid of december. Please bear with my for not reading wikitech, I've got to much to do doing the editing part. But from my point of view, all that needs to be done is to buy exact replacements for the failed hardware we had in december. And I don't think that should need that much time, given the funds we (still) have.
I'd like to invite you to join wikitech-l, because we're well aware of the things you're talking about and doing what we can. It doesn't make sense for us to buy a lot of new hardware at the existing facility when we are going to move. So, I'm loaning everything I've got available to Wikipedia just as fast as I can make it available.
--Jimbo
On Wed, Jan 14, 2004 at 11:35:57AM -0800, Jimmy Wales wrote:
... So, I'm loaning everything I've got available to Wikipedia just as fast as I can make it available.
...and still it's you to blame. Gawd, I like end users. :-)
Oh, by the way, THANKS for all the efforts and contributions you continously make to keep things running. I'm sure most of us lurkers appreciate and value your and Brion's efforts without telling it on the list everytime someone loudly complains.
Peter
The replacement arrived, and Jason tested it, and it had the same ram errors. The problem is mysterious, but anyhow we are to the point where we are likely going to send the machine back and have them fix it and return it. I haven't spoken to Jason this morning, so I don't know the exact status of that situation.
What about that status today?
Uli
Ulrich Fuchs wrote:
The replacement arrived, and Jason tested it, and it had the same ram errors. The problem is mysterious, but anyhow we are to the point where we are likely going to send the machine back and have them fix it and return it. I haven't spoken to Jason this morning, so I don't know the exact status of that situation.
What about that status today?
The new machine should arrive later today at Jason's house. He intends to do a burn-in on that one, including memtest86.
Geoffrin continues to give errors on some memory tests, in every different configuration that Jason tests. He has 2 motherboards and lots of sticks of ram. But we are currently not 100% sure what the exact situation is, as follows:
1. As shipped, Geoffrin gave errors in test #8 of memcache86. Jason has determined that this was due to the memory being in the wrong slots as shipped. This is Penguin's fault, of course, but on the other hand, the motherboard manual contains contradictory information, so it's easy to understand the error.
2. In various configurations consistent with the correct placement of memory, Geoffrin gives sporadic errors in test #11. These are listed as "ECC" errors, which Jason believes may mean that these are o.k., i.e. errors that were caught and fixed by the ECC stuff in the RAM. Or, it may mean something else. He was researching that yesterday afternoon.
3. Therefore, it is quite possible that Geoffrin will run just fine in the current configuration. Certainly, it no longer gives errors in test #8. It is also possible that Geoffrin will *not* run just fine in the current configuration. He's doing testing, but there's no way to be certain, really certain, until we see what happens.
4. As a result, we're still planning to loan this other machine to Wikipedia as a backup for Geoffrin. We do not know what time it will arrive at Jason's house. Also, this new machine has only 1 gig of RAM, and I'm purchasing more to go into it. There's some question as to whether we should wait until Monday when that RAM arrives before Jason drives to San Diego to install all this stuff, or if he should go today. (And going today is contingent on what time the new machine comes in, anyway.)
--Jimbo
On Fri, Jan 16, 2004 at 04:26:57AM -0800, Jimmy Wales wrote:
- In various configurations consistent with the correct placement of
memory, Geoffrin gives sporadic errors in test #11. These are listed as "ECC" errors, which Jason believes may mean that these are o.k., i.e. errors that were caught and fixed by the ECC stuff in the RAM. Or, it may mean something else. He was researching that yesterday afternoon.
I would use these only if they're out of the guarantee period. Any errors mean it's not perfect, and you should change them if they're still covered by the seller. (Well, at least we do that in good old Europe, but maybe industry of the USA is not that advanced ;->.)
My $0.02.
grin
http://www.brianism.org/wikipedia.htm
(dated 20 January 2004!)
for the background, see http://en.wikipedia.org/wiki/Talk:Brianism
--Optim
__________________________________ Do you Yahoo!? Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes http://hotjobs.sweepstakes.yahoo.com/signingbonus
On Jan 16, 2004, at 12:20 PM, Nikos-Optim wrote:
http://www.brianism.org/wikipedia.htm
(dated 20 January 2004!)
for the background, see http://en.wikipedia.org/wiki/Talk:Brianism
Why are you bringing this to the list? Not meant rudely, I just don't understand what's failing on-site, mostly because I haven't been following it.
Peter
--- Funding for this program comes from Borders without Doctors: The Bookstore Chain That Sounds Like a Charity. --Harry Shearer, Le Show
Peter Gervai wrote:
- In various configurations consistent with the correct placement of
memory, Geoffrin gives sporadic errors in test #11. These are listed as "ECC" errors, which Jason believes may mean that these are o.k., i.e. errors that were caught and fixed by the ECC stuff in the RAM. Or, it may mean something else. He was researching that yesterday afternoon.
I would use these only if they're out of the guarantee period. Any errors mean it's not perfect, and you should change them if they're still covered by the seller. (Well, at least we do that in good old Europe, but maybe industry of the USA is not that advanced ;->.)
They are covered by warranty, and so our current plan is to return the entire machine to Penguin with the insistence that they get it 100% right before they return it to us. But this will have to wait for another week or so, i.e. until our new datacenter is up and running so that Geoffrin is not needed.
--Jimbo
Ulrich, I'm doing what I can. I have a new server on order for Bomis, a dual opteron, that I'm going to loan to Wikipedia when it comes in. That server ships tomorrow, which means that Jason should have it on Friday. He'll install it soon after, either on the weekend if he feels like working on the weekend (ick) or Monday.
I've got the feeling that everyone right now is concerned with and waiting for the new hardware, and forgetting the one we have.
Not me! I am doing what I can to help out.
This server will be a dual opteron 240, an Altus 1000E. Jason will stuff it with as much RAM as he can. I think it will make an enormous difference.
--Jimbo
wikipedia-l@lists.wikimedia.org