[Maps-l] Server admin procedures wrt ptolemy and ortelius

Marcin Cieslak saper at saper.info
Thu Sep 17 23:50:02 UTC 2009


Mark Bergsma wrote:
> Ævar Arnfjörð Bjarmason wrote:
> Please keep in mind that ptolemy and ortelius are meant to be WMF
> production boxes. That means they're (also) managed by the Wikimedia Ops
> team. I think that for the near future we're happy to let you play with
> the boxes and experiment with what OSM/integration software/architecture
> works best. But eventually, when these maps are integrated into our core
> web sites, the servers and software will need to be managed by WMF as
> well as you guys. Especially since you volunteers might lose interest in
> the long run... :)

I think being "production" is very good - we will be on monitoring from
the very beginning :-)

> That means:
> 
> - Please work with us; keep us informed

So far the only update for ptolemy:
- LOM firmware update by river
- Sun STK RAID INT firmware update from 5.2-0 (15825) to 5.2-0 (16732)
- Tool to manage STK RAID INT installed in /usr/sbin/arcconf (Version
6.10 (B17551) from Intel website). I think that Adaptec's version
(Version 6.10 (B18359)) is a bit more informative, what are you using on
other servers?

- Installed as dependencies for arcconf:
libstdc++5
gcc-3.3-base
libgcc1

> - Put documentation in our documentation wiki,
> http://wikitech.wikimedia.org. If you need access, please contact me and
> I'll get you set up.

Can you create accounts for Aude and myself (Saper)? Is Ævar there as well?

Looking at
http://wikitech.wikimedia.org/view/Platform-specific_documentation
how far is Sun Fire X4250 different?

> - Logging of server actions can be done on #wikimedia-tech using the log
> bot. Just use "!log <message>" in the channel, it will work. Put the
> server-name in the line.

Cool, thanks.

> - If you have any problems/issues/needs related to managing the servers
> in general (RAID controller/driver issues?), as opposed to OSM software
> specific things, then certainly ask us! Chances are we've already solved
> it or have a certain way of doing things, and there is no need for you
> to reinvent the wheel. :)

Yes, here are my questions:

(1) It has been reported that RAID controller has serious stability
problems (causes kernel abends). I think this should be fixed in the new
firmware OR the new driver, see below.

(2) What are the kernel upgrade procedures on the WMF servers?

(3) What are the OS upgrade procedure on the WMF servers?

(4) /home/saper/raid/linux_x86_x64_driver_v1.1.5-2463 contains Linux
driver version 2463 for Sun STK RAID INT that we probably should be
running. I can do that given (2) above :)

(5) I asked on #ts-admins about the management console access, that
would be beneficial to perfom changes to kernel and partitioning, see
the next points what we need to be done from there.

(6) I think we should reconfigure RAID - for now, I would like to put
the current filesystem on a single RAID1 pair of drives. It's root, so
I think this shouldn't be done from the running system. I think we can
disband the current RAID 10 setup for now, we will be testing one or two
possible RAID setups for Postgres as soon as we have space.

(7) I'd love to have OS repartitioned - small /, large /usr, mid-large
/var, small /tmp in a traditional UNIX way. All of this on a RAID 1
volume created in step #6

(8) It would be nice to have different OS (FreeBSD or Solaris) but
I understand that probably you'd like to have a uniform setup accross
WMF and I think I can live with it. Would be nice to have information re
#3 though if we stick to Ubuntu.

I think I could do (4)...(7) myself given access to the management
console and with some possibility to have some netboot/CD-boot from
there. This leads me to:

(9) I've seen this:
http://wikitech.wikimedia.org/view/Automated_installation
Do you have some kind of minimal netboot/recovery system to be invoked
from LOM to do stuff like total repartitioning?

> Ptolemy and ortelius are, in the long run, *not* meant to be used by
> toolserver users. Those boxes are explicitly separate. You can't run a
> production database when users are running all kinds of inefficient and
> uncoordinated queries on it. :) For now it doesn't matter, but keep this
> in mind.
> 
> Cassini is a toolserver, and managed by Wikimedia Germany. They do
> things differently than WMF, coordinate with them to see what works there.

I hope that we can have a joint on project on maps and use resources
efficiently. For example, we might not have space for the full OSM
database anywhere else then on ptolemy. However, I think we can find
a way to provide production-level stability and stay within our resource
base. Besides, I have no objections to having exactly the same
production/monitoring features on cassini as well.

Uff, that's all from me for now :)

-- 
              << Marcin Cieslak // saper at saper.info >>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 273 bytes
Desc: OpenPGP digital signature
Url : http://lists.wikimedia.org/pipermail/maps-l/attachments/20090918/f638343d/attachment-0001.pgp 


More information about the Maps-l mailing list