Jeremy,
The wiki family files to which you linked are very interesting and if I go
the route of a wiki family, I'd do a very scaled down and simpler version
of that, as I only have five very similar wikis (they're essentially just
different language versions of the same thing, though their content is
managed by different groups of editors, and not by me). I'd seen the files
at
https://noc.wikimedia.org/conf/ previously, so I had a rough idea of
that approach.
As for your suggestions about the uploaded files, I should first reiterate
that currently I have a single storage host and that is my primary concern.
Using rsync for backups is fine, but not to keep multiple production
storage hosts synchronized. Based on some more reading today, I think it
would take moving to a clustered filesystem such as GFS2
<http://en.wikipedia.org/wiki/GFS2>. I am curious though about the server
architecture and what the MediaWiki configuration would be to use the file
servers and image scalers shown in this diagram
<http://upload.wikimedia.org/wikipedia/commons/d/d8/Wikimedia-servers-2010-12-28.svg>
.
Justin
On Tue, Oct 28, 2014 at 6:19 AM, Jeremy Baron <jeremy(a)tuxmachine.com> wrote:
On Tue, Oct 28, 2014 at 8:33 AM, Justin Lloyd
<jclbugz(a)gmail.com> wrote:
Also, unless I'm missing something or being
dense (it is late here),
rsync
simply wouldn't work since the upload
directories are constantly being
accessed and files written through one web server could easily be
immediately accessed afterwards through another web server, and since
there
are four web servers (and possibly more or even
less if I were to add AWS
Auto Scaling into the mix), so keeping them all identical when writes
could
go through any of them would be pretty much
impossible.
Well you don't have to limit yourself to having the file uploads
visible in only one part of the filesystem. (so this could work even
if you don't have dedicated storage hosts. mediawiki uses NFS mount
and unrelated vhost serves static files out of rsync target)
But you could also do like WMF (and it sounds like you already have
dedicated storage boxes?):
files are fetched from one hostname/varnish cluster/storage cluster
and HTML/etc. comes from a completely separate hostname/varnish
cluster/php cluster.
4 webservers mount rsync master by NFS. same as now. writes and file
description page rendering runs over NFS.
new webservers (or vhosts on existing webservers or webservers on the
storage hosts directly) serve images read-only from the local copy of
the files propagated by the rsync cron. (or whatever other way)
you could autoscale for the 4 webservers that don't have local images
at all (just NFS) and then either build an initial rsync into the
scale up process for storage hosts or do that scaling manually.
anyway, this is all just to address the immediate spof quickly. longer
term maybe figure out a way to use s3 or something. (which you could
do already actually. your rsync cron could instead be a copy to s3
cron. but then still spof on the same things where the config
described above would also have spof. e.g. file description pages,
uploading, deleting, etc.)
-Jeremy
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l