Brion Vibber wrote:
> There's been some sample code submitted for running uploads
> through the clamav virus scanner; I'll try and get this integrated this
> weekend, and we can then enable a bunch of formats with greater confidence.
that would be great, thank you! If anyone else wants to have a look, the
sample code can be tested here:
http://area23.brightbyte.de/checkfile-test.php
I would however like to add another suggestion: Show an extra warning on
the description-page of media in a potentially dangerous format (or
rather for any non-image, non-sound format). The message could read
something like this:
"This file may contain executable code that might damage your system.
If you download this file and open it on your computer, potentially
harmfult contents may be run. Please make sure you know what you
are doing. The Wikimedia Foundation does not take any resposibility for
the contents of this file or any harm it may do to your system."
This is not only aimed at macro-viruses hidden in doc-fiels etc (which
the virus scanner will hopefully find on upload), but maily at "trivial
trojans" like batch files containing a "format C:" or some such.
In that context, it meight also be good to block all files with the
extensions .exe, .bat, .cmd, .reg, .js as well as any files starting
with a she-bang (#!), regardles of the guessed mime-type. That is, there
should be a list of forbidden extensions in addition to a list of
forbidden mime-types (or does that already exist?), and an additional
check for the she-bang. If you like, i could add that to the sample
code, but it's trivial.
Thank you,
daniel
Hi,
I'm a wikipedian and have helped on a few articles. I've noticed the
new categorization feature and have a suggestion.
I happen to have professional experience with automated categorization
software, and think you could use it to automatically populate
categories with relevant articles.
The basic approach:
- Prepare a dataset
- Specifiy a taxonomy to be learned, e.g. the current category taxonomy.
- Populate the taxonomy with example documents, e.g. a sample of
existing articles from each category.
- "Learn a odel" using a statistical text classifier.
- This requires a fair amount of server time.. say a day or so on a
beefy server with lots of RAM.
- Evaluate the results and select well modelled categories.
- Set up a classification server that funnels all wikipedia articles
through the selected categories.
- Those articles/nodes that are a strong match for a category get
annontated with a special tag showing an automated categorization has
been made and a link is added to the article page back to the
category.
- The category lists all such articles/nodes, sorted by confidence, size, etc.
If this sounds interesting, I may be able to get a donated copy of
ReelTwo's classification system:
http://reeltwo.com/products.html
for permanent use in the wikipedia. It's very high performance.
ReelTwo is my old company. It's a small datamining shop. The staff are
all supporters of OSS and enjoy using wikipedia, and I think they'd
appreciate the opportunity to contribute to wikipedia, and also be
proud to be associated. As would I.
Please let me know if you have any questions.
Cheers,
Pablo Mayrgundter
freality.org/~pablo
Hello, I'm Alejandro Sánchez, erchache on irc, system administrator
(sure? ;P) of Enciclopedia Libre, fork of es.wikipedia.org.
We are planning to expand our system. We are in negotiations with some
universities of Spain to make a global farm. I think to install a linux
virtual server, on a machine or more, with ip-ip encapsulated mode, to
be able to connect any machine to our farm without problems about
differences network connection.
I never buy a "real" server before and i dont know all aspect of this.
For this reason I ask to this channel for requirements on:
- Linux virtual server.
- Apache+php+memcache.
- Mysql server.
- Mysql slave.
- Squid.
And a good network filesystem to interconect all machines, ours and the
others machines.
See on http://enciclopedia.us.es/webalizer, to see real stadistic of our
site, hangs and breakdowns are include ;D
Im very paranoid about security, but i dont want to mount a system who
only runs to make DoS attack for crackers, you know? ;P
All sugestion are welcome, except to join es.wikipedia....Sorry :( All
users of our comunity vote from this explicitly.....
We are planning to talk with all universities which join to us to add
mediawiki on studies plan on computer science, Literature, and more. To
get more programmers and users, to our projects, all Wikipedia and
Enciclopedia Libre.
Well this is all for the moment, and i'm waiting for your replies.
I've converted the MonoBook template to a PHP function which doesn't
require PHPTAL. This has a few advantages:
* Since it doesn't need to store a compiled template, this should
greatly reduce the amount of problem reports we get due to unwritable
temp directories, safe mode oddities, etc.
* We won't need to bundle a second version of PHPTAL for PHP 5 support:
the pure PHP version works on both PHP 4 and PHP 5.
* We can save a few hundred kb from the distribution by removing the
bundled PHPTAL and PEAR core. Since we're adding a crapload of support
data files for Unicode and Chinese script conversion, this is nice.
* No one could read the template code anyway, why not make it uglier? ;)
In my crude tests, loads of a short wiki page with the PHP version are
5% faster than the PHPTAL version on PHP 5.0.2 with a PHPTAL 1.0.0 dev
snapshot (no accelerator/opcode cache). This is probably due to time
spent loading up PHPTAL and PEAR include files rather than the
execution of the compiled template.
The new SkinTemplate.php is a slightly altered version of the prior
SkinPHPTal.php; it does pretty much all the same stuff, but sends it to
a different final output path. The new SkinPHPTal.php now inherits that
code and interfaces to either PHPTAL 0.7 (on PHP 4) or 1.0 (on PHP 5).
So, if you want to make a new custom PHPTAL-based skin it can still be
done.
-- brion vibber (brion @ pobox.com)
Hi,
is there a howto available for comfortably transferring an existing usemod-wiki
(meaning its contents) to a newly installed mediawiki besides copying each
single article manually and adjusting the links? About 1000 articles in question.
Very useful would be some function for influencing "usemod article name" ->
"mediawiki article name", so an old article "NamingConventions" can be chosen to
become "Naming Conventions" in mediawiki as well for the title as for the links
themselves in any article they appear?
Philipp
For a while we experimented with a bulletin-board discussion site
(running on phpBB) at http://boards.wikimedia.org/ This hasn't been
used since August as far as I can see.
A serious security vulnerability has been announced in phpBB recently;
rather than add to our maintenance burden for an unused site I'm taking
the boards offline. If there's a sudden demand for it, we could upgrade
it and put it back online.
-- brion vibber (brion @ pobox.com)
I noticed that now you can not upload anymore a lot of file types. Like
zip, gz, doc, xls, sxw, sxc
I understand that some types of files are not allowed for legal and
system security reasons. But this is not the case here. For creating and
maintaining the wikipedia software all are the system developers
greatful. But to use or not to use those files is a editorial decision
that has to be made by every wikipedia. It is not something that may be
decided by the system developers.
I most strongly request on behalf of the dutch wikipedia a more free
upload possibility. Like a white list that can be edited or a override.
Do not ignore this.
--
[[w:nl:gebruiker:walter]]
The manual states: another way is to perform a search (as you should
have done before) with the Go button and coming up with nothing,
pressing "create the article".
When I click on GO to create a new page I don't see the link "create the
article". How can I enable it?
Just a note of warning for those of you using MySQL 4.1: changes in the
new charset options may result in mysqldump outputting bogus data into
backups which can't be restored without data loss.*
This may affect some Unicode text, and certainly can irretrievably
corrupt compressed old revision text (using $wgCompressRevisions
option). If you're using MySQL 4.1, you should probably examine and
test your backup dumps to make sure they can be restored and used
successfully.
Passing an option like --default-character-set=latin1 may stop
mysqldump from trying to 'convert' (and thus corrupt) your data. (If
your server is not set to the defaults, this may or may not be the
correct value for you.) In the future hopefully we'll be able to play
nicer with the new character set settings, but for now MediaWiki
follows prior practice for older versions of MySQL where there was (and
remains) no ability to correctly indicate the charset used in a
particular database, table, or field.
* Specifically, a default "latin-1" to UTF-8 conversion silently
corrupts all bytes with the values 0x81, 0x8d, 0x8f, 0x90, or 0x9d by
turning them into literal question marks. The question marks cannot be
returned to their original byte values when the data is re-imported.
-- brion vibber (brion @ pobox.com)