Wikitech-l May 2003

wikitech-l@lists.wikimedia.org

59 participants
147 discussions

Countering the vandals who attack users (not just pages)
by David A. Wheeler 18 Jun '03

18 Jun '03

Sadly, it appears that there are some hurtful vandals out there who are attacking the people trying to counter them. For example, User:Zoe has just posted that she's abandoning her efforts to counter vandals; see: http://www.wikipedia.org/wiki/User%3AZoe which begins: "I'm tired of fighting, I'm tired of arguing, I'm tired of being called names." The last straw seems to have been an edit by No-Fx to Zoe's user page, in which No-Fx made it appear that Zoe was "into oral sex". I don't know enough about this situation to know for sure if this is an example, but I am concerned about the long-term dangers if this starts a trend. Attacks on users and sysops - particularly highly dedicated ones - are much more dangerous to the Wikipedia than simple attacks on a few pages. If these kinds of attacks cause people to stop weeding out bad pages or vandals for fear of retribution, the project is doomed. Is there any way the software could be modified to make it harder for vandals to counter-attack the people who are trying to remove vandalism? At the least, why not let the User:NAME pages be ONLY editable by NAME? The "User_talk:" spaces need to be editable in some way, but I don't see a need for others to "fix" the User: space of someone; it's not critical that that content be fixed, and there's advantanges to having some areas that are "precious" to each user. Here's a more controversial idea: perhaps some information relating to deletion of pages and banning of users should be hidden from non-sysops. For example, since "delete" can only be done by sysops, why not just tell non-sysops that a deletion occurred, but not WHICH sysop did it? By the same token, perhaps some discussion areas should be only readable/writeable by sysops, in particular a discussion area to discuss banning someone. Perhaps there could be a way where anyone (non-sysop) could suggest that someone be banned, without having their name revealed to non-sysops. Since real deletes and banning can only be done by sysops anyway, and sysops are trusted, there's no reason this information MUST be public. A related idea might be to modify the "talk" system so that it's more like a bulletin board, with threaded messages and a clear identification of who made it (click on "reply" to reply to that item, maybe in a threaded way). That way, any message is clearly identified with its REAL author. A side-effect would be that the attribution would happen automatically (no more forgetting ~~~~). That way, when people discuss things, they can't make it appear that someone else made an outrageous/nasty statement. The goal here would be to prevent people from attacking each other, or at least limit its effectiveness. Thoughts?

13 34

Static html
by Alfio Puglisi 12 Jun '03

12 Jun '03

After some delays and bug-hunting my script for the HTML static versions is in acceptable shape. Here you can see an example, built from a SQL file of some weeks ago: (Don't try the Search box!!! I explain below) http://www.arcetri.astro.it/~puglisi/wiki/dump/ma/main_page.html Please don't DOS the connection, it's not a very fast line. Interested parties can find the script here: http://www.arcetri.astro.it/~puglisi/wiki/wiki2static.txt (renamed to .txt due to some server misconfig) use a wide terminal for this one. Everything (html code included) is in one single file. The whitespace may appear weird because I use 4-space tabs. There's no need to tell me you don't like the coding style, I alread know :-))) Some issues: - the topbar links do not work (known bug :-). The Edit link goes to the online wikipedia site. - interlanguage links are ignored - some wiki markup is not recognized yet. - no images are present (of course!) - filenames should be OK for most filesystems not "8.3" limited (max 63 chars, only a-z, 0-9 and underscore) - despite the two-letter subdirectories, some of them have over 4,000 files in them! - Time: the script takes more than 2 hours on my 1.3 Ghz Athlon... - Size: this dump is about 800MB. (tar.gz is just 110MB). I think that I can bring it down to 600-650MB with a bit of trimming and eliminating unnecessary redirects. BUT, without some form of compression, the English wikipedia will soon overflow a single CD. Maybe we should target DVDs? :-) - Images: no images are present here. AFAIK, each of them has a SQL record (that my script skips), but the actual image data is not included. How many megabytes of images we have? I think it will be impossible to store the full images on a CD. Certainly it's possible on a DVD. Maybe a low-res version could be included in a CD. - Search: I tried a javascript search that worked well for small sized databases: it's basically a big array of strings (article titles and filenames) with some lines code that do a regexp match against them. For full-sized databases like this one, the search page becomes an 8 megabytes monster that takes forever to process (IE grabs 100 MB of memory and stops there, Opera is even worse). I'll see if I can find a different solution. Enough for now. While I carry on development, any input is welcome. Ciao, Alfio

15 109

request for a redirect
by giskart＠gmx.net 02 Jun '03

02 Jun '03

Can somebody setup a redirect from http://nds.wikipedia.org and http://www.nds.wikipedia.org to http://za.wikipedia.com The Plattdüütsch Wikipedia now is using a old usemod wikipedia whit the wrong language code. Whit the redirect the can spread the good url even when the do not have a fase III wiki Giskart -- +++ GMX - Mail, Messaging & more http://www.gmx.net +++ Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!

6 6

Re: Fair use
by Daniel Mayer 02 Jun '03

02 Jun '03

Marco wrote: > No, you lose much more. You can not easily combine the content of two > "free" encyclopedias and get something that is "free". You can not copy > images from the English Wikipedia to the German Wikipedia anymore because > the "fair use" right works not this way in Germany. What? Since when has the German Wikipedia moved to a German-based server? Well, I know for a fact that it hasn't so German law has no bearing on the legality of having "fair user" (per US law) images on the German Wikipedia. However, those people who are subject to German law may be legally barred from uploading such images. But there are plenty of German-speaking Wikipedians living outside of Germany to do this. -- Daniel Mayer (aka mav)

8 9

3 new InterWiki prefixes
by erik_moeller＠gmx.de 01 Jun '03

01 Jun '03

I have added *PageHistory *UserContributions *BackLinks InterWiki prefixes, because we currently do not support parameters in the [[Special:]] namespace, and this was the lazy way to provide a much needed quickfix. Among other things, this allows us to put #REDIRECT [[UserContributions:Username]] on the user page of a known vandal, making it easier to fix his edits from RC. As you might expect, these InterWiki links point to en:. I pondered picking a name like EnHistory, EnContris etc., but I wanted something intuitive. If other languages want the same functionality, prefixes like "SeitenHistorie","BenutzerBeitraege" and "LinksAuf" can be easily added (i.e. local equivalents). Note that if we change the functionality of [[Special:]], things like [[Special:MovePage->nul|Click here]] also become possible unless we specifically forbid them. Regards, Erik

2 3

Bug in Search
by Thomas Corell 31 May '03

31 May '03

If you enter something like this (test in german wikipedia): "5 AND Dezember" you get: 1064: You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'AND (MATCH (si_title) AGAINST ('dezember')) ) AND cur_namespace": SELECT cur_id,cur_namespace,cur_title,cur_text FROM cur,searchindex WHERE cur_id=si_page AND ( AND (MATCH (si_title) AGAINST ('dezember')) ) AND cur_namespace IN (0) LIMIT 0, 20 I see two AND's one after the other which means that "this->mTextcond" is emtpy (in the source code). It works with every single character as search term, not only numbers. Someone good at SearchEngine.php should take a look. -- Smurf smurf(a)AdamAnt.mud.de ------------------------- Anthill inside! ---------------------------

1 0

InnoDB monitor -- how to turn off?
by Brion Vibber 31 May '03

31 May '03

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Is somebody deliberately turning on the InnoDB monitor, or is some setting turn it on automatically? It dumps data to the MySQL error log file every 15 seconds listing a bunch of status and every transaction that's been done since the last one, and that comes to several hundred megs of log file after a few days, which can only be freed from the disk by deleting the log file and restarting MySQL. So if anyone knows a way to have it not start up, that would be nice. :) - -- brion vibber (brion @ pobox.com) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2 (GNU/Linux) iD8DBQE+2Dt3xVlOmwh1xjgRAn5zAJ4pOiIZB7QCMZkcCBl2pQAJJq83eQCeN6RC D9tybprG144oSWsj5oIQsvc= =0Ebu -----END PGP SIGNATURE-----

1 0

RE: Static html
by Erik Zachte 29 May '03

29 May '03

Hi Alfio, I looked at your code. Nice job. Superficially it may seem we did almost the same job. But overlap is minimal. My perl script addresses a lot of issues that only are relevant in a Palm/Pocket PC/TomeRaider environment. Your version has quite some code which is specific for a static html version. Still there are some areas where we can be of help to each other. You mentioned unicode support as an open issue. Conincidentally I was looking into this issue the past few days, while preparing a TomeRaider version of the Esperanto Wikipedia, which would be unreadable without it. You will also find the UTF-8 coding scheme on which this is based below. Here is some Perl code to translate unicode multicharacter byte sequences into html tags of type &#nnn; # unicode -> html character codes &#nnnn; $entry =~ s/([\x80-\xFF]+)/&UnicodeToHtml($1)/ge ; sub UnicodeToHtml { my $text = shift ; my $html = "" ; my $c, $byte, $ord, $unicode, $bytes, $html ; for ($c = 0 ; $c < length ($text) ; $c++) { $byte = substr ($text,$c,1) ; # optimize with regexp ? $ord = ord ($byte) ; if ($ord < 128) # plain ascii character { $html .= $byte ; } # (will not occur in this script) else { if ($ord < 224) { $bytes = 2 ; } elsif ($ord < 240) { $bytes = 3 ; } elsif ($ord < 248) { $bytes = 4 ; } elsif ($ord < 252) { $bytes = 5 ; } else { $bytes = 6 ; } $unicode = substr ($text,$c,$bytes) ; $html .= &UnicodeToHtmlTag ($unicode) ; $c += $bytes - 1 ; } } return ($html) ; } sub UnicodeToHtmlTag { my $unicode = shift ; my $char = substr ($unicode,0,1) ; my $ord = ord ($char) ; my $c, $ord, $value ; if ($ord < 128) # plain ascii character { return ($unicode) ; } # (will not occur in this script) else { if ($ord >= 252) { $value = $ord - 252 ; } elsif ($ord >= 248) { $value = $ord - 248 ; } elsif ($ord >= 240) { $value = $ord - 240 ; } elsif ($ord >= 224) { $value = $ord - 222 ; } else { $value = $ord - 192 ; } for ($c = 1 ; $c < length ($unicode) ; $c++) { $value = $value * 64 + ord (substr ($unicode, $c,1)) - 128 ; } return ("\&\#" . $value . ";") ; } } Found this somewhere on the web: #UTF-8 works as follows: #ENCODING # The following byte sequences are used to represent a char- # acter. The sequence to be used depends on the UCS code # number of the character: # 0x00000000 - 0x0000007F: # 0xxxxxxx # # 0x00000080 - 0x000007FF: # 110xxxxx 10xxxxxx # # 0x00000800 - 0x0000FFFF: # 1110xxxx 10xxxxxx 10xxxxxx # # 0x00010000 - 0x001FFFFF: # 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx # # 0x00200000 - 0x03FFFFFF: # 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx # # 0x04000000 - 0x7FFFFFFF: # 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx # # The xxx bit positions are filled with the bits of the # character code number in binary representation. Only the # shortest possible multibyte sequence which can represent # the code number of the character can be used. By the way I enjoyed your contribution about Ant Power. If you have any questions or suggestions you can reach me at xxx(a)chello.nl !spam: read xxx as epzachte Cheers, Erik Zachte

1 0

RE: Unicode and polices...Have you tried this?
by Erik Zachte 29 May '03

29 May '03

>> If you run IE6 and right click on any web page, you will get a drop down menu with "encoding" as an entry. Follow the arrow to a long list of encodings. In my case, I chose Japanese and it was installed on demand, in under a minute. Then I left "Encoding" set to "Autoselect." << I tried this but it did not work for me. I remember that when I installed XP and then ran the 'Windows Update' wizard I clicked 'Remove' for all foreign language packages (a little short sighted, looking back). Maybe this explains why. Could not find how to undo this. There is also a "Enable Install On Demand (Explorer)" checkbox in Explorer -> Options -> Advanced. (unchecked by default, or because of my actions above). Enabling this did not help me either. Finally I found a link in the Wikipedia to "Alan Wood's Unicode Resources": "http://www.alanwood.net/unicode/ Lots of info and useful links there. He tells that Microsoft has some very complete TrueType fonts. They are only shipped with MS Office. I copied the Arial unicode font (Arialuni.ttf, 24 Mb) from another machine running Office and all was well. Erik Zachte

1 0

Re: [Wikitech-l] Unicode and polices...Have you tried this?
by rose.parks＠att.net 29 May '03

29 May '03

Hi, I just got a new computer with Windows XP. I, also, was wondering where the old "Input Methods" for foreign languages were. If you run IE6 and right click on any web page, you will get a drop down menu with "encoding" as an entry. Follow the arrow to a long list of encodings. In my case, I chose Japanese and it was installed on demand, in under a minute. Then I left "Encoding" set to "Autoselect." If you are aware of this already, apologies... As Ever, Ruth Ifcher -- > On Tue, 27 May 2003 12:32:19 +0900, Guillaume Blanchard > <gblanchard(a)arcsy.co.jp> gave utterance to the following: > > <older attribution for the >> was snipped by Guillaume> > >> So...perhaps I understood nothing, but do you think > >> Opera 5 is not accepting unicode because of missing > >> polices or does it just not tolerate it at all ? > > > > > I think there are both problem. Even if your browser can handle unicode, > > you > > can't see caracters not defined in your font. I'm using MS Arial Unicode > > with IE6.0 and I still not be able to see 100% of unicode characters. In > > my > > case I think it's only a font problem. You can go to this page and look > > at > > what percentage of caracters you can see : > > http://www.columbia.edu/kermit/utf8.html (it's a UTF8 sample page). > > > Opera 5 has no unicode support - Opera 6 was the unicode rewrite. > Both Opera (6+) and Mozilla support unicode natively - the only thing you > have to do to get it working is to install an appropriate font. > However, even if you have the font, IE doesn't display some writing systems > until you "install support" by downloading a large patch to your operating > system. (A fully multilingual installation of IE6 weighs in at around 85MB) > > -- > Richard Grevers > I hate Victor Hugo said Les miserably > > > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l(a)wikipedia.org > http://www.wikipedia.org/mailman/listinfo/wikitech-l

1 0

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l May 2003