Wikitech-l March 2003

wikitech-l@lists.wikimedia.org

53 participants
94 discussions

Google
by Pedro M.V. 18 Mar '03

18 Mar '03

When I tried a search in wikipedia-google I recieved : www.wikipedia.org/w/wiki.phtml?title=ISO_8601&printable=yes www.wikipedia.org/w/wiki.phtml?title=Special:Userlogin&returnto=ISO_8601 www.wikipedia.org/w/wiki.phtml?title=ISO_8601&action=edit www.wikipedia.org/w/wiki.phtml?title=ISO_8601&action=history www.wikipedia.org/wiki/Talk:ISO_8601 - www.wikipedia.org/w/wiki.phtml?title=Special:Whatlinkshere&target=ISO_8601 www.wikipedia.org/w/wiki.phtml?title=Special:Userlogin&returnto=Talk:ISO_86… www.wikipedia.org/w/wiki.phtml?title=Special:Whatlinkshere&target=Talk:ISO_… www.wikipedia.org/w/wiki.phtml?title=Talk:ISO_8601&printable=yes www.wikipedia.org/w/wiki.phtml?title=Talk:ISO_8601&action=history www.wikipedia.org/w/wiki.phtml?title=Talk:ISO_8601&action=edit And so on. So, I recieve a lot of rubish ( if I desiree go to the talk page, I would press the talk link in the ISO 8601 article page. Is there a way to hide this pages in the results, because they are parts of the same ISO 8601 main page ???. Afortunatelly, there is only an article with the strings ISO 8601. What would happen if it would be in more than one ??. How much rubbish would I go through ??. Really, I love the separation between search results in the page tittles and in the content. Can we obtain if from Google ???. Regards.

1 0

Attempt to set up a local wiki from scratch
by Michel Mouly 18 Mar '03

18 Mar '03

For some project with some friends, I've looked for some time for a way to build some text in common. Wikis answer the questions, and I tried to build from scratch a local wiki, using wikipedia's scripts. Let me clarify that up to yesterday morning I had no experience with http servers, php, mysql, or wikipedia's scripts. OK, that's quite arrogant (or courageous, or plain dumb) to try to jump so high from so low, please be kind with me... Moreover I use a not so young PC (Pentium III 450 MHz)... I must say that things went quite well! I managed to get all what I wanted (I can browse the base and modify pages), which is positively surprising. Most of the hurdles were negotiated with the various help and doc files, and looking in the php scripts. In the process I noticed a few things, and it remains a malfunction with page modifications. Before going further, here follows the bare bones of what I tried to set up: OS : ms windows me (no comment, please) browser : ms ie 5.50... server : xitami (could'nt manage to install ms PWS - might not be possible on win me -, and couldn't find a simple install of apache...) no pb at all, neither installing or running. php: version 4.2.3 for win32; some pb's, solved by tinkering in php.ini, in particular, set with cgi redirect off mysql : 3.25.55 for windows; no pb at all, neither installing or running wikipedia : the last versions of the files found on source forge, loaded one by one (how to dump the CVS??) Points to note: a) The script createdb.php fails if there is already a db. This is due to lack of deletion of table user_newtalk in buildTables.inc b) The database as obtained starting with an empty directory and running the script createdb.php is such that the search function always answers that it found nothing. Running rebuildindex.php corrects that. May be this is the normal policy, and rebuildindex.php must be run regularly?? c) I loaded the current state of the french articles (smaller than the english, and I'm french btw). There again, basic functions runs correctly except for the search. I tried to run rebuildindex.php, but it aborts, seemingly some timer in the server (message is an overload indication). Is there any way to do that without invoking the server? For instance, a MySQL script would be handy! d) When I edit a page, the submit yields a blank page after a long time. Likely to be a pb in the php scripts, but this seems beyond my competence. e) After a submit on a new page, I can see the modification only after deleting the IE cache and recalling wiki.phtml. I set the browser to checking most recent version at each page download, but the behaviour is the same. Thanks in advance for any help! This being said, congratulations to all of you (and also Xitami, php and MySQL). It is quite impressive that the code and docs allowed me with my limited experience to go so far in so little time! M. Mouly

3 2

MySQL 4.0
by Lee Daniel Crocker 18 Mar '03

18 Mar '03

In case you hadn't heard, MySQL 4.0.12 was officially declared "production quality" by MtSQL AB. This will be the first thing I test with the test suite. -- Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/> "All inventions or works of authorship original to me, herein and past, are placed irrevocably in the public domain, and may be used or modified for any purpose, without permission, attribution, or notification."--LDC

1 0

Serious security issue with default Wikipedia installation
by Juho Heikkurinen 18 Mar '03

18 Mar '03

Hi. I'm working on getting wikipedia wiki fully installed for http://www.consumerium.org/wiki/ and I just got the uploading setting to work so that uploading works, but... As I checked where it put the test file I noticed the the png I had uploaded had permissions set to -rwxr-xr-x which is not a good thing. Imagine: 1. Upload whack_the_database.php 2. Point your browser to uploadpath/whack_the_database.php assuming it has access to LocalSettigs.php I heard from taw at #wikipedia that the upload code should make the files _not executable_ which is not what it did. He tracked it down to the bug being in move_uploaded_file( $wpUploadTempName, $wgSavedFile ) or near it. Could someone take a look at this? My CVS-dump is dated 22.2.2003 regards, Juho Heikkurinen

3 2

Test suite:input requested
by Lee Daniel Crocker 17 Mar '03

17 Mar '03

The test suite is coming along nicely, and I'm now to the point of just filling in each test case. There's one part I'm working on now that just reads pages, makes sure their HTML parses, and then does a few simple regular expression matches on the output. Right now it just looks for things like DOCTYPE, the META ROBOTS tag, the DIVs, and such, and looks for the absence of things like <FONT and onclick=. Help me fill out these two lists with lots of good output tests: i.e., tell me what you would expect to see on every page, and what you would not want to see on any page (preferably as a regular expression, but if you can just describe in words, I can code it). I'm copying this to the list at large, because I think non-techies will be able to help here as well. -- Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/> "All inventions or works of authorship original to me, herein and past, are placed irrevocably in the public domain, and may be used or modified for any purpose, without permission, attribution, or notification."--LDC

1 0

Caching Special Pages
by Thomas Corell 17 Mar '03

17 Mar '03

Hello, because of the load, the special pages produce, some of them are disabled during 14:00 and 2:00 UTC. Unfortunaly only one of the pages is working with a cache-file (wantedpages). And for most (europe) users the pages are only avalible in the early morning or during working hours. First of all I though about writing some patches for enabeling caching for the other pages too, but after some time I thought about some more possibilities. And I think it will helps to discuss them before starting coding for the paperbasket. 1. We hope that the serverload will be better sometime, and the scripts (sql-queries) will be so fast that we don't need any caching and every byte is wasted. I don't think so. 2. We use the current system, and add it to some more pages. This will at least have the advantage to have a information about lonlypages, shortpages, ... even if it in the worst case 12 hours old. It's possibly fast to get, because the code is at hand and only some copy and paste is needed. 3. We assume that we can make the queries better, but they still will need a lot of time to run and preventing running some of them in parallel is what we want. Then I think a system filling the cache of these pages in regular intervals is needed. The queries will run every half hour or so, and every user get the cached output. I expect your comments, Smurf -- ------------------------- Anthill inside! ---------------------------

1 0

Upgrade questions
by Sheldon Rampton 17 Mar '03

17 Mar '03

I'm in the process of trying to upgrade the scripts on my Disinfopedia. Right now it's running the Wikipedia scripts which I downloaded about three months ago. I'm having some trouble and would appreciate answers to a few questions. (1) My immediate *reason* for wanting to upgrade is that my web host says the existing script "has driven the server load to unbearable proportions and even caused the server to crash which required a manual reboot." We got a significant upward spike in usage on Sunday, which probably accounts for the problem. My first question is: Have changes made during the past three months improved the efficiency of the scripts in ways that might reduce the server load? If not, maybe there's no need for me to upgrade the scripts at this time. (2) I used CVS to install the new scripts at a temporary URL for testing purposes, but I haven't been able to get them working. I compared my current database schema to the schema in buildTables.sql, and it looks like a number of changes have been made, including: * In table "user," the field "user_newtalk" has been dropped. * A table "user_newtalk" has been added. * In table "cur," the fields "cur_ind_title" and "cur_ind_text" have been dropped. * In table "math," the field "math_html_conservative" has been renamed to "math_html_conservativeness," and a new field "math_mathml" has been added. * A table "searchindex" has been added. I'm assuming that these differences account for the failure of the current scripts to run my existing database. I could use SQL queries to add and delete the necessary fields, but I'm nervous about screwing something up. What's the best way to proceed? Mil gracias! -- -------------------------------- | Sheldon Rampton | Editor, PR Watch (www.prwatch.org) | Author of books including: | Friends In Deed: The Story of US-Nicaragua Sister Cities | Toxic Sludge Is Good For You | Mad Cow USA | Trust Us, We're Experts --------------------------------

2 1

really slow today. behaviour is innefficient. O(1) solution needed
by Hunter Peress 17 Mar '03

17 Mar '03

from the chatroom on irc where I always hangout, i believe I've picked up that the database is being queried for serving current-articles. If so, wikipedia need sto cache (dump) out the current versions of pages into file format so as to create the most minimal load possible. Its probably a good idea too to dump out the edit page as well. im [[user_talk:hfastedge]]. __________________________________________________ Do you Yahoo!? Yahoo! Web Hosting - establish your business online http://webhosting.yahoo.com

1 0

Performance Disabled Time
by Thomas Corell 15 Mar '03

15 Mar '03

Hello, after adding a date() output in the 'perfdisabled' message (thanks to Magnus for installing the patch) I got this mornig a interesting result on Spezial:Lonelypages: Versuchen Sie es bitte zwischen 02:00 und 14:00 UTC noch einmal (Aktuelle Serverzeit : 08:08:43 UTC). Translation(short): Try again between 2:00 and 14:00 UTC. (Current Servertime: 8:00 UTC). I looked in the source but did not found anything changing the $wgMiserMode Variable. I think it will modified externally. And I think I read on various pages that the server uses UTC time itself. Still the output is mystic. So can someone please give me a hint, whats wrong? (a) using simple date() is bad, because the server don't use UTC. (b) the external change of the $wgMiserMode Variable happens not at 2 and 14 UTC but at ... i.e. 14 and 2 UTC. (c) something completly different Smurf -- Tower: Flight A723, come in on nine o'clock. A723: Sorry tower, can you give us another hint? We have only digital watches! ------------------------- Anthill inside! ---------------------------

2 2

Re: Re: Re: what's going on with wikipedia ? (googlebot)
by Leonard Tulipan 14 Mar '03

14 Mar '03

> Message: 7 > Date: Tue, 11 Mar 2003 12:19:36 -0800 (PST) > From: Brion Vibber <vibber(a)aludra.usc.edu> > Subject: Re: [Wikitech-l] Re: what's going on with wikipedia ? > To: wikitech-l(a)wikipedia.org > Reply-To: wikitech-l(a)wikipedia.org > > On Tue, 11 Mar 2003, Lee Daniel Crocker wrote: > > It appears we're being Googled this morning. Googlebot is very > > well-behaved, and I'm not sure if that's the problem or not, but > > Googlebot is fairly light (several seconds to 30 seconds between requests, > and follows our robots.txt restrictions - its getting articles, not > millions of diffs or contribs pages. It's only a fraction of total pages > being served). I have written them an e-mail asking if it's possible to > restrict the spidering to off-peak hours, though. > > -- brion vibber (brion @ pobox.com) > Well Brian, off-peak hours is a bit of problem with an international website, isn't it? When germany goes to lunch (12:00 CET - Central European Time), the people in San Francisco come home from the bar (3:00 AM PST - Pacific Standard Time). So I think google cannot really do anything about, except treating every sub-domain according to it's timezone. (otherwise people in europe will ALWAYS have a slow wikipedia, because google thinks that is off-peak time). Another idea which might or might not work is the Apache Module mod_throttle http://www.snert.com/Software/mod_throttle/ You could give a general minimum idle time between requests or you could give penalties to db-heavy documents. But of course, this would make things still slower to some, but at least the server will take the load without coming close to a crash. Cheers Leo

5 5

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l March 2003