For some project with some friends, I've looked for some time for
a way to build some text in common. Wikis answer the questions,
and I tried to build from scratch a local wiki, using wikipedia's
scripts.
Let me clarify that up to yesterday morning I had no experience
with http servers, php, mysql, or wikipedia's scripts. OK, that's
quite arrogant (or courageous, or plain dumb) to try to jump so
high from so low, please be kind with me... Moreover I use a not
so young PC (Pentium III 450 MHz)...
I must say that things went quite well! I managed to get all what
I wanted (I can browse the base and modify pages), which is
positively surprising. Most of the hurdles were negotiated with
the various help and doc files, and looking in the php scripts.
In the process I noticed a few things, and it remains a malfunction with page modifications.
Before going further, here follows the bare bones of what I tried
to set up:
OS : ms windows me (no comment, please)
browser : ms ie 5.50...
server : xitami (could'nt manage to install ms PWS - might not be
possible on win me -, and couldn't find a simple install of
apache...) no pb at all, neither installing or running.
php: version 4.2.3 for win32; some pb's, solved by tinkering in
php.ini, in particular, set with cgi redirect off
mysql : 3.25.55 for windows; no pb at all, neither installing or
running
wikipedia : the last versions of the files found on source forge,
loaded one by one (how to dump the CVS??)
Points to note:
a) The script createdb.php fails if there is already a db. This is
due to lack of deletion of table user_newtalk in buildTables.inc
b) The database as obtained starting with an empty directory and
running the script createdb.php is such that the search function
always answers that it found nothing.
Running rebuildindex.php corrects that. May be this is the normal
policy, and rebuildindex.php must be run regularly??
c) I loaded the current state of the french articles (smaller than
the english, and I'm french btw). There again, basic functions
runs correctly except for the search.
I tried to run rebuildindex.php, but it aborts, seemingly some
timer in the server (message is an overload indication). Is there
any way to do that without invoking the server? For instance, a
MySQL script would be handy!
d) When I edit a page, the submit yields a blank page after a long
time. Likely to be a pb in the php scripts, but this seems beyond
my competence.
e) After a submit on a new page, I can see the modification only
after deleting the IE cache and recalling wiki.phtml. I set the
browser to checking most recent version at each page download, but
the behaviour is the same.
Thanks in advance for any help!
This being said, congratulations to all of you (and also Xitami,
php and MySQL). It is quite impressive that the code and docs
allowed me with my limited experience to go so far in so little
time!
M. Mouly
In case you hadn't heard, MySQL 4.0.12 was officially declared
"production quality" by MtSQL AB. This will be the first thing I
test with the test suite.
--
Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Hi.
I'm working on getting wikipedia wiki fully installed for
http://www.consumerium.org/wiki/
and I just got the uploading setting to work so that uploading works,
but...
As I checked where it put the test file I noticed the the png I had
uploaded had permissions set to -rwxr-xr-x
which is not a good thing.
Imagine:
1. Upload whack_the_database.php
2. Point your browser to uploadpath/whack_the_database.php assuming it
has access to LocalSettigs.php
I heard from taw at #wikipedia that the upload code should make the
files _not executable_ which is not what it did.
He tracked it down to the bug being in move_uploaded_file(
$wpUploadTempName, $wgSavedFile ) or near it.
Could someone take a look at this?
My CVS-dump is dated 22.2.2003
regards, Juho Heikkurinen
The test suite is coming along nicely, and I'm now to the point of
just filling in each test case. There's one part I'm working on now
that just reads pages, makes sure their HTML parses, and then does a
few simple regular expression matches on the output. Right now it
just looks for things like DOCTYPE, the META ROBOTS tag, the DIVs,
and such, and looks for the absence of things like <FONT and
onclick=.
Help me fill out these two lists with lots of good output tests:
i.e., tell me what you would expect to see on every page, and
what you would not want to see on any page (preferably as a regular
expression, but if you can just describe in words, I can code it).
I'm copying this to the list at large, because I think non-techies
will be able to help here as well.
--
Lee Daniel Crocker <lee(a)piclab.com> <http://www.piclab.com/lee/>
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC
Hello,
because of the load, the special pages produce, some of them are
disabled during 14:00 and 2:00 UTC. Unfortunaly only one of the pages is
working with a cache-file (wantedpages). And for most (europe) users the
pages are only avalible in the early morning or during working hours.
First of all I though about writing some patches for enabeling caching
for the other pages too, but after some time I thought about some more
possibilities. And I think it will helps to discuss them before starting
coding for the paperbasket.
1. We hope that the serverload will be better sometime, and the scripts
(sql-queries) will be so fast that we don't need any caching and every
byte is wasted. I don't think so.
2. We use the current system, and add it to some more pages. This will
at least have the advantage to have a information about lonlypages,
shortpages, ... even if it in the worst case 12 hours old. It's possibly
fast to get, because the code is at hand and only some copy and paste is
needed.
3. We assume that we can make the queries better, but they still will
need a lot of time to run and preventing running some of them in
parallel is what we want. Then I think a system filling the cache of
these pages in regular intervals is needed. The queries will run every
half hour or so, and every user get the cached output.
I expect your comments,
Smurf
--
------------------------- Anthill inside! ---------------------------
I'm in the process of trying to upgrade the scripts on my
Disinfopedia. Right now it's running the Wikipedia scripts which I
downloaded about three months ago. I'm having some trouble and would
appreciate answers to a few questions.
(1) My immediate *reason* for wanting to upgrade is that my web host
says the existing script "has driven the server load to unbearable
proportions and even caused the server to crash which required a
manual reboot." We got a significant upward spike in usage on Sunday,
which probably accounts for the problem. My first question is: Have
changes made during the past three months improved the efficiency of
the scripts in ways that might reduce the server load? If not, maybe
there's no need for me to upgrade the scripts at this time.
(2) I used CVS to install the new scripts at a temporary URL for
testing purposes, but I haven't been able to get them working. I
compared my current database schema to the schema in buildTables.sql,
and it looks like a number of changes have been made, including:
* In table "user," the field "user_newtalk" has been dropped.
* A table "user_newtalk" has been added.
* In table "cur," the fields "cur_ind_title" and "cur_ind_text" have
been dropped.
* In table "math," the field "math_html_conservative" has been
renamed to "math_html_conservativeness," and a new field
"math_mathml" has been added.
* A table "searchindex" has been added.
I'm assuming that these differences account for the failure of the
current scripts to run my existing database. I could use SQL queries
to add and delete the necessary fields, but I'm nervous about
screwing something up. What's the best way to proceed?
Mil gracias!
--
--------------------------------
| Sheldon Rampton
| Editor, PR Watch (www.prwatch.org)
| Author of books including:
| Friends In Deed: The Story of US-Nicaragua Sister Cities
| Toxic Sludge Is Good For You
| Mad Cow USA
| Trust Us, We're Experts
--------------------------------
from the chatroom on irc where I always hangout, i believe I've picked up that the database is
being queried for serving current-articles. If so, wikipedia need sto cache (dump) out the current
versions of pages into file format so as to create the most minimal load possible. Its probably a
good idea too to dump out the edit page as well.
im [[user_talk:hfastedge]].
__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - establish your business online
http://webhosting.yahoo.com
Hello,
after adding a date() output in the 'perfdisabled' message (thanks to
Magnus for installing the patch) I got this mornig a interesting result
on Spezial:Lonelypages:
Versuchen Sie es bitte zwischen 02:00 und 14:00 UTC noch einmal
(Aktuelle Serverzeit : 08:08:43 UTC).
Translation(short): Try again between 2:00 and 14:00 UTC. (Current
Servertime: 8:00 UTC).
I looked in the source but did not found anything changing the
$wgMiserMode Variable. I think it will modified externally. And I think
I read on various pages that the server uses UTC time itself. Still the
output is mystic.
So can someone please give me a hint, whats wrong?
(a) using simple date() is bad, because the server don't use UTC.
(b) the external change of the $wgMiserMode Variable happens not at 2
and 14 UTC but at ... i.e. 14 and 2 UTC.
(c) something completly different
Smurf
--
Tower: Flight A723, come in on nine o'clock.
A723: Sorry tower, can you give us another hint? We have only digital
watches!
------------------------- Anthill inside! ---------------------------
> Message: 7
> Date: Tue, 11 Mar 2003 12:19:36 -0800 (PST)
> From: Brion Vibber <vibber(a)aludra.usc.edu>
> Subject: Re: [Wikitech-l] Re: what's going on with wikipedia ?
> To: wikitech-l(a)wikipedia.org
> Reply-To: wikitech-l(a)wikipedia.org
>
> On Tue, 11 Mar 2003, Lee Daniel Crocker wrote:
> > It appears we're being Googled this morning. Googlebot is very
> > well-behaved, and I'm not sure if that's the problem or not, but
>
> Googlebot is fairly light (several seconds to 30 seconds between requests,
> and follows our robots.txt restrictions - its getting articles, not
> millions of diffs or contribs pages. It's only a fraction of total pages
> being served). I have written them an e-mail asking if it's possible to
> restrict the spidering to off-peak hours, though.
>
> -- brion vibber (brion @ pobox.com)
>
Well Brian, off-peak hours is a bit of problem with an international
website, isn't it?
When germany goes to lunch (12:00 CET - Central European Time), the
people in San Francisco come home from the bar (3:00 AM PST - Pacific
Standard Time).
So I think google cannot really do anything about, except treating every
sub-domain according to it's timezone. (otherwise people in europe will
ALWAYS have a slow wikipedia, because google thinks that is off-peak
time).
Another idea which might or might not work is the Apache Module
mod_throttle
http://www.snert.com/Software/mod_throttle/
You could give a general minimum idle time between requests or you could
give penalties to db-heavy documents. But of course, this would make
things still slower to some, but at least the server will take the load
without coming close to a crash.
Cheers
Leo