Jimmy Wales <jwales(a)bomis.com> writes:
> I've installed my new faster wikipedia search engine.
When you said "faster" you weren't kidding were you. Nice job.
--
Gareth Owen
$FS = "\xb3";
$FS1 = $FS . "1";
$FS2 = $FS . "2";
$FS3 = $FS . "3";
# read full contents of .db file into $dbfile
%Page = split(/$FS1/, $dbfile, -1);
%Section = split(/$FS2/, $Page{'text_default'}, -1);
%Text = split(/$FS3/, $Section{'data'}, -1);
$pagetext = $Text{'text'}; # text of the page
I got this from Clifford Adam's UseMod website. It's highly
useful. I'm using it to provide a snippet of each page in the
search engine output.
Tomorrow I will use it to fulltext index the entire site. This
will be mondo cool.
--
*************************************************
* http://www.nupedia.com/ *
* The Ever Expanding Free Encyclopedia *
*************************************************
I've installed my new faster wikipedia search engine.
This searches only the titles of the articles, not the body. You can still search
the body by clicking onward to use the original search engine.
It's marginally clever. Search for 'jargon file' and 'poker jargon' and the
results are sort of good.
If I had a perl function with the following specifications, I could easily turn this
into a much better full-text search engine...
$PageText = &GetPageText("Computer_Jargon");
This ought to be simple, but for the life of me I have been unable to figure out how
to parse Cliff's *.db files.
I suppose I could key off of the html cache files, but it sure would be nice to be
able to update the database straight from the horse's mouth.
My code is or will be released under the GPL by, oh, Friday let's say. In the
meantime, I'm too embarassed to let anyone see it. I took a search engine program
or two from some other site or sites that I run and hacked them mercilessly. There's
all kinds of useless variables and stuff.
I'll clean it all up and release it on Friday, but it's nothing very special.
--
*************************************************
* http://www.nupedia.com/ *
* The Ever Expanding Free Encyclopedia *
*************************************************
>There are plans afoot to spin off Nupedia and Wikipedia into a non-profit
>organization. This is by no means certain, but I think that there is
much
>to recommend it.
>
501c3 non-profits in the U.S. are severely limited in the kinds of
political speech they can make, esp. in re: various candidates and
issues on the ballot. Not necessarily something I would expect
wikipedia to be doing anyway, but since it ''is'' something anyone can
edit, you might want to look at the possibility very carefully. Even
some of the comments on various /Talk pages might be enough to cause a
challenge to the 501c3 tax exemption; I am not sure; I am not a
lawyer; you might want to look into visiting one. :-) Just a
warning, not that the sky is falling, but that you should look into it
before making a decision.
Regards,
KQ
0
1)Magnus's PHP version of the Wikipedia is now available at:
Http://php.wikipedia.com/
I added additional links to other MySQL PHP wiki's to that page.
2) I think the name should not change. it would wreck all my external
links.
3) What advantages or disadvantages might accrue if we
(wikipedia/Nupedia) were listed at sorceforge?
4) I just want to say thanks to Magnus for the work, and Larry for
keeping the rest of us in line.
Mike Dill
mikedill(a)nupedia.com
Here's my understanding of the edit lock problem...
Whenever someone is actively in the process of writing their changes to disk,
the software makes a temporary file (directory) on the disk to let other instances
of the cgi process know that this one is doing some writing to disk. You don't
want two processes to write to the same file at the same time, because you will
lose one person's changes and possibly the entire file.
Notice that the edit lock is only there when you are *actively writing your changes
to disk*. This isn't while you are typing them, but only for that instant after
you hit 'submit' and before you get the next screen back.
The problem is that -- for mysterious reasons -- the script sometimes dies and leaves
the edit lock hanging. All other processes after that refuse to write changes to disk
because they think that someone has the edit lock.
There are several possible solutions. My own personal solution is wild. I just
wouldn't use a lock at all. You run the risk of collisions, but these are so rare
*in my experience* as to be worth ignoring.
Another solution is to have 'per-file' edit locks. This is more elegant, because
there isn't any particular reason to lock the ENTIRE system down just because one
person is writing to one file. If Larry is writing to file A and I am writing to file
B, there is no reason to lock anything. The main downside to this solution is that
it is still possible for an edit lock to get hung. The good thing is that only one
page would be affected, not the entire site.
Another solution is to find the reason why the script is leaving the lock, and fix
that bug. In my experience, this will be impossible. CGI scripts do in fact die
for no apparent reason sometimes, and it's very hard to trap/debug/etc. I mean, it
is theoretically possible, but as a practical matter I find that something is always
left hanging.
One of my own personal design philosophies is that everything should fail gracefully.
That is, I accept that my own software sucks and will in fact break. This is a matter
of humility before the gods of software. The best that I can hope for is that when my
software does choke, it will do so gracefully. There will always be bugs and things
that don't work right, but as long as they don't destroy the universe, there's hope.
:-)
--Jimbo
--
*************************************************
* http://www.nupedia.com/ *
* The Ever Expanding Free Encyclopedia *
*************************************************
Magnus's PHP version of the Wikipedia is now available at:
Http://php.wikipedia.com/
You can also find another PHP and MySQL version of the WikiWiki software in
use at:
http://phpwiki.sourceforge.net/phpwiki/
Perhaps we might be able to get some ideas and/or code from these guys. I
don't know what license Magnus's version has, but the phpwiki at the above
link is GPLed, and has some useful features which we might want to implement
on our own PHP wiki. One such feature, which has been very useful to me on
the current UseModWiki based version of WikiPedia is DIFF.
Anyway, I now see that the code for Magnus's version is up, so I can look
more closely at the differences...
-----Original Message-----
From: lsanger(a)ross.bomis.com [mailto:lsanger@ross.bomis.com]
Sent: Saturday, August 25, 2001 3:05 PM
To: 'wikipedia-l(a)nupedia.com'
Subject: RE: [Wikipedia-l] PHP Wikipedia, Part 2
If anyone ever uploads an Internet-accessible version of the wiki, I'd
sure like to see it.
Larry
On Sat, 25 Aug 2001, Mark Christensen wrote:
> I've been tooling around on the php wikipedia, and it is very nice!
> Congratulations.
>
> Great job with the statistics page, it'll be very useful.
>
> The parser works well with the couple pages I imported, generally I'm very
> impressed.
>
> Are you planning to post the code for this on one of the php.wikipedia.com
> pages as you mentioned in a previous e-mail. I'd be very interested in
> seeing it.
>
> The one problem I see thus far is that UseModWiki allows subpage links on
a
> subpage to other subpages of the main page. I know that's not the best
> description, but in the original wikipedia software a [[/subtalk]] link on
> [[MainPage/talk]] would lead to [[MainPage/subtalk]]. We may not want to
> emulate this, as it is certainly not intuitive, but there are a lot of
> pages, like poker, with subpages that link to other subpages -- either our
> parser needs to automatically translate these links or the PHP wikipedia
> should deal with them in the same way as UseModWiki.
>
> Yours
> Mark Christensen
>
> -----Original Message-----
> From: Magnus Manske [mailto:Magnus.Manske@epost.de]
> Sent: Saturday, August 25, 2001 3:30 AM
> To: Wikipedia-L@Nupedia. Com
> Subject: [Wikipedia-l] PHP Wikipedia, Part 2
>
>
> Jason is activating php.wikipedia.com for the script to test, which should
> be working later today. So, soon you can flood me with bug reports ;)
>
> Some points that were mentioned on the list while I was asleep:
> - Larry, I don't oppose CVS as such, I just thought why bother...
> So, I wouldn't mind a CVS at all.
>
> - Edit locks : I thought they'd protect a page that is edited for a
certain
> time, e.g., 5 minutes, so there won't be two edits of the same text at the
> same time. Now that I know it's only for writing, I am glad to not have
> wasted time in implementing such a thing in my script ;)
> The MySQL server will take care of the write-at-the-same-time problem,
for
> sure.
>
> - /Talk pages : Changing the standard text for new documents so they'd
have
> a /Talk page should do it, right? I could also have the parser look for
> "/Talk" and append it if necessary in a "top-level" article.
>
> - Conversion to SQL format : The easiest way I can think of is a script
that
> goes through all articles in the current wikipedia and generates a
complete
> article text in chronoligical order (oldest first). After each "version"
is
> generated, a variant of my script can store it in the DB. That would
ensure
> identical data. Anyone to write a "generation" script?
>
> - Lame names : How about "Aide-Pikiw" (wikipedia spelled backwards)? That
> must be the lamest, for sure? ;)
>
> Magnus
>
> [Wikipedia-l]
> To manage your subscription to this list, please go here:
> http://www.nupedia.com/mailman/listinfo/wikipedia-l
> [Wikipedia-l]
> To manage your subscription to this list, please go here:
> http://www.nupedia.com/mailman/listinfo/wikipedia-l
>
[Wikipedia-l]
To manage your subscription to this list, please go here:
http://www.nupedia.com/mailman/listinfo/wikipedia-l
I've been tooling around on the php wikipedia, and it is very nice!
Congratulations.
Great job with the statistics page, it'll be very useful.
The parser works well with the couple pages I imported, generally I'm very
impressed.
Are you planning to post the code for this on one of the php.wikipedia.com
pages as you mentioned in a previous e-mail. I'd be very interested in
seeing it.
The one problem I see thus far is that UseModWiki allows subpage links on a
subpage to other subpages of the main page. I know that's not the best
description, but in the original wikipedia software a [[/subtalk]] link on
[[MainPage/talk]] would lead to [[MainPage/subtalk]]. We may not want to
emulate this, as it is certainly not intuitive, but there are a lot of
pages, like poker, with subpages that link to other subpages -- either our
parser needs to automatically translate these links or the PHP wikipedia
should deal with them in the same way as UseModWiki.
Yours
Mark Christensen
-----Original Message-----
From: Magnus Manske [mailto:Magnus.Manske@epost.de]
Sent: Saturday, August 25, 2001 3:30 AM
To: Wikipedia-L@Nupedia. Com
Subject: [Wikipedia-l] PHP Wikipedia, Part 2
Jason is activating php.wikipedia.com for the script to test, which should
be working later today. So, soon you can flood me with bug reports ;)
Some points that were mentioned on the list while I was asleep:
- Larry, I don't oppose CVS as such, I just thought why bother...
So, I wouldn't mind a CVS at all.
- Edit locks : I thought they'd protect a page that is edited for a certain
time, e.g., 5 minutes, so there won't be two edits of the same text at the
same time. Now that I know it's only for writing, I am glad to not have
wasted time in implementing such a thing in my script ;)
The MySQL server will take care of the write-at-the-same-time problem, for
sure.
- /Talk pages : Changing the standard text for new documents so they'd have
a /Talk page should do it, right? I could also have the parser look for
"/Talk" and append it if necessary in a "top-level" article.
- Conversion to SQL format : The easiest way I can think of is a script that
goes through all articles in the current wikipedia and generates a complete
article text in chronoligical order (oldest first). After each "version" is
generated, a variant of my script can store it in the DB. That would ensure
identical data. Anyone to write a "generation" script?
- Lame names : How about "Aide-Pikiw" (wikipedia spelled backwards)? That
must be the lamest, for sure? ;)
Magnus
[Wikipedia-l]
To manage your subscription to this list, please go here:
http://www.nupedia.com/mailman/listinfo/wikipedia-l