I'm not sure if this is the right place to ask, but I've just written a bot,
and Wikipedia policy says it should be approved before I set it loose.
I recently moved [[Brisbane, Queensland]] to [[Brisbane]], and then started
fixing the links to those pages. I got through about 100 pages and I wasn't
even a third of the way there, and I though "it would be easier to just
write a bot". So I did.
The bot takes a list of article titles, and runs a Perl search and replace
operation on the HTML escaped wikitext. The idea is to manually extract the
list of titles from Special:Whatlinkshere, and then replace all links to the
old page with links to the new page.
It will be running from username "Timbot". I've tested it on
test.wikipedia.org. It runs in "trickle" mode, i.e. like this:
while (<ARTICLES>) {
chomp($_);
DoPage($_);
if (time - $lastTime < $delay) {
sleep($delay - $lastTime + $time);
}
$lastTime = time;
}
I'm open to suggestions as to the setting of $delay. If anyone wants to see
the rest of the code, I'll send it by private email.
-- Tim Starling.
_________________________________________________________________
MSN Instant Messenger now available on Australian mobile phones. Go to
http://ninemsn.com.au/mobilecentral/hotmail_messenger.asp
----------
From: "Tim Starling" <ts4294967296(a)hotmail.com>
Date: Sun, 30 Mar 2003 10:56:46 +1000
To: fredbaud(a)ctelco.net
Subject: Re: Code for Timbot
>Please send me the code for Timbot.
No.
_________________________________________________________________
Hotmail now available on Australian mobile phones. Go to
http://ninemsn.com.au/mobilecentral/hotmail_mobile.asp
Hello,
a closer look in the maintenance directory from cvs esp.
BuildTables.[inc|sql] showed differencs between those two files.
(e.g. table user, user_touched).
Possibly someone can tell me how the real tables look like, it will make
porting Wikipedia to e.g postgeSQL easier.
Smurf
--
--- Anthill Inside! ---
>But if this is something that will need to be done a lot, then we
>probably should look at doing it directly to the backend database,
>because that will use less server bandwidth and not clutter up the
>recent changes and the article histories.
The kind of task I'm doing is done by hand fairly regularly on Wikpedia,
although perhaps on a smaller scale per job. It would be nice to provide a
proper "Special:" interface to allow contributors (perhaps just sysops) to
be able to do this more easily. However, I'm not holding my breath waiting.
I can always do it by hand (or pretend I'm doing it by hand).
As for the impact of this particular bot -- I think it will be on the scale
of hundreds (not thousands) of edits. I'm aware of the general suspicion
regarding bots around here -- I'll only use it when it's really necessary
(i.e. important enough to have been done by hand).
>And yes, I'd like to see the code before you unleash it.
Done.
-- Tim Starling
_________________________________________________________________
MSN Instant Messenger now available on Australian mobile phones. Go to
http://ninemsn.com.au/mobilecentral/hotmail_messenger.asp
On Saturday 29 March 2003 12:02 am, Lee Daniel Crocker wrote:
> Looks like you've done you're homework. To run the bot the first time,
> I'd set the delay to something nice and long like 30 seconds, and run
> it at some off-peak hour, and go to RecentChanges and check its work
> while it runs.
>
> But if this is something that will need to be done a lot, then we
> probably should look at doing it directly to the backend database,
> because that will use less server bandwidth and not clutter up the
> recent changes and the article histories.
>
> And yes, I'd like to see the code before you unleash it.
Eh? Why not just "approve" the bot so that any edits made by user:Timbot do
not show-up in RC? Then Tim can run the bot at full speed and not bug
anybody. But a developer should take a look at the code first, as you say.
--mav
Hi Erik
> > In order to demonstrate the capability of our product across an entire
> > encyclopaedia we have fully integrated it with Wikipedia. The advantage for
> > Wikipedia users is that each time they view an article from the
> > encyclopedia they are provided with an automatically generated list of
> > related Wikipedia articles.
>
>Hmmm, how is this information more accurate than that provided by the
>"What links here" feature (which enumerates backlinks to an article)?
InfoWrangler doesn't necessarily provide more "accurate" links to related
material, but it will provide a more thorough set of links without the
author having to physically construct (or maintain) them.
In the case of the Gawain page in Wikipedia there are three entries in the
"What links here" page (as viewed on the main site, we will have a look at
generating the links for our install). They are:
Solar Deity
Prince Valiant
King Arthur
By contrast InfoWrangler has listed 20 links (which is a configured maximum
in this demonstration) that are all on topic.
King Arthur
Sir Galahad
Sir Gawain and the Green Knight
Tintagel, Cornwall, England
Lancelot
Monty Python and the Holy Grail
Arthur, Prince of Wales
Prince Valiant
Talk:Dubious historical resources
Arthur of Britain
Arthurian legend
Arthur
Guinevere
Uther Pendragon
Round Table
Literature cycle
Merlin
Bodmin Moor
Holy Grail
Monty Python and the Holy Grail/Black Knight
The notable inclusion is the third entry "Sir Gawain and the Green
Knight". This doesn't appear in the back links page as neither author
thought to provide a link on the name Sir Gawain. InfoWrangler generated
links are not intended to replace links/back links, but to complement them.
These links are generated automatically, without the need for any user
input. If I were to add another article related to the topic of Sir Gawain
(and didn't know that the original existed), then InfoWrangler would notice
the association and link the docs up. No human intervention required. No
maintenance issue. No fore-knowledge required.
This issue has been under discussion recently on the
gmane.science.linguistics.wikipedia.misc list in the post "most
excellent!". This post discussed a requirement for a feature that would
show a Wikipedia author, pages related to the one they are working on. If
I am reading that post right, I think that we are providing that kind of
feature.
>Incidentally, "What links here" doesn't work on your site, probably
>because the links table has not been generated.
Thanks for pointing that out, we will set it up.
>In any case, do you plan to keep up the site? A Wikipedia mirror would
>certainly be nice.
We do intend to keep the demonstration on-line. Making it a proper mirror
to www.wikipedia.org is an option, but may require that we upgrade the
hardware hosting it (and pay for more band-width as well, depending on
traffic). There are also technical issue of synchronisation to be
addressed. We are happy to discuss this further.
Regards - Langdon
----------------------------------------------------
Managing Director
Object Positive
Sydney, Australia
Ph: + 61 2 9659 2344
Fx: + 61 2 9659 2355
http://www.infowrangler.com
I'd like to do some tinkering with the Wikipedia skins and maybe come
up with a couple of different layouts, but the HTML seems to be
scattered across several different PHP files. Is there a recommended
easy way to do this (ideally in a WYSIWYG HTML editor like
Dreamweaver)?
--
--------------------------------
| Sheldon Rampton
| Editor, PR Watch (www.prwatch.org)
| Author of books including:
| Friends In Deed: The Story of US-Nicaragua Sister Cities
| Toxic Sludge Is Good For You
| Mad Cow USA
| Trust Us, We're Experts
--------------------------------
I have made the ANNOUNCE-L list stop advertising itself on
http://www.wikipedia.org/mailman/listinfo. Also, I have added
wikitech-l and wikipedia-l as subscribers to the list.
To my knowledge (and from looking at the archives), this list has
never been used, except to discuss whether or not it should be used
for announcements. This being the case, I was getting tired of having
to discard the spam messages that come in on it. I'm hoping that
removing the list from those advertised on the listinfo pag, I will
reduce the amount of junk that comes in on the list.
This list was originally meant as THE place for wikipedia
announcements (software changes, policy changes, etc.). So, it only
makes sense that all lists that might be interested in such
information should be on the memberlist. Of course, none of this
matters unless poeple start using it.
I'm happy either way.
--
"Jason C. Richey" <jasonr(a)bomis.com>
_______________________________________________
Announce-l mailing list
Announce-l(a)wikipedia.org
http://www.wikipedia.org/mailman/listinfo/announce-l
Dear Wikipedia Technical Team,
Object Positive is a developer of innovative information management tools
based in Sydney, Australia. We have just launched a new product which uses
a "smart" technology to improve the users' experience when using
information systems like Wikipedia.
Our flagship product is InfoWrangler Sever. InfoWrangler Server is an
application which discovers additional information within a set of data. If
a user finds some information they are interested in, InfoWrangler can
assist them in finding more information that is relevant - information
which cannot easily be found by conventional search processes.
In order to demonstrate the capability of our product across an entire
encyclopaedia we have fully integrated it with Wikipedia. The advantage for
Wikipedia users is that each time they view an article from the
encyclopedia they are provided with an automatically generated list of
related Wikipedia articles.
To view a demonstration of InfoWrangler Server working with Wikipedia
please visit the demo page of our web site:
http://www.infowrangler.com/demo.html
We would appreciate any feedback you could provide on this enhancement to
Wikipedia. If you feel that it is of value then we would be happy to
investigate ways in which we may be able to work with Wikipedia moving
forward.
Looking forward to your comments.
Regards - Langdon Stevenson
----------------------------------------------------
Managing Director
Object Positive
Sydney, Australia
Ph: + 61 2 9659 2344
Fx: + 61 2 9659 2355
http://www.infowrangler.com
> Message: 10
> Date: Thu, 27 Mar 2003 11:24:07 -0600
> From: Nick Reinking <nick(a)twoevils.org>
> To: wikitech-l(a)wikipedia.org
> Subject: Re: [Wikitech-l] wikipedia dead again
> Reply-To: wikitech-l(a)wikipedia.org
>
> >
> > Only to meassure my expectations for wikipedia ;) But keep in mind that
> > reusing serials is not part of the pg concept, but as far as I know
> > mysql does. E.g. deleting articles may spend a serial.
>
> Fair enough, I do believe you're right. Still, deleted articles are not
> terribly common, and I think we should design this with maximum
> performance in mind. If we hit some odd 1.8B articles in the future, we
> can always figure out a way to change it to serial8. :)
>
> Speaking of maximum performance... I have a question concerning our
> implementation. What will be more important in the future, clean and
> efficient code, or backwards compatibility? Especially when you
> consider the reverse_timestamp hacks everywhere that won't be needed in
> MySQL4 (or PostgreSQL), I would think that we should just drop MySQL3
> support (especially considering that it is easy to upgrade, and nobody
> will be using it in a year or two). But, maybe I'm crazy - anybody else
> have any comments?
>
> --
> Nick Reinking -- eschewing obfuscation since 1981 -- Minneapolis, MN
>
In accordance with Lee Daniel Crocker's comments I would say, that you
shouldn't really care that much about backward compatibility. As long as
there are some scripts which can handle import/export from older
versions (which you need to write anyways), this isn't really a problem.
Of course people like me, who use wikipedia as their primary wiki becaus
it is "so cool" will always know how to install, as long as the
requirements are written down somewhere.
So: Go for it. Upgrade your code
Maybe use some kind of Database abstraction (Like
http://php.weblogs.com/ADODB), also it might require some more hacks,
you can in theory support nearly any DB backend.
Cheers
Leo