Hello,
I am known as Hashar on irc and fr.wikipedia.
I am currently developping a php script that aims to help me updating interwiki (also known as interlangage links). I am posting there so the community know about what's I am doing and mainly to prevent any crash of the server that might be cause by my script.
This is what I am doing:
Script is running on my local workstation using php and my cable connection in France. It operates following this script:
1/ ask user for an article to check 2/ retrieve article and parse for interwiki links 3/ retrieve one of the interwiki link and parse it for interwiki links 4/ repeat 3 until all interwiki links of 2 have been parsed. 5/ display for each wikipedia the number of interwiki and a list of them.
It doesn't manage redirect yet.
This is an output result on a test I made this morning (while american are sleeping and european not yet awake as to minimize risks). I was browsing the site while it was running and didn't notice any slowdown.
----- SCRIPT OUTPUT ----- Enter the name of an article on the fr wikipédia.Wikilinks will be checked on (ar|ms|bs|cs|cy|da|de|el|en|es|eo|fr|fy|hi|hr|he|ko|hu|ml|nah|nl|ja|pl|ro| ru|sq|sk|sl|sr|sv|tr|zh) wikipedias.
Espagne [OK]
Current wikilink(s) for this article: fr (11 interwiki): [[da:Spanien]] [[de:Spanien]] [[en:Spain]] [[es:España]] [[eo:Hispanio]] [[nl:Spanje]] [[ja:スペイン]] [[pl:Hiszpania]] [[ro:Spania]] [[sv:Spanien]] [[zh:西班牙]]
Number and wikilinks on the linked wikis: ---------------------------------------------------------------------- da (11 interwiki): [[en:Spain]] [[de:Spanien]] [[eo:Hispanio]] [[es:España]] [[fr:Espagne]] [[ja:スペイン]] [[nl:Spanje]] [[pl:Hiszpania]] [[ro:Spania]] [[sv:Spanien]] [[zh:西班牙]] de (11 interwiki): [[da:Spanien]] [[en:Spain]] [[eo:Hispanio]] [[es:España]] [[fr:Espagne]] [[ja:スペイン]] [[nl:Spanje]] [[pl:Hiszpania]] [[ro:Spania]] [[sv:Spanien]] [[zh:%E8%A5%BF%E7%8F%AD%E7%89%99]] es (10 interwiki): [[da:Spanien]] [[de:Spanien]] [[en:Spain]] [[eo:Hispanio]] [[fr:Espagne]] [[ja:スペイン]] [[nl:Spanje]] [[pl:Hiszpania]] [[sv:Spanien]] [[zh:西班牙]] eo (8 interwiki): [[de:Spanien]] [[en:Spain]] [[fr:Espagne]] [[es:España]] [[ja:スペイン]] [[nl:Spanje]] [[pl:Hiszpania]] [[ro:Spania]] ja (0 interwiki): doesn't got any interwiki link.
----- END SCRIPT OUTPUT ----- NB: I removed lot of entries to make things clearer.
From this script output we can see that there is at most 11 interwiki. Eo
wiki is missing 3 of them, es wiki is missing one, ja doesn't seem to be linked (must be a script bug :p ).
Before I work more on this idea, what's your reaction about that ? Do you find it usefull ? Should it be spread to some server operator around the wikipedias ?
Followup either here or on my talk page.
:0°
-- Ashar Voultoiz [[Hashar]] @ fr.wikipedia.org
Ashar Voultoiz wrote:
Hello,
I am known as Hashar on irc and fr.wikipedia.
I am currently developping a php script that aims to help me updating interwiki (also known as interlangage links). I am posting there so the community know about what's I am doing and mainly to prevent any crash of the server that might be cause by my script.
What you describe is one of the robots implemented in the pywikipediabot, available via sourceforge.net: http://sourceforge.net/projects/pywikipediabot
That software does handle redirects and a lot of other wikipedia-intricacies. If you have any other wishes, we can have a look at them. You're welcome to join the team and bring in your experience.
Regards,
Rob
On Sun, 02 Nov 2003 16:08:12 +0100, Rob Hooft wrote:
Ashar Voultoiz wrote:
Hello,
I am known as Hashar on irc and fr.wikipedia.
I am currently developping a php script that aims to help me updating interwiki (also known as interlangage links). I am posting there so the community know about what's I am doing and mainly to prevent any crash of the server that might be cause by my script.
What you describe is one of the robots implemented in the pywikipediabot, available via sourceforge.net: http://sourceforge.net/projects/pywikipediabot
That software does handle redirects and a lot of other wikipedia-intricacies. If you have any other wishes, we can have a look at them. You're welcome to join the team and bring in your experience.
Regards,
Rob
Well I started my script yesterday night :0) Guess it is time for me to install python and learn it :)
Will come back to the list once I have practiced with the pyton bot.
cheers,
On Sun, 2 Nov 2003, Rob Hooft wrote:
What you describe is one of the robots implemented in the pywikipediabot, available via sourceforge.net: http://sourceforge.net/projects/pywikipediabot
That software does handle redirects and a lot of other wikipedia-intricacies. If you have any other wishes, we can have a look at them. You're welcome to join the team and bring in your experience.
Having some experience with Rob's bot (and done some minor coding on it too), I can say that I can recommend it. It is currently being used on nl: (by Rob and me) and on da: (by Christian List), while I have also made a run of it on fy:. For other languages, I would like to hear which languages would allow the usage of such a bot. If wanted, I can do a trial run on a smaller number of pages, so people can see whether it is running correctly.
The best thing would be to have an operator (or more) for each language, but if nobody is willing & able, I would not mind doing several other languages as well.
There are also some more bots there, among them a bot that is used to help in doing disambiguation (this one I have been running on en: under username Robbot). It also contains a library (wikipedia.py) so it can be used for programming other automatic or machine-aided edits on Wikipedia as well.
Andre Engels
On Mon, 3 Nov 2003 11:25:56 +0100 (CET), Andre Engels wrote:
On Sun, 2 Nov 2003, Rob Hooft wrote:
What you describe is one of the robots implemented in the pywikipediabot, available via sourceforge.net: http://sourceforge.net/projects/pywikipediabot
That software does handle redirects and a lot of other wikipedia-intricacies. If you have any other wishes, we can have a look at them. You're welcome to join the team and bring in your experience.
Having some experience with Rob's bot (and done some minor coding on it too), I can say that I can recommend it. It is currently being used on nl: (by Rob and me) and on da: (by Christian List), while I have also made a run of it on fy:. For other languages, I would like to hear which languages would allow the usage of such a bot. If wanted, I can do a trial run on a smaller number of pages, so people can see whether it is running correctly.
The best thing would be to have an operator (or more) for each language, but if nobody is willing & able, I would not mind doing several other languages as well.
There are also some more bots there, among them a bot that is used to help in doing disambiguation (this one I have been running on en: under username Robbot). It also contains a library (wikipedia.py) so it can be used for programming other automatic or machine-aided edits on Wikipedia as well.
Andre Engels
Hello andre,
I am going to check with the fr: sysops to discuss about that bot. Currently the robot.txt doesnt allow Rob's bot. I think we will conduct some test :)
Maybe we can set a page on meta to organize and talk about that ?
cheers,
On Tue, 4 Nov 2003, Ashar Voultoiz wrote:
I am going to check with the fr: sysops to discuss about that bot. Currently the robot.txt doesnt allow Rob's bot. I think we will conduct some test :)
Unless it has been changed since Saturday, the bot actually is allowed - it has loaded many French pages to check the Dutch ones.
Maybe we can set a page on meta to organize and talk about that ?
That's not a bad idea, although I would have to change my habits to use it - currently I hardly visit meta at all (only when I have to put something on Brion's todo list).
Andre Engels
On Tue, 4 Nov 2003 10:53:38 +0100 (CET), Andre Engels wrote:
On Tue, 4 Nov 2003, Ashar Voultoiz wrote:
I am going to check with the fr: sysops to discuss about that bot. Currently the robot.txt doesnt allow Rob's bot. I think we will conduct some test :)
Unless it has been changed since Saturday, the bot actually is allowed - it has loaded many French pages to check the Dutch ones.
Hello,
Well here is my last try:
In wikipedia.py I setup the user agent like that: conn.putheader("User-Agent", "HasharBot (pywikipedia.py)")
I then execut login.php:
------------ Logging in to fr.wikipedia.org username: HasharBot password: xxxxxx 403 Forbidden <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>403 Forbidden</TITLE> </HEAD><BODY> <H1>Forbidden</H1> You don't have permission to access /w/wiki.phtml on this server.<P> <HR> <ADDRESS>Apache/1.3.28 Server at fr.wikipedia.org Port 80</ADDRESS> </BODY></HTML> Hm. Did something go wrong? Wrong password? ----
Url and parameters seems ok, I tried with a mozilla user-agent but got the same 403 thing :p
On Wed, 5 Nov 2003 01:44:16 +0100, Ashar Voultoiz wrote:
On Tue, 4 Nov 2003 10:53:38 +0100 (CET), Andre Engels wrote:
On Tue, 4 Nov 2003, Ashar Voultoiz wrote:
I am going to check with the fr: sysops to discuss about that bot. Currently the robot.txt doesnt allow Rob's bot. I think we will conduct some test :)
Unless it has been changed since Saturday, the bot actually is allowed - it has loaded many French pages to check the Dutch ones.
Hello,
Well here is my last try:
In wikipedia.py I setup the user agent like that: conn.putheader("User-Agent", "HasharBot (pywikipedia.py)")
I then execut login.php:
Logging in to fr.wikipedia.org username: HasharBot password: xxxxxx 403 Forbidden
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD> <TITLE>403 Forbidden</TITLE> </HEAD><BODY> <H1>Forbidden</H1> You don't have permission to access /w/wiki.phtml on this server.<P> <HR> <ADDRESS>Apache/1.3.28 Server at fr.wikipedia.org Port 80</ADDRESS> </BODY></HTML> Hm. Did something go wrong? Wrong password? ----
Url and parameters seems ok, I tried with a mozilla user-agent but got the same 403 thing :p
Hello again
Got it working correctly. I will post specific FR: usage on sourceforge.net tracker / features / bugs :0)
The bot is registered as HasharBot on fr.wikipedia and is currently under testing and then will need approval by the fr: community.
cheers,
On Wed, 5 Nov 2003, Ashar Voultoiz wrote:
On Tue, 4 Nov 2003 10:53:38 +0100 (CET), Andre Engels wrote:
On Tue, 4 Nov 2003, Ashar Voultoiz wrote:
I am going to check with the fr: sysops to discuss about that bot. Currently the robot.txt doesnt allow Rob's bot. I think we will conduct some test :)
Unless it has been changed since Saturday, the bot actually is allowed - it has loaded many French pages to check the Dutch ones.
Hello,
Well here is my last try:
In wikipedia.py I setup the user agent like that: conn.putheader("User-Agent", "HasharBot (pywikipedia.py)")
I then execut login.php:
(snip)
The problem is that login.py does not use the connection facilities of wikipedia.py. Christian List has now changed login.py so that it does provide a User agent; please upload login.py anew and try again.
Andre Engels
Andre Engels wrote:
On Wed, 5 Nov 2003, Ashar Voultoiz wrote:
On Tue, 4 Nov 2003 10:53:38 +0100 (CET), Andre Engels wrote:
On Tue, 4 Nov 2003, Ashar Voultoiz wrote:
I am going to check with the fr: sysops to discuss about that bot. Currently the robot.txt doesnt allow Rob's bot. I think we will conduct some test :)
Unless it has been changed since Saturday, the bot actually is allowed - it has loaded many French pages to check the Dutch ones.
Hello,
Well here is my last try:
In wikipedia.py I setup the user agent like that: conn.putheader("User-Agent", "HasharBot (pywikipedia.py)")
I then execut login.php:
(snip)
The problem is that login.py does not use the connection facilities of wikipedia.py. Christian List has now changed login.py so that it does provide a User agent; please upload login.py anew and try again.
Andre Engels
I advise anyone using the bot to subscribe to the pywikipediabot mailing list on sourceforge:
http://sourceforge.net/mail/?group_id=93107
Keeping discussions about the bot on that forum keeps everyone updated without "bot"hering the true wikimedia developers.
Regards,
Rob Hooft
On Tue, 4 Nov 2003 10:53:38 +0100 (CET), Andre Engels wrote:
On Tue, 4 Nov 2003, Ashar Voultoiz wrote:
<snip>
Maybe we can set a page on meta to organize and talk about that ?
That's not a bad idea, although I would have to change my habits to use it - currently I hardly visit meta at all (only when I have to put something on Brion's todo list).
Andre Engels
Follow-up to: http://meta.wikipedia.org/wiki/Interwiki_bot
:0)
wikitech-l@lists.wikimedia.org