Every Monday i public a new weekly botting from from my Interwiki link
checker tool.
After publishing i bot the list with my FlaBot-Bot. At the end of the week
i have an
tool to find out if an articel is now present with both links in both
languages in the wiki.
If this is not, perhaps because my Bot cant do it in autonoumous-mode i will
post
a list with all the still missing entries of my Database.
Here the batch-list for botting :
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:af
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:es
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:fi
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:tr
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:ca
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:da
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:de
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:nds
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:en
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:nl
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:no
python interwiki.py -warnfile:warning_bot_rebot_need.log -lang:sv
The Data you can get here :
http://www.flacus.de/wikipedia/Interwiki-Link-Checker/bot-reb.php
At the moment we have week 49. My bot is botting week 48.
The List above has entries until week 47,
See you next week ;-)
--
[[:de:Benutzer:Flacus]][[:de:Benutzer:FlaBot]]
http://www.flacus.de/wikipedia/Interwiki-Link-Checker/
Hi,
the main directory keeps filling up with scripts. To keep some order, I
would suggest moving some older scripts to an archive directory. That's
why I want to know which of the following scripts are still in use. I
don't want to offend anyone; if you think one of the scripts in the list
is still in use or under development, please simply say so.
are-identical.py
http://tools.wikimedia.de/~flacus/IWLC/start.php works much better
brackethttp.py
I don't think anyone still uses it
check_extern.py
Replaced by weblinkchecker.py
copy_table.py
Too much work to maintain it
editarticle.py
No longer maintained, but we should re-use parts of it for other
scripts.
extract_names.py
Doesn't write the file format expected by most scripts.
find.py
Never worked; we might also consider deleting it.
getimages.py
imagetransfer.py can do everything this can do
pagefromfile.py
should either be updated or moved away
saveHTML.py
No longer maintained and maybe also no longer used
sqldump.py
All scripts have been changed and now only support xmldump.py
translator.py
Part of copy_table.py
us-states.py
Unless someone is still using it
vertexgen.py
Needs commenting, also in interwiki.py. Now it's unclear what's its
purpose.
WdT.py and WdTXMLParser.py
No longer used/maintained.
windows_chars.py
I don't think there are still any more ISO 8859-1 wikis left
anywhere, are they?
Daniel
The getReferences() function needs to be re-written due to the new change
in the What links here page since the addition of '(inclusion)' marking things
as templates. The reason is that the current regex will count that as a
redirect.
I am in the current process of re-writing this function, but in case anyone
wants to beat me to it, I suggest the following all encompassing regular
expression to use:
re.compile('<li><a href=".*?" title=".*?">(.*?)</a> *\(*(inclusion|redirect page)*\)*.*?</li>')
group(1) will give you the title, and group(2) of the search will be either:
'', 'inclusion', 'redirect page'
--
Jason Y. Lee
AKA AllyUnion
[[Wikipedia:Sandbox]] does not function in other projects beyond the
wikipedia.
** 1 ** with [[Wikipedia:Sandbox]]
======Post-processing [[pt:Alexis de Tocqueville]]======
Updating links on page [[he:?????? ??-??????]].
Changes to be made: Adding: it
+ [[it:Alexis de Tocqueville]]
NOTE: Updating live wiki...
Getting a page to check if we're logged in on wikiquote:he
Getting page to get a token.
Getting page [[he:Wikipedia:Sandbox]]
Sleeping for 8.5 seconds
Retrieving MediaWiki messages for wikiquote:he
Parsing MediaWiki messages
WARNING: No text area found on
he.wikiquote.org/w/index.php?title=Wikipedia%3ASandbox&action=edit.
Maybe the server is down. Retrying in 1 minutes...
Traceback (most recent call last):
File "interwiki.py", line 1330, in ?
bot.run()
File "interwiki.py", line 1114, in run
self.queryStep()
File "interwiki.py", line 1093, in queryStep
subj.finish(self)
File "interwiki.py", line 762, in finish
if self.replaceLinks(page, new, sa):
File "interwiki.py", line 858, in replaceLinks
status, reason, data = pl.put(newtext, comment =
wikipedia.translate(pl.site
().lang, msg)[0] + mods)
File "C:\Python24\wikipedia.py", line 677, in put
return self.putPage(newtext, comment, watchArticle, minorEdit,
newPage, self
.site().getToken(sysop = sysop), sysop = sysop)
File "C:\Python24\wikipedia.py", line 2559, in getToken
Page(self, "Wikipedia:Sandbox").get(force = True, sysop = sysop)
File "C:\Python24\wikipedia.py", line 351, in get
self._contents, self._isWatched, self.editRestriction =
self.getEditPage(get
_redirect = get_redirect, throttle = throttle, sysop = sysop)
File "C:\Python24\wikipedia.py", line 448, in getEditPage
i2 = re.search('</textarea>', text).start()
AttributeError: 'NoneType' object has no attribute 'start'
** 2 ** with Non-existing page
======Post-processing [[pt:Alexis de Tocqueville]]======
Updating links on page [[he:?????? ??-??????]].
Changes to be made: Adding: it
+ [[it:Alexis de Tocqueville]]
NOTE: Updating live wiki...
Getting a page to check if we're logged in on wikiquote:he
Getting page to get a token.
Getting page [[he:Non-existing page]]
Changing page [[he:?????? ??-??????]]
Updating links on page [[en:Alexis de Tocqueville]].
Changes to be made: Adding: it
+ [[it:Alexis de Tocqueville]]
NOTE: Performing a recursive query first to save time....
NOTE: Nothing left to do 2
NOTE: Updating live wiki...
Getting a page to check if we're logged in on wikiquote:en
...
Leonardo Gregianin
Q) Bot usage Help!! (admin) bot usages!! 1. user:A tagged *{{no
license|~~~~~}}* to "image:sample.jpg" uploaded by user:B 2. user:Cbot
tag *{{subst:Image
copyright|Image:sample.jpg}} --~~~~* to user_talk:B, and change *{{no
license|~~~~~}}* to *{{no license notified by bot|~~~~~}}* automatically. What
is bot command?? 2. user:Cbot insert speedy deletion tag to
Image:sample.jpgafter 7 days automatically. What
is bot command?? 3. user:Dbot(admin) delete all "image:..." in the certain
speedy del category automatically. What is bot command??
--WonYong<http://en.wikipedia.org/wiki/User:WonYong>13:41, 30 December
2005 (UTC) 원본
주소 - 'http://en.wikipedia.org/wiki/Wikipedia:Village_pump_%28technical%29'
--
WonYong
I've been wondering about some kind of parser to add to the python wikipedia
project such that it knows how to handle transwiki links as well as
trans-interwiki links. This email contains my thoughts on the matter,
please feel free to correct me, add to it, or elaborate anything further.
I felt it necessary to send this email, so that we could possibly all agree
on several points, and that will help me or someone else pin down the problem
properly.
----
I've come to realize some assumptions:
* Each wikilink will not typically exceed the usage of 2-3 colons, although
anything above that is usually redundant.
Example: [[q:fr:w:en:Test]] on the English Wikipedia does process properly,
and following such a link will take you to the English Wikiquote then to
the French Wikiquote then back to the English Wikipedia.
Problem: Is there a simple and easy way to test for a namespace?
Discussion: Not easily... At least, I think so.
* The last part of the wikilink is ALWAYS the article title.
----
Most language codes are two characters long, but there are some exceptions:
als, ang, arc, ast, cho, chr, chy, csb, fur, haw, jbo, mus, nah, nds, roa-rup,
simple, tlh, tokipona, tpi, tpi, tum, zh-min-nan, zh-cn, zh-tw, minnan,
& zh-cfr
I noticed in the family file there is another one: bug
There maybe more on that list that I'm unaware of... It's likely to safe to
assume the same language families of the other projects, even those may not
exist yet. If they don't, they will likely exist in the future.
----
Okay, here is a list of cases, based on colon count:
1 colon:
1) Article namespace with no leading character in front
2) Interwiki link
3) Namespace preceeding the colon
2 colons:
1a) Interwiki link + Namespace (untranslated in English)
1b) Interwiki link + Namespace (translated in the Interwiki link's language)
2) Transwiki link + Interwiki link
3) Transwiki link + Namespace (may have to consider about different names for
the "Project" namespace)
3 colons:
1) Transwiki link + Interwiki link + Namespace (translated/untranslated)
2) Interwiki link + Transwiki link + Interwiki link (stupid, but possible)
3) Interwiki link + Transwiki link + Namespace link (stupid, but possible)
4) Transwiki link + Interwiki link + Interwiki link (stupid, but possible)
5) Transwiki link X3 (stupid, but possible)
6) Transwiki link + Transwiki link + Namespace link (stupid, but possible)
4+ colons:
Any combination above
Possible solution(s):
* Create a function to specifically to determine transwiki links
* Create a function to specifically to determine interwiki links based on
transwiki link information
* Create a function to specifically to determine namespace links based on
transwiki and interwiki link information
* Develop a class that uses the information from:
http://meta.wikimedia.org/wiki/Interwiki_map
* Develop a class for conversion only for the current available families
-- ignore the rest
----
If we split anything between '[[' and ']]' using the ':' as the separator,
we know the following to be true:
If the list is size of 1, then it has no interwiki links, no category links,
and no transwiki links. We also know that [0] is the name of the article.
No matter what the situation of the split, index of -1 will always point to
the name of the article.
Now, the matter is: In what order should we proceed?
Should we scan forwards or backwards?
In what order should we look for links?
1) Transwiki, interwiki, namespace
2) namespace, interwiki, transwiki
3) interwiki, namespace, transwiki
etc.
----
One thing is for certain, the regular expression regarding this will be
extensively long. If we do manage to resolve this, then our parser for
wikilinks should be able to handle anything we throw at it, and would make
any related bugs regarding linkedPages(), and getRedirectPage() easier
to fix. One thing I have a problem with is that getRedirectPage()
returns a string object, rather than a Page object. But it is obvious that
it should return a string object, because it could have any of the number
of situations I've described above.
The principle reason behind why I'm concerned over this matter is that I'm
in the process of developing a Notification bot. Unfortunately, I've run
into several user pages who have, in their wisdom, decided to redirect their
pages to either a different project or a different language, and sometimes
it is a combination of both. So I've been thinking of a way to properly
parse the information from getRedirectPage() such that I can pass the correct
parameters to the Site class.
Thoughts, anyone?
--
Jason Y. Lee
AKA AllyUnion
I'd like to request a -namespace option to the interwiki.py script, I
see that the allpages() function in wikipedia.py supports this but I
don't know python well enough (or at all) to add this myself. This
would be useful e.g. to add/update interwiki links to/on templates &
categories.
At the Icelandic Wikipedia we put <noinclude> after the first
paragraph of an article and </noinclude> at the very end of it so that
we can put {{:Article}} at the articles top category to get a summary
of the topic, however interwiki.py converts something like:
"
[[en:Topic]]
</noinclude>
"
into
"
</noinclude>
[[fr:Topico]]
[[en:Topic]]
...
"
which means that the interwikis get transcluded to the category page,
is there a way to make the bot place the added interwiki links where
it found the first one instead of putting them all at the bottom?