Happy Monday,
There are strange people who make such links (kindof urlencoded?):
[[Második világháború#Partrasz.C3.A1ll.C3.A1s Szic.C3.ADli.C3.A1ban
.28Huskey hadm.C5.B1velet.29|Huskey hadműveletben]]
So the section title must have been copied from the URL.
Do we have a ready tool to fix these?
--
Bináris
Forgot to "send to list".
-------- Forwarded Message --------
Subject: Re: [pywikibot] Which username for Wikidata test ?
Date: Fri, 30 Jun 2017 00:30:56 +0900
From: Yongmin H. <lists(a)revi.pe.kr>
Organization: Wikimedia
To: Jean-Baptiste Pressac <Jean-Baptiste.Pressac(a)univ-brest.fr>
Hi,
Are you sure you have account on test.wikidata.org? I cannot find it in
[[Special:ListUsers]][1]. Even if you have SUL, you need to visit the
wiki once to get it auto-created.
[1]:
https://test.wikidata.org/w/index.php?title=Special%3AListUsers&username=Tr…
Thanks,
PS: Mailing list archive is available here.
https://lists.wikimedia.org/pipermail/pywikibot/
On 2017-06-30 00:15, Jean-Baptiste Pressac wrote:
> Hello,
>
> I created a /user-config.py/ with /generate_user_files.py/ to use
> Wikidata test (mylang = 'test'). But as I try to login I have this error
> message:
>
> pywikibot.exceptions.NoUsername: Username 'trucmuche' does not exist on
> wikidata:test
>
> Where trucmuche is my usual account on Wikidata. Is there a special
> username for Wikidata test ?
>
> What is the URL of Wikidata test ?
>
> Thanks,
>
> PS : Is there a way to search in the forum archives ?
>
> --
> Jean-Baptiste Pressac
>
> Traitement et analyse de bases de données
> Production et diffusion de corpus numériques
>
> Centre de Recherche Bretonne et Celtique
> Unité mixte de service (UMS) 3554
> 20 rue Duquesne
> CS 93837
> 29238 Brest cedex 3
>
> tel : +33 (0)2 98 01 68 95
> fax : +33 (0)2 98 01 63 93
>
>
>
> _______________________________________________
> pywikibot mailing list
> pywikibot(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>
--
Yongmin Hong
https://wp.revi.blog
Please note that this address is list-only address and any non-mailing
list mails will be treated as spam.
Please use https://encrypt.to/0x947f156f16250de39788c3c35b625da5beff197a
Hello,
I created a /user-config.py/ with /generate_user_files.py/ to use
Wikidata test (mylang = 'test'). But as I try to login I have this error
message:
pywikibot.exceptions.NoUsername: Username 'trucmuche' does not exist on
wikidata:test
Where trucmuche is my usual account on Wikidata. Is there a special
username for Wikidata test ?
What is the URL of Wikidata test ?
Thanks,
PS : Is there a way to search in the forum archives ?
--
Jean-Baptiste Pressac
Traitement et analyse de bases de données
Production et diffusion de corpus numériques
Centre de Recherche Bretonne et Celtique
Unité mixte de service (UMS) 3554
20 rue Duquesne
CS 93837
29238 Brest cedex 3
tel : +33 (0)2 98 01 68 95
fax : +33 (0)2 98 01 63 93
Thank you very much for your answer, Merlijn.I don't have in plans to switch to Python 3 in the near future:)
Your second and third solutions works fine. I'm sticking to the third solution for now.Thanks again for your help.Regards.Dan
Saturday, 24 June 2017 18:43:46, Merlijn van Deen (valhallasw) <valhallasw(a)arctus.nl> wrote:
Hi Dan,
On 23 June 2017 at 19:41, Dan <dan15i(a)yahoo.com> wrote:
Hi.
Do PWB has issues with decoding URL strings?
Nothing in your example suggests it does:
test1 = urllib.unquote(m)
test2 = urllib.unquote_plus(m)
test3 = m.decode('utf8')
test4 = m.encode('utf8')
These are all questions of what the Python built-in urllib module does. In the case of Python 2, the behavior is a bit odd, and I think this is what is causing your issue.
In your example, m = u'%c3%85', i.e., a unicode string with the text "%C3%85". Urldecoding this should yield two bytes: the bytes C3 and 85, i.e, the UTF-8 representation of Å.
However, what Python 2 does is it interprets u'%c3%85' to mean 'a unicode string with characters U+00C3 U+0085', i.e., the characters à and [unprintable]. There is no clean way to fix the situation after we have ended up there.
Now -- how to solve this?
- The most obvious solution is 'Use Python 3', where the unquote function correctly processes the string.- Another option is to turn your URL into a bytestring first, i.e., m = m.encode('utf-8'), then call unquote, then decode the string again.- As you already have a dependency on pywikibot, the last option is to use the pywikibot.page.url2unicode, which works correctly, even on Python 2.
Best,Merlijn
Hi.
Do PWB has issues with decoding URL strings?
Try this script:
from __future__ import absolute_import, unicode_literals
import re, urllib
import pywikibot
mylist = \
[
u"Åge Hovengen",
u"Åge Konradsen",
u"Åge Ramberg",
]
for a in mylist:
ssite = pywikibot.getSite("en")
spage = pywikibot.Page(ssite, a)
text = spage.get()
m0 = re.search(ur"\{\{\s*Stortingetbio\s*\|\s*(?:id=)?\s*([^\s}\|]+)\s*[\|\}]", text, flags=re.IGNORECASE)
if m0:
m = m0.group(1)
test1 = urllib.unquote(m)
test2 = urllib.unquote_plus(m)
test3 = m.decode('utf8')
test4 = m.encode('utf8')
pywikibot.output(test1)
pywikibot.output(test2)
pywikibot.output(test3)
pywikibot.output(test4)
It doesn't decode for me %c3%85 to ÅWhile on http://repl.it/Izdw/2 you can see that pure python can decode that string sequence with urllib.unquote and urllib.unquote_plus.Is this a PWB bug or what?
Hello,
I wonder if some of you could maybe take a look at
https://phabricator.wikimedia.org/T119791 and the archivebot.py script in
general?
It'd be good if the script supported some other functions such as different
n=x archiving and immediate archiving if a template is there in a thread.
Best regards, M.
Hello,
I have a script which should add a template to articles which are created
by the ContentTranslation tool (the template has parameters which depends
on language and revision which were used as the source one; this is the
reason why I use separate script). It may be found at
https://github.com/urbanecm/addPrekladCT/blob/master/addmissing.py. The
script work perfectly on my local PC and on bastion host but I can't get it
work on the grid.
The script itself is run by *python3 addmissing.py -always -file:pages.txt
-search:'-insource:/\{\{[Pp]řeklad/'* and require pages.txt file and
preklads.txt file at https://tools.wmflabs.org/urbanecmbot/test/preklads.txt.
The first contains pages that should be processed and act as the generator,
the second one is something like a database with exact templates which
should be inserted. Both files are as an example in the attachments.
When I try to run it at toollabs bastion, all works as it should. When I
send the script to grid, it do not work (see sample output below). Why? Can
somebody help me with it?
Thank you in advance,
Martin Urbanec / Urbanecm
; Output
urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT
$ cat test.sh
python3 addmissing.py -always -file:pages.txt
-search:'-insource:/\{\{[Pp]řeklad/'
urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT
$ jsub bash test.sh
Your job 6201363 ("bash") has been submitted
urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT
$ qstat
job-ID prior name user state submit/start at queue
slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6201363 0.30000 bash urbanecm r 06/16/2017 18:14:42
task(a)tools-exec-1404.eqiad.wmf 1
urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT
$ ls ~/bash.*
/home/urbanecm/bash.err /home/urbanecm/bash.out
urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT
$ cat ~/bash.*
Traceback (most recent call last):
File "addmissing.py", line 223, in <module>
main()
File "addmissing.py", line 183, in main
local_args = pywikibot.handle_args(args)
File "/shared/pywikipedia/core/pywikibot/bot.py", line 954, in handle_args
writeToCommandLogFile()
File "/shared/pywikipedia/core/pywikibot/bot.py", line 1128, in
writeToCommandLogFile
command_log_file.write(s + os.linesep)
File "/usr/lib/python3.4/codecs.py", line 711, in write
return self.writer.write(data)
File "/usr/lib/python3.4/codecs.py", line 368, in write
data, consumed = self.encode(object, self.errors)
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc5' in
position 67: surrogates not allowed
CRITICAL: Closing network session.
<class 'UnicodeEncodeError'>
urbanecm@tools-bastion-02 ~/Documents/cswiki/addPrekladCT
$
Hi everyone,
On the recent hackathon in Vienna we talked about the large number of
changes still open and how to get the flow back. We currently have over
300 open changes going back to 2014 (
https://gerrit.wikimedia.org/r/#/q/status:open+project:pywikibot/core ).
A change is in Gerrit because the developer wants code review to get it
merged. Code review might not be a lot of fun and this is made worse by
this huge backlog.
A lot of the changes have issues preventing this:
* Merge conflict, needs to be rebased
* Not verified, tests fail
* Code review -1, -2
My proposal is to abandon the changes we're not going to work on anyway
and focus our attention on the changes we do want to get merged. I
understand that some changes in which people invested a lot of time and
effort will get abandoned, but I think the benefit of getting the code
review process back on track is higher. Abandoned changes are not gone,
we can always open them again.
I ask everyone who has (a lot of) old open changes to have a look at
them and make the decision: Pick it up or abandon. If the change is
linked to a phabricator task, it would be nice to update the task too.
Thank you,
Maarten
Hi folks,
I've added a patch [1] for the new EventStreams web service [2] which will replace RCStream soon. The new library part is ready to review (and two of my scripts use it for a long term test). I've added a test suite but unfortunately this fails due to missing installation of the needed sseclient. Could anybody give a hint how to setup the nose test to install this needed library.
Thanks a lot
xqt
[1] https://gerrit.wikimedia.org/r/#/c/346164/
[2] https://wikitech.wikimedia.org/wiki/EventStreams