There are strange people who make such links (kindof urlencoded?):
[[Második világháború#Partrasz.C3.A1ll.C3.A1s Szic.C3.ADli.C3.A1ban
.28Huskey hadm.C5.B1velet.29|Huskey hadműveletben]]
So the section title must have been copied from the URL.
Do we have a ready tool to fix these?
>From one of my assignments as a bot operator I have some code which
does template parsing and general text parsing (e.g. Image/File tags).
It is not using regex and thus able to correctly parse nested
templates and other such nasty things. I have written those as library
classes and written tests for them which cover almost all of the code.
I would now really like to contribute that code back to the community.
Would you be interested in adding this code to the pywikibot
framework? If yes, can I send the code to someone for code review or
how do you usually operate?
PS: wiki userpage is http://en.wikipedia.org/wiki/User:Hannes_R%C3%B6st
Hello, I am again asking if it is possible for me to gain SVN access (I know it probably wouldn't last too long, considering that SVN will be read-only in the (near?) future, but that time is still unknown to us). I frequently use the framework, and I'd like to be able to directly make commits, rather than generating patches to put on the bug tracker, when possible. This would include things such as PEP and typo fixes, as well as improving wikipedia.DataPage. Thanks.
I had mentioned this in the rewrite roadmap, and noticed it came up on IRC
as well, so I'd like to run this by the mailing list:
User:The Earwig has written a pure-python (with optional C-speedups)
MediaWiki text parser named mwparserfromhell. Currently we have the
textlib library and some various regexes that implement this in a
non-perfect way. From my experience using mwparser (over 400k successful
edits with no issues) I believe it is ready to be bundled with the
framework. I think it would still be a good idea to keep textlib in as a
fallback or for users who are currently using it and don't need to migrate.
As for actually adding it, in the rewrite branch we can just add it as a
dependency in setup.py, and then convert various methods over.
In trunk, I'm guessing we would need to add it as an external. (I'm not
sure how that's actually done.)
I am Chinmay, a GSOC intern at Crowdbio (
Our goal is to create a bot to capture gene info in PBB templates (
wikidata. We will be using Pywikipedia-bot framework. I am confused whether
to go with Trunk or Rewrite branch. I am familiar with the Trunk and have
queried wikidata items using it. On the other hand, there seems to be less
documentation for Rewrite branch. I have installed the Rewrite as package
from latest nightly release but not too familiar with it.
For our bot, we will be having normal functions like creating/updating
items, claims, sources etc.(the Trunk seems sufficient for these
functions). I have started coding with the Trunk but now i am a bit
concerned about the lifetime about Trunk branch. Any chances it may be get
outdated and then i would have to port to Release??
I am also concerned about the pace of development of the Rewrite branch.
>From another thread, I have noticed about changes to editEntity() to create
Should i use the Trunk or Rewrite branch ??
I try to use rewrite lately, but I encounter something difficult to reslove.
bot = TestBot()
if __name__ == "__main__":
While test case is "pwb test"
When I type "字母频率", which "频" is not supported by cp950, it shows "
字母?率" but "字母频率" is expected. In the same time, while using the
same script in Pywikipedia, it correctly shows "字母频率.
Here is something stranger, if "学校" is used, the following error is given:
Traceback (most recent call last):
File "D:\pywikipedia\pwb.py", line 50, in <module>
File "D:\pywikipedia\scripts\self\test.py", line 21, in <module>
File "D:\pywikipedia\scripts\self\test.py", line 17, in main
File "D:\pywikipedia\scripts\self\test.py", line 13, in run
File "D:\pywikipedia\pywikibot\bot.py", line 450, in input
data = ui.input(question, password)
line 196, in input
text = unicode(text, config.console_encoding)
UnicodeDecodeError: 'cp950' codec can't decode bytes in position 0-1:
illegal multibyte sequence
When I try to do some tests myself, print(self.encoding) of
terminal_interface_base.py of pywikipedia gives utf-8, while
config.console_encoding of terminal_interface.py of rewrite gives cp950.
Can somebody help?
-----BEGIN PGP SIGNED MESSAGE-----
Following issue: I am up re-organizing the whole "externals" part in
trunk as you might have recognized already. In fact this is done now
with the single exception of a generic patching system, e.g. needed
for BeautifulSoup.py. (As usual) this is no problem under linux, but
becomes a major issue under win.
The mechanism I want to use is the well known diff-patch duo.
Therefore a "patch" executable/binary (OR python script) is needed
(for every OS). While this is kind of "built-in" in linux, win needs
extra-attention. This is what I found so far:
The executables do only depend on the Microsoft C-runtime
(msvcrt.dll) and not on an emulation layer like that provided by
Cygwin tools - windows ONLY not multi OS
Python script therefore multi OS - but does not support the full
diff-patch "command set" e.g. cannot create new files
Not a command-line tool like "patch" but a python library/module.
So I am stuck here and need some further knowledge, experience and
personal preferences from your side in order to make a good decicion.
In my opinion we should also keep further os (than just linux, mac,
win) in our mind, becuase they are very close to what we already have too.
Thanks for any help and Greetings
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----
-------- Origineel bericht --------
Onderwerp: [Wikidata-tech] imminent breaking change to the API
Datum: Sun, 16 Jun 2013 13:43:05 +0200
Van: Daniel Kinzler <daniel.kinzler(a)wikimedia.de>
Antwoord-naar: Wikidata technical discussion
Organisatie: Wikimedia Deutschland e.V.
Aan: wikidata-tech <wikidata-tech(a)lists.wikimedia.org>, Lydia Pintscher
A quick heads up:
We have a breaking change for the API in the pipeline:
<https://gerrit.wikimedia.org/r/#/c/68406/12>. This has not been merged yet, but
I expect this to go into the next branch (to be deployed on June 24, if I'm not
It's really a fix for previously underspecified and dangerous behavior:
the editentity module would automatically create a new item when called without
giving an ID of the entity to modify.
With the patch, editentity now requires that either the 'id' parameter or the
'new' parameter be set, explicitly stating which entity to modify resp. what
kind of entity to create.
That is, bots that wish to create an item now need to provide the new=item
parameter, instead of just not supplying an id parameter. This affects all bots
that create items.
We could keep the old behavior as a B/C mode, but it's dangerous and ill
defined; I think it's better to break some bots now, it's easy to fix anyway.
But it should of course be announced in due time.
Wikidata-tech mailing list
The time has finally come upon us--I'm finally moving forward with shutting
down SVN and making it a read-only service. As Pywikipedia is the only
consumer of SVN anymore, I wanted to reach out to the community to find
out what everyone wants to do. As I see it, there's three courses of action
that Pywikipedia can go in:
1) Move to Gerrit
2) Move to Git elsewhere (Github, Google Code, etc)
3) Move to some other SVN service
I'm more than willing to help with any of these choices--the first two would
involve a conversion of the history to Git, along with importing it to the
destination of choice. Staying with SVN is also potentially possible, I'm
more than happy to provide full SVN dumps if someone's wanting to setup
that service elsewhere.
What are people's thoughts? I've not come up with a firm date yet, but
coming to consensus sooner rather than later would be nice.