Happy Monday,
There are strange people who make such links (kindof urlencoded?):
[[Második világháború#Partrasz.C3.A1ll.C3.A1s Szic.C3.ADli.C3.A1ban
.28Huskey hadm.C5.B1velet.29|Huskey hadműveletben]]
So the section title must have been copied from the URL.
Do we have a ready tool to fix these?
--
Bináris
Hi all!
This is an announcement for a significant change to the Wikibase entity
format, which went live the beginning of September. It potentially affects
clients that process snaks
<https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Glossary#Snak>.
Internally, Wikibase assigns a *hash* to each snak (which is just the hash
function (Q183427) <https://www.wikidata.org/wiki/Q183427> of an internal
representation of the snak). Those hashes were previously emitted for snaks
that appeared in qualifiers, but not for the main snak or reference snaks
of a statement. With the change, the hashes are emitted for all snaks,
regardless of where they appear. This means that a snak can now look like
this:
{
"snaktype": "value",
"property": "P370",
"hash": "682fdb448ef68669a1b728a5076836da9ac3ffae",
"datavalue": {
"value": "some text",
"type": "string"
},
"datatype": "string"}
The hashes are also added to the HTML output, as an additional class
similar to the statement ID class on statements:
<div class="wikibase-statementview
wikibase-statement-Q4115189$29acf9c6-450a-7612-d206-049f5fe58328">
<!-- … -->
<div class="wikibase-statementview-mainsnak">
<div class="wikibase-snakview
wikibase-snakview-682fdb448ef68669a1b728a5076836da9ac3ffae">
<!-- … -->
</div>
</div></div>
The ultimate goal of this is to make any snak addressable in the DOM, which
is necessary for checking constraints on qualifiers and references (T168532
<https://phabricator.wikimedia.org/T168532>).
It should be noted that unlike statement IDs, snak hashes are not
identifiers. They are not stable, and may change at any time with the
internal format.
Please let us know if you have any comments or objections.
-- Lucas
Relevant tickets:
- T171607 <https://phabricator.wikimedia.org/T171607>
- T171725 <https://phabricator.wikimedia.org/T171725>
Relevant patches:
- https://github.com/wmde/WikibaseDataModelSerialization/pull/233
- https://gerrit.wikimedia.org/r/#/c/374835/
--
Lucas Werkmeister
Software Developer (Intern)
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Imagine a world, in which every single human being can freely share in the
sum of all knowledge. That‘s our commitment.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
I have corrected the misspelled subject and added one more good reason to
do it. Please reply to this one, not the prevoius.
Happy Monday for all!
I have had a dream for many-many years already: when I use replace.py for
correcting grammatical and spelling errors (that's what I do in most of my
time dedicated to botwork), it would be nice and useful to count the
replacements and to extract old-new pairs for later use. This task needs
changing of replaceExcept() in textlib, so for long time I haven't been
brave enough as I thought it to be more complicated and I was afraid of
community rejection, as often happens when a task is important for
somebody, but others don't feel it as such.
But now compat is desert, so I was brave enough to make experiments with my
copy. I still use replace.py in compat for several reasons, not to be
detailed here. I want to show you what I have done and why and ask for
opinions, if this is a good direction and could be ported to core. The
benefit of this change is much greater, then the pain with it.
Of course, the below solution is an estimation. It cannot count
modifications with the built-in editor. Still it is a good and useful
estimation.
== Example ==
Please have a look at https://hu.wikipedia.org/wiki/
Szerkeszt%C5%91:BinBot/munka#2017._szeptember_4.
The first numerical value in the table is the number of modified pages, the
last one is the total number of replacements. The difference is
astonishing: for some tasks the two numbers are equal, while there is one
where the last number is 12 times as big as the first. I think this is
something worth to show.
== Motivation ==
=== Counting the replacements ===
* Statistics
* Choosing bot tasks by efficiency
* Printing the number of replacements to the screen after each page
increases the security of the work. Sometimes not every diff is properly
coloured (e.g. think of a space), and the work is tiring, so it is easy to
skip a change, but the number may make the user focus on it.
* Give data for community (e.g. which are dangerous common errors, where we
need further steps)
* Natural curiosity of a bot owner
* Scientific purpose
* etc.
=== Saving the old-new pairs to a file or a wikipage ==
* Preparing new bot tasks, developing fixes and regexes
* Creating lists of common errors for the community
* There is a common spelling error which is quite easy to detect when the
word is [[link]]ed, but almost impossible without linking (due to the
enormous number of false positives). Ma idea is to save the hits from the
linked version and use them for the unliked as a list of errors rather than
a pattern.
* There is another common error which is not worth to be treated by bot due
to the enormous number of false positives. But if I could save the list
automatically (without modifying pages), it could be revised by volunteers
and used later as a list of errors rather than a pattern.
* Showing this list to users or groups of interests in order to teach them
which errors to avoid in the future.
* Scientific purpose
* etc.
== Solution ==
Sorry, I cannot create a diff now, because this directory is not versioned.
However, these 4 steps are not complicated to follow.
=== textlib.py ===
def replaceExcept(text, old, new, exceptions, caseInsensitive=False,
allowoverlap=False, marker='', site=None):
became:
def replaceExcept(text, old, new, exceptions, caseInsensitive=False,
allowoverlap=False, marker='', site=None,
returnPairs=False):
Just within 80 characters. :-) So it won't cause any harm when called from
anywhere without the new argument, the behaviour is unchanged for existing
calls.
A new initialization:
pairs = []
At the end of the main if, bottom of this branch:
else:
# We found a valid match. Replace it.
the last line:
markerpos = match.start() + len(replacement)
became:
markerpos = match.start() + len(replacement)
pairs.append((match.group(), replacement))
And at the very end of the method instead of return text now I have:
if returnPairs:
return (text, pairs)
else:
return text
=== replace.py ===
replaceExcept() is called from doReplacements(). Without details, instead
of returning new_text, now it will
return (new_text, replaceList)
where replaceList is a list of (old, new) tuples.
Generally it is not recommended to mix returning values and making side
effects, such as storing pairs in a list, which is global to the method, so
I decided do give back pairs. The main method of the bot (run()) can handle
it according to given parameters, either to increment a counter, or save
the (old, new) pairs to a file or a wikipage, or do nothing, just the
classic task of replacement. It needs some memory, but by this point only
pairs of the actual page are stored. Unless you explicitely create a huge
list with all the occuring pairs, which is not neccessary, it won't cause a
problem.
--
Bináris
--
Bináris
Happy Monday for all!
I have had a dream for many-many years already: when I use replace.py for
correcting grammatical and spelling errors (that's what I do in most of my
time dedicated to botwork), it would be nice and useful to count the
replacements and to extract old-new pairs for later use. This task needs
changing of replaceExcept() in textlib, so for long time I haven't been
brave enough as I thought it to be more complicated and I was afraid of
community rejection, as often happens when a task is important for
somebody, but others don't feel it as such.
But now compat is desert, so I was brave enough to make experiments with my
copy. I still use replace.py in compat for several reasons, not to be
detailed here. I want to show you what I have done and why and ask for
opinions, if this is a good direction and could be ported to core. The
benefit of this change is much greater, then the pain with it.
Of course, the below solution is an estimation. It cannot count
modifications with the built-in editor. Still it is a good and useful
estimation.
== Example ==
Please have a look at
https://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/munka#2017._szeptember…
.
The first numerical value in the table is the number of modified pages, the
last one is the total number of replacements. The difference is
astonishing: for some tasks the two numbers are equal, while there is one
where the last number is 12 times as big as the first. I think this is
something worth to show.
== Motivation ==
=== Counting the replacements ===
* Statistics
* Choosing bot tasks by efficiency
* Give data for community (e.g. which are dangerous common errors, where we
need further steps)
* Natural curiosity of a bot owner
* Scientific purpose
* etc.
=== Saving the old-new pairs to a file or a wikipage ==
* Preparing new bot tasks, developing fixes and regexes
* Creating lists of common errors for the community
* There is a common spelling error which is quite easy to detect when the
word is [[link]]ed, but almost impossible without linking (due to the
enormous number of false positives). Ma idea is to save the hits from the
linked version and use them for the unliked as a list of errors rather than
a pattern.
* There is another common error which is not worth to be treated by bot due
to the enormous number of false positives. But if I could save the list
automatically (without modifying pages), it could be revised by volunteers
and used later as a list of errors rather than a pattern.
* Showing this list to users or groups of interests in order to teach them
which errors to avoid in the future.
* Scientific purpose
* etc.
== Solution ==
Sorry, I cannot create a diff now, because this directory is not versioned.
However, these 4 steps are not complicated to follow.
=== textlib.py ===
def replaceExcept(text, old, new, exceptions, caseInsensitive=False,
allowoverlap=False, marker='', site=None):
became:
def replaceExcept(text, old, new, exceptions, caseInsensitive=False,
allowoverlap=False, marker='', site=None,
returnPairs=False):
Just within 80 characters. :-) So it won't cause any harm when called from
anywhere without the new argument, the behaviour is unchanged for existing
calls.
A new initialization:
pairs = []
At the end of the main if, bottom of this branch:
else:
# We found a valid match. Replace it.
the last line:
markerpos = match.start() + len(replacement)
became:
markerpos = match.start() + len(replacement)
pairs.append((match.group(), replacement))
And at the very end of the method instead of return text now I have:
if returnPairs:
return (text, pairs)
else:
return text
=== replace.py ===
replaceExcept() is called from doReplacements(). Without details, instead
of returning new_text, now it will
return (new_text, replaceList)
where replaceList is a list of (old, new) tuples.
Generally it is not recommended to mix returning values and making side
effects, such as storing pairs in a list, which is global to the method, so
I decided do give back pairs. The main method of the bot (run()) can handle
it according to given parameters, either to increment a counter, or save
the (old, new) pairs to a file or a wikipage, or do nothing, just the
classic task of replacement. It needs some memory, but by this point only
pairs of the actual page are stored. Unless you explicitely create a huge
list with all the occuring pairs, which is not neccessary, it won't cause a
problem.
--
Bináris
MediaWiki 1.29.0
Apache 2.4.27
MySQL 5.6.37
PHP 5.6.31
phpMyAdmin 4.7.2
Python 3.6.2
Pywikibot (whatever the latest from ToolForge is)
I recently installed Bitnami mediawiki wamp stack, python and pywikibot with tbe above versions. It is a local installation running on MS Windows 10.
My mediawiki seems to work fine. I can login, create and edit pages. I added a separate bot user but when I go to try to set a password on the Special page for Bot passwords I get the following error:
Invalid IP address or range: 0.0.0.0/0 ::/0
I have tried every combination of ip addresses I can imagine (inluding the default) 0.0.0.0/0 ::/0 but nothing is accepted.
I am new to wikis, python and that sort of stuff. Any suggestions would be greatly appreciated.
Hi everyone!
Nearly two weeks ago some of our travis build jobs related to testing OAuth
have started to timeout.[1]
I think it is related to travis misconfigured enviroment variables. Perhaps
someohne has changed them?
Unfortunately I don't have access there and I don't know who has. If you're
the one, please take a look at T173498[1], we probably need to check/reset
OAuth token values. Therefore other than having access to travis settings,
knowing the authentication information for Pywikibot-test account is
required.[2]
Thanks!
[1]: https://phabricator.wikimedia.org/T173498
[2]: https://phabricator.wikimedia.org/T100797#1321387