Happy Monday,
There are strange people who make such links (kindof urlencoded?):
[[Második világháború#Partrasz.C3.A1ll.C3.A1s Szic.C3.ADli.C3.A1ban
.28Huskey hadm.C5.B1velet.29|Huskey hadműveletben]]
So the section title must have been copied from the URL.
Do we have a ready tool to fix these?
--
Bináris
Hello,
There is a big issue with the removal of empty sections by cosmetic changes.
https://gerrit.wikimedia.org/r/#/c/433914/6/pywikibot/cosmetic_changes.py
It removes also non-empty sections, that starts and ends with HTML comments
See :
https://fr.wikipedia.org/w/index.php?title=S%C3%A9ries_t%C3%
A9l%C3%A9vis%C3%A9es_diffus%C3%A9es_sur_American_Broadcast
ing_Company&diff=prev&oldid=148771218
This bug is, I think, critical because all changes with CC activated must
be verified.
I don't think that CC should remove comments added by editors.
I tried to open a bug on phabricator but don't understand how to do it.
Regard
Hi Pywikibot people,
FYI. This shouldn't affect Pywikibot in a negative way because it
already properly handles maxlag, but if we run into weird problems on
Wikidata in July it's good to know this has been changed. This will
solve our issues with unexpected ratelimit errors.
Maarten
-------- Forwarded Message --------
Subject: [Wikidata] Wikibase’s maxlag now takes dispatch lag in account
Date: Thu, 28 Jun 2018 14:31:26 +0200
From: Léa Lacroix <lea.lacroix(a)wikimedia.de>
Reply-To: Discussion list for the Wikidata project
<wikidata(a)lists.wikimedia.org>
To: Discussion list for the Wikidata project.
<wikidata(a)lists.wikimedia.org>, wikidata-tech(a)lists.wikimedia.org
/This change impacts people running bots and semi-automated tools to
edit Wikidata./
Hello all,
Based on the previous discussions that happened around the limitation
set up to fix the important dispatch lag on clients, we came with a new
solution to try.
The database behind Wikidata is replicated to several other database
servers. At each edit, the changes are replicated to these other
servers. There is always a short lag, which is usually less than a
second. If this lag is too high, the other databases can’t synchronize
correctly, which can cause problems for reading and editing Wikidata, or
reusing data on other projects.
If the lag is too high on too many servers, the master database stops
accepting new edits. When the lag is close to the limit, the system is
prioritizing “humans” edits and ignore the edits from bots, sending back
an error. This limit is set up by the maxlag option in the API.
People writing bots can set up a number as maxlag for their bot. The
default value is 5. This number is used to evaluate two things: the
replication lag between master database and replicas, and the size of
the job queue.
*On Tuesday, June 3rd, maxlag will also evaluate the dispatch lag
between Wikidata and clients (eg Wikipedias).*
The dispatch lag is the latency between an edit on Wikidata and the
moment when it’s shown on clients. Its median value is around 2 minutes.
*If you’re running a bot and using a standard configuration (maxlag=5),
when the median of dispatch lag is more than 300 seconds, your bot edits
won’t be saved and will return an error. *
If this change is impacting your work too much, please let us know by
letting a comment in this ticket
<https://phabricator.wikimedia.org/T194950>. This is also where you can
ask any question. You can also change your configuration in order to
increase the maxlag limit.
More information: Wikidata dispatch Grafana board
<https://grafana.wikimedia.org/dashboard/db/wikidata-dispatch?refresh=1m&org…>
Thanks for your constructive feedback,
--
Léa Lacroix
Project Manager Community Communication for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de <http://www.wikimedia.de>
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello,
after long time (~6 moths) i manually updated my pywikibot (to nightly
version 2018-06-06)
But I found very serious BUG:
pwb.py newitem -namespace:0 -unconnectedpages:5000
ignores -namespace parameter and imports all pages.
how to set namespace now? In pagegenerators.py is this parameter now with
some exceptions, but newitem.py is not between them.
And how and why was this changed? I did'nt noticed this BREAKING change in
pywikibedia-bugs mails
JAnD
Hi all,
I'm getting a strange InvalidTitle error while iterating through each of
the articles in the English Wikipedia's "Unprintworthy redirects" category
using the .articles() function.
In particular, if you run this code:
import pywikibot
site = pywikibot.Site("en", "wikipedia"); site.login()
cat = pywikibot.Category(site, "Category:Unprintworthy redirects")
for each_article in cat.articles(namespaces=(0)):
print(each_article.title(withNamespace=True), each_article.pageid)
Then it'll run for a while, printing out a bunch of titles and page IDs,
and then crash:
Traceback (most recent call last):
File "/data/project/apersonbot/test-redir-bann.py", line 5, in <module>
print(each_article.title(withNamespace=True), each_article.pageid)
File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1446,
in wrapper
return obj(*__args, **__kw)
File "/shared/pywikipedia/core/pywikibot/page.py", line 322, in title
title = self._link.canonical_title()
File "/shared/pywikipedia/core/pywikibot/page.py", line 5737, in
canonical_title
if self.namespace != Namespace.MAIN:
File "/shared/pywikipedia/core/pywikibot/page.py", line 5698, in namespace
self.parse()
File "/shared/pywikipedia/core/pywikibot/page.py", line 5669, in parse
raise pywikibot.InvalidTitle("The link does not contain a page "
pywikibot.exceptions.InvalidTitle: The link does not contain a page title
CRITICAL: Closing network session.
Any ideas? I don't think this is expected behavior, but I could be wrong.
- Daniel
Hello, just a note to say that following the request bellow about Github repo access, it has been requested that @Xqt, @Dalba, @Dvorapa and me @framawiki have access to "pywikibot" tool on toolforge.
https://phabricator.wikimedia.org/T196843
Thanks
Framawiki
> From: "Federico Leva (Nemo)" <nemowiki(a)gmail.com>
> To: pywikibot(a)lists.wikimedia.org
> Subject: [pywikibot] Permission requests for wikimedia/pywikibot repo
> on GitHub
>
> FYI, there's a request for Xqt, Dalba and others to get owner (?)
> permission on the GitHub mirrors of the pywikibot repository (ex
> pywikibot-core), which control some functionality.
>
> <https://phabricator.wikimedia.org/T196810>
>
> Federico
>
FYI, there's a request for Xqt, Dalba and others to get owner (?)
permission on the GitHub mirrors of the pywikibot repository (ex
pywikibot-core), which control some functionality.
<https://phabricator.wikimedia.org/T196810>
Federico
A village pump in Hungarian Wikipedia had not been archived for a long time
before we noticed it.
Investigation showed that the bot had hit a warning-type abuse filter upon
copying the text to archive and saving. In this case the abuse filter
displays a warning and lets the user press Save again if he/she wants to do
it anyway.
Of course, the archivebot
- did not save the page for the second time (could it?)
- did not pass the problematic section and archive the remainder
- did not let the owner know about the problem.
It just silently failed, with logging being the only action.
So what would the desired behaviour be in a similar case? Please keep in
mind that archivebot is tyoically run with cron or other timing, not in
interactive mode.
To see the log click on https://tools.wmflabs.org/ato/log/archive.txt and
ctrl f for
ERROR: editpage: abusefilter-warning
--
Bináris
Hi all,
If you use Eventstreams in pywikibot, the message below is of interest to
you.
My apologies for the slow forward; this message had been kept back in a
mailing list filter.
Best,
Merlijn
---------- Forwarded message ----------
From: Andrew Otto <otto(a)wikimedia.org>
To: "A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics." <analytics(a)lists.wikimedia.org>,
Services Mailing List <services(a)lists.wikimedia.org>, Internal discussion
of WMF Research Team <research-internal(a)lists.wikimedia.org>, Research into
Wikimedia content and communities <wiki-research-l(a)lists.wikimedia.org>,
huggle(a)lists.wikimedia.org, pywikibot-announce(a)lists.wikimedia.org,
mediawiki-api-announce(a)lists.wikimedia.org, Operations LIst <
ops(a)lists.wikimedia.org>
Cc:
Bcc:
Date: Tue, 15 May 2018 11:43:29 -0400
Subject: EventStreams offset reset - June 5 2018
Hi all!
*If you are not an active user of the EventStreams service, you can ignore
this email.*
We’re in the process of upgrading
<https://phabricator.wikimedia.org/T152015> the backend infrastructure that
powers the EventStreams service. When we switch EventStreams to the new
infrastructure <https://phabricator.wikimedia.org/T185225>, the ‘offsets’
AKA Last-Event-IDs will change.
Connected EventStreams SSE clients will reconnect and not be able to
automatically consume from the exact position in the stream where they left
off. Instead, reconnecting clients will begin consuming from the latest
messages in the stream. This means that connected clients will likely miss
any messages that occurred during the reconnect period. Hopefully this
will be a very small number of messages, as your SSE client should
reconnect quickly.
This switch is scheduled to happen on June 5 2018, at around 17:30 UTC.
Let us know if you have any questions.
Thanks!
- Andrew Otto
Senior Systems Engineer, WMF