Hi Daniel,
Changing the loop to the below tells me the first problematic pageid is
28644448 <https://en.wikipedia.org/wiki/Special:Redirect/page/28644448>,
which is the character \x85.
>> for each_article in
cat.articles(namespaces=(0)):
... try:
... print(each_article.title(withNamespace=True),
each_article.pageid)
... except pywikibot.exceptions.InvalidTitle:
... print(each_article.pageid)
... raise
...
str.strip() removes this character resulting an empty string, so the
exception is raised. (page.py#L5666-L5670
<https://github.com/wikimedia/pywikibot/blob/16a31c88b67c7af1966ca00ed998db01f76c2adb/pywikibot/page.py#L5666-L5670>
)
Regards,
JJ
On Mon, Jun 18, 2018 at 1:23 PM Daniel Glus <danielhglus(a)gmail.com> wrote:
Hi all,
I'm getting a strange InvalidTitle error while iterating through each of
the articles in the English Wikipedia's "Unprintworthy redirects" category
using the .articles() function.
In particular, if you run this code:
import pywikibot
site = pywikibot.Site("en", "wikipedia"); site.login()
cat = pywikibot.Category(site, "Category:Unprintworthy redirects")
for each_article in cat.articles(namespaces=(0)):
print(each_article.title(withNamespace=True), each_article.pageid)
Then it'll run for a while, printing out a bunch of titles and page IDs,
and then crash:
Traceback (most recent call last):
File "/data/project/apersonbot/test-redir-bann.py", line 5, in
<module>
print(each_article.title(withNamespace=True), each_article.pageid)
File "/shared/pywikipedia/core/pywikibot/tools/__init__.py", line 1446,
in wrapper
return obj(*__args, **__kw)
File "/shared/pywikipedia/core/pywikibot/page.py", line 322, in title
title = self._link.canonical_title()
File "/shared/pywikipedia/core/pywikibot/page.py", line 5737, in
canonical_title
if self.namespace != Namespace.MAIN:
File "/shared/pywikipedia/core/pywikibot/page.py", line 5698, in
namespace
self.parse()
File "/shared/pywikipedia/core/pywikibot/page.py", line 5669, in parse
raise pywikibot.InvalidTitle("The link does not contain a page "
pywikibot.exceptions.InvalidTitle: The link does not contain a page title
CRITICAL: Closing network session.
Any ideas? I don't think this is expected behavior, but I could be wrong.
- Daniel
_______________________________________________
pywikibot mailing list
pywikibot(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot