pywikibot September 2018

pywikibot@lists.wikimedia.org

3 participants
5 discussions

Helper script for Check Wikipedia Project

by Bináris

For those who want to work with Check Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Check_Wikipedia and https://tools.wmflabs.org/checkwiki/cgi-bin/checkwiki.cgi I created a script that will download the page list to a certain error type. It can be used in 3 ways: * Save to a file, then use Pywikibot with -file. * Upload the contents of the file to a subpage, then use Pywikibot with -links. * Import as a pagegenerator. Check Wikipedia is something similar to Special:LInterrors, but not the same. As it provides plain title lists for bots and no API, I wrote an HTML-scraper, but its output is prepared for use as a pagegenerator in Pywikibot. Have fun! https://hu.wikipedia.org/wiki/Szerkeszt%C5%91:BinBot/checkwiki.py -- Bináris

5 years, 7 months

Replace.py: very slow reading from XML dump

by Bináris

Hi folks, I still use trunk/compat for many reasons, but as I see the new code at https://github.com/wikimedia/pywikibot/blob/master/scripts/replace.py, the core version must suffer from the same problem. If we use -namespace for namespace filtering, class XmlDumpReplacePageGenerator will go through ALL pages, THEN the result is filtered by a namespace generator. This may MULTIPLY the running time in some cases and this may cost hours or even days for a fix of complicated, slow regexes. I have just checked, that dump does contain namespace informátion. So why don't we filter during the scan? I made an experiment. I modified my copy to display count of articles and count of matching pages. The replacement was: (ur'(\d)\s*%', ur'\1%'), which seems pretty slow. :-( The bot scanned the latest huwiki dump for 14 hours(!). (Not the whole dump, I used -xmlstart.) It went through 820 thousand pages and found 240+ matches (I displayed every 10th match). Then the bot worked further 30-40 minutes to check the actual pages from live wiki, this time with namespace filtering on. (I don't replace in this phase, just save the list, so no human interaction is implied in this time.) Guess the result! 62 out of 240 remained. This means that the bigger part of these 14 hours went into /dev/null. Now I realize how much time I wasted in the past 10 years. :-( I am sure that passing namespaces to XmlDumpReplacePageGenerator is worth. -- Bináris

5 years, 7 months

asynchronous put on core

by Steen Thomassen

Hi In core is how many edits I can set for later save is only 2-3. How can it increase the number of edits that are waiting to be saved? In compat the number was up to 60. /Steen

5 years, 7 months

[Pywikipedia-l] Urlencoded section titles

by Bináris

Happy Monday, There are strange people who make such links (kindof urlencoded?): [[Második világháború#Partrasz.C3.A1ll.C3.A1s Szic.C3.ADli.C3.A1ban .28Huskey hadm.C5.B1velet.29|Huskey hadműveletben]] So the section title must have been copied from the URL. Do we have a ready tool to fix these? -- Bináris

5 years, 7 months

Google Code-in 2018: Mentor some smaller tasks for young contributors!

by Andre Klapper

Also see https://phabricator.wikimedia.org/T203122 <tl;dr>: Read https://www.mediawiki.org/wiki/Google_Code-in/Mentors and add your name to the mentors table and start tagging #GCI-2018 tasks. We'll need MANY mentors and MANY tasks, otherwise we cannot make it. Google Code-in is an annual contest for 13-17 year old students. It will take place from Oct23 to Dec13. It's not only about coding: we also need tasks about design, docs, outreach/research, QA. Last year, 300 students worked on 760 tasks supported by 51 mentors. For some achievements from last round, see https://blog.wikimedia.org/2018/03/20/wikimedia-google-code-in-2017/ While we wait whether Wikimedia will get accepted: * You have small, self-contained bugs you'd like to see fixed? * Your documentation needs specific improvements? * Your user interface has some smaller design issues? * Your Outreachy/Summer of Code project welcomes small tweaks? * You'd enjoy helping someone port your template to Lua? * Your gadget code uses some deprecated API calls? * You have tasks in mind that welcome some research? Note that "beginner tasks" (e.g. "Set up Vagrant") and generic tasks are very welcome (like "Choose and fix 2 PHP7 issues from the list in https://phabricator.wikimedia.org/T120336" style). We also have more than 400 unassigned open #easy tasks listed: https://phabricator.wikimedia.org/maniphest/query/HCyOonSbFn.z/#R Can you mentor some of those tasks in your area? Please take a moment to find / update [Phabricator etc.] tasks in your project(s) which would take an experienced contributor 2-3 hours. Read https://www.mediawiki.org/wiki/Google_Code-in/Mentors , ask if you have any questions, and add your name to https://www.mediawiki.org/wiki/Google_Code-in/2018#List_of_Wikimedia_mentors (If you have mentored before and have a good overview of our infrastructure: We also need more organization admins! See https://www.mediawiki.org/wiki/Google_Code-in/Admins ) Thanks (as we cannot run this without your help), andre -- Andre Klapper | Bugwrangler / Developer Advocate https://blogs.gnome.org/aklapper/

5 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

pywikibot September 2018