Wikimedia Sverige has been running the FindingGLAMs project
<https://meta.wikimedia.org/wiki/FindingGLAMs> over the last year which has
identified two areas (amongst others) where technical development is needed
to unblock GLAM contributions. The first is support for Structured Data on
Commons in the major upload tools (of which Pywikibot is a big one), the
second is support for uploading Lexicographical data to Wikidata.
I’ve seen some Pywikibot activity and discussion on both of these issues
but neither seems to have taken off. To try and get either, or both, of
these kickstarted I think it would be useful to get some of the active
Pywikibot developers together. I believe that even just a joint schematic
understanding of how these new features should be implemented would be a
good outcome as this would aid actual implementation (as well as subsequent
code review). Since Wikimedia Sverige is also hosting Wikimania this year
the attached Hackathon <https://wikimania.wikimedia.org/wiki/2019:Hackathon>
provides an excellent opportunity for this.
If you are attending and interested in joining up for this I’d love to hear
from you. Wikimedia Sverige also has a travel budget which can be used for
bringing a few selected people working on these areas to the Wikimania
Hackathon (this is outside of the scholarship process).
Since our budget is limited I’d kindly ask that if you would be interested
in receiving some financial support for the travel, please contact me off
list and answer the following questions (each answer should be no longer
than 100 words):
What would you be interested working on?
What relevant experience do you have?
What is the expected travel cost you would need support with?
https://phabricator.wikimedia.org/T223820 - SDC on Commons
https://phabricator.wikimedia.org/T189321 - Lexicographical data
https://phabricator.wikimedia.org/T186200 - Rethink Wikibase data model
implementation (may or may not be a blocker to either of the above)
André Costa (also Lokal_Profil when not wearing my Wikimedia Sverige hat)
André Costa | Chief Operating Officer, Wikimedia Sverige |
Andre.Costa(a)wikimedia.se | +46 (0)733-964574
Stöd fri kunskap, bli medlem i Wikimedia Sverige.
Läs mer på blimedlem.wikimedia.se
a new stable release "3.0.20190722" was deployed which is available at pypi (as side package without scripts) or from our repository with the tags "3.0.20190430" and "stable". There was a long time between this stable branch and the previous one. The main reason was the implementation of sitelink badges and the implementation of closed wiki access which needed some additional work to get all tests passing.
Please note that the last release is always tagged with the "stable" tag. For production systems you should always use this branch instead of master branch because master is under perpetual development (see T217908 at phabricator). PAWS is also based on the stable tagged release.
There are currently 96,697 lines of code.
The following changes where comming with this change:
Important changes which needs operator's attention:
* Deprecation warning: support for Python 2 will be dropped in 2020 (T213287). Please update your Python.
* deprecate test_family, use wikipedia_family: Site('test', 'wikipedia') instead (T228375, T228300). To have access to test wiki you should instantiate the site object by Site('test', 'wikipedia') instead of Site('test', 'test'). The old method can be used further but can be dropped.
* remove the unimplemented "proxy" variable in config.py. You should remove it from user-config.py too.
* Increase the throttling delay if maxlag >> retry-after (T210606)
* deprecate test_family: Site('test', 'test'), use wikipedia_family: Site('test', 'wikipedia') instead (T228375, T228300)
* Add "user_agent_description" option in config.py
* APISite.fromDBName works for all known dbnames (T225590, 225723, 226960)
* Make Family.langs property more robust (T226934)
* Remove strategy family
* Handle closed_wikis as read-only (T74674)
* TokenWallet: login automatically
* Add closed_wikis to Family.langs property (T225413)
* Redirect 'mo' site code to 'ro' and remove interwiki_replacement_overrides (T225417, T89451)
* Add support for badges on Wikibase item sitelinks (T128202)
* Remove login.showCaptchaWindow() method
* New parameter supplied in suggest_help function for missing dependencies
* Remove NonMWAPISite class
* Introduce Claim.copy and prevent adding already saved claims (T220131)
* Fix create_short_link method after MediaWiki changes (T223865)
* Validate proofreadpage.IndexPage contents before saving it
* Refactor Link and introduce BaseLink (T66457)
* Count skipped pages in BaseBot class
* 'actionthrottledtext' is a retryable wikibase error (T192912)
* Clear tokens on logout(T222508)
* botirc.IRCBot has been dropped
* Avoid using outdated browseragents (T222959)
* textlib: avoid infinite execution of regex (T222671)
* Add CSRF token in sitelogout() api call (T222508)
* Refactor WikibasePage.get and overriding methods and improve documentation
* Improve title patterns of WikibasePage extensions
* Add support for property creation (T160402)
For older changes refer HISTORY.rst.
Also some changes for scripts where made like:
* A lot of commoncat messages got twn support (219094)
* script_wui.py was dropped and moved to archive folder (T222759)
* default sign was added to welcome.py (T223044)
* Quit option was added to disambredir.py (T223048)
* Use additional twn messages with checkimages.py (T220178)
* redirect.py does no longer fail for RuntimeError retrieving missing redirects (T130911)
* Make transferbot.py continue when targetpage cannot be edited by bots or page does not exist (T223816)
* Make coordinate_import.py work on a set of Wikidata items (T220806)
* Implement -overwrite option in create_categories.py (T220305)
* Enable choosing protect level with check_protection_level in protect.py (T225448)
* Move categories without leaving a redirect which is the new default behaviour if suppressredirect right is available (T150093)
* Namespace identifier can be ommitted for -start option for checkimages.py (T217824)
Thanks all contributors for this code changes and all reviewers for their importand code checking and submitting it to the repository. Also thanks for all bug reporters and other persons who gave their valueable comments on phab tasks.
I am working on a bot that fetches a list of anonymous editors on fawiki,
uses WHOIS to retrieve more info about that IP, and uses a number of online
APIs to check if the IP is a proxy or not.
I would like to improve the code by implementing a CIDR cache, so that if I
run whois on 220.127.116.11 and determine that its ASN range is 18.104.22.168/24 and
then I encounter 22.214.171.124 in the next iteration of my for loop, I would
quickly determine this IP also belongs to the same range and skip the WHOIS
part for it.
The search space would consist of IP ranges like "126.96.36.199 - 188.8.131.52"
(these are the beginning and end IP addresses of the 184.108.40.206/24 range).
Obviously, we can convert these IPs to Hex to make comparisons easier.
Given an IP like 220.127.116.11, we need the object to efficiently determine if it
already has an IP range that encompasses this given IP and if so, return
the previously cached details for that IP pair. If not, we will store that
The part that I am not fully clear about is the following: how can I avoid
having to loop through every range in the cache? Is there a way to create a
hash function that checks two inequality comparisons efficiently?