Pywikipedia-l November 2007

pywikipedia-l@lists.wikimedia.org

29 participants
285 discussions

[ pywikipediabot-Bugs-1826767 ] faulty color bitmask in terminal_interface.py

by SourceForge.net

Bugs item #1826767, was opened at 2007-11-06 03:43 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1826767&group_… Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: General Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: faulty color bitmask in terminal_interface.py Initial Comment: in the function getDefaultTextColorInWindows, the color returned to the output system is being bitmasked with 0x0007, and not 0x000f, which is resulting in a color that is "half" of the user's color setting return csbi.wAttributes & 0x0007 -> return csbi.wAttributes & 0x000f ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1826767&group_…

16 years, 6 months

SVN: [4511] trunk/pywikipedia/

by valhallasw＠svn.wikimedia.org

Revision: 4511 Author: valhallasw Date: 2007-11-06 10:49:56 +0000 (Tue, 06 Nov 2007) Log Message: ----------- Updated ignore properties: * separate log files -> *.log * category.dump.bz2 -> *.dump.bz2 Property Changed: ---------------- trunk/pywikipedia/ Property changes on: trunk/pywikipedia ___________________________________________________________________ Name: svn:ignore - *.pyc user-config.py autonomous_problem.dat sax_parse_bug.dat treelang.log *.dump warning*.log *.txt check_extern.log throttle.log login-data password-file category.dump.bz2 + *.pyc *.log *.dump *.dump.bz2 *.txt user-config.py autonomous_problem.dat sax_parse_bug.dat login-data password-file

16 years, 6 months

SVN: [4507] branches/rewrite/pywikibot/data/api.py

by Bryan Tong Minh

This message didn't get through the first time. On Nov 5, 2007 9:40 PM, Bryan Tong Minh <bryan.tongminh(a)gmail.com> wrote: > On Nov 5, 2007 9:27 PM, Russell Blau <russblau(a)imapmail.org> wrote: > > "Merlijn van Deen" <valhallasw(a)arctus.nl> wrote: > > > Although I appreciate your efforts, I still want to ask you to wait with > > > any other rewrite commits until you have read, understood, and reacted to > > > this (fairly long) email. > > > > All of the points in your email (which I won't quote at length) are very > > good ones. I think it would be very helpful for all interested participants > > to agree on style and approach before coding. (My commit this morning, > > incidentally, was code that I had already written for my own use, which fit > > in nicely with what Misza had already done. And there's more where that > > came from, but I'll wait...) > > > > > For the coding style, I propose the following: > > > * As base: PEP 8 [1]. > > > > Misza will be disappointed; PEP 8 says to use lowercase with underscores for > > function and method names. ;) > > > Endorse PEP 8 ;) > > > > > One module per class > > I'd make an exception to this one to allow including subclasses in the same > > module, such as Page, ImagePage, and Category. > > > I agree with russblau. Additionally, one module per class might give > problems with cross importing of modules. > > > > > > thirdly: point c. > > > This really is not a seperate point, but it is important. I think we > > > should use the API where possible, but having a fallback to the monobook > > > html parser. > > > > Absolutely. The API doesn't do everything that you can do with HTML (yet), > > and if we want backwards compatibility we have to remember that the API has > > changed a lot between MW versions. > > > Branching is the key. The trunk should always be in sync with the > trunk of MediaWiki. Additionally each time a MediaWiki version is > released, the trunk should be forked into a compatible branch. > > I also don't see the point of providing a fall back to HTML scraping, > except when absolutely necessary. > > > On unit testing -- it may be difficult to write unit tests for methods that > > access the wiki, because the value returned will depend on the contents of > > the wiki at any given time. Maybe if we have a dedicated test wiki with at > > least some pages that are locked, so that they give predictable values, that > > would be a way around the problem. > > > > Or an auto-reseted wiki or something like that. > > Bryan >

16 years, 6 months

Fwd: SVN: [4507] branches/rewrite/pywikibot/data/api.py

by Bryan Tong Minh

This message didn't get through the first time. ---------- Forwarded message ---------- From: Bryan Tong Minh <bryan.tongminh(a)gmail.com> Date: Nov 5, 2007 9:36 PM Subject: Re: [Pywikipedia-l] SVN: [4507] branches/rewrite/pywikibot/data/api.py To: Merlijn van Deen <valhallasw(a)arctus.nl> On Nov 5, 2007 5:12 PM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote: > Although I appreciate your efforts, I still want to ask you to wait with > any other rewrite commits until you have read, understood, and reacted to > this (fairly long) email. > > The purpose of the rewrite was > a) to restructure the framework > b) to have consistent formatting and documentation > c) to move to the API > and possibly > d) to add i18n support > I would also like to give some thoughts on the rewrite. As some of you might already know, I have already created a fairly complete framework based on the API and thus some experience with it. First of all, the advantage of the API is that you don't need screen scraping, except for the changing content stuff. However, all information that is required from the changing content stuff is available through the API, so the only thing that requires screen scraping is getting information about whether the action was successful. What is very important is that we clearly separate the different layers that a framework consists of and get rid of functions like replaceExceptInWhatever in the main module. In my opinion a proper framework consists of three separate layers: * High-level * Middleware * Lowware or core The core functionality should consists of methods to get and put raw page data, such as Page.get, Page.categories, Site.recentchanges, etc. The middleware consists of commonly used functions such as replaceIn, replaceImage, replaceCategory. The high-level software is the bot itself. It performs tasks by calling the functions of the middle ware and core. This separation must be such that one can only use the low-ware part without dependencies on the middle and high ware. Dependencies should only be top down, never bottom up. A real separation would make the code much clearer and easier to maintain. Related to this, the question of i18n. I strongly believe that low-ware should never output things to stdout or ask something from stdin. Communication to higher layers should happen through the use of return values and exceptions. I18n is part of middleware, or specific to a bot, but never the task of the core. The core itself can also be divided into sublayers: * Python equivalents of the functions that the API provides * Abstract Page/List objects * Generic API function * Generic communication layer The first item are the functions that are used by the outside, functions as Site.recentchanges(), Page.put(). The API has several list/generator functions which behave the same way. An abstract parent class would prevent duplicating code. The generic API function translates a function call to an appropriate HTTP request. The comm layer initiates and handles the connection to the server in an (optionally) persistent fashion. What is important to consider is where the error correction is put. Some errors are recoverable after retry. The lower in level such a retry is placed, the less code duplication is required. However, retry code placed to low may cause to catch to generic errors. An example of this is the the slave lag or Retry-After. This is probably something that should be caught in either the second or the third layer. HTTP errors should probably be caught in the third or the fourth layer. User blockage should be detected in either the second or the third layer and propagate through the low and middle ware to the bot who can optionally handle it or pass it on to the user. A thing to consider is how fool proof the framework is supposed to be. Pywikipediabot is used by many different users, from advanced users to absolute beginners. Beginners probably want the framework to catch many common exceptions and act for them, while advanced users want to keep stuff into their own control. Also the use of persistent HTTP connections makes the framework less fool proof. Persistent HTTP connections makes an object that uses them automatically unsuitable for sharing between threads. Of course one should always use proper locking when sharing between threads, but we all know that that is something that does not always happen. So far my thoughts. Thank you for reading it, it's probably a little bit messy and unstructured, but I put it down as it entered my head. Cheers, Bryan

16 years, 6 months

Re: [Pywikipedia-l] SVN: [4507] branches/rewrite/pywikibot/data/api.py

by Russell Blau

"Bryan Tong Minh" <bryan.tongminh(a)gmail.com> wrote: > On Nov 5, 2007 9:27 PM, Russell Blau <russblau(a)imapmail.org> wrote: >> "Merlijn van Deen" <valhallasw(a)arctus.nl> wrote: >> > This really is not a seperate point, but it is important. I think we >> > should use the API where possible, but having a fallback to the >> > monobook >> > html parser. >> >> Absolutely. The API doesn't do everything that you can do with HTML >> (yet), >> and if we want backwards compatibility we have to remember that the API >> has >> changed a lot between MW versions. >> > Branching is the key. The trunk should always be in sync with the > trunk of MediaWiki. Additionally each time a MediaWiki version is > released, the trunk should be forked into a compatible branch. > > I also don't see the point of providing a fall back to HTML scraping, > except when absolutely necessary. This raises an important question. Do we want to continue trying to support every MediaWiki installation, including those that haven't upgraded to more recent versions of MW? There are a number of wikis in the families/ directory now that don't have any API support at all. Personally, I only use my bots on Wikimedia Foundation sites, so it's not an issue for me. But it may be for others. If we decide to go API-only, then users of the non-current MediaWiki installations will have to use the "old" pywikipediabot. Russ

16 years, 6 months

Re: [Pywikipedia-l] SVN: [4507] branches/rewrite/pywikibot/data/api.py

by Merlijn van Deen

> Revision: 4507 > Author: russblau > Date: 2007-11-05 15:13:10 +0000 (Mon, 05 Nov 2007) > > Log Message: > ----------- > Generalize query method; add some response processing and error handling. Although I appreciate your efforts, I still want to ask you to wait with any other rewrite commits until you have read, understood, and reacted to this (fairly long) email. The purpose of the rewrite was a) to restructure the framework b) to have consistent formatting and documentation c) to move to the API and possibly d) to add i18n support First of all: point b. Before any new code is written, we should agree on a fixed coding style. The current framework has several coding styles used throughout the framework - and even sometimes in one class: in the Page class, we have got Page.urlname, Page.isAutoTitle and Page.put_async... - I think we should use one style for the entire framework. For the coding style, I propose the following: * As base: PEP 8 [1]. * UTF8 encoding for all files (and using u'' for all strings), with UTF8 BOM * Docstrings mandatory, using Epydoc [2] for describing parameters and return value. Defining the function in this way almost automatically defines unit tests for that function. * __version__ in every file, and setting the corresponding SVN property. * One module per class Note that all code should be written thread safe: we want to use persistent HTTP connections and this means we will be sharing a connection throughout several page objects. If we want to be able to put in a thread, we really need to make sure code is thread safe. Secondly: point a. The point of the rewrite was restructuring the framework. Adding structure after writing is much harder than first thinking of structure and then coding along the structure. First of all, we need to split the framework and the bots. Using the name 'pywikibot' for the framework might not be the best idea because of that ;) Because adding structure afterwards is much harder, I think we first should decide on what modules/classes we want, then defining what functions we want and what these functions should do. After that, we can start writing unit tests and functions. Unit tests are there to help defining what code is necessary. In general, first define what a function should to, then write unit tests, then write code to make these tests work, and stop coding when all tests pass. If a function needs to do more, first update the defenition, then update the unit tests, then update the code. This makes sure a) that documentation is always up to date with the code and b) that a change does not break existing behavior. For more information about unit testing and how to code using them, please read chapter 13, 14 and 15 of Dive Into Python [3]. For unit testing, we will need one or more installs of MediaWiki; the SVN release used at wikipedia, but maybe also older stable versions - if we want to maintain compatibility. On a compatibility note; even though we are changing the way the framework works, it should be possible to create a layer that maps functions in the old framework to the new framework. thirdly: point c. This really is not a seperate point, but it is important. I think we should use the API where possible, but having a fallback to the monobook html parser. The API allows neat ways of getting information from many pages at once. We will need a page generator that can handle this information, and that will generate page objects according to the information recieved. This will mean we need more data on what information we have got already etc. finally: point d. This is not a very important point, but it's kinda interesting. With the new framework, it's easier to restructure existing functions, making translations easier. I'm sorry for the long email, and I repeat: I really appreciate your efforts, but I really think we should address these issues before starting at - even the most basic - code. --valhallasw [1] http://www.python.org/dev/peps/pep-0008/ [2] http://epydoc.sourceforge.net/manual-epytext.html [3] http://diveintopython.org/unit_testing/index.html

16 years, 6 months

Re: [Pywikipedia-l] Rewrite thoughts

by Misza13

Ok now. As I am a complete lamer when it comes to team coding, coding standards etc., I'll take your words for it, but have nonetheless several lame questions. The purpose of the rewrite was > a) to restructure the framework > b) to have consistent formatting and documentation > c) to move to the API > and possibly > d) to add i18n support > > First of all: point b. > Before any new code is written, we should agree on a fixed coding style. > The current framework has several coding styles used throughout the framework - and even sometimes in one class: in the Page class, we have > got Page.urlname, Page.isAutoTitle and Page.put_async... - I think we > should use one style for the entire framework. If team coding is a democracy then I vote for camelCaseWithFirstWordLowercase. It's a strong standard in Java (which is a close cousin to Python) and is frequent in Python too (take Twisted for an example). For the coding style, I propose the following: > * As base: PEP 8 [1]. Fine. I suggest the most common 1 indentation level = 4 spaces. * UTF8 encoding for all files (and using u'' for all strings), with UTF8 BOM I'll assume gvim does that for me. o_O * Docstrings mandatory, using Epydoc [2] for describing parameters and > return value. Defining the function in this way almost automatically > defines unit tests for that function. That's an interesting document - smells of javadoc too. :) One module per class Java style again? ;) Anyway, let's make sure I understand a "module" correctly, especially in the context of unit tests later on. A module corresponds to one .py file (like http.py or api.py from our example) and a folder of these (data/) is a package, not module? Despite of this: >>> import data >>> data <module 'data' from 'data/__init__.py'> Note that all code should be written thread safe: we want to use > persistent HTTP connections and this means we will be sharing a connection > throughout several page objects. If we want to be able to put in a thread, > we really need to make sure code is thread safe. If I may suggest, let's leave data.http.HTTP (or its equivalent) as an object encapsulating a connection and presenting wrappers on GET and POST methods, but in a non-thread-safe manner and instead resolve this at a higher level (Site, perhaps) by keeping a pool of connections or something similar (or maybe write a special HTTPPool handler between Site and HTTP). That's really a matter of deciding whether the framework handles multithreading or do we tell people that each thread should have its own instance of Site. Secondly: point a. > <snip> Because adding structure afterwards is much harder, I think we first > should decide on what modules/classes we want, then defining what > functions we want and what these functions should do. After that, we can > start writing unit tests and functions. Where do we start? Edit http://www.botwiki.sno.cc/wiki/Rewrite and add classes until someone says enough? On a compatibility note; even though we are changing the way the framework > works, it should be possible to create a layer that maps functions in the > old framework to the new framework. Yes, I would assume most methods of current high-level objects (Page, Site, generators) will not change their interface much. thirdly: point c. Yes, <3 API - I think we established that already. :> > finally: point d. > This is not a very important point, but it's kinda interesting. With the > new framework, it's easier to restructure existing functions, making > translations easier. Ok now, can you elaborate on this? I don't think I'm getting the point - we already same i18n - bots have localized edit summaries, the framework knows #REDIRECT and namespace names locales as well. Are you talking about some whole new level of i18n? I'm sorry for the long email, and I repeat: I really appreciate your > efforts, but I really think we should address these issues before starting > at - even the most basic - code. Well, it seems I'm a coder, not project designer and it tempts me to stand back, watch the fireworks and return when we actually need some code. But instead, I hope to gain some experience in team programming. Misza

16 years, 6 months

SVN: [4510] trunk/pywikipedia/

by warddr＠svn.wikimedia.org

Revision: 4510 Author: warddr Date: 2007-11-05 17:03:48 +0000 (Mon, 05 Nov 2007) Log Message: ----------- added category.dump.bz2 to ignore list Property Changed: ---------------- trunk/pywikipedia/ Property changes on: trunk/pywikipedia ___________________________________________________________________ Name: svn:ignore - *.pyc user-config.py autonomous_problem.dat sax_parse_bug.dat treelang.log *.dump warning*.log *.txt check_extern.log throttle.log login-data password-file + *.pyc user-config.py autonomous_problem.dat sax_parse_bug.dat treelang.log *.dump warning*.log *.txt check_extern.log throttle.log login-data password-file category.dump.bz2

16 years, 6 months

SVN: [4509] trunk/pywikipedia/families/wikibond_family.py

by warddr＠svn.wikimedia.org

Revision: 4509 Author: warddr Date: 2007-11-05 16:53:21 +0000 (Mon, 05 Nov 2007) Log Message: ----------- copy - paste Modified Paths: -------------- trunk/pywikipedia/families/wikibond_family.py Modified: trunk/pywikipedia/families/wikibond_family.py =================================================================== --- trunk/pywikipedia/families/wikibond_family.py 2007-11-05 16:39:03 UTC (rev 4508) +++ trunk/pywikipedia/families/wikibond_family.py 2007-11-05 16:53:21 UTC (rev 4509) @@ -16,7 +16,7 @@ 'nl': [u'WikiBond'], } self.namespaces[5] = { - 'botwiki': [u'Overleg WikiBond'], + 'nl': [u'Overleg WikiBond'], } def path(self, code):

16 years, 6 months

SVN: [4508] trunk/pywikipedia/families/wikibond_family.py

by warddr＠svn.wikimedia.org

Revision: 4508 Author: warddr Date: 2007-11-05 16:39:03 +0000 (Mon, 05 Nov 2007) Log Message: ----------- +namespacess Modified Paths: -------------- trunk/pywikipedia/families/wikibond_family.py Modified: trunk/pywikipedia/families/wikibond_family.py =================================================================== --- trunk/pywikipedia/families/wikibond_family.py 2007-11-05 15:13:10 UTC (rev 4507) +++ trunk/pywikipedia/families/wikibond_family.py 2007-11-05 16:39:03 UTC (rev 4508) @@ -3,16 +3,21 @@ import family # I added this becouse someone asked me to. The url op the wiki: nl.wikibond.org -#to do: add namespaces class Family(family.Family): - + def __init__(self): family.Family.__init__(self) self.name = 'wikibond' self.langs = { 'nl': 'nl.wikibond.org', } + self.namespaces[4] = { + 'nl': [u'WikiBond'], + } + self.namespaces[5] = { + 'botwiki': [u'Overleg WikiBond'], + } def path(self, code): return '/wikibond/index.php'

16 years, 6 months

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Pywikipedia-l November 2007