Bugs item #1826767, was opened at 2007-11-06 03:43
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1826767&group_…
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: General
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Nobody/Anonymous (nobody)
Assigned to: Nobody/Anonymous (nobody)
Summary: faulty color bitmask in terminal_interface.py
Initial Comment:
in the function getDefaultTextColorInWindows, the color returned to the output system is being bitmasked with 0x0007, and not 0x000f, which is resulting in a color that is "half" of the user's color setting
return csbi.wAttributes & 0x0007
->
return csbi.wAttributes & 0x000f
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=603138&aid=1826767&group_…
This message didn't get through the first time.
On Nov 5, 2007 9:40 PM, Bryan Tong Minh <bryan.tongminh(a)gmail.com> wrote:
> On Nov 5, 2007 9:27 PM, Russell Blau <russblau(a)imapmail.org> wrote:
> > "Merlijn van Deen" <valhallasw(a)arctus.nl> wrote:
> > > Although I appreciate your efforts, I still want to ask you to wait with
> > > any other rewrite commits until you have read, understood, and reacted to
> > > this (fairly long) email.
> >
> > All of the points in your email (which I won't quote at length) are very
> > good ones. I think it would be very helpful for all interested participants
> > to agree on style and approach before coding. (My commit this morning,
> > incidentally, was code that I had already written for my own use, which fit
> > in nicely with what Misza had already done. And there's more where that
> > came from, but I'll wait...)
> >
> > > For the coding style, I propose the following:
> > > * As base: PEP 8 [1].
> >
> > Misza will be disappointed; PEP 8 says to use lowercase with underscores for
> > function and method names. ;)
> >
> Endorse PEP 8 ;)
>
>
> > > One module per class
> > I'd make an exception to this one to allow including subclasses in the same
> > module, such as Page, ImagePage, and Category.
> >
> I agree with russblau. Additionally, one module per class might give
> problems with cross importing of modules.
>
> >
> > > thirdly: point c.
> > > This really is not a seperate point, but it is important. I think we
> > > should use the API where possible, but having a fallback to the monobook
> > > html parser.
> >
> > Absolutely. The API doesn't do everything that you can do with HTML (yet),
> > and if we want backwards compatibility we have to remember that the API has
> > changed a lot between MW versions.
> >
> Branching is the key. The trunk should always be in sync with the
> trunk of MediaWiki. Additionally each time a MediaWiki version is
> released, the trunk should be forked into a compatible branch.
>
> I also don't see the point of providing a fall back to HTML scraping,
> except when absolutely necessary.
>
> > On unit testing -- it may be difficult to write unit tests for methods that
> > access the wiki, because the value returned will depend on the contents of
> > the wiki at any given time. Maybe if we have a dedicated test wiki with at
> > least some pages that are locked, so that they give predictable values, that
> > would be a way around the problem.
> >
>
> Or an auto-reseted wiki or something like that.
>
> Bryan
>
This message didn't get through the first time.
---------- Forwarded message ----------
From: Bryan Tong Minh <bryan.tongminh(a)gmail.com>
Date: Nov 5, 2007 9:36 PM
Subject: Re: [Pywikipedia-l] SVN: [4507] branches/rewrite/pywikibot/data/api.py
To: Merlijn van Deen <valhallasw(a)arctus.nl>
On Nov 5, 2007 5:12 PM, Merlijn van Deen <valhallasw(a)arctus.nl> wrote:
> Although I appreciate your efforts, I still want to ask you to wait with
> any other rewrite commits until you have read, understood, and reacted to
> this (fairly long) email.
>
> The purpose of the rewrite was
> a) to restructure the framework
> b) to have consistent formatting and documentation
> c) to move to the API
> and possibly
> d) to add i18n support
>
I would also like to give some thoughts on the rewrite. As some of you
might already know, I have already created a fairly complete framework
based on the API and thus some experience with it.
First of all, the advantage of the API is that you don't need screen
scraping, except for the changing content stuff. However, all
information that is required from the changing content stuff is
available through the API, so the only thing that requires screen
scraping is getting information about whether the action was
successful.
What is very important is that we clearly separate the different
layers that a framework consists of and get rid of functions like
replaceExceptInWhatever in the main module. In my opinion a proper
framework consists of three separate layers:
* High-level
* Middleware
* Lowware or core
The core functionality should consists of methods to get and put raw
page data, such as Page.get, Page.categories, Site.recentchanges, etc.
The middleware consists of commonly used functions such as replaceIn,
replaceImage, replaceCategory. The high-level software is the bot
itself. It performs tasks by calling the functions of the middle ware
and core.
This separation must be such that one can only use the low-ware part
without dependencies on the middle and high ware. Dependencies should
only be top down, never bottom up. A real separation would make the
code much clearer and easier to maintain.
Related to this, the question of i18n. I strongly believe that
low-ware should never output things to stdout or ask something from
stdin. Communication to higher layers should happen through the use of
return values and exceptions. I18n is part of middleware, or specific
to a bot, but never the task of the core.
The core itself can also be divided into sublayers:
* Python equivalents of the functions that the API provides
* Abstract Page/List objects
* Generic API function
* Generic communication layer
The first item are the functions that are used by the outside,
functions as Site.recentchanges(), Page.put().
The API has several list/generator functions which behave the same
way. An abstract parent class would prevent duplicating code.
The generic API function translates a function call to an appropriate
HTTP request.
The comm layer initiates and handles the connection to the server in
an (optionally) persistent fashion.
What is important to consider is where the error correction is put.
Some errors are recoverable after retry. The lower in level such a
retry is placed, the less code duplication is required. However, retry
code placed to low may cause to catch to generic errors.
An example of this is the the slave lag or Retry-After. This is
probably something that should be caught in either the second or the
third layer. HTTP errors should probably be caught in the third or the
fourth layer. User blockage should be detected in either the second or
the third layer and propagate through the low and middle ware to the
bot who can optionally handle it or pass it on to the user.
A thing to consider is how fool proof the framework is supposed to be.
Pywikipediabot is used by many different users, from advanced users to
absolute beginners. Beginners probably want the framework to catch
many common exceptions and act for them, while advanced users want to
keep stuff into their own control.
Also the use of persistent HTTP connections makes the framework less
fool proof. Persistent HTTP connections makes an object that uses them
automatically unsuitable for sharing between threads. Of course one
should always use proper locking when sharing between threads, but we
all know that that is something that does not always happen.
So far my thoughts. Thank you for reading it, it's probably a little
bit messy and unstructured, but I put it down as it entered my head.
Cheers,
Bryan
"Bryan Tong Minh" <bryan.tongminh(a)gmail.com> wrote:
> On Nov 5, 2007 9:27 PM, Russell Blau <russblau(a)imapmail.org> wrote:
>> "Merlijn van Deen" <valhallasw(a)arctus.nl> wrote:
>> > This really is not a seperate point, but it is important. I think we
>> > should use the API where possible, but having a fallback to the
>> > monobook
>> > html parser.
>>
>> Absolutely. The API doesn't do everything that you can do with HTML
>> (yet),
>> and if we want backwards compatibility we have to remember that the API
>> has
>> changed a lot between MW versions.
>>
> Branching is the key. The trunk should always be in sync with the
> trunk of MediaWiki. Additionally each time a MediaWiki version is
> released, the trunk should be forked into a compatible branch.
>
> I also don't see the point of providing a fall back to HTML scraping,
> except when absolutely necessary.
This raises an important question. Do we want to continue trying to support
every MediaWiki installation, including those that haven't upgraded to more
recent versions of MW? There are a number of wikis in the families/
directory now that don't have any API support at all.
Personally, I only use my bots on Wikimedia Foundation sites, so it's not an
issue for me. But it may be for others. If we decide to go API-only, then
users of the non-current MediaWiki installations will have to use the "old"
pywikipediabot.
Russ
> Revision: 4507
> Author: russblau
> Date: 2007-11-05 15:13:10 +0000 (Mon, 05 Nov 2007)
>
> Log Message:
> -----------
> Generalize query method; add some response processing and error
handling.
Although I appreciate your efforts, I still want to ask you to wait with
any other rewrite commits until you have read, understood, and reacted to
this (fairly long) email.
The purpose of the rewrite was
a) to restructure the framework
b) to have consistent formatting and documentation
c) to move to the API
and possibly
d) to add i18n support
First of all: point b.
Before any new code is written, we should agree on a fixed coding style.
The current framework has several coding styles used throughout the
framework - and even sometimes in one class: in the Page class, we have
got Page.urlname, Page.isAutoTitle and Page.put_async... - I think we
should use one style for the entire framework.
For the coding style, I propose the following:
* As base: PEP 8 [1].
* UTF8 encoding for all files (and using u'' for all strings), with UTF8 BOM
* Docstrings mandatory, using Epydoc [2] for describing parameters and
return value. Defining the function in this way almost automatically
defines unit tests for that function.
* __version__ in every file, and setting the corresponding SVN property. *
One module per class
Note that all code should be written thread safe: we want to use
persistent HTTP connections and this means we will be sharing a connection
throughout several page objects. If we want to be able to put in a thread,
we really need to make sure code is thread safe.
Secondly: point a.
The point of the rewrite was restructuring the framework. Adding structure
after writing is much harder than first thinking of structure and then
coding along the structure.
First of all, we need to split the framework and the bots. Using the name
'pywikibot' for the framework might not be the best idea because of that
;)
Because adding structure afterwards is much harder, I think we first
should decide on what modules/classes we want, then defining what
functions we want and what these functions should do. After that, we can
start writing unit tests and functions.
Unit tests are there to help defining what code is necessary. In general,
first define what a function should to, then write unit tests, then write
code to make these tests work, and stop coding when all tests pass. If a
function needs to do more, first update the defenition, then update the
unit tests, then update the code. This makes sure a) that documentation is
always up to date with the code and b) that a change does not break
existing behavior.
For more information about unit testing and how to code using them, please
read chapter 13, 14 and 15 of Dive Into Python [3].
For unit testing, we will need one or more installs of MediaWiki; the SVN
release used at wikipedia, but maybe also older stable versions - if we
want to maintain compatibility.
On a compatibility note; even though we are changing the way the framework
works, it should be possible to create a layer that maps functions in the
old framework to the new framework.
thirdly: point c.
This really is not a seperate point, but it is important. I think we
should use the API where possible, but having a fallback to the monobook
html parser.
The API allows neat ways of getting information from many pages at once.
We will need a page generator that can handle this information, and that
will generate page objects according to the information recieved. This
will mean we need more data on what information we have got already etc.
finally: point d.
This is not a very important point, but it's kinda interesting. With the
new framework, it's easier to restructure existing functions, making
translations easier.
I'm sorry for the long email, and I repeat: I really appreciate your
efforts, but I really think we should address these issues before starting
at - even the most basic - code.
--valhallasw
[1] http://www.python.org/dev/peps/pep-0008/
[2] http://epydoc.sourceforge.net/manual-epytext.html
[3] http://diveintopython.org/unit_testing/index.html
Ok now. As I am a complete lamer when it comes to team coding, coding
standards etc., I'll take your words for it, but have nonetheless several
lame questions.
The purpose of the rewrite was
>
a) to restructure the framework
> b) to have consistent formatting and documentation
> c) to move to the API
> and possibly
> d) to add i18n support
>
> First of all: point b.
> Before any new code is written, we should agree on a fixed coding style.
> The current framework has several coding styles used throughout the
framework - and even sometimes in one class: in the Page class, we have
>
got Page.urlname, Page.isAutoTitle and Page.put_async... - I think we
> should use one style for the entire framework.
If team coding is a democracy then I vote for
camelCaseWithFirstWordLowercase.
It's a strong standard in Java (which is a close cousin to Python) and is
frequent in Python too (take Twisted for an example).
For the coding style, I propose the following:
> * As base: PEP 8 [1].
Fine. I suggest the most common 1 indentation level = 4 spaces.
* UTF8 encoding for all files (and using u'' for all strings), with UTF8 BOM
I'll assume gvim does that for me. o_O
* Docstrings mandatory, using Epydoc [2] for describing parameters and
> return value. Defining the function in this way almost automatically
> defines unit tests for that function.
That's an interesting document - smells of javadoc too. :)
One module per class
Java style again? ;)
Anyway, let's make sure I understand a "module" correctly, especially in the
context of unit tests later on.
A module corresponds to one .py file (like http.py or api.py from our
example) and a folder of these (data/) is a package, not module? Despite of
this:
>>> import data
>>> data
<module 'data' from 'data/__init__.py'>
Note that all code should be written thread safe: we want to use
> persistent HTTP connections and this means we will be sharing a connection
> throughout several page objects. If we want to be able to put in a thread,
> we really need to make sure code is thread safe.
If I may suggest, let's leave data.http.HTTP (or its equivalent) as an
object encapsulating a connection and presenting wrappers on GET and POST
methods, but in a non-thread-safe manner and instead resolve this at a
higher level (Site, perhaps) by keeping a pool of connections or something
similar (or maybe write a special HTTPPool handler between Site and HTTP).
That's really a matter of deciding whether the framework handles
multithreading or do we tell people that each thread should have its own
instance of Site.
Secondly: point a.
> <snip>
Because adding structure afterwards is much harder, I think we first
>
should decide on what modules/classes we want, then defining what
> functions we want and what these functions should do. After that, we can
> start writing unit tests and functions.
Where do we start? Edit http://www.botwiki.sno.cc/wiki/Rewrite and add
classes until someone says enough?
On a compatibility note; even though we are changing the way the framework
> works, it should be possible to create a layer that maps functions in the
> old framework to the new framework.
Yes, I would assume most methods of current high-level objects (Page, Site,
generators) will not change their interface much.
thirdly: point c.
Yes, <3 API - I think we established that already. :>
> finally: point d.
> This is not a very important point, but it's kinda interesting. With the
> new framework, it's easier to restructure existing functions, making
> translations easier.
Ok now, can you elaborate on this? I don't think I'm getting the point - we
already same i18n - bots have localized edit summaries, the framework knows
#REDIRECT and namespace names locales as well.
Are you talking about some whole new level of i18n?
I'm sorry for the long email, and I repeat: I really appreciate your
> efforts, but I really think we should address these issues before starting
> at - even the most basic - code.
Well, it seems I'm a coder, not project designer and it tempts me to stand
back, watch the fireworks and return when we actually need some code. But
instead, I hope to gain some experience in team programming.
Misza