2008/10/25 Bryan Tong Minh bryan.tongminh@gmail.com:
On Sat, Oct 25, 2008 at 10:05 PM, Brion Vibber brion@wikimedia.org wrote:
If you screen-scrape an HTML *user interface form* for your bot, YOU WILL GET BROKEN BY CHANGES, especially if you don't bother to even TRY to behave like an actual client (which would load all the required form variables from the edit page in the first place).
Indeed. If you are using screenscraping, you should at least bother to run a HTML parser over the edit page. You need it anyway to get the content and other tokens. See http://mwclient.svn.sourceforge.net/viewvc/mwclient/trunk/mwclient/page_nowriteapi.py?revision=45&view=markup for an example.
Good regexes save you the (memory) effort of an HTML parser. I used some really long one to give me all the fields necessary, now I use API for edits.
Marco