A lot of automation stopped working; symptom is that page edit/create returns a "preview" page instead of completing.
I was able to fix edits by adding wpSection:'' (null) to the form data. But create still does the same thing.
How do we find out what was changed (this time, and any given time)?
Please, no lectures on using the API for edits; the edit API has only been available for a month, and there is a HUGE amount of software using the GUI form.
Can we please identify what is wrong, or revert it ASAP? It is seriously borked.
Robert
Robert Ullmann wrote:
A lot of automation stopped working; symptom is that page edit/create returns a "preview" page instead of completing.
I was able to fix edits by adding wpSection:'' (null) to the form data. But create still does the same thing.
How do we find out what was changed (this time, and any given time)?
Please, no lectures on using the API for edits; the edit API has only been available for a month, and there is a HUGE amount of software using the GUI form.
Can we please identify what is wrong, or revert it ASAP? It is seriously borked.
Robert
The reason may be the new requirement of sending non-empty and non-zero (?) wpStarttime and wpEdittime parameters for all edit requests, including new page creations [1].
[1] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=42037
The reason may be the new requirement of sending non-empty and non-zero (?) wpStarttime and wpEdittime parameters for all edit requests, including new page creations [1].
[1] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=42037
That would do it. It breaks even today's svn version of the python wikipedia framework.
Did anyone even think that this might break something?
I think you'd best revert that for a while; anyone using pybot is going to have to get re-synced, and anyone using any other page loader is going to have to modify their code. Is going to take a while. (yes, it would be good to convert to the API now that it is available; that will take even longer ;-)
Robert
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Robert Ullmann wrote:
The reason may be the new requirement of sending non-empty and non-zero (?) wpStarttime and wpEdittime parameters for all edit requests, including new page creations [1].
[1] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=42037
That would do it. It breaks even today's svn version of the python wikipedia framework.
Did anyone even think that this might break something?
If you screen-scrape an HTML *user interface form* for your bot, YOU WILL GET BROKEN BY CHANGES, especially if you don't bother to even TRY to behave like an actual client (which would load all the required form variables from the edit page in the first place).
This particular change doesn't seem particularly necessary to me; the affected fields appear before the text field in the form, so should never be missing in an early submission from a browser.
I'll revert it for now...
- -- brion
On Sat, Oct 25, 2008 at 10:05 PM, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Robert Ullmann wrote:
The reason may be the new requirement of sending non-empty and non-zero (?) wpStarttime and wpEdittime parameters for all edit requests, including new page creations [1].
[1] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=42037
That would do it. It breaks even today's svn version of the python wikipedia framework.
Did anyone even think that this might break something?
If you screen-scrape an HTML *user interface form* for your bot, YOU WILL GET BROKEN BY CHANGES, especially if you don't bother to even TRY to behave like an actual client (which would load all the required form variables from the edit page in the first place).
Indeed. If you are using screenscraping, you should at least bother to run a HTML parser over the edit page. You need it anyway to get the content and other tokens. See http://mwclient.svn.sourceforge.net/viewvc/mwclient/trunk/mwclient/page_nowriteapi.py?revision=45&view=markup for an example.
2008/10/25 Bryan Tong Minh bryan.tongminh@gmail.com:
On Sat, Oct 25, 2008 at 10:05 PM, Brion Vibber brion@wikimedia.org wrote:
If you screen-scrape an HTML *user interface form* for your bot, YOU WILL GET BROKEN BY CHANGES, especially if you don't bother to even TRY to behave like an actual client (which would load all the required form variables from the edit page in the first place).
Indeed. If you are using screenscraping, you should at least bother to run a HTML parser over the edit page. You need it anyway to get the content and other tokens. See http://mwclient.svn.sourceforge.net/viewvc/mwclient/trunk/mwclient/page_nowriteapi.py?revision=45&view=markup for an example.
Good regexes save you the (memory) effort of an HTML parser. I used some really long one to give me all the fields necessary, now I use API for edits.
Marco
If you screen-scrape an HTML *user interface form* for your bot, YOU WILL GET BROKEN BY CHANGES, especially if you don't bother to even TRY to behave like an actual client (which would load all the required form variables from the edit page in the first place).
Look, I understand your frustration, but there wasn't an API for people to use. So lots of stuff (LOTS) was built on the GUI.
(And anyone who used query.php got burned in August; took a bit of time for some of those things to get fixed.)
People wrote code to submit forms with the parameters that were required; they didn't parse the whole effing HTML page.
No matter how awful that is; that was all they could do; there was no API. So the reality is, you are going to have to consider the effects of GUI changes on the very large existing set of code out there.
best, Robert
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Robert Ullmann wrote:
If you screen-scrape an HTML *user interface form* for your bot, YOU WILL GET BROKEN BY CHANGES, especially if you don't bother to even TRY to behave like an actual client (which would load all the required form variables from the edit page in the first place).
Look, I understand your frustration, but there wasn't an API for people to use. So lots of stuff (LOTS) was built on the GUI.
...and every time the forms change in a trivial way that every compliant client would survive just by submitting back the form fields it's given, people who wrote their bots with bad assumptions complain.
(And anyone who used query.php got burned in August; took a bit of time for some of those things to get fixed.)
Yeah, people only had a year or so to update their software to the new API!
People wrote code to submit forms with the parameters that were required; they didn't parse the whole effing HTML page.
This is why they break -- bad assumptions in the client code.
- -- brion
Brion,
That's all fine. And in the world we would like to live in, people would write compliant clients, and always use a strict HTML parser, and nothing would ever break.
That isn't the real world.
The world you have, that you have to deal with, is that in the absence of the present API until a year ago, and the edit API a month ago, people did whatever they could make work. And there is a huge amount out there. It may be frustrating as all hell, but it is reality.
For myself, I started with a copy of the python stuff about two years ago, fixing all sorts of cruft along the way. If something broke, I went and looked at the current copy (svn update on a different copy, not what I use). I've replaced various bits with API calls, particularly because doing a edit op when all you want is to *read* the page wikitext is silly.
In this case I looked at it, found the missing wpSection parameter easily, and fixed that. Then couldn't find the other. (Misza hadn't checked it in yet ;-). So I'm not complaining about the stuff I'm doing, but I am concerned about the 99.9% of the users that don't know how to just go fix it. A number of them are on the en.wikt and will turn to me for help ...
I'm sorry to say that it is simply a requirement that the GUI be treated as a stabilized API that cannot be changed without checking the effect on client software, and that this state of affairs will continue until the pybot et al are completely converted to the API, and time enough has passed that people have gotten the API version and are using it. This probably means a year or more.
I know you *really* don't like it. I wouldn't either. But the way it is, it is.
with my best regards, Robert
On Sun, Oct 26, 2008 at 8:04 AM, Robert Ullmann rlullmann@gmail.com wrote:
I'm sorry to say that it is simply a requirement that the GUI be treated as a stabilized API that cannot be changed without checking the effect on client software,
The client software for the GUI being primarily web browsers, which are all capable of understanding a new hidden field in the form because they parse the HTML.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Stephen Bain wrote:
On Sun, Oct 26, 2008 at 8:04 AM, Robert Ullmann rlullmann@gmail.com wrote:
I'm sorry to say that it is simply a requirement that the GUI be treated as a stabilized API that cannot be changed without checking the effect on client software,
The client software for the GUI being primarily web browsers, which are all capable of understanding a new hidden field in the form because they parse the HTML.
Of course, these fields were added about 6 years ago. ;)
But we reserve the right to add, change, rename, etc fields at any time. We have never, ever, EVER made any guarantee of stability in the GUI forms.
- -- brion
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
[snip bla bla bla]
The GUI is not a stable API, never was, never will be. As has always been the case, it has, does, and will change according to changing requirements.
If your client breaks due to making bad assumptions, it was, is, and will forever remain your responsibility to fix it.
That's the way it's been for the last 7 years -- this is nothing new. When your broken client's broken assumptions get broken, you need to update your client to a non-broken version.
As for this alleged "it'll take a year for people to upgrade", I call utter and complete BS. The fact that it's not working will pretty quickly clue users into the need to ugprade from the broken code to a non-broken version.
- -- brion
On Sat, Oct 25, 2008 at 5:04 PM, Robert Ullmann rlullmann@gmail.com wrote:
I'm sorry to say that it is simply a requirement that the GUI be treated as a stabilized API that cannot be changed without checking the effect on client software, and that this state of affairs will continue until the pybot et al are completely converted to the API, and time enough has passed that people have gotten the API version and are using it. This probably means a year or more.
The GUI is served in the format of an HTML form. It will always, guaranteed, be submittable as an HTML form. If you attempt to treat it as some chunk of regex-able text instead of an HTML form, your client *will* break, because it *is not* some chunk of regex-able text. Any client that ignores the standards governing the use of HTML forms and doesn't submit all fields is not something we are interested in supporting. If Pywikipediabot tries to submit the form but refuses to do it properly, it may break, and that's their problem for not doing it properly.
Of course, maybe it's inconvenient to have to parse an HTML form. I actually find it hard to believe that there aren't easily usable frameworks for doing this. But regardless, if your problem is that it's inconvenient to do something correctly, the fix you should request is a more convenient way to do it correctly, not a large and obnoxious (to us) change in the established semantics of the existing manner of submission. What you're asking is for us to inconvenience ourselves and give consideration to non-standard and non-endorsed manners of interacting with our software, for your convenience as an author of tools that interact with it, when there already exist standard and endorsed ways for you to accomplish your goals, and we're not going to do that.
If we had changed the semantics of existing required form fields (e.g. renaming "wpTextbox1"), you could complain with some legitimacy, because before very recently there was no possible way for you to write a bot that would be robust against such changes. But if you had written your bots correctly in the first place, there would have been no problem with hidden fields actually becoming required, because in the semantics of HTML you're already required to submit them, and you're ignoring that, and your tools are going to break because of it, and we aren't sympathetic.
For the future, clearly, you want to use the edit API and avoid all these issues.
On Sat, Oct 25, 2008 at 8:08 PM, Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com
wrote:
[snip]
For the future, clearly, you want to use the edit API and avoid all these issues.
Exactly. Roan specifically tries to make the API as non-changing as possible, so bots and other scripts can rely on it as being a stable interface to the software. Unlike the GUI, which changes very very regularly (and has been emphasized in many posts above as being a poor way to interact except by browsers), it rarely changes in a a breaking way. When it does change in a breaking or non-backwards- compatible way, I know Roan goes at great lengths to announce this to people so they are aware of the impending change. However, the API developers try to avoid breaking changes when at all possible (the issues with the SiteMatrix API module being a notable case, we're _still_ debating whether to change it in a breaking way to make it more sensible, see bug 14955).
-Chad
Aryeh Gregor wrote:
Of course, maybe it's inconvenient to have to parse an HTML form. I actually find it hard to believe that there aren't easily usable frameworks for doing this.
I don't know about Python, but for Perl at least HTML::Form will do it just fine (as long as you're careful with charsets, it doesn't handle those quite automatically). Or use WWW::Mechanize, which is a higher level interface to that and a bunch of other modules.
http://search.cpan.org/dist/WWW-Mechanize/
Actually, a few minutes of Googling suggest it has been ported to Python:
http://wwwsearch.sourceforge.net/mechanize/
Facts, order is meaningless :
# GUI changes are usually fixed in PYWP trunk within a couple of hours when reported properly. Not advocating for screen-scraping (no one does), but GUI changes are not such a *huge* problem. In fact, it actually forces users to *update* regularly to the latest trunk version: could sound like a blessing to me.
# Russell Blau is doing a wonderful job on a PYWP branch. He's rewriting the primitives to use the API. Most of the basic functions have been rewritten, and a bunch of unit tests are shipped with them: only the "famous" big scripts ( aka interwiki.py ) haven't been transfered to API... yet. He needs hands, and his last call for help [1] did not bring any new volunteers
# Some websites are still not using the API and yet do want to use robots. Sigh. They will have to make a choice, one day or another.
[1] http://lists.wikimedia.org/pipermail/pywikipedia-l/2008-October/004407.html
2008/10/26 Aryeh Gregor Simetrical+wikilist@gmail.com:
On Sat, Oct 25, 2008 at 5:04 PM, Robert Ullmann rlullmann@gmail.com wrote:
I'm sorry to say that it is simply a requirement that the GUI be treated as a stabilized API that cannot be changed without checking the effect on client software, and that this state of affairs will continue until the pybot et al are completely converted to the API, and time enough has passed that people have gotten the API version and are using it. This probably means a year or more.
The GUI is served in the format of an HTML form. It will always, guaranteed, be submittable as an HTML form. If you attempt to treat it as some chunk of regex-able text instead of an HTML form, your client *will* break, because it *is not* some chunk of regex-able text. Any client that ignores the standards governing the use of HTML forms and doesn't submit all fields is not something we are interested in supporting. If Pywikipediabot tries to submit the form but refuses to do it properly, it may break, and that's their problem for not doing it properly.
Of course, maybe it's inconvenient to have to parse an HTML form. I actually find it hard to believe that there aren't easily usable frameworks for doing this. But regardless, if your problem is that it's inconvenient to do something correctly, the fix you should request is a more convenient way to do it correctly, not a large and obnoxious (to us) change in the established semantics of the existing manner of submission. What you're asking is for us to inconvenience ourselves and give consideration to non-standard and non-endorsed manners of interacting with our software, for your convenience as an author of tools that interact with it, when there already exist standard and endorsed ways for you to accomplish your goals, and we're not going to do that.
If we had changed the semantics of existing required form fields (e.g. renaming "wpTextbox1"), you could complain with some legitimacy, because before very recently there was no possible way for you to write a bot that would be robust against such changes. But if you had written your bots correctly in the first place, there would have been no problem with hidden fields actually becoming required, because in the semantics of HTML you're already required to submit them, and you're ignoring that, and your tools are going to break because of it, and we aren't sympathetic.
For the future, clearly, you want to use the edit API and avoid all these issues.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Sat, Oct 25, 2008 at 9:34 PM, Robert Ullmann rlullmann@gmail.com wrote:
The reason may be the new requirement of sending non-empty and non-zero (?) wpStarttime and wpEdittime parameters for all edit requests, including new page creations [1].
[1] http://svn.wikimedia.org/viewvc/mediawiki?view=rev&revision=42037
That would do it. It breaks even today's svn version of the python wikipedia framework.
It was already fixed 8 hours ago by Misza.
Heh, didn't take a year for bots on .de to update to submit baseRevId either.
-------------------------------------------------- From: "Brion Vibber" brion@wikimedia.org Sent: Saturday, October 25, 2008 7:10 PM To: "Wikimedia developers" wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] what changed at 4:30 UTC this morning?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
[snip bla bla bla]
The GUI is not a stable API, never was, never will be. As has always been the case, it has, does, and will change according to changing requirements.
If your client breaks due to making bad assumptions, it was, is, and will forever remain your responsibility to fix it.
That's the way it's been for the last 7 years -- this is nothing new. When your broken client's broken assumptions get broken, you need to update your client to a non-broken version.
As for this alleged "it'll take a year for people to upgrade", I call utter and complete BS. The fact that it's not working will pretty quickly clue users into the need to ugprade from the broken code to a non-broken version.
- -- brion
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkkDp0sACgkQwRnhpk1wk45Z4gCgiTe3R5wBuTmAAa2iNsXaPboU gwQAoNqCsmsMFriS4XE0cDS4Z9JHVC9d =pM3O -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org