Hi DanB,
Comments inline:
On Thu, Jan 12, 2012 at 8:31 AM, Daniel Barrett danb@vistaprint.com wrote:
As MediaWiki 1.19 is getting ready, I'd like to offer information on how MediaWiki 1.18.0 was the most difficult MW upgrade I've ever been through.
Some background: my team administers an internal wiki at a major company with ~2000 users, over 100 extensions (many of them custom/unreleased), and 100K articles. I've been upgrading MW regularly since 1.11 - every release and patch - and have never had this much trouble before, mainly because of extensions that broke in 1.18. The typical MW upgrade takes me a day or two including regression-testing our extensions. But 1.18 has taken me weeks and I'm still not done.
Ugh...sorry to hear that.
This message is meant to be constructive & helpful, not blameful: it's
quite possible that every issue was "our fault" for not keeping up on exactly which functions & globals were being deprecated, etc. I'd just like to describe what kinds of things broke for a reasonably active wiki run by well-meaning people, and to document how we fixed them.
This is very helpful, thank you.
I understand your frustration on a lot of these points, and I hope we can do better in future releases. A lot of the problems you point out here are issues where we broke backwards compatibility without really good reason to do so. It's a tough balance, because we also want to reduce our technical debt, but I think we're probably too haphazard in our approach to nuking and modifying interfaces.
There's a few things that folks like yourself can do to avoid these surprises 1. Look through your logs for deprecation warnings now and when you get 1.18 fully running 2. Start testing 1.19 (trunk) *now* rather than waiting for the release. You may be able to catch a gratuitous interface change while there's still time to revert it, saving yourself the trouble of updating your code and saving others from going through the breakage you're experiencing now. 3. Release the source code to your extension, either directly on our site, or on github/gitorious/wherever in anticipation of being able to mirror your work in our shiny new Git repo. Our devs are generally pretty good about updating extensions that are checked into our repository 4. If you can't release the source for whatever reason, help write unit tests for the APIs that matter to you, so that you can track when they break or are changed. 5. If you don't have time to help write unit tests, help identify those APIs you'd like to see have unit tests. I don't know if we have a central place to collect "most wanted unit tests", but I'm sure something like that could be started if you're interested in participating at that level.
I vaguely remember some of the changes you outline below, and I think some of them even stung us during the 1.18 deploy. I'm interested in understanding better why these changes were made.
More inline:
The global variable $action disappeared, breaking a bunch of our
extensions. I switched to $wgRequest->getVal('action').
I'll assume Chad is correct that this was never intended to be a stable global.
The removal of Xml::hidden() caused one of our extensions to
break. I switched to Xml::input(..., array('type', 'hidden'))
This one bit us during the 1.18 release cycle, and it looks like we fixed it for ourselves: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/97784
...but forgot to also put it back in 1.18 (and trunk for that matter). Aaron made that fix in the middle of the rather hectic 1.18 deployment cycle, so I can see why we missed it, but it's still a shame. Given that this will probably also bite us in 1.19, we should probably backport to trunk, and REL1_18 for those people that haven't upgraded yet.
3. A few of our older extensions were not ported to ResourceLoader yet
and were adding JS and CSS via $wgOut->add... calls. They worked in 1.17 and all broke in 1.18. I ported them to use ResourceLoader, but this is not a good solution yet because of bug 31676 (the 32-stylesheet limit of IE, https://bugzilla.wikimedia.org/show_bug.cgi?id=31676) which IMHO is a very serious time-bomb waiting to explode. I hope it makes it into "1.19wmf deployment" as planned.
Is this all versions of IE?
4. Some of our parser tag extensions had a bug, in that they didn't
return a value in the tag callback. (These tags had no visual display.) This didn't cause problems in 1.17 and earlier, but in 1.18.0 it caused a UNIQ.....QINU string to render on the page. I fixed our extensions to return the empty string, and the problem went away.
Yup, it's going to be difficult for us to make MediaWiki releases bug-for-bug compatible on extensions.
5. The removal of $wgMessageCache->addMessage() broke many extensions,
some ours and some from mediawiki.org like SimpleForms. Some fixes just required use of the i18n file. Our more difficult issue was that we were injecting system messages into articles to add tracking categories. On advice from this list (thanks!), we used code patterned after Parser::addTrackingCategory() to inject categories and it works fine, actually much better than what we had.
I see the change here: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/81027
...but it doesn't look like there was much in the way of mailing list discussion about this, and I also don't see that the README was updated when this change was made.
Chad claims this was on a clear path to deprecation, but I'm not so sure it was, based on the history of this page: http://www.mediawiki.org/wiki/Manual:$wgMessageCache
It looks like it was marked for deprecation here: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/52503
..which means it was deprecated starting in 1.16. That's old, but not ancient history. Ubuntu 10.04 ships with MediaWiki 1.15, and that's the current LTS version of Ubuntu. While I understand all of the arguments for downloading our tarball rather than using the installed package, it's reasonable to expect that someone would have the same version of MediaWiki that ships with their LTS box, regardless of where they got it. There may be a better litmus than this, but this one seems to be a good compromise between what seems to be the expectation around these parts ("1.16? that's o-o-o-ld") and other more conservative, less predicable distros (e.g. RHEL and Debian) that are commonly used in production environments.
This bug appears to be one consequence of this: https://bugzilla.wikimedia.org/show_bug.cgi?id=32962
I'm not saying that we can never break backwards compatibility with whatever reasonable litmus we choose, but that we should do so much more reluctantly than we currently do it. There should be a tangible, compelling feature (or complicated bug fix) that results from the breakage, not merely cleanup.
6. The removal of ts_makeSortable() from wikibits.js threw off a bunch
of our JavaScript: we were using the function to sort on a different column than the first one on render, and in extensions that create tables within dialogs. We left the problem unfixed until I can understand the new jQuery UI way of doing things (jquery.ui.sortable.js).
Yup, we ended up getting hit with some table sorting bugs, too, some of which may not be solved on our wikis.
Nearly 100% of our customizations to WikiEditor 1.17 broke in
1.18. We had followed the documented rules on mediawiki.org, using extensions, ResourceLoader, etc., and everything worked in 1.17. Nevertheless in 1.18, toolbars and menus disappeared in IE. Menus appeared multiple times instead of once in Firefox. JavaScript objects in one module became undefined in others, even with proper dependencies. Some of these issues are still not worked out, but most were fixed by a variety of changes.
The dust is probably still settling on Resource Loader in many ways, so it's not too surprising that there are problems here. It may be that there were some changes that could have been postponed or done in a more backwards-compatible way, but without getting into the details, I can't say that confidently.
The fact that these are problems in a specific browser is indicative of problems that may just be tough to avoid. We've found it's hard enough making sure core functionality works between releases in a cross browser way, let alone trying to make sure arbitrary developer modifications also work.
Our MediaWiki:common.js stopped running on the login page. I
realize this was a security fix; it just took me by surprise. Fixed by writing a custom extension using the hook UserLoginForm to inject the few lines of JS we needed, and I'm evaluating other non-JS solutions for more security.
Yeah, those are always going to be ugly.
The addHandler() function in JavaScript does not seem to work in
IE8 anymore. We worked around this by using jQuery's "bind" function.
Is there a bug for this problem?
At this point, our test wiki is stable and I am not anticipating any
further large issues, so we should roll out in the next two weeks or so.
Glad to hear this is still on track!
Thanks for reading, and I hope this helps someone,
Very helpful, and I hope this results in a good conversation.
Rob
I think we should reconsider how we deal with backward compatibility, I am one of people who believe that software should be kept backward compatible as long as it's technically not causing troubles. It isn't hard for dev to flag a variable or function as deprecated for some reason and completely drop a support for it, but it should not automatically become a standard for every function for which it is possible to make a better alternative, rather than that we should consider rewriting the body of old code to somehow use the new code and keep the deprecated code working. I don't think it's best idea to just drop support for all sw which has been using some functions / variables whatever we just flagged as "deprecated". Especially not when it comes to parseable output like api, xml, json etc. I noticed that latest release of semantic wiki changed behavior of json output which probably broke lot of tools.
It is generally bad programmer behavior to drop support for old software frequently, although it makes a dev life easier (I know myself that keeping support for old sw is annoying, while we can concentrate on new cool sw and is best just to forget we ever had some old version), however it probably became a standard for most of open source sw I know, apart of linux (which still has a very good support for old versions of sw / drivers etc.)
Maybe we should consider keeping old extensions / tools / browsers and such work even with newest version of sw we work on (like mediawiki) and think more about the design of some output, so that we don't need to change it in future (and in case we would need to, probably keep it possible to retrieve also deprecated output style)
On Thu, Jan 12, 2012 at 11:05 PM, Rob Lanphier robla@wikimedia.org wrote:
Hi DanB,
Comments inline:
On Thu, Jan 12, 2012 at 8:31 AM, Daniel Barrett danb@vistaprint.com wrote:
As MediaWiki 1.19 is getting ready, I'd like to offer information on how MediaWiki 1.18.0 was the most difficult MW upgrade I've ever been through.
Some background: my team administers an internal wiki at a major company with ~2000 users, over 100 extensions (many of them custom/unreleased), and 100K articles. I've been upgrading MW regularly since 1.11 - every release and patch - and have never had this much trouble before, mainly because of extensions that broke in 1.18. The typical MW upgrade takes me a day or two including regression-testing our extensions. But 1.18 has taken me weeks and I'm still not done.
Ugh...sorry to hear that.
This message is meant to be constructive & helpful, not blameful: it's
quite possible that every issue was "our fault" for not keeping up on exactly which functions & globals were being deprecated, etc. I'd just like to describe what kinds of things broke for a reasonably active wiki run by well-meaning people, and to document how we fixed them.
This is very helpful, thank you.
I understand your frustration on a lot of these points, and I hope we can do better in future releases. A lot of the problems you point out here are issues where we broke backwards compatibility without really good reason to do so. It's a tough balance, because we also want to reduce our technical debt, but I think we're probably too haphazard in our approach to nuking and modifying interfaces.
There's a few things that folks like yourself can do to avoid these surprises
- Look through your logs for deprecation warnings now and when you get
1.18 fully running 2. Start testing 1.19 (trunk) *now* rather than waiting for the release. You may be able to catch a gratuitous interface change while there's still time to revert it, saving yourself the trouble of updating your code and saving others from going through the breakage you're experiencing now. 3. Release the source code to your extension, either directly on our site, or on github/gitorious/wherever in anticipation of being able to mirror your work in our shiny new Git repo. Our devs are generally pretty good about updating extensions that are checked into our repository 4. If you can't release the source for whatever reason, help write unit tests for the APIs that matter to you, so that you can track when they break or are changed. 5. If you don't have time to help write unit tests, help identify those APIs you'd like to see have unit tests. I don't know if we have a central place to collect "most wanted unit tests", but I'm sure something like that could be started if you're interested in participating at that level.
I vaguely remember some of the changes you outline below, and I think some of them even stung us during the 1.18 deploy. I'm interested in understanding better why these changes were made.
More inline:
- The global variable $action disappeared, breaking a bunch of our
extensions. I switched to $wgRequest->getVal('action').
I'll assume Chad is correct that this was never intended to be a stable global.
- The removal of Xml::hidden() caused one of our extensions to
break. I switched to Xml::input(..., array('type', 'hidden'))
This one bit us during the 1.18 release cycle, and it looks like we fixed it for ourselves: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/97784
...but forgot to also put it back in 1.18 (and trunk for that matter). Aaron made that fix in the middle of the rather hectic 1.18 deployment cycle, so I can see why we missed it, but it's still a shame. Given that this will probably also bite us in 1.19, we should probably backport to trunk, and REL1_18 for those people that haven't upgraded yet.
- A few of our older extensions were not ported to ResourceLoader yet
and were adding JS and CSS via $wgOut->add... calls. They worked in 1.17 and all broke in 1.18. I ported them to use ResourceLoader, but this is not a good solution yet because of bug 31676 (the 32-stylesheet limit of IE, https://bugzilla.wikimedia.org/show_bug.cgi?id=31676) which IMHO is a very serious time-bomb waiting to explode. I hope it makes it into "1.19wmf deployment" as planned.
Is this all versions of IE?
- Some of our parser tag extensions had a bug, in that they didn't
return a value in the tag callback. (These tags had no visual display.) This didn't cause problems in 1.17 and earlier, but in 1.18.0 it caused a UNIQ.....QINU string to render on the page. I fixed our extensions to return the empty string, and the problem went away.
Yup, it's going to be difficult for us to make MediaWiki releases bug-for-bug compatible on extensions.
- The removal of $wgMessageCache->addMessage() broke many extensions,
some ours and some from mediawiki.org like SimpleForms. Some fixes just required use of the i18n file. Our more difficult issue was that we were injecting system messages into articles to add tracking categories. On advice from this list (thanks!), we used code patterned after Parser::addTrackingCategory() to inject categories and it works fine, actually much better than what we had.
I see the change here: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/81027
...but it doesn't look like there was much in the way of mailing list discussion about this, and I also don't see that the README was updated when this change was made.
Chad claims this was on a clear path to deprecation, but I'm not so sure it was, based on the history of this page: http://www.mediawiki.org/wiki/Manual:$wgMessageCache
It looks like it was marked for deprecation here: http://www.mediawiki.org/wiki/Special:Code/MediaWiki/52503
..which means it was deprecated starting in 1.16. That's old, but not ancient history. Ubuntu 10.04 ships with MediaWiki 1.15, and that's the current LTS version of Ubuntu. While I understand all of the arguments for downloading our tarball rather than using the installed package, it's reasonable to expect that someone would have the same version of MediaWiki that ships with their LTS box, regardless of where they got it. There may be a better litmus than this, but this one seems to be a good compromise between what seems to be the expectation around these parts ("1.16? that's o-o-o-ld") and other more conservative, less predicable distros (e.g. RHEL and Debian) that are commonly used in production environments.
This bug appears to be one consequence of this: https://bugzilla.wikimedia.org/show_bug.cgi?id=32962
I'm not saying that we can never break backwards compatibility with whatever reasonable litmus we choose, but that we should do so much more reluctantly than we currently do it. There should be a tangible, compelling feature (or complicated bug fix) that results from the breakage, not merely cleanup.
- The removal of ts_makeSortable() from wikibits.js threw off a bunch
of our JavaScript: we were using the function to sort on a different column than the first one on render, and in extensions that create tables within dialogs. We left the problem unfixed until I can understand the new jQuery UI way of doing things (jquery.ui.sortable.js).
Yup, we ended up getting hit with some table sorting bugs, too, some of which may not be solved on our wikis.
- Nearly 100% of our customizations to WikiEditor 1.17 broke in
1.18. We had followed the documented rules on mediawiki.org, using extensions, ResourceLoader, etc., and everything worked in 1.17. Nevertheless in 1.18, toolbars and menus disappeared in IE. Menus appeared multiple times instead of once in Firefox. JavaScript objects in one module became undefined in others, even with proper dependencies. Some of these issues are still not worked out, but most were fixed by a variety of changes.
The dust is probably still settling on Resource Loader in many ways, so it's not too surprising that there are problems here. It may be that there were some changes that could have been postponed or done in a more backwards-compatible way, but without getting into the details, I can't say that confidently.
The fact that these are problems in a specific browser is indicative of problems that may just be tough to avoid. We've found it's hard enough making sure core functionality works between releases in a cross browser way, let alone trying to make sure arbitrary developer modifications also work.
- Our MediaWiki:common.js stopped running on the login page. I
realize this was a security fix; it just took me by surprise. Fixed by writing a custom extension using the hook UserLoginForm to inject the few lines of JS we needed, and I'm evaluating other non-JS solutions for more security.
Yeah, those are always going to be ugly.
- The addHandler() function in JavaScript does not seem to work in
IE8 anymore. We worked around this by using jQuery's "bind" function.
Is there a bug for this problem?
At this point, our test wiki is stable and I am not anticipating any
further large issues, so we should roll out in the next two weeks or so.
Glad to hear this is still on track!
Thanks for reading, and I hope this helps someone,
Very helpful, and I hope this results in a good conversation.
Rob _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Rob, thanks for your comprehensive and thoughtful reply.
... bug 31676 (the 32-stylesheet limit of IE, https://bugzilla.wikimedia.org/show_bug.cgi?id=31676) which IMHO is a very serious time-bomb waiting to explode. I hope it makes it into "1.19wmf deployment" as planned.
Is this all versions of IE?
Yes indeed.
The addHandler() function in JavaScript does not seem to work in
IE8 anymore. We worked around this by using jQuery's "bind" function.
Is there a bug for this problem?
I haven't had time to produce a minimal test case to show that it happens, just a huge blob of code, so I haven't filed a bug report. I did mention it in this list and got the advice to use ready() (sorry, I wrote "bind" above):
http://lists.wikimedia.org/pipermail/wikitech-l/2011-December/057166.html
DanB
wikitech-l@lists.wikimedia.org