Hi all!
In my opinion, MediaWiki’s PHPCS ruleset feels largely rooted in an older version of PHP, where static type declarations (formerly known as “type hints”) did not exist. As we move towards more modern code, I think some rules should be relaxed, and others adjusted. More specifically, I’d like to know if most people agree with the following propositions and conclusion:
Proposition 1: *Some code is sufficiently documented by names and types*, and does not require additional documentation. Cases where additional documentation is required do certainly exist, but they can only be identified by human reviewers, not by automated tools.
You can see this in our existing code wherever a doc comment specifies only a type (with @var, @param, or @return), but no additional text. For example, in CreditsAction https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/actions/CreditsAction.php, nobody needs to be told that the LinkRenderer will be used to render links, or that the UserFactory creates User objects:
class CreditsAction extends FormlessAction {
/** @var LinkRenderer */
private $linkRenderer;
/** @var UserFactory */ private $userFactory;
Likewise, it’s not necessary to explain in great detail that the string returned by LinksTable::getTableName() https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/deferred/LinksUpdate/LinksTable.php#175 is the table name, that the $actor parameter of ActorCache::remove( UserIdentity $actor ) https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/user/ActorCache.php#103 represents the actor to remove from the cache, or what the meaning of the Message $m and returned MessageValue are in Message\Converter::convertMessage() https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/Message/Converter.php#48 :
/**
* Convert a Message to a MessageValue
* @param Message $m
* @return MessageValue
*/ public function convertMessage( Message $m ) {
(I want to clarify that in this last example I’m only talking about the @param and @return tags that already don’t have any prose text. While the method comment “Convert a Message to a MessageValue” might also be considered redundant, I think this would be more contentious, and I’m not advocating for removing that today.)
Proposition 2: *Adding types as static types is generally preferable.* Unlike doc comments, static types are checked at runtime and thus guaranteed to be correct (as long as the code runs at all); the small runtime cost should be partially offset by performance improvements in newer PHP versions, and otherwise considered to be worth it. New code should generally include static types where possible, and existing code may have static types added as part of other work on it. I believe this describes our current development practice as MediaWiki developers.
Note that some older MediaWiki classes are considered unsuitable for static types, and should only be used in comments; this is expected to help in a future migration away from these classes (see T240307#6191788 https://phabricator.wikimedia.org/T240307#6191788).
Proposition 3: *Where types can be losslessly represented as PHP static types, types in doc comments are unnecessary.* When doc comments are considered necessary for actual documentation beyond types, then the type is generally still included (and Phan will check that it matches the static type), but when no further documentation is needed (see proposition 1 above), then the @var, @param, etc. doc comment can be omitted.
Note that depending on the PHP version, not all types can be losslessly represented as PHP static types yet (e.g. union types and mixed both need to wait for PHP 8.0, null and false for PHP 8.2); in such cases, doc comments can remain necessary.
Conclusion: *We should update our PHPCS ruleset to require fewer doc comments.* Exact rules are probably to be decided, depending on how much work we’re willing to put into the sniff implementations (e.g. is it feasible to require /** @param */ doc comments only if a parameter has no static type?), but generally, I argue that we want code such as the following to be allowed by our standard PHPCS ruleset:
class CreditsAction extends FormlessAction {
private LinkRenderer $linkRenderer;
private UserFactory $userFactory;
/** Convert a Message to a MessageValue */
public function convertMessage( Message $m ): MessageValue {
When doc comments are still necessary or at least beneficial because the type alone isn’t enough information, it’s up to humans to decide this while writing the code or point it out during code review.
What do people think about this? :)
PS: In PHP 8, we could abbreviate some of this code even more using constructor property promotion:
class CreditsAction extends FormlessAction {
public function __construct(
Page $page,
IContextSource $context,
private LinkRenderer $linkRenderer,
private UserFactory $userFactory
) {
parent::__construct( $page, $context );
}
(Again, I’m not saying that all code should look like this – but I think we have plenty of existing code that effectively carries no additional information in its documentation, and which could be converted into this form without losing anything.)
Cheers, Lucas
pe 28. lokak. 2022 klo 17.04 Lucas Werkmeister ( lucas.werkmeister@wikimedia.de) kirjoitti:
In my opinion, MediaWiki’s PHPCS ruleset feels largely rooted in an older version of PHP, where static type declarations (formerly known as “type hints”) did not exist. As we move towards more modern code, I think some rules should be relaxed, and others adjusted. More specifically, I’d like to know if most people agree with the following propositions and conclusion:
I support relaxing the phpcs rules by default. We have already disabled some of these rules for new code in the Translate extension (ref https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Translate/+/refs/heads/master/.phpcs.xml#27) with the same reasoning you gave.
-Niklas
100% agree. Comments that do nothing but repeat what the code already says serve no purpose¹. I regularly remove these but can't when it means PHPCS starts complaining.
Typical example:
/** * @param User $user * @param array $rights * @return bool */ public function isAllowed( User $user, array $rights ): bool {
A comment like this does not even qualify as "documentation", I would argue. It does not contain any information. It's just a copy of the code.
¹ Ok, it can have a purpose: It marks the code as "this is all we have to say", similar to @inheritDoc. And it's easier to add actual documentation when there is already some scaffolding. On the other hand, all IDEs can auto-generate this when it's needed.
I wanted to update the relevant PHPCS rules for a long time but never managed to. The idea is relatively simple: When everything in a method signature does have a type (all parameters as well as a return type) then no sniff should complain about "missing" documentation. Same for class properties.
Later we could even _remove_ comments that don't have any additional information.
Best Thiemo
I agree, a lot of the doc comments become redundant when types are specified in type hints.
Note that doc comments for properties are actually not required, as far as I can tell – you could replace:
/** @var LinkRenderer */ private $linkRenderer;
…with:
private LinkRenderer $linkRenderer;
…today!, and no lint checks are going to stop you. I've been suggesting this for new code since we moved to PHP 7.4, which allows these type hints on properties.
We have already disabled some of these rules for new code in the Translate extension (ref https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Translate/+/refs/heads/master/.phpcs.xml#27 )
Interesting – we do something similar in some Wikibase codebases (ref https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/0a3b07a195/.phpcs.xml#8), but I didn’t know there were other extensions doing the same thing.
Note that doc comments for properties are actually not required, as far as
I can tell
I think that’s specific to MediaWiki core (ref https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/.phpcs.xml#8), and possibly other extensions that disable the MediaWiki.Commenting.PropertyDocumentation.MissingDocumentation(Public|Protected|Private) sniff – out of the box, I believe mediawiki-codesniffer requires property documentation at the moment.
Am Fr., 28. Okt. 2022 um 17:51 Uhr schrieb Bartosz Dziewoński < matma.rex@gmail.com>:
I agree, a lot of the doc comments become redundant when types are specified in type hints.
Note that doc comments for properties are actually not required, as far as I can tell – you could replace:
/** @var LinkRenderer */ private $linkRenderer;
…with:
private LinkRenderer $linkRenderer;
…today!, and no lint checks are going to stop you. I've been suggesting this for new code since we moved to PHP 7.4, which allows these type hints on properties.
-- Bartosz Dziewoński _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Thank you for suggesting this! I agree that type declaration is preferable to type documentation, and that type documentation is often redundant if type declaration is present.
However, we can't always use type declarations. For instance, union types are quite useful, since PHP doesn't have method overloading. And union type declarations will only become available in PHP 8. So we'll have a mix of declared and un-declared parameters and fields for a while. I think we should still require type documentation if there is no type declaration - and of course, if a method has any @param tags, it needs to have all of them.
Also there is the notable exception of the array type. Saying that something is an array is generally insufficient, we should say at least whether it's a list or an associative array, and document the type of the array elements or and/or well-known keys.
And we should be careful that we don't end up discouraging documentation of the meaning of a parameter. The barrier to adding some explanation of the meaning of a parameter is lower if there is already a @param string $name line. If I'd first have to create a doc block, I may just not add the documentation at all. We should still encourage having doc blocks in all but the most trivial cases (simple constructors, getters and setters probably don't need one).
-- daniel
PS: I'm not sure I like constructor argument property promotion... For very simple value objects that might be nice, but generally, I fear it will make it harder to see all fields declared on an object.
Am 28.10.2022 um 16:03 schrieb Lucas Werkmeister:
Hi all!
In my opinion, MediaWiki’s PHPCS ruleset feels largely rooted in an older version of PHP, where static type declarations (formerly known as “type hints”) did not exist. As we move towards more modern code, I think some rules should be relaxed, and others adjusted. More specifically, I’d like to know if most people agree with the following propositions and conclusion:
Proposition 1: */Some/ code is sufficiently documented by names and types*, and does not require additional documentation. Cases where additional documentation is required do certainly exist, but they can only be identified by human reviewers, not by automated tools.
You can see this in our existing code wherever a doc comment specifies only a type (with @var, @param, or @return), but no additional text. For example, in CreditsAction https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/actions/CreditsAction.php, nobody needs to be told that the LinkRenderer will be used to render links, or that the UserFactory creates User objects:
class CreditsAction extends FormlessAction {
/** @var LinkRenderer */
private $linkRenderer;
/** @var UserFactory */
private $userFactory;
Likewise, it’s not necessary to explain in great detail that the string returned by LinksTable::getTableName() https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/deferred/LinksUpdate/LinksTable.php#175 is the table name, that the $actor parameter of ActorCache::remove( UserIdentity $actor ) https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/user/ActorCache.php#103 represents the actor to remove from the cache, or what the meaning of the Message $m and returned MessageValue are in Message\Converter::convertMessage() https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/Message/Converter.php#48:
/**
Convert a Message to a MessageValue
@param Message $m
@return MessageValue
*/
public function convertMessage( Message $m ) {
(I want to clarify that in this last example I’m only talking about the @param and @return tags that already don’t have any prose text. While the method comment “Convert a Message to a MessageValue” might also be considered redundant, I think this would be more contentious, and I’m not advocating for removing that today.)
Proposition 2: *Adding types as static types is generally preferable.* Unlike doc comments, static types are checked at runtime and thus guaranteed to be correct (as long as the code runs at all); the small runtime cost should be partially offset by performance improvements in newer PHP versions, and otherwise considered to be worth it. New code should generally include static types where possible, and existing code may have static types added as part of other work on it. I believe this describes our current development practice as MediaWiki developers.
Note that some older MediaWiki classes are considered unsuitable for static types, and should only be used in comments; this is expected to help in a future migration away from these classes (see T240307#6191788 https://phabricator.wikimedia.org/T240307#6191788).
Proposition 3: *Where types can be losslessly represented as PHP static types, types in doc comments are unnecessary.* When doc comments are considered necessary for actual documentation beyond types, then the type is generally still included (and Phan will check that it matches the static type), but when no further documentation is needed (see proposition 1 above), then the @var, @param, etc. doc comment can be omitted.
Note that depending on the PHP version, not all types can be losslessly represented as PHP static types yet (e.g. union types and mixed both need to wait for PHP 8.0, null and false for PHP 8.2); in such cases, doc comments can remain necessary.
Conclusion: *We should update our PHPCS ruleset to require fewer doc comments.* Exact rules are probably to be decided, depending on how much work we’re willing to put into the sniff implementations (e.g. is it feasible to require /** @param */ doc comments only if a parameter has no static type?), but generally, I argue that we want code such as the following to be allowed by our standard PHPCS ruleset:
class CreditsAction extends FormlessAction {
private LinkRenderer $linkRenderer;
private UserFactory $userFactory;
/** Convert a Message to a MessageValue */
public function convertMessage( Message $m ): MessageValue {
When doc comments are still necessary or at least beneficial because the type alone isn’t enough information, it’s up to humans to decide this while writing the code or point it out during code review.
What do people think about this? :)
PS: In PHP 8, we could abbreviate some of this code even more using constructor property promotion:
class CreditsAction extends FormlessAction {
public function __construct(
Page $page,
IContextSource $context,
private LinkRenderer $linkRenderer,
private UserFactory $userFactory
) {
parent::__construct( $page, $context );
}
(Again, I’m not saying that all code should look like this – but I think we have plenty of existing code that effectively carries no additional information in its documentation, and which could be converted into this form without losing anything.)
Cheers, Lucas
-- Lucas Werkmeister (he/er) Software Engineer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30-577 11 62-0 https://wikimedia.de
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikitech-l mailing list --wikitech-l@lists.wikimedia.org To unsubscribe send an email towikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Hi, A great initiative, thank you!
I am generally in favor of this proposal but just want to give a cautionary tale. It's a bit off-topic but important. Given that there is no actual enforcing mechanism for the documentation typehints, some of them have actually drifted from reality. I caused a UBN bug once by relying on the documentation for the type of a variable. So my request is to avoid mass migration of documentation type hints to php type declaration.
Best
Am So., 30. Okt. 2022 um 14:02 Uhr schrieb Daniel Kinzler < dkinzler@wikimedia.org>:
Thank you for suggesting this! I agree that type declaration is preferable to type documentation, and that type documentation is often redundant if type declaration is present.
However, we can't always use type declarations. For instance, union types are quite useful, since PHP doesn't have method overloading. And union type declarations will only become available in PHP 8. So we'll have a mix of declared and un-declared parameters and fields for a while. I think we should still require type documentation if there is no type declaration - and of course, if a method has any @param tags, it needs to have all of them.
Also there is the notable exception of the array type. Saying that something is an array is generally insufficient, we should say at least whether it's a list or an associative array, and document the type of the array elements or and/or well-known keys. And we should be careful that we don't end up discouraging documentation of the meaning of a parameter. The barrier to adding some explanation of the meaning of a parameter is lower if there is already a @param string $name line. If I'd first have to create a doc block, I may just not add the documentation at all. We should still encourage having doc blocks in all but the most trivial cases (simple constructors, getters and setters probably don't need one).
-- daniel
PS: I'm not sure I like constructor argument property promotion... For very simple value objects that might be nice, but generally, I fear it will make it harder to see all fields declared on an object. Am 28.10.2022 um 16:03 schrieb Lucas Werkmeister:
Hi all!
In my opinion, MediaWiki’s PHPCS ruleset feels largely rooted in an older version of PHP, where static type declarations (formerly known as “type hints”) did not exist. As we move towards more modern code, I think some rules should be relaxed, and others adjusted. More specifically, I’d like to know if most people agree with the following propositions and conclusion:
Proposition 1: *Some code is sufficiently documented by names and types*, and does not require additional documentation. Cases where additional documentation is required do certainly exist, but they can only be identified by human reviewers, not by automated tools.
You can see this in our existing code wherever a doc comment specifies only a type (with @var, @param, or @return), but no additional text. For example, in CreditsAction https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/actions/CreditsAction.php, nobody needs to be told that the LinkRenderer will be used to render links, or that the UserFactory creates User objects:
class CreditsAction extends FormlessAction {
/** @var LinkRenderer */
private $linkRenderer;
/** @var UserFactory */ private $userFactory;
Likewise, it’s not necessary to explain in great detail that the string returned by LinksTable::getTableName() https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/deferred/LinksUpdate/LinksTable.php#175 is the table name, that the $actor parameter of ActorCache::remove( UserIdentity $actor ) https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/user/ActorCache.php#103 represents the actor to remove from the cache, or what the meaning of the Message $m and returned MessageValue are in Message\Converter::convertMessage() https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/Message/Converter.php#48 :
/**
Convert a Message to a MessageValue
@param Message $m
@return MessageValue
*/ public function convertMessage( Message $m ) {
(I want to clarify that in this last example I’m only talking about the @param and @return tags that already don’t have any prose text. While the method comment “Convert a Message to a MessageValue” might also be considered redundant, I think this would be more contentious, and I’m not advocating for removing that today.)
Proposition 2: *Adding types as static types is generally preferable.* Unlike doc comments, static types are checked at runtime and thus guaranteed to be correct (as long as the code runs at all); the small runtime cost should be partially offset by performance improvements in newer PHP versions, and otherwise considered to be worth it. New code should generally include static types where possible, and existing code may have static types added as part of other work on it. I believe this describes our current development practice as MediaWiki developers.
Note that some older MediaWiki classes are considered unsuitable for static types, and should only be used in comments; this is expected to help in a future migration away from these classes (see T240307#6191788 https://phabricator.wikimedia.org/T240307#6191788).
Proposition 3: *Where types can be losslessly represented as PHP static types, types in doc comments are unnecessary.* When doc comments are considered necessary for actual documentation beyond types, then the type is generally still included (and Phan will check that it matches the static type), but when no further documentation is needed (see proposition 1 above), then the @var, @param, etc. doc comment can be omitted.
Note that depending on the PHP version, not all types can be losslessly represented as PHP static types yet (e.g. union types and mixed both need to wait for PHP 8.0, null and false for PHP 8.2); in such cases, doc comments can remain necessary.
Conclusion: *We should update our PHPCS ruleset to require fewer doc comments.* Exact rules are probably to be decided, depending on how much work we’re willing to put into the sniff implementations (e.g. is it feasible to require /** @param */ doc comments only if a parameter has no static type?), but generally, I argue that we want code such as the following to be allowed by our standard PHPCS ruleset:
class CreditsAction extends FormlessAction {
private LinkRenderer $linkRenderer;
private UserFactory $userFactory;
/** Convert a Message to a MessageValue */
public function convertMessage( Message $m ): MessageValue {
When doc comments are still necessary or at least beneficial because the type alone isn’t enough information, it’s up to humans to decide this while writing the code or point it out during code review.
What do people think about this? :)
PS: In PHP 8, we could abbreviate some of this code even more using constructor property promotion:
class CreditsAction extends FormlessAction {
public function __construct(
Page $page, IContextSource $context, private LinkRenderer $linkRenderer, private UserFactory $userFactory
) {
parent::__construct( $page, $context );
}
(Again, I’m not saying that all code should look like this – but I think we have plenty of existing code that effectively carries no additional information in its documentation, and which could be converted into this form without losing anything.)
Cheers, Lucas
-- Lucas Werkmeister (he/er) Software Engineer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30-577 11 62-0 https://wikimedia.de
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.orghttps://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Daniel Kinzler Principal Software Engineer, Platform Engineering Wikimedia Foundation
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
+1 to moving towards more type hints
And just to expand on Amir's comment, giving a type hint for return values is usually safe because you can see exactly what types are possible, but parameter type hints can be dangerous unless you look at all usages of the function. Nulls can be especially surprising, passing a null into a "String" parameter for example will throw a fatal TypeError, the hint would need to be "?String" to accept null and distinguishing optional parameters isn't always done in the phpdoc comments.
-Adam
On 10/30/22 19:10, Amir Sarabadani wrote:
Hi, A great initiative, thank you!
I am generally in favor of this proposal but just want to give a cautionary tale. It's a bit off-topic but important. Given that there is no actual enforcing mechanism for the documentation typehints, some of them have actually drifted from reality. I caused a UBN bug once by relying on the documentation for the type of a variable. So my request is to avoid mass migration of documentation type hints to php type declaration.
Best
Am So., 30. Okt. 2022 um 14:02 Uhr schrieb Daniel Kinzler dkinzler@wikimedia.org:
Thank you for suggesting this! I agree that type declaration is preferable to type documentation, and that type documentation is often redundant if type declaration is present. However, we can't always use type declarations. For instance, union types are quite useful, since PHP doesn't have method overloading. And union type declarations will only become available in PHP 8. So we'll have a mix of declared and un-declared parameters and fields for a while. I think we should still require type documentation if there is no type declaration - and of course, if a method has any @param tags, it needs to have all of them. Also there is the notable exception of the array type. Saying that something is an array is generally insufficient, we should say at least whether it's a list or an associative array, and document the type of the array elements or and/or well-known keys. And we should be careful that we don't end up discouraging documentation of the meaning of a parameter. The barrier to adding some explanation of the meaning of a parameter is lower if there is already a @param string $name line. If I'd first have to create a doc block, I may just not add the documentation at all. We should still encourage having doc blocks in all but the most trivial cases (simple constructors, getters and setters probably don't need one). -- daniel PS: I'm not sure I like constructor argument property promotion... For very simple value objects that might be nice, but generally, I fear it will make it harder to see all fields declared on an object. Am 28.10.2022 um 16:03 schrieb Lucas Werkmeister:
Hi all! In my opinion, MediaWiki’s PHPCS ruleset feels largely rooted in an older version of PHP, where static type declarations (formerly known as “type hints”) did not exist. As we move towards more modern code, I think some rules should be relaxed, and others adjusted. More specifically, I’d like to know if most people agree with the following propositions and conclusion: Proposition 1: */Some/ code is sufficiently documented by names and types*, and does not require additional documentation. Cases where additional documentation is required do certainly exist, but they can only be identified by human reviewers, not by automated tools. You can see this in our existing code wherever a doc comment specifies only a type (with @var, @param, or @return), but no additional text. For example, in CreditsAction <https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/actions/CreditsAction.php>, nobody needs to be told that the LinkRenderer will be used to render links, or that the UserFactory creates User objects: class CreditsAction extends FormlessAction { /** @var LinkRenderer */ private $linkRenderer; /** @var UserFactory */ private $userFactory; Likewise, it’s not necessary to explain in great detail that the string returned by LinksTable::getTableName() <https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/deferred/LinksUpdate/LinksTable.php#175> is the table name, that the $actor parameter of ActorCache::remove( UserIdentity $actor ) <https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/user/ActorCache.php#103> represents the actor to remove from the cache, or what the meaning of the Message $m and returned MessageValue are in Message\Converter::convertMessage() <https://gerrit.wikimedia.org/g/mediawiki/core/+/de752f45af/includes/Message/Converter.php#48>: /** * Convert a Message to a MessageValue * @param Message $m * @return MessageValue */ public function convertMessage( Message $m ) { (I want to clarify that in this last example I’m only talking about the @param and @return tags that already don’t have any prose text. While the method comment “Convert a Message to a MessageValue” might also be considered redundant, I think this would be more contentious, and I’m not advocating for removing that today.) Proposition 2: *Adding types as static types is generally preferable.* Unlike doc comments, static types are checked at runtime and thus guaranteed to be correct (as long as the code runs at all); the small runtime cost should be partially offset by performance improvements in newer PHP versions, and otherwise considered to be worth it. New code should generally include static types where possible, and existing code may have static types added as part of other work on it. I believe this describes our current development practice as MediaWiki developers. Note that some older MediaWiki classes are considered unsuitable for static types, and should only be used in comments; this is expected to help in a future migration away from these classes (see T240307#6191788 <https://phabricator.wikimedia.org/T240307#6191788>). Proposition 3: *Where types can be losslessly represented as PHP static types, types in doc comments are unnecessary.* When doc comments are considered necessary for actual documentation beyond types, then the type is generally still included (and Phan will check that it matches the static type), but when no further documentation is needed (see proposition 1 above), then the @var, @param, etc. doc comment can be omitted. Note that depending on the PHP version, not all types can be losslessly represented as PHP static types yet (e.g. union types and mixed both need to wait for PHP 8.0, null and false for PHP 8.2); in such cases, doc comments can remain necessary. Conclusion: *We should update our PHPCS ruleset to require fewer doc comments.* Exact rules are probably to be decided, depending on how much work we’re willing to put into the sniff implementations (e.g. is it feasible to require /** @param */ doc comments only if a parameter has no static type?), but generally, I argue that we want code such as the following to be allowed by our standard PHPCS ruleset: class CreditsAction extends FormlessAction { private LinkRenderer $linkRenderer; private UserFactory $userFactory; /** Convert a Message to a MessageValue */ public function convertMessage( Message $m ): MessageValue { When doc comments are still necessary or at least beneficial because the type alone isn’t enough information, it’s up to humans to decide this while writing the code or point it out during code review. What do people think about this? :) PS: In PHP 8, we could abbreviate some of this code even more using constructor property promotion: class CreditsAction extends FormlessAction { public function __construct( Page $page, IContextSource $context, private LinkRenderer $linkRenderer, private UserFactory $userFactory ) { parent::__construct( $page, $context ); } (Again, I’m not saying that all code should look like this – but I think we have plenty of existing code that effectively carries no additional information in its documentation, and which could be converted into this form without losing anything.) Cheers, Lucas -- Lucas Werkmeister (he/er) Software Engineer Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30-577 11 62-0 https://wikimedia.de Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. _______________________________________________ Wikitech-l mailing list --wikitech-l@lists.wikimedia.org To unsubscribe send an email towikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Daniel Kinzler Principal Software Engineer, Platform Engineering Wikimedia Foundation _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Amir (he/him)
Wikitech-l mailing list --wikitech-l@lists.wikimedia.org To unsubscribe send an email towikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Daniel Kinzler dkinzler@wikimedia.org writes:
Also there is the notable exception of the array type. Saying that something is an array is generally insufficient, we should say at least whether it’s a list or an associative array, and document the type of the array elements or and/or well-known keys.
+1, there is also \stdClass which we use in a few places. I believe IDEs will attempt to create placeholders for all parameters as well, if you start documenting a single one.
And we should be careful that we don’t end up discouraging documentation of the meaning of a parameter. The barrier to adding some explanation of the meaning of a parameter is lower if there is already a @param string $name line. If I’d first have to create a doc block, I may just not add the documentation at all. We should still encourage having doc blocks in all but the most trivial cases (simple constructors, getters and setters probably don’t need one).
I agree with this as well.
Kosta
– daniel
PS: I’m not sure I like constructor argument property promotion… For very simple value objects that might be nice, but generally, I fear it will make it harder to see all fields declared on an object.
Am 28.10.2022 um 16:03 schrieb Lucas Werkmeister:
Hi all!
In my opinion, MediaWiki’s PHPCS ruleset feels largely rooted in an older version of PHP, where static type declarations (formerly known as “type hints”) did not exist. As we move towards more modern code, I think some rules should be relaxed, and others adjusted. More specifically, I’d like to know if most people agree with the following propositions and conclusion:
Proposition 1: Some code is sufficiently documented by names and types, and does not require additional documentation. Cases where additional documentation is required do certainly exist, but they can only be identified by human reviewers, not by automated tools.
You can see this in our existing code wherever a doc comment specifies only a type (with @var, @param, or @return), but no additional text. For example, in CreditsAction, nobody needs to be told that the LinkRenderer will be used to render links, or that the UserFactory creates User objects:
class CreditsAction extends FormlessAction {
/** @var LinkRenderer */
private $linkRenderer;
/** @var UserFactory */
private $userFactory;
Likewise, it’s not necessary to explain in great detail that the string returned by LinksTable::getTableName() is the table name, that the $actor parameter of ActorCache::remove( UserIdentity $actor ) represents the actor to remove from the cache, or what the meaning of the Message $m and returned MessageValue are in Message\Converter::convertMessage():
/**
Convert a Message to a MessageValue
@param Message $m
@return MessageValue
*/
public function convertMessage( Message $m ) {
(I want to clarify that in this last example I’m only talking about the @param and @return tags that already don’t have any prose text. While the method comment “Convert a Message to a MessageValue” might also be considered redundant, I think this would be more contentious, and I’m not advocating for removing that today.)
Proposition 2: Adding types as static types is generally preferable. Unlike doc comments, static types are checked at runtime and thus guaranteed to be correct (as long as the code runs at all); the small runtime cost should be partially offset by performance improvements in newer PHP versions, and otherwise considered to be worth it. New code should generally include static types where possible, and existing code may have static types added as part of other work on it. I believe this describes our current development practice as MediaWiki developers.
Note that some older MediaWiki classes are considered unsuitable for static types, and should only be used in comments; this is expected to help in a future migration away from these classes (see T240307#6191788).
Proposition 3: Where types can be losslessly represented as PHP static types, types in doc comments are unnecessary. When doc comments are considered necessary for actual documentation beyond types, then the type is generally still included (and Phan will check that it matches the static type), but when no further documentation is needed (see proposition 1 above), then the @var, @param, etc. doc comment can be omitted.
Note that depending on the PHP version, not all types can be losslessly represented as PHP static types yet (e.g. union types and mixed both need to wait for PHP 8.0, null and false for PHP 8.2); in such cases, doc comments can remain necessary.
Conclusion: We should update our PHPCS ruleset to require fewer doc comments. Exact rules are probably to be decided, depending on how much work we’re willing to put into the sniff implementations (e.g. is it feasible to require /** @param */ doc comments only if a parameter has no static type?), but generally, I argue that we want code such as the following to be allowed by our standard PHPCS ruleset:
class CreditsAction extends FormlessAction {
private LinkRenderer $linkRenderer;
private UserFactory $userFactory;
/** Convert a Message to a MessageValue */
public function convertMessage( Message $m ): MessageValue {
When doc comments are still necessary or at least beneficial because the type alone isn’t enough information, it’s up to humans to decide this while writing the code or point it out during code review.
What do people think about this? :)
PS: In PHP 8, we could abbreviate some of this code even more using constructor property promotion:
class CreditsAction extends FormlessAction {
public function __construct(
Page $page, IContextSource $context, private LinkRenderer $linkRenderer, private UserFactory $userFactory
) {
parent::__construct( $page, $context );
}
(Again, I’m not saying that all code should look like this – but I think we have plenty of existing code that effectively carries no additional information in its documentation, and which could be converted into this form without losing anything.)
Cheers, Lucas
– Lucas Werkmeister (he/er) Software Engineer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30-577 11 62-0 https://wikimedia.de
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
Wikitech-l mailing list – wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
On 29/10/22 01:03, Lucas Werkmeister wrote:
Proposition 2: *Adding types as static types is generally preferable.* Unlike doc comments, static types are checked at runtime and thus guaranteed to be correct (as long as the code runs at all); the small runtime cost should be partially offset by performance improvements in newer PHP versions, and otherwise considered to be worth it. New code should generally include static types where possible, and existing code may have static types added as part of other work on it. I believe this describes our current development practice as MediaWiki developers.
I generally don't add return type declarations to methods that I add, and I have pushed back on CR requests to add them, except when the exception is reachable, i.e. a TypeError may actually be thrown if the class is misused by an external caller. The reasons for this are clutter and performance.
Clutter, because it's redundant to add a return type declaration when the return type is already in the doc comment. If we stop requiring doc comments as you propose, then fine, add a return type declaration to methods with no doc comment. But if there is a doc comment, an additional return type declaration just pads out the file for no reason.
The performance impact is measurable for hot functions. In gerrit 820244 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/820244 I removed parameter type declarations from a private method for a benchmark improvement of 2%.
I would prefer to return the performance benefits of newer PHP versions to our users, rather than to fully consume them ourselves by increasing abstraction.
-- Tim Starling
Am 10.11.2022 um 03:08 schrieb Tim Starling:
Clutter, because it's redundant to add a return type declaration when the return type is already in the doc comment. If we stop requiring doc comments as you propose, then fine, add a return type declaration to methods with no doc comment. But if there is a doc comment, an additional return type declaration just pads out the file for no reason.
I agree that we shouldn't have redundant doc tags and return type declarations. I would suggest that all methods should have a return type declaration, but should not have a @return doc tag unless there is additional info to be provided. For example, when the declared return type may is array, we may still want to document @return string[] or @return array<int,PageIdentity>.
The performance impact is measurable for hot functions. In gerrit 820244 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/820244 I removed parameter type declarations from a private method for a benchmark improvement of 2%.
This raises an interesting issue, one that has bitten me before: How do we know that a given method is "hot"? Maybe we should establish a @hot or @performance tag to indicate that a given method should be optimized for speed. Is it possible to make phpcs smart enough that it would apply different rules if a method is marked as @hot? In that case, perhaps the use of type hints for parameters and the return type should be discouraged rather than encouraged.
On Tue, 15 Nov 2022, at 11:41, Daniel Kinzler wrote:
Am 10.11.2022 um 03:08 schrieb Tim Starling:
Clutter, because it's redundant to add a return type declaration when the return type is already in the doc comment. If we stop requiring doc comments as you propose, then fine, add a return type declaration to methods with no doc comment. But if there is a doc comment, an additional return type declaration just pads out the file for no reason.
I agree that we shouldn't have redundant doc tags and return type declarations. I would suggest that all methods should have a return type declaration, but should not have a @return doc tag unless there is additional info […]
The performance impact is measurable for hot functions. In gerrit 820244 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/820244 I removed parameter type declarations from a private method for a benchmark improvement of 2%.
This raises an interesting issue, one that has bitten me before: How do we know that a given method is "hot"? Maybe we should establish a @hot or @performance tag to indicate that a given method should be optimized for speed. […]
I think the enforced and automated codesniffer could remain fairly simple: As today, the sniff encourages all methods to have parameter and return types documented in a way that humans, Phan, and IDEs can understand for static analysis to avoid and catch mistakes.
What I propose we change is that instead of enforcing this solely through a mandatory doc comment, enforce it by requiring at least one of them to be present. Either parameters and returns are typed, or a doc block exists. Both may exist, of course.
We've established in this email thread that it can be cluttering (and waste of effort) to require repeating of information when doing so adds no value. It is also my understanding that Phan and IDEs already understand either and both so we don't need them to be aware of which "should" exist.
Is there value in enforcing removal of existing doc blocks after someone has written it? This seems to me like potentially a significant time sink with no return on that other because we enforced it as a new rule. If we agree there is no urgency in removing existing doc blocks or actively blocking CI when someone choose to write a doc block, then afaik we do not need new annotations like "hot" or "performance" or some other tag to surpress warnings about doc blocks.
I do think it is important to preserve author intent when it comes to performance optimisations. However these are by no means limited to this new notion of saving native type overhead. There are all sorts of code optimisations. I believe we typically document these through an inline comment like "Optimization: ..." next to the code in question, in which the need for optimisation and sometimes (if non-obvious) how that optimisation is achieved, are mentioend. That should suffice I think in preserving the use case and e.g. prevent someone from re-introducing typing where it was previously removed for perf reasons.
In other words: Codesniffer helps us avoid unknown types (in docblock and/or native type), and inline comments remind us about past performance optimisations. Do we need more? If so, what is the benefit/usecase for more? What do we risk if we don't?
-- Timo
What Timo writes sounds good to me. I don’t think we need to enforce removal of existing code blocks – that can happen gradually as the code is touched for other reasons instead.
Regarding performance, I think we should limit micro-optimizations like omitting static types to code that we know is hot; for most of our code, I think the benefit of avoiding bugs that can be caught or prevented by static types is more important. We already have some hot code that uses a more performance-oriented code style (e.g. RemexHtml “sometimes use[s] direct member access instead of going through accessors, and manually inline[s] some performance-sensitive code”), but we don’t use that code style everywhere.
Aside: I would also hope that PHP will eventually learn some optimizations that are taken for granted in many other languages, such as inlining setters/getters and other short methods, or (more relevant to this thread) eliminating runtime type checks when they can be statically proven. But I definitely don’t have the skills to push that forward, so I won’t complain about it too much.
I’ll try to find some time soon-ish to look into MediaWiki CodeSniffer and see if some improvements can be implemented without too much trouble.
Am Mi., 16. Nov. 2022 um 01:00 Uhr schrieb Krinkle krinkle@fastmail.com:
On Tue, 15 Nov 2022, at 11:41, Daniel Kinzler wrote:
Am 10.11.2022 um 03:08 schrieb Tim Starling:
Clutter, because it's redundant to add a return type declaration when the return type is already in the doc comment. If we stop requiring doc comments as you propose, then fine, add a return type declaration to methods with no doc comment. But if there is a doc comment, an additional return type declaration just pads out the file for no reason.
I agree that we shouldn't have redundant doc tags and return type declarations. I would suggest that all methods should have a return type declaration, but should not have a @return doc tag unless there is additional info […]
The performance impact is measurable for hot functions. In gerrit 820244 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/820244 I removed parameter type declarations from a private method for a benchmark improvement of 2%.
This raises an interesting issue, one that has bitten me before: How do we know that a given method is "hot"? Maybe we should establish a @hot or @performance tag to indicate that a given method should be optimized for speed. […]
I think the enforced and automated codesniffer could remain fairly simple: As today, the sniff encourages all methods to have parameter and return types documented in a way that humans, Phan, and IDEs can understand for static analysis to avoid and catch mistakes.
What I propose we change is that instead of enforcing this solely through a mandatory doc comment, enforce it by requiring at least one of them to be present. Either parameters and returns are typed, or a doc block exists. Both may exist, of course.
We've established in this email thread that it can be cluttering (and waste of effort) to require repeating of information when doing so adds no value. It is also my understanding that Phan and IDEs already understand either and both so we don't need them to be aware of which "should" exist.
Is there value in enforcing removal of existing doc blocks after someone has written it? This seems to me like potentially a significant time sink with no return on that other because we enforced it as a new rule. If we agree there is no urgency in removing existing doc blocks or actively blocking CI when someone choose to write a doc block, then afaik we do not need new annotations like "hot" or "performance" or some other tag to surpress warnings about doc blocks.
I do think it is important to preserve author intent when it comes to performance optimisations. However these are by no means limited to this new notion of saving native type overhead. There are all sorts of code optimisations. I believe we typically document these through an inline comment like "Optimization: ..." next to the code in question, in which the need for optimisation and sometimes (if non-obvious) how that optimisation is achieved, are mentioend. That should suffice I think in preserving the use case and e.g. prevent someone from re-introducing typing where it was previously removed for perf reasons.
In other words: Codesniffer helps us avoid unknown types (in docblock and/or native type), and inline comments remind us about past performance optimisations. Do we need more? If so, what is the benefit/usecase for more? What do we risk if we don't?
-- Timo
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
I uploaded some Gerrit changes for the code sniffer: chain starts at change 865650 https://gerrit.wikimedia.org/r/c/mediawiki/tools/codesniffer/+/865650. Reviews welcome!
With those changes combined, *fully* typed functions (all parameters + return type) and typed properties are allowed (but not required, of course) to have no doc comment, or to have doc comments without @param / @return. However, partial @param are not allowed, since I think that would be too confusing – if some parameter can’t have a static type yet,¹ then the phpdoc has to list all the params (with their types).
¹: incidentally, PHP 8.2 was released today https://www.php.net/archive/2022.php#2022-12-08-1 (h/t Reedy), with support for the types true, false, and null.
Am Mi., 16. Nov. 2022 um 17:17 Uhr schrieb Lucas Werkmeister < lucas.werkmeister@wikimedia.de>:
What Timo writes sounds good to me. I don’t think we need to enforce removal of existing code blocks – that can happen gradually as the code is touched for other reasons instead.
Regarding performance, I think we should limit micro-optimizations like omitting static types to code that we know is hot; for most of our code, I think the benefit of avoiding bugs that can be caught or prevented by static types is more important. We already have some hot code that uses a more performance-oriented code style (e.g. RemexHtml “sometimes use[s] direct member access instead of going through accessors, and manually inline[s] some performance-sensitive code”), but we don’t use that code style everywhere.
Aside: I would also hope that PHP will eventually learn some optimizations that are taken for granted in many other languages, such as inlining setters/getters and other short methods, or (more relevant to this thread) eliminating runtime type checks when they can be statically proven. But I definitely don’t have the skills to push that forward, so I won’t complain about it too much.
I’ll try to find some time soon-ish to look into MediaWiki CodeSniffer and see if some improvements can be implemented without too much trouble.
Am Mi., 16. Nov. 2022 um 01:00 Uhr schrieb Krinkle krinkle@fastmail.com:
On Tue, 15 Nov 2022, at 11:41, Daniel Kinzler wrote:
Am 10.11.2022 um 03:08 schrieb Tim Starling:
Clutter, because it's redundant to add a return type declaration when the return type is already in the doc comment. If we stop requiring doc comments as you propose, then fine, add a return type declaration to methods with no doc comment. But if there is a doc comment, an additional return type declaration just pads out the file for no reason.
I agree that we shouldn't have redundant doc tags and return type declarations. I would suggest that all methods should have a return type declaration, but should not have a @return doc tag unless there is additional info […]
The performance impact is measurable for hot functions. In gerrit 820244 https://gerrit.wikimedia.org/r/c/mediawiki/core/+/820244 I removed parameter type declarations from a private method for a benchmark improvement of 2%.
This raises an interesting issue, one that has bitten me before: How do we know that a given method is "hot"? Maybe we should establish a @hot or @performance tag to indicate that a given method should be optimized for speed. […]
I think the enforced and automated codesniffer could remain fairly simple: As today, the sniff encourages all methods to have parameter and return types documented in a way that humans, Phan, and IDEs can understand for static analysis to avoid and catch mistakes.
What I propose we change is that instead of enforcing this solely through a mandatory doc comment, enforce it by requiring at least one of them to be present. Either parameters and returns are typed, or a doc block exists. Both may exist, of course.
We've established in this email thread that it can be cluttering (and waste of effort) to require repeating of information when doing so adds no value. It is also my understanding that Phan and IDEs already understand either and both so we don't need them to be aware of which "should" exist.
Is there value in enforcing removal of existing doc blocks after someone has written it? This seems to me like potentially a significant time sink with no return on that other because we enforced it as a new rule. If we agree there is no urgency in removing existing doc blocks or actively blocking CI when someone choose to write a doc block, then afaik we do not need new annotations like "hot" or "performance" or some other tag to surpress warnings about doc blocks.
I do think it is important to preserve author intent when it comes to performance optimisations. However these are by no means limited to this new notion of saving native type overhead. There are all sorts of code optimisations. I believe we typically document these through an inline comment like "Optimization: ..." next to the code in question, in which the need for optimisation and sometimes (if non-obvious) how that optimisation is achieved, are mentioend. That should suffice I think in preserving the use case and e.g. prevent someone from re-introducing typing where it was previously removed for perf reasons.
In other words: Codesniffer helps us avoid unknown types (in docblock and/or native type), and inline comments remind us about past performance optimisations. Do we need more? If so, what is the benefit/usecase for more? What do we risk if we don't?
-- Timo
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
-- Lucas Werkmeister (he/er) Software Engineer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30-577 11 62-0 https://wikimedia.de
Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.
wikitech-l@lists.wikimedia.org