RFC: Refactoring the Title object

List overview All Threads
Download

newer

older

Re: [Wikitech-l] [Engineering]...

Wikimedia Bugzilla Guided Bug...

Rob Lanphier

10 Oct 2013 10 Oct '13

10:10 p.m.

Hi folks,

I think Daniel buried the lede here (see his mail below), so I'm mailing this out with a subject line that will hopefully provoke more discussion. :-)

This is an RFC we originally conceived at the Hong Kong Wikimania architecture discussion. The notes from that are here: https://www.mediawiki.org/wiki/Architecture_guidelines/Meetings/Wikimania_20...

There has been discussion on this RFC here, so far only between Tim and Daniel: https://www.mediawiki.org/wiki/Talk:Requests_for_comment/TitleValue

It would be good to get some other voices in that conversation.

Rob ---------- Forwarded message ---------- From: Daniel Kinzler daniel@brightbyte.de Date: Tue, Oct 1, 2013 at 10:18 AM Subject: [Wikitech-l] RFC: TitleValue To: Wikimedia developers wikitech-l@lists.wikimedia.org

[Re-posting, since my original post apparently never got through. Maybe I posted from the wrong email account.]

Hi all!

As discussed at the MediaWiki Architecture session at Wikimania, I have created an RFC for the TitleValue class, which could be used to replace the heavy-weight Title class in many places. The idea is to show case the advantages (and difficulties) of using true "value objects" as opposed to "active records". The idea being that hair should not know how to cut itself.

You can find the proposal here: https://www.mediawiki.org/wiki/Requests_for_comment/TitleValue

Any feedback would be greatly appreciated.

-- daniel

PS: I have included the some parts of the proposal below, to give a quick impression.

------------------------------------------------------------------------------

== Motivation ==

The old Title class is huge and has many dependencies. It relies on global state for things like namespace resolution and permission checks. It requires a database connection for caching.

This makes it hard to use Title objects in a different context, such as unit tests. Which in turn makes it quite difficult to write any clean unit tests (not using any global state) for MediaWiki since Title objects are required as parameters by many classes.

In a more fundamental sense, the fact that Title has so many dependencies, and everything that uses a Title object inherits all of these dependencies, means that the MediaWiki codebase as a whole has highly "tangled" dependencies, and it is very hard to use individual classes separately.

Instead of trying to refactor and redefine the Title class, this proposal suggest to introduce an alternative class that can be used instead of Title object to represent the title of a wiki page. The implementation of the old Title class should be changed to rely on the new code where possible, but its interface and behavior should not change.

== Architecture ==

The proposed architecture consists of three parts, initially:

# The TitleValue class itself. As a value object, this has no knowledge about namespaces, permissions, etc. It does not support normalization either, since that would require knowledge about the local configuration.

# A TitleParser service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a string into a TitleValue should require a TitleParser service as a constructor argument (dependency injection). Should that not be possible, a TitleParser can be obtained from a global registry.

# A TitleFormatter service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a TitleValue into a string should require a TitleFormatter service as a constructor argument (dependency injection). Should that not be possible, a TitleFormatter can be obtained from a global registry.

_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Show replies by date

Tim Starling

24 Oct 24 Oct

8:09 a.m.

I'm inclined to accept this RFC, except perhaps with minor changes. It would be nice if I could get some comments on it from someone other than me or Daniel, but if nobody else has anything to say, I will just accept it.

-- Tim Starling

On 11/10/13 03:40, Rob Lanphier wrote:

...

Hi folks,

I think Daniel buried the lede here (see his mail below), so I'm mailing this out with a subject line that will hopefully provoke more discussion. :-)

This is an RFC we originally conceived at the Hong Kong Wikimania architecture discussion. The notes from that are here: https://www.mediawiki.org/wiki/Architecture_guidelines/Meetings/Wikimania_20...

There has been discussion on this RFC here, so far only between Tim and Daniel: https://www.mediawiki.org/wiki/Talk:Requests_for_comment/TitleValue

It would be good to get some other voices in that conversation.

Rob ---------- Forwarded message ---------- From: Daniel Kinzler daniel@brightbyte.de Date: Tue, Oct 1, 2013 at 10:18 AM Subject: [Wikitech-l] RFC: TitleValue To: Wikimedia developers wikitech-l@lists.wikimedia.org

[Re-posting, since my original post apparently never got through. Maybe I posted from the wrong email account.]

Hi all!

As discussed at the MediaWiki Architecture session at Wikimania, I have created an RFC for the TitleValue class, which could be used to replace the heavy-weight Title class in many places. The idea is to show case the advantages (and difficulties) of using true "value objects" as opposed to "active records". The idea being that hair should not know how to cut itself.

You can find the proposal here: https://www.mediawiki.org/wiki/Requests_for_comment/TitleValue

Any feedback would be greatly appreciated.

-- daniel

PS: I have included the some parts of the proposal below, to give a quick impression.

== Motivation ==

The old Title class is huge and has many dependencies. It relies on global state for things like namespace resolution and permission checks. It requires a database connection for caching.

This makes it hard to use Title objects in a different context, such as unit tests. Which in turn makes it quite difficult to write any clean unit tests (not using any global state) for MediaWiki since Title objects are required as parameters by many classes.

In a more fundamental sense, the fact that Title has so many dependencies, and everything that uses a Title object inherits all of these dependencies, means that the MediaWiki codebase as a whole has highly "tangled" dependencies, and it is very hard to use individual classes separately.

Instead of trying to refactor and redefine the Title class, this proposal suggest to introduce an alternative class that can be used instead of Title object to represent the title of a wiki page. The implementation of the old Title class should be changed to rely on the new code where possible, but its interface and behavior should not change.

== Architecture ==

The proposed architecture consists of three parts, initially:

# The TitleValue class itself. As a value object, this has no knowledge about namespaces, permissions, etc. It does not support normalization either, since that would require knowledge about the local configuration.

# A TitleParser service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a string into a TitleValue should require a TitleParser service as a constructor argument (dependency injection). Should that not be possible, a TitleParser can be obtained from a global registry.

# A TitleFormatter service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a TitleValue into a string should require a TitleFormatter service as a constructor argument (dependency injection). Should that not be possible, a TitleFormatter can be obtained from a global registry.

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Chad

8:24 a.m.

On Wed, Oct 23, 2013 at 7:39 PM, Tim Starling tstarling@wikimedia.orgwrote:

...

I'm inclined to accept this RFC, except perhaps with minor changes. It would be nice if I could get some comments on it from someone other than me or Daniel, but if nobody else has anything to say, I will just accept it.

I've personally not been convinced of this Value/Parser/Formatter pattern but if I'm the only one I'll just keep my big mouth shut.

-Chad

Tyler Romeo

9:14 a.m.

On Wed, Oct 23, 2013 at 10:54 PM, Chad innocentkiller@gmail.com wrote:

...

I've personally not been convinced of this Value/Parser/Formatter pattern but if I'm the only one I'll just keep my big mouth shut

I agree with this as well. The idea behind this RFC is the "hair can't cut itself" pattern. However, a value object needs to be easily serializable. So what representation is used for serializing a TitleValue? It can't be the display title or DB key since that's part of the TitleFormatter class.

*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science

Chad

9:34 a.m.

On Wed, Oct 23, 2013 at 8:44 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...

On Wed, Oct 23, 2013 at 10:54 PM, Chad innocentkiller@gmail.com wrote:

...
I've personally not been convinced of this Value/Parser/Formatter pattern but if I'm the only one I'll just keep my big mouth shut

I agree with this as well. The idea behind this RFC is the "hair can't cut itself" pattern.

I wish my hair could cut itself ;-)

...

However, a value object needs to be easily serializable. So what representation is used for serializing a TitleValue? It can't be the display title or DB key since that's part of the TitleFormatter class.

I would assume something like the prefixed text form. As long as the TitleValue knows its namespace and title you can guess the rest.

-Chad

Tyler Romeo

9:55 a.m.

On Thu, Oct 24, 2013 at 12:04 AM, Chad innocentkiller@gmail.com wrote:

...

I would assume something like the prefixed text form. As long as the TitleValue knows its namespace and title you can guess the rest.

Maybe, but the RFC says, "As a value object, this has no knowledge about namespaces, permissions, etc.". I think there comes a point when you have to acknowledge that some properties of Title objects are indeed part of the value object.

*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science

Chad

10:09 a.m.

On Wed, Oct 23, 2013 at 9:25 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...

On Thu, Oct 24, 2013 at 12:04 AM, Chad innocentkiller@gmail.com wrote:

...
I would assume something like the prefixed text form. As long as the TitleValue knows its namespace and title you can guess the rest.

Maybe, but the RFC says, "As a value object, this has no knowledge about namespaces, permissions, etc.". I think there comes a point when you have to acknowledge that some properties of Title objects are indeed part of the value object.

I read that as namespaces other than the one you're obviously referring to with a TitleValue--so Category:Foo only knows about Category:, not Project: or User:.

Maybe someone can clarify here?

-Chad

Daniel Friesen

10:57 a.m.

On 2013-10-23 9:39 PM, Chad wrote:

...

On Wed, Oct 23, 2013 at 9:25 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...
On Thu, Oct 24, 2013 at 12:04 AM, Chad innocentkiller@gmail.com wrote:

...
I would assume something like the prefixed text form. As long as the TitleValue knows its namespace and title you can guess the rest.

Maybe, but the RFC says, "As a value object, this has no knowledge about namespaces, permissions, etc.". I think there comes a point when you have to acknowledge that some properties of Title objects are indeed part of the value object.

I read that as namespaces other than the one you're obviously referring to with a TitleValue--so Category:Foo only knows about Category:, not Project: or User:.

Maybe someone can clarify here?

-Chad

I read it as TitleValue doesn't know about texual namespaces like Category:, Project:, or User:. But just contains an integer namespace such as `4`.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

Brad Jorsch (Anomie)

9:49 p.m.

On Thu, Oct 24, 2013 at 12:04 AM, Chad innocentkiller@gmail.com wrote:

...

On Wed, Oct 23, 2013 at 8:44 PM, Tyler Romeo tylerromeo@gmail.com wrote:

...
On Wed, Oct 23, 2013 at 10:54 PM, Chad innocentkiller@gmail.com wrote:

...
I've personally not been convinced of this Value/Parser/Formatter

pattern

...
...
but if I'm the only one I'll just keep my big mouth shut

I agree with this as well. The idea behind this RFC is the "hair can't

cut

...
itself" pattern.

I wish my hair could cut itself ;-)

Maybe my hair can't cut itself, but I can cut my own hair without having to find a Barber instance. ;)

Looking at the RFC itself, it does seem to be following the "kingdom of nouns" model Maxsem linked to. And it has a maze of twisty little objects, all different, that tends to concern me in these proposals. Why do we need a separate TitleParser and TitleFormatter? They use the same configuration data to do complementary operations. And then there's talk about making "TitleFormatter" a one-method class that has subclasses for every different way to format a title, and about having a subclass for wikilinks (huh?) which has subclasses for internal and external and interwiki links (wtf?), and so on. High-level methods to actually do anything useful like "move a page" or "parse some wikitext" seem like they are going to need to take dozens of these objects to be able to actually do what they need to do. (And yes, I know the argument is that then the useful function should be split into multiple smaller functions with fewer arguments, to which I reply "you just pushed the problem up the call stack, you didn't solve it").

So then the RFC proposes a "ServiceRegistry" to try to get around this problem of everything requiring dozens of arguments. But after observing that RequestContext already *is* this, it then argues against using RequestContext on the basis of additional dependencies. The logical extension of this argument is that we're not really talking about a "ServiceRegistry" here, we're talking about a "TitleServiceRegistry" and then we'd also have a "RevisionServiceRegistery", "PageServiceRegistry", "UserServiceRegistry", and so on, trying to pretend there is no forest because we can point to all the individual trees. Why not have one ServiceRegistry, maybe even part of RequestContext, and use it preferentially for passing services instead of just as a fallback? Do we really have the need for multiple differently-configured services all over the place in "real" code that we need to be passing them around individually, or is it just for testing where we could as easily solve the problem by creating a registry instance that throws errors for all the services not needed for each test?

Somewhere in this thread someone claimed that we shouldn't have code that processes user input and then does something and then formats the output, presumably because each of those should be their own "tree". So how then does any end user actually do anything? At the top level something *does* have to do all three things, even if it does it by calling into various "trees" and each of those trees has its own branches and each branch has its own twigs and so on.

The claimed problem behind a lot of this is "too many dependencies" making things hard to test and the idea that you can somehow make this go away by dividing everything into tinier and tinier pieces. To some extent this works, but at the cost of making the system as a whole harder to understand because you have to track all the little pieces. I doubt MediaWiki has reached the point of diminishing returns on that, but I'm not really sure that the end-goal envisioned here is the *right* division.

Daniel Friesen

10:02 p.m.

On 2013-10-24 9:19 AM, Brad Jorsch (Anomie) wrote:

...

The claimed problem behind a lot of this is "too many dependencies" making things hard to test and the idea that you can somehow make this go away by dividing everything into tinier and tinier pieces. To some extent this works, but at the cost of making the system as a whole harder to understand because you have to track all the little pieces. I doubt MediaWiki has reached the point of diminishing returns on that, but I'm not really sure that the end-goal envisioned here is the *right* division.

Then there's the potential that individually testing a bunch of components doesn't always match what happens in the actually used code that puts everything together. So in the end you either have to also test the thing as a whole anyways. Or you end up with unit tests that are even less useful than you started because they won't catch regressions they would've originally.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

Tim Starling

25 Oct 25 Oct

11:50 a.m.

On 25/10/13 03:32, Daniel Friesen wrote:

...

On 2013-10-24 9:19 AM, Brad Jorsch (Anomie) wrote:

...
The claimed problem behind a lot of this is "too many dependencies" making things hard to test and the idea that you can somehow make this go away by dividing everything into tinier and tinier pieces. To some extent this works, but at the cost of making the system as a whole harder to understand because you have to track all the little pieces. I doubt MediaWiki has reached the point of diminishing returns on that, but I'm not really sure that the end-goal envisioned here is the *right* division.

Then there's the potential that individually testing a bunch of components doesn't always match what happens in the actually used code that puts everything together. So in the end you either have to also test the thing as a whole anyways. Or you end up with unit tests that are even less useful than you started because they won't catch regressions they would've originally.

Intuitively, it seems likely that if we reduce the typical scope and integration level of automated testing, we will end up missing bugs, because bugs which were previously testable due to being within a single unit will now be spread over multiple units. Presumably this effect is offset by an increase in test coverage.

However, I would prefer not to make such decisions by intuition alone. It would be nice to see some solid statistical data on this.

-- Tim Starling

Tim Starling

11:25 a.m.

On 25/10/13 03:19, Brad Jorsch (Anomie) wrote:

...

So then the RFC proposes a "ServiceRegistry" to try to get around this problem of everything requiring dozens of arguments. But after observing that RequestContext already *is* this, it then argues against using RequestContext on the basis of additional dependencies. The logical extension of this argument is that we're not really talking about a "ServiceRegistry" here, we're talking about a "TitleServiceRegistry" and then we'd also have a "RevisionServiceRegistery", "PageServiceRegistry", "UserServiceRegistry", and so on, trying to pretend there is no forest because we can point to all the individual trees. Why not have one ServiceRegistry, maybe even part of RequestContext, and use it preferentially for passing services instead of just as a fallback?

The proposal as I understand it is just for a single ServiceRegistry class, rather than TitleServiceRegistry etc. I raised similar concerns at

https://www.mediawiki.org/wiki/Talk:Requests_for_comment/TitleValue#ServiceRegistry_singleton_33334

Daniel's response was: "You are right that the ServiceRegistry is again bundling all the services in a single object. The point is to restrict access to ServiceRegistry to "bootstrapping code" (e.g. in static hook handlers) only. RequestContext is used all over the code, so all the code is dependent on anything in the RequestContext. That makes me very very reluctant to add more stuff to it."

...

Do we really have the need for multiple differently-configured services all over the place in "real" code that we need to be passing them around individually, or is it just for testing where we could as easily solve the problem by creating a registry instance that throws errors for all the services not needed for each test?

Division of the Title class could help with features such as:

* Dealing with titles in foreign wikis, with a different set of namespaces and first letter capitalisation to the request context wiki

* The URL customisation needs of features like action=render and DumpHTML

...

The claimed problem behind a lot of this is "too many dependencies" making things hard to test and the idea that you can somehow make this go away by dividing everything into tinier and tinier pieces. To some extent this works, but at the cost of making the system as a whole harder to understand because you have to track all the little pieces. I doubt MediaWiki has reached the point of diminishing returns on that, but I'm not really sure that the end-goal envisioned here is the *right* division.

Since the Title class is large to start with (4895 lines) I figure it can tolerate being cut up a few times without the resulting pieces becoming tiny. So, it makes an interesting test case for the approach. We will see the consequences with more clarity once the code reaches Gerrit for review, if we allow it to proceed that far.

-- Tim Starling

Antoine Musso

5:36 p.m.

Le 25/10/13 07:55, Tim Starling a écrit :

...

Since the Title class is large to start with (4895 lines) I figure it can tolerate being cut up a few times without the resulting pieces becoming tiny. So, it makes an interesting test case for the approach. We will see the consequences with more clarity once the code reaches Gerrit for review, if we allow it to proceed that far.

Would it make sense to cut a TitleValue branch as a playground? If we can get some code before the January summit, that would give us all a bit more experience.

-- Antoine "hashar" Musso

Max Semenik

24 Oct 24 Oct

3:10 p.m.

On 24.10.2013, 6:54 Chad wrote:

...

On Wed, Oct 23, 2013 at 7:39 PM, Tim Starling tstarling@wikimedia.orgwrote:

...

...
I'm inclined to accept this RFC, except perhaps with minor changes. It would be nice if I could get some comments on it from someone other than me or Daniel, but if nobody else has anything to say, I will just accept it.

I've personally not been convinced of this Value/Parser/Formatter pattern but if I'm the only one I'll just keep my big mouth shut.

My main problem with this proposal is that it aims to make us all very good typists, by making every popular operation much more verbose.

Now:

$title = Title::newFromText( $text ); if ( $title ) { return $title->getLocalUrl( 'foo=bar' ); } return null;

Proposed:

$tp = new TextTitleParser(); try { $title = $tp->parse( $text ); $tf = new UrlTitleFormatter( $title, 'foo=bar ); return $tf->format(); } catch( MWException ) { return null; }

Which of them is more clear? Writing which of them you're less likely to introduce errors?

Also, http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html

-- Best regards, Max Semenik ([[User:MaxSem]])

Jeroen De Dauw

5:49 p.m.

Hey,

Proposed:

...

$tp = new TextTitleParser(); try { $title = $tp->parse( $text ); $tf = new UrlTitleFormatter( $title, 'foo=bar ); return $tf->format(); } catch( MWException ) { return null; }

I hope this is your own interpretation of what would happen on top of what is proposed and not what is actually proposed, since this code snippet has some issues.

Those parser and Formatter objects contain configuration specific state. If you instantiate them like this, you are creating code that is just as bad as Title::newFromText, where this newFromText method relies on similar config specific state. In both cases you are using global state and programming against concretions.

Does the RFC also sate exceptions should be used rather than returned error values? Seems like a quite disjoint concern to me. And if you use exceptions, please use them properly and do not copy-paste MWException all over.

This example also seems rather fabricated and not representative of most use cases. Ignoring the points above, it strikes me as only making sense in a transaction script, ie something which goes from handling user input (such as parsing a title), through domain logic (altering a title), to presentation (ie formatting a title). Given what MW does, it is much more likely any particular piece of code only resides in one layer, rather then 3. Though I can imagine how we have lots of code that for no good reason does this anyway, quite plausibly prompted by the fact that the Title object caters to all 3 layers.

I read that as namespaces other than the one you're obviously referring

...

to with a TitleValue--so Category:Foo only knows about Category:, not Project: or User:.

Maybe someone can clarify here?

It is good to have a value object to not be dependent on configuration, so instances of it do not get messed up if the config is changed. So presumably in this case it'd contain an int, rather then an internationalized sting or whatever.

Also,

...

http://steve-yegge.blogspot.com/2006/03/execution-in-kingdom-of-nouns.html

I stopped reading at

All Java people love "use cases"

...

Perhaps you could summarize what the RFC in question does wrong according to the post you linked?

Cheers

-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

Tim Starling

25 Oct 25 Oct

10:33 a.m.

On 24/10/13 23:19, Jeroen De Dauw wrote:

...

Hey,

Proposed:

...
$tp = new TextTitleParser(); try { $title = $tp->parse( $text ); $tf = new UrlTitleFormatter( $title, 'foo=bar ); return $tf->format(); } catch( MWException ) { return null; }

I hope this is your own interpretation of what would happen on top of what is proposed and not what is actually proposed, since this code snippet has some issues.

Those parser and Formatter objects contain configuration specific state. If you instantiate them like this, you are creating code that is just as bad as Title::newFromText, where this newFromText method relies on similar config specific state. In both cases you are using global state and programming against concretions.

No, it's not what's proposed, it's proposed to have a thing called a ServiceRegistry which would supply parser/formatter instances. I think the ServiceRegistry itself should be managed by the RequestContext, for lifetime control, whereas Daniel Kinzler thinks it should be a singleton.

...

Does the RFC also sate exceptions should be used rather than returned error values? Seems like a quite disjoint concern to me. And if you use exceptions, please use them properly and do not copy-paste MWException all over.

No, the RFC does not state that exceptions should be used.

Maybe you should both read the RFC.

https://www.mediawiki.org/wiki/Requests_for_comment/TitleValue

-- Tim Starling

Antoine Musso

24 Oct 24 Oct

6:47 p.m.

Le 24/10/13 11:40, Max Semenik a écrit :

...

Now:

$title = Title::newFromText( $text ); if ( $title ) { return $title->getLocalUrl( 'foo=bar' ); } return null;

Proposed:

$tp = new TextTitleParser(); try { $title = $tp->parse( $text ); $tf = new UrlTitleFormatter( $title, 'foo=bar ); return $tf->format(); } catch( MWException ) { return null; }

Since I am lazy, I would want:

return Title::newFromText( $text ) ->getLocalUrl( 'foo=bar ');

Aka make Title::newFromText() to always return an object having the methods neededs, albeit when the $text does not form a title, it would yield a NullTitle objet that has calls returning nothing (or other NullTitle object).

Less fatal errors.

-- Antoine "hashar" Musso

Daniel Kinzler

30 Oct 30 Oct

7:41 p.m.

Am 10.10.2013 18:40, schrieb Rob Lanphier:

...

Hi folks,

I think Daniel buried the lede here (see his mail below), so I'm mailing this out with a subject line that will hopefully provoke more discussion. :-)

Thanks for bumping this, Rob. And thanks to Tim for moderating this discussion so far, and to everyone for the criticism.

And sorry for my late reply, I'm just now catching up on email after a brief vacation and a few days of conferencing.

I'll reply to the various comments in this thread in this mail, to keep the discussion focused. I hope I managed to reply to all the relevant points.

First off, TLDR:

* I maintain that dependency injection is useful, and less painful than one might think. * I'm open to discussion about whether to use a namespace ID or a canonical namespace name in the TitleValue object * I'm interested in discussing how to best slice and dice the parser and formatter services.

So, here are my replies to some comments:

...

I agree with this as well. The idea behind this RFC is the "hair can't cut itself" pattern. However, a value object needs to be easily serializable. So what representation is used for serializing a TitleValue? It can't be the display title or DB key since that's part of the TitleFormatter class.

TitleValue as proposed can be serialized without any problems using standard PHP serialization, or as a JSON structure containing the namespace ID and name string. Or we can come up with something nicer, like $nsid:$title or some such. The current Title object cannot be serialized at all directly.

...

Maybe, but the RFC says, "As a value object, this has no knowledge about namespaces, permissions, etc.". I think there comes a point when you have to acknowledge that some properties of Title objects are indeed part of the value object

The point is avoiding access to config information (which is global state). Namespace names are configuration and require global state for normalization/lookup. This should not be done in a value object.

...

I read it as TitleValue doesn't know about texual namespaces like Category:, Project:, or User:. But just contains an integer namespace such as `4`.

I understand that using the numeric namespace ID is controversial. TitleValue could also require the canonical (!) namespace name to be provided to the constructor, instead of the numeric id. This would make it harder though to use it for DB queries.

...

$title = Title::newFromText( $text ); if ( $title ) { return $title->getLocalUrl( 'foo=bar' ); }

newFromText() uses global state to look up namespaces, interwiki prefixes, title normalization options, etc. If we want to get away from relying on global state, we have to avoid this.

getLocalUrl() uses global state to construct the URL using the wiki's base URL. Again, if we want to have reusable, testable code, this must be avoided.

Sure, global state is convenient. You don't have to think about what information is needed where, you don't have to be explicit about your dependencies. This makes it easier to write code. It makes it very hard to understand, test, reuse or refactor the code, though.

Yes, it's more terse and "easier to read". But it hides information flow and implicit dependencies, making the "easy read" quite misleading. It makes it harder to truly understand what is going on.

...

$tp = new TextTitleParser(); try { $title = $tp->parse( $text ); $tf = new UrlTitleFormatter( $title, 'foo=bar ); return $tf->format(); } catch( MWException ) { return null; }

As Jeroen already pointed out in his reply, you should rarely have the need to turn a page title string into a URL. When generating the URL, you would typically already have a TitleValue object (if not, ask yourself why not).

Catching exceptions should be done as late as possible - ideally, in the presentation layer. Generally, throwing an exception is preferable to returning a "special" value, so the try/catch would not be here in the code.

However, you are right that being explicit about which information and services are needed where means writing more code. That's what "explicit" means. If we agree that it's a good thing to explicitly expose information flow and dependencies, then this implies that we need to actually write the additional (declarative) code.

...

Maybe my hair can't cut itself, but I can cut my own hair without having to find a Barber instance. ;)

But you will need to find a Scissors instance.

...

Why do we need a separate TitleParser and TitleFormatter? They use the same configuration data to do complementary operations. And then there's talk about making "TitleFormatter" a one-method class that has subclasses for every different way to format a title, and about having a subclass for wikilinks (huh?) which has subclasses for internal and external and interwiki links (wtf?), and so on.

I'm very much open to discussion regarding how the parser/renderer services are designed. For example:

* It would probably be fine to have a single class for parsing titles in wiki-link syntax, and generating wiki-link style titles. * I'm reluctant to include functions for generating URLs / HTML links in the same class (this needs additional information / services ). * Beware that TitleValue objects represent *titles*, not links - they do not handle interwiki prefixes and such. An interwiki link would need to be represented by a different object (because of the prefix, but also because of namespace IDs vs. namespace names - there is room for discussion here).

...

High-level methods to actually do anything useful like "move a page" or "parse some wikitext" seem like they are going to need to take dozens of these objects to be able to actually do what they need to do.

I read this as: high level functions rely on multiple services (which is what makes them high level), which we then can control, configure and replace individually. This is a good thing.

...

(And yes, I know the argument is that then the useful function should be split into multiple smaller functions with fewer arguments, to which I reply "you just pushed the problem up the call stack, you didn't solve it").

No - I *distributed* it along the call stack in a sensible way.

...

So then the RFC proposes a "ServiceRegistry" to try to get around this problem of everything requiring dozens of arguments. But after observing that RequestContext already *is* this, it then argues against using RequestContext on the basis of additional dependencies.

The difference is that RequestContext is passed around the code a lot, while a registry should be passed around as little as possible. Access to such a registry should be limited to code that, for some reason, cannot have the services themselves injected.

A lot of code uses (read: depends on) RequestContext. Thus, all that code depends on anything RequestContext depends on. Adding more things to RequestContext means making a lot of code depend on more things, making it harder to test, understand and re-use.

"Kitchen-Sink" objects are bad, I'm trying to avoid making the kitchen sink bigger.

...

Why not have one ServiceRegistry, maybe even part of RequestContext, and use it preferentially for passing services instead of just as a fallback?

Because then everything depends on all the services.

That is, to use or test something that needs e.g. a UserPermissionService, you'd also have to provide all the other services, our you'd at least somehow have to know which services it needs, without this being declared explicitly anywhere.

...

Do we really have the need for multiple differently-configured services all over the place in "real" code that we need to be passing them around individually, or is it just for testing where we could as easily solve the problem by creating a registry instance that throws errors for all the services not needed for each test?

The point is modularity. Modularity makes it easier to re-use code, easier to understand it, to refactor it. Unit tests just expose dependency issues nicely, because they require modular isolation per definition.

But there are numerous examples in the MediaWiki code base where it is obvious that splitting things into services would have been beneficial, and would have avoided duplicate code and annoying workarounds.

The fact that Title needs the database and thus makes all tests need the database is just one glaring example. Another example is that the underlying logic of special pages is often the same as the underlying logic of API modules. But currently, the logic on special pages is mixed up with HTML generation, and the code in API modules is mixed with parameter processing and result generation, making re-use hard and leading to redundant code.

...

At the top level something *does* have to do all three things, even if it does it by calling into various "trees" and each of those trees has its own branches and each branch has its own twigs and so on.

Yes, but it then doesn't have to know *how* these things are done, and the logic for doing these things can easily be replaced.

...

The claimed problem behind a lot of this is "too many dependencies" making things hard to test and the idea that you can somehow make this go away by dividing everything into tinier and tinier pieces. To some extent this works, but at the cost of making the system as a whole harder to understand because you have to track all the little pieces.

You *don't* have to track the pieces, because they are all mentioned explicitly, and your editor (hopefully) allows you to navigate between them easily.

In my experience, what makes a system hard to understand are hidden dependencies and side effects (Title is full of those). Splitting things into mode classes and smaller functions leads to code that is more easily readable.

It increases the number of files, yes. But files are really not the entity we should use to think about software architecture. If the number of files is an issue, something is wrong with the tools we use.

...

I doubt MediaWiki has reached the point of diminishing returns on that, but I'm not really sure that the end-goal envisioned here is the *right* division.

Well, how would *you* re-factor the Title class, then?

...

So in the end you either have to also test the thing as a whole anyways. Or you end up with unit tests that are even less useful than you started because they won't catch regressions they would've originally.

Integration tests are useful, but they are no replacement for unit tests (nor vice versa). One big is the number of code pathes to cover (another is the effort of locating errors).

A heavily simplified example: We have three methods to test, where a() uses b() and b() uses c(). Let's say each of these has 3 code paths to test.

If we can test a(), b(), and c() separately (making a() use a mock version of b(), etc), we need 5+5+5 = 15 test cases for full coverage.

If we use an integration test, we'll need one test case for every possible combination of code pathes - that's 5*5*5 = 225 test cases.

In practice, we'll not do this. We'll test, say, 20 typical cases. Which means a lot of code paths are not covered, and may well lead to problems.

...

Would it make sense to cut a TitleValue branch as a playground? If we can get some code before the January summit, that would give us all a bit more experience.

My experience from introducing the ContentHandler makes me very wary of using a development branch. Rebasing that over and over is going to be very painful.

-- daniel

Martijn Hoekstra

11:02 p.m.

On Oct 30, 2013 3:11 PM, "Daniel Kinzler" daniel@brightbyte.de wrote:

...

Am 10.10.2013 18:40, schrieb Rob Lanphier:

...
Hi folks,

I think Daniel buried the lede here (see his mail below), so I'm mailing this out with a subject line that will hopefully provoke more discussion. :-)

Thanks for bumping this, Rob. And thanks to Tim for moderating this

discussion

...

so far, and to everyone for the criticism.

And sorry for my late reply, I'm just now catching up on email after a

brief

...

vacation and a few days of conferencing.

I'll reply to the various comments in this thread in this mail, to keep

the

...

discussion focused. I hope I managed to reply to all the relevant points.

Snip

...

...
Would it make sense to cut a TitleValue branch as a playground? If we can get some code before the January summit, that would give us all a bit more experience.

My experience from introducing the ContentHandler makes me very wary of

using a

...

development branch. Rebasing that over and over is going to be very

painful.

...

-- daniel

Rebase early, rebase often. At some point integration must take place. Not using a separate branch won't help you there. Anyone working on anything involving the title object that hasn't been merged yet will hate to rebase whenever you'll have merged back to master though. Which kind of solidifies your original point

Martijn.

...

Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Daniel Kinzler

31 Oct 31 Oct

7:22 p.m.

Am 30.10.2013 18:32, schrieb Martijn Hoekstra:

...

Rebase early, rebase often. At some point integration must take place. Not using a separate branch won't help you there. Anyone working on anything involving the title object that hasn't been merged yet will hate to rebase whenever you'll have merged back to master though. Which kind of solidifies your original point

I disagree - introducing the new features/logic in small increments is more likely to expose any issues early on, and will make it a lot easier to avoid stale patches.

The idea is to *not* actually refactor the Title class, but to introduce a light weight alternative, TitleValue. We can then replace usage of Title with usage of TitleObject bit by bit.

-- daniel

Daniel Kinzler

8 p.m.

Am 31.10.2013 14:52, schrieb Daniel Kinzler:

...

The idea is to *not* actually refactor the Title class, but to introduce a light weight alternative, TitleValue. We can then replace usage of Title with usage of TitleObject bit by bit.

That was meant to be "replace Title with TitleValue", of course.

-- daniel

4042

Age (days ago)

4063

Last active (days ago)

wikitech-l@lists.wikimedia.org

20 comments

11 participants

tags (0)

participants (11)

Antoine Musso
Brad Jorsch (Anomie)
Chad
Daniel Friesen
Daniel Kinzler
Jeroen De Dauw
Martijn Hoekstra
Max Semenik
Rob Lanphier
Tim Starling
Tyler Romeo