Improving the ValueValidator interface

List overview All Threads
Download

newer

older

QueryEngine moved to own...

backports needed after last...

Daniel Kinzler

10 Jun 2013 10 Jun '13

12:46 p.m.

I' currently thinking about how to best validate incoming Snaks / DataValues. I have been looking at the ValueValidator interface, and it seems to be somewhat inconvenient. * why do we need the setOptions() method? Shouldn't the validator be fully configured in the constructor? * why is there no canValidate() method? This would be very handy for finding applicable validators for a given object. This seems a bit odd, but where it gets really problematic is the ValueValidatorsObject base class: * why are errors collected in a member? I would expect a validator to be stateless and reusable. * why do we need the Result object? Can't we just throw an exception? The "value with warnings" use case for MediaWiki's status object doesn't seem to apply here. * There is a lot of "convenient stuff" implemented in the base class which i expect many implementations are not going to need, e.g. the allowdValues/forbiddenValues stuff. * options are passed to sub-validators using an option map, changing the configuration and thus behavior of the (sub-)validators on the fly. That seems like asking for trouble. I propose a lighter design: * the ValueValidator interface has two methods, validate() and canValidate(). validate() can throw a validation exception, which contains one or more errors. * Validators are stateless and get configured completely and finally by the constructor. * Validators should generally implement only a single constraint - so instead of a String validator that implements min/max length, min/max value, and a regex check, each of these checks should have their own validator; A CompositeValidator could be used to bundle multiple validations, if needed. * ValueValidatorFactory gets a getApplicableValidators() method that returns all validators that can be used to validate a given object. As far as I can see, the Validator interface isn't yet widely used, so it would be easy enough to change these things. What do you think? -- daniel

Show replies by date

Jeroen De Dauw

12 Jun 12 Jun

3:25 a.m.

Hey, You raise some good questions. * why do we need the setOptions() method? Shouldn't the validator be fully

...

configured in the constructor?

We certainly do not need it. The question is why the interface has it. Looking at current usage, I see no valid reason to have it, as the users are following poor design practices. * why is there no canValidate() method? This would be very handy for finding

...

applicable validators for a given object.

Since there has been no use for this so far. * why are errors collected in a member? I would expect a validator to be

...

stateless and reusable. * why do we need the Result object? Can't we just throw an exception? The "value with warnings" use case for MediaWiki's status object doesn't seem to apply here.

If there is a result object, having a field that contains state for that is perfectly reasonable, and in this case IMO preferred (note how this state is not exposed outside of the class). I know I've been shouting "use Exceptions rather then returning null/false or potentially invalid objects" lately; this is a good example of why one should not apply such a "rule" blindly, as it does not apply everywhere. Exceptions should not be used to implement control flow for expected situations. If one has a simple isValid method, returning a boolean makes perfect sense. Returning true, or throwing an exception would be wrong. In case of the ValueValidators just returning a boolean would not be expressive enough, as the information of which checks (note the plural here) failed is desired, so the user can be informed of why the value did not pass validation. * There is a lot of "convenient stuff" implemented in the base class which i

...

expect many implementations are not going to need, e.g. the allowdValues/forbiddenValues stuff.

This is certainly poor design. Given the whole ValueValidators resulted from refactoring some awful code in which every class did a dozen different things, I'm mildly surprised there are not more such artefacts left. Ideally we could just kill this one, but like with most legacy code, we can't do this without breaking existing functionality. * options are passed to sub-validators using an option map, changing the

...

configuration and thus behavior of the (sub-)validators on the fly. That seems like asking for trouble.

Agree. * the ValueValidator interface has two methods, validate() and

...

canValidate(). validate() can throw a validation exception, which contains one or more errors.

What would this canValidate method take as parameter? Can you give some concrete examples of how this would be used in the Wikibase context? As already mentioned, -2 on using exceptions here. * Validators should generally implement only a single constraint - so

...

instead of a String validator that implements min/max length, min/max value, and a regex check, each of these checks should have their own validator; A CompositeValidator could be used to bundle multiple validations, if needed.

I find that proposal somewhat surprising coming from you :) The current ValueValidators are designed for a use case where there is a single such validator per "parameter type" (ie boolean, float, string, etc). The ValueValidator interface itself does not specify anything about this though, so one is entirely free to make more low level validators where this makes sense. The sub validator system, though poorly implemented, already facilitates the kind of composition you are talking about. Since none of this is, or should be, specified by the ValueValidator interface, it is really a discussion that makes more sense when talking about specific validators, and not the overall design. As far as I can see, the Validator interface isn't yet widely used, so it

...

would be easy enough to change these things.

It is used by the ParamProcessor extension, and by extension by everything that uses that. Given there are probably in the order of a hundred wikis that run this code from master or some random git version, we cannot make breaking changes here without ensuring the code making use of ValueValidators does not break because of them. Even without that, it'd be extremely rude towards the developers of that code. So either we make these changes in a way that continues accommodate existing usescases (which are not identical to those of WD), or we create our own alternative. Perhaps a good mix of avoiding legacy crud and pragmatism is to create fresh implementations of the interface for our usage. Nothing forces us to make use of the existing ones, for which we do not have all that much usage in the Wikidata context to begin with. The only legacy thing we'd have is the setOptions method, which we can easily implement by a method throwing an exception. (And no, this is in fact not a LSP violation.) Compared to the MediaWiki core legacy code we are dealing with, that really is nothing. The one concern I have with creating an alternative is that we will likely be making use of this existing code if we share result format stuff with SMW. I'd be thus unfortunate to duplicate this functionally for us (and it'll be unfortunate in general either way). However if we end up with something clean and reusable, and keep the existing usecase in mind, we ought to not have real troubles later on using it for these existing usecases. Given the reuse of the existing code for us right now is limited anyway, I thus certainly do not object to a fresh start approach either. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

Daniel Kinzler

8:03 a.m.

Am 12.06.2013 05:25, schrieb Jeroen De Dauw:

...

Hey, You raise some good questions. * why do we need the setOptions() method? Shouldn't the validator be fully configured in the constructor? We certainly do not need it. The question is why the interface has it. Looking at current usage, I see no valid reason to have it, as the users are following poor design practices.

I suggest to deprecate it, then.

...

* why are errors collected in a member? I would expect a validator to be stateless and reusable. * why do we need the Result object? Can't we just throw an exception? The "value with warnings" use case for MediaWiki's status object doesn't seem to apply here. If there is a result object, having a field that contains state for that is perfectly reasonable, and in this case IMO preferred (note how this state is not exposed outside of the class).

I strongly disagree. Making the validator stateful is asking for trouble. In order to re-use it, you have to re-set it. And if you happen to re-use it when recursively validating a data structure, your state gets messed up. Collecting the state in a member means using a member to pass information between function scopes. That is a very bad idea. It would be different if the validator was it's own result object, that is, if for every validation, you'd create a new validator instance, use it on the data, and in the end as the validator itself if it's in the OK state. When conflating the validator with the result that way, the validator would be stateful, and should have such a member variable.

...

I know I've been shouting "use Exceptions rather then returning null/false or potentially invalid objects" lately; this is a good example of why one should not apply such a "rule" blindly, as it does not apply everywhere. Exceptions should not be used to implement control flow for expected situations. If one has a simple isValid method, returning a boolean makes perfect sense. Returning true, or throwing an exception would be wrong. In case of the ValueValidators just returning a boolean would not be expressive enough, as the information of which checks (note the plural here) failed is desired, so the user can be informed of why the value did not pass validation.

I have no strong feelings about this. I think an exception would be the easiest and cleanest solution, but perhaps not the nicest and most powerful. The advantage of a result object is that it can collect and report multiple issues. The disadvantage is your validators can not "stack": e.g. with exceptions, the first validator could check that $x is a string, and the second validator could check that $x matches some regular expression. If you try to collect all outcomes of all validators, the second one in this example with cause a warning or runtime exception if $x is not a string. Also, reporting multiple errors at once is often not easy to do in a meaningful way, but that's just a UI issue. Apropos... how can we internationalize the errors reported by validators? Should errors use well defined string keys? Or do we inject some kind of error factory? After all, validation failures often result from user interaction, and thus should be reported in a human-friendly form.

...

* There is a lot of "convenient stuff" implemented in the base class which i expect many implementations are not going to need, e.g. the allowdValues/forbiddenValues stuff. This is certainly poor design. Given the whole ValueValidators resulted from refactoring some awful code in which every class did a dozen different things, I'm mildly surprised there are not more such artefacts left. Ideally we could just kill this one, but like with most legacy code, we can't do this without breaking existing functionality.

I propose to not use the ValueValidatorObject base class for Wikibase. We could make our own, but if we stick to "trivial" validators, we don't even need that.

...

* options are passed to sub-validators using an option map, changing the configuration and thus behavior of the (sub-)validators on the fly. That seems like asking for trouble. Agree.

So, let's at least not do this in Wikibase.

...

* the ValueValidator interface has two methods, validate() and canValidate(). validate() can throw a validation exception, which contains one or more errors. What would this canValidate method take as parameter? Can you give some concrete examples of how this would be used in the Wikibase context?

For recursive validation of complex structures. This would follow the pattern that Tobi S. suggested for serialization/unserialization. If we wanted to validate an entity as a recursive structure, applying validators to Claims, Snaks, DataValues, etc recursively, we could do this by having a registry of validators, where each validator knows which kind of object it can validate. For each object encountered in the structure, we'd ask each validator if it can validate that kind of object, and apply it if so. The issue here though is that we'd need to pass the top level validator to each "inner" validator, either in the constructor or as an additional method argument, to allow the top-level validator (which knows all the specialized validators) to be applied to sub.structures recursively. I think this could be useful, but we don't have an immediate need for this at the moment. So I guess we can just leave it.

...

As already mentioned, -2 on using exceptions here. * Validators should generally implement only a single constraint - so instead of a String validator that implements min/max length, min/max value, and a regex check, each of these checks should have their own validator; A CompositeValidator could be used to bundle multiple validations, if needed. I find that proposal somewhat surprising coming from you :)

I suppose the "pragmatic" coding style I got into working on MediaWiki is not representative of my design preferences :) Maybe I should dig up some of my old Java libs full of nicely chopped and diced functionality to compose and re-use...

...

The current ValueValidators are designed for a use case where there is a single such validator per "parameter type" (ie boolean, float, string, etc).

Fair enough for a baseline, but the StringValidator should probably not implement any and all validations that can be applied to a string.

...

The ValueValidator interface itself does not specify anything about this though, so one is entirely free to make more low level validators where this makes sense.

Of course - in the second section of my mail, I was talking about ValueValidatorsObject, not the interface.

...

As far as I can see, the Validator interface isn't yet widely used, so it would be easy enough to change these things. It is used by the ParamProcessor extension, and by extension by everything that uses that.

[...]

...

Perhaps a good mix of avoiding legacy crud and pragmatism is to create fresh implementations of the interface for our usage. Nothing forces us to make use of the existing ones, for which we do not have all that much usage in the Wikidata context to begin with. The only legacy thing we'd have is the setOptions method, which we can easily implement by a method throwing an exception. (And no, this is in fact not a LSP violation.) Compared to the MediaWiki core legacy code we are dealing with, that really is nothing.

I agree: we should stick with the ValueValidator interface, but we should not use ValueValidatorObject or StringValidator in Wikibase. Not sure about the other default validators, will have to look at them in detail. I also suggest to deprecate much of the functionality of ValueValidatorObject along with the setOptions method, but that has little to do with Wikibase. Implementing setOptions() to throw an exception may work, but I'm curious how that is not an LSP violation. In languages that enforce declaration of exceptions thrown, this kind of violation is even forbidden by the language. I think in our case, setOptions() should just do nothing. After all, it's defined to apply the options known/supported by the respective validator - which in our case would ju8st always be none.

...

The one concern I have with creating an alternative is that we will likely be making use of this existing code if we share result format stuff with SMW. I'd be thus unfortunate to duplicate this functionally for us (and it'll be unfortunate in general either way). However if we end up with something clean and reusable, and keep the existing usecase in mind, we ought to not have real troubles later on using it for these existing usecases. Given the reuse of the existing code for us right now is limited anyway, I thus certainly do not object to a fresh start approach either.

Yea, I think that would work - the only trouble I see is naming conflicts. Do we create a NewBooleanValidator? OR do we rely on namespaces? -- daniel

3971

days inactive

3973

days old

wikidata-tech@lists.wikimedia.org

Manage subscription

2 comments

2 participants

tags (0)

participants (2)

Daniel Kinzler
Jeroen De Dauw