Am 12.06.2013 05:25, schrieb Jeroen De Dauw:
Hey,
You raise some good questions.
* why do we need the setOptions() method? Shouldn't the validator be fully
configured in the constructor?
We certainly do not need it. The question is why the interface has it. Looking
at current usage, I see no valid reason to have it, as the users are following
poor design practices.
I suggest to deprecate it, then.
* why are errors collected in a member? I would
expect a validator to be
stateless and reusable.
* why do we need the Result object? Can't we just throw an exception? The
"value
with warnings" use case for MediaWiki's status object doesn't seem to
apply
here.
If there is a result object, having a field that contains state for that is
perfectly reasonable, and in this case IMO preferred (note how this state is not
exposed outside of the class).
I strongly disagree. Making the validator stateful is asking for trouble. In
order to re-use it, you have to re-set it. And if you happen to re-use it when
recursively validating a data structure, your state gets messed up.
Collecting the state in a member means using a member to pass information
between function scopes. That is a very bad idea.
It would be different if the validator was it's own result object, that is, if
for every validation, you'd create a new validator instance, use it on the data,
and in the end as the validator itself if it's in the OK state. When conflating
the validator with the result that way, the validator would be stateful, and
should have such a member variable.
I know I've been shouting "use Exceptions
rather then returning null/false or
potentially invalid objects" lately; this is a good example of why one should
not apply such a "rule" blindly, as it does not apply everywhere. Exceptions
should not be used to implement control flow for expected situations. If one has
a simple isValid method, returning a boolean makes perfect sense. Returning
true, or throwing an exception would be wrong. In case of the ValueValidators
just returning a boolean would not be expressive enough, as the information of
which checks (note the plural here) failed is desired, so the user can be
informed of why the value did not pass validation.
I have no strong feelings about this. I think an exception would be the easiest
and cleanest solution, but perhaps not the nicest and most powerful.
The advantage of a result object is that it can collect and report multiple issues.
The disadvantage is your validators can not "stack": e.g. with exceptions, the
first validator could check that $x is a string, and the second validator could
check that $x matches some regular expression. If you try to collect all
outcomes of all validators, the second one in this example with cause a warning
or runtime exception if $x is not a string.
Also, reporting multiple errors at once is often not easy to do in a meaningful
way, but that's just a UI issue.
Apropos... how can we internationalize the errors reported by validators? Should
errors use well defined string keys? Or do we inject some kind of error factory?
After all, validation failures often result from user interaction, and thus
should be reported in a human-friendly form.
* There is a lot of "convenient stuff"
implemented in the base class which i
expect many implementations are not going to need, e.g. the
allowdValues/forbiddenValues stuff.
This is certainly poor design. Given the whole ValueValidators resulted from
refactoring some awful code in which every class did a dozen different things,
I'm mildly surprised there are not more such artefacts left. Ideally we could
just kill this one, but like with most legacy code, we can't do this without
breaking existing functionality.
I propose to not use the ValueValidatorObject base class for Wikibase. We could
make our own, but if we stick to "trivial" validators, we don't even need
that.
* options are passed to sub-validators using an
option map, changing the
configuration and thus behavior of the (sub-)validators on the fly. That seems
like asking for trouble.
Agree.
So, let's at least not do this in Wikibase.
* the ValueValidator interface has two methods,
validate() and canValidate().
validate() can throw a validation exception, which contains one or more errors.
What would this canValidate method take as parameter? Can you give some concrete
examples of how this would be used in the Wikibase context?
For recursive validation of complex structures. This would follow the pattern
that Tobi S. suggested for serialization/unserialization.
If we wanted to validate an entity as a recursive structure, applying validators
to Claims, Snaks, DataValues, etc recursively, we could do this by having a
registry of validators, where each validator knows which kind of object it can
validate. For each object encountered in the structure, we'd ask each validator
if it can validate that kind of object, and apply it if so.
The issue here though is that we'd need to pass the top level validator to each
"inner" validator, either in the constructor or as an additional method
argument, to allow the top-level validator (which knows all the specialized
validators) to be applied to sub.structures recursively.
I think this could be useful, but we don't have an immediate need for this at
the moment. So I guess we can just leave it.
As already mentioned, -2 on using exceptions here.
* Validators should generally implement only a single constraint - so instead of
a String validator that implements min/max length, min/max value, and a regex
check, each of these checks should have their own validator; A
CompositeValidator could be used to bundle multiple validations, if needed.
I find that proposal somewhat surprising coming from you :)
I suppose the "pragmatic" coding style I got into working on MediaWiki is not
representative of my design preferences :) Maybe I should dig up some of my old
Java libs full of nicely chopped and diced functionality to compose and re-use...
The current
ValueValidators are designed for a use case where there is a single such
validator per "parameter type" (ie boolean, float, string, etc).
Fair enough for a baseline, but the StringValidator should probably not
implement any and all validations that can be applied to a string.
The
ValueValidator interface itself does not specify anything about this though, so
one is entirely free to make more low level validators where this makes sense.
Of course - in the second section of my mail, I was talking about
ValueValidatorsObject, not the interface.
As far as I can see, the Validator interface
isn't yet widely used, so it would
be easy enough to change these things.
It is used by the ParamProcessor extension, and by extension by everything that
uses that.
[...]
Perhaps a good mix of avoiding legacy crud and
pragmatism is to create fresh
implementations of the interface for our usage. Nothing forces us to make use of
the existing ones, for which we do not have all that much usage in the Wikidata
context to begin with. The only legacy thing we'd have is the setOptions method,
which we can easily implement by a method throwing an exception. (And no, this
is in fact not a LSP violation.) Compared to the MediaWiki core legacy code we
are dealing with, that really is nothing.
I agree: we should stick with the ValueValidator interface, but we should not
use ValueValidatorObject or StringValidator in Wikibase. Not sure about the
other default validators, will have to look at them in detail.
I also suggest to deprecate much of the functionality of ValueValidatorObject
along with the setOptions method, but that has little to do with Wikibase.
Implementing setOptions() to throw an exception may work, but I'm curious how
that is not an LSP violation. In languages that enforce declaration of
exceptions thrown, this kind of violation is even forbidden by the language.
I think in our case, setOptions() should just do nothing. After all, it's
defined to apply the options known/supported by the respective validator - which
in our case would ju8st always be none.
The one concern I have with creating an alternative is
that we will likely be
making use of this existing code if we share result format stuff with SMW. I'd
be thus unfortunate to duplicate this functionally for us (and it'll be
unfortunate in general either way). However if we end up with something clean
and reusable, and keep the existing usecase in mind, we ought to not have real
troubles later on using it for these existing usecases. Given the reuse of the
existing code for us right now is limited anyway, I thus certainly do not object
to a fresh start approach either.
Yea, I think that would work - the only trouble I see is naming conflicts. Do we
create a NewBooleanValidator? OR do we rely on namespaces?
-- daniel