Hey,
You raise some good questions.
* why do we need the setOptions() method? Shouldn't the validator be fully
configured in the constructor?
We certainly do not need it. The question is why the interface has it.
Looking at current usage, I see no valid reason to have it, as the users
are following poor design practices.
* why is there no canValidate() method? This would be very handy for finding
applicable validators for a given object.
Since there has been no use for this so far.
* why are errors collected in a member? I would expect a validator to be
stateless and reusable.
* why do we need the Result object? Can't we just throw an exception? The
"value
with warnings" use case for MediaWiki's status object doesn't seem to
apply here.
If there is a result object, having a field that contains state for that is
perfectly reasonable, and in this case IMO preferred (note how this state
is not exposed outside of the class).
I know I've been shouting "use Exceptions rather then returning null/false
or potentially invalid objects" lately; this is a good example of why one
should not apply such a "rule" blindly, as it does not apply everywhere.
Exceptions should not be used to implement control flow for expected
situations. If one has a simple isValid method, returning a boolean makes
perfect sense. Returning true, or throwing an exception would be wrong. In
case of the ValueValidators just returning a boolean would not be
expressive enough, as the information of which checks (note the plural
here) failed is desired, so the user can be informed of why the value did
not pass validation.
* There is a lot of "convenient stuff" implemented in the base class which i
expect many implementations are not going to need,
e.g. the
allowdValues/forbiddenValues stuff.
This is certainly poor design. Given the whole ValueValidators resulted
from refactoring some awful code in which every class did a dozen different
things, I'm mildly surprised there are not more such artefacts left.
Ideally we could just kill this one, but like with most legacy code, we
can't do this without breaking existing functionality.
* options are passed to sub-validators using an option map, changing the
configuration and thus behavior of the
(sub-)validators on the fly. That
seems
like asking for trouble.
Agree.
* the ValueValidator interface has two methods, validate() and
canValidate().
validate() can throw a validation exception, which contains one or more
errors.
What would this canValidate method take as parameter? Can you give some
concrete examples of how this would be used in the Wikibase context?
As already mentioned, -2 on using exceptions here.
* Validators should generally implement only a single constraint - so
instead of
a String validator that implements min/max length, min/max value, and a
regex
check, each of these checks should have their own validator; A
CompositeValidator could be used to bundle multiple validations, if needed.
I find that proposal somewhat surprising coming from you :) The current
ValueValidators are designed for a use case where there is a single such
validator per "parameter type" (ie boolean, float, string, etc). The
ValueValidator interface itself does not specify anything about this
though, so one is entirely free to make more low level validators where
this makes sense. The sub validator system, though poorly implemented,
already facilitates the kind of composition you are talking about. Since
none of this is, or should be, specified by the ValueValidator interface,
it is really a discussion that makes more sense when talking about specific
validators, and not the overall design.
As far as I can see, the Validator interface isn't yet widely used, so it
would
be easy enough to change these things.
It is used by the ParamProcessor extension, and by extension by everything
that uses that. Given there are probably in the order of a hundred wikis
that run this code from master or some random git version, we cannot make
breaking changes here without ensuring the code making use of
ValueValidators does not break because of them. Even without that, it'd be
extremely rude towards the developers of that code. So either we make these
changes in a way that continues accommodate existing usescases (which are
not identical to those of WD), or we create our own alternative.
Perhaps a good mix of avoiding legacy crud and pragmatism is to create
fresh implementations of the interface for our usage. Nothing forces us to
make use of the existing ones, for which we do not have all that much usage
in the Wikidata context to begin with. The only legacy thing we'd have is
the setOptions method, which we can easily implement by a method throwing
an exception. (And no, this is in fact not a LSP violation.) Compared to
the MediaWiki core legacy code we are dealing with, that really is nothing.
The one concern I have with creating an alternative is that we will likely
be making use of this existing code if we share result format stuff with
SMW. I'd be thus unfortunate to duplicate this functionally for us (and
it'll be unfortunate in general either way). However if we end up with
something clean and reusable, and keep the existing usecase in mind, we
ought to not have real troubles later on using it for these existing
usecases. Given the reuse of the existing code for us right now is limited
anyway, I thus certainly do not object to a fresh start approach either.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil. ~=[,,_,,]:3
--