I agree this is misconception that a copyright license
make any direct
change to data reliability. But attribution requirement does somewhat
indirectly have an impact on it, as it legally enforce traceability.
While true, I don't think it's of much practical use if traceability is
what you are seriously interested in. Imagine Wikidata were CC-BY, so
each piece of data you use from Wikidata now has to be marked as "coming
from Wikidata.Org". What have you gained? Wikidata is huge, and this
mark doesn't even tell you which item it is from, while being completely
satisfactory legally. Even more useless it is for actually ensuring the
data is correct or tracing its provenance to primary sources - you'd
still have to find the item and check the references manually (or
automatically, maybe) as you could do for CC0. CC-BY license would not
have added very much on Wikidata side.
All this is while, of course, even with CC0 nothing prevents you from
importing Wikidata data in such a way that each piece of data still
carries the mark "coming from Wikidata". While it is not a legal
requirement with CC0, nothing in CC0 prevents that from happening. If
your provenance needs are matched by this, there's nothing preventing
you from doing this, and legal requirements of CC-BY do not improve it
for you in any way - they just would force people that *do not* need to
do it still do it.
That is I strongly disagree with the following
assertion: "a license
that requires BY sucks so hard for data [because] attribution
requirements grow very quickly". To my mind it is equivalent to say that
I think this assertion (that attribution requirements grow) is factually
true. Each data piece from CC-BY data set needs to carry attribution. If
your data needs require to combine several data sets, each of them needs
to carry attribution. This attribution should be carried through all
data processing pipelines. You may be OK with this growth, but as I just
explained above, these requirements, while being onerous for people that
don't need tracing each piece of data, are still unsatisfactory in many
cases for those that do. So having CC-BY would be both onerous and useless.
we will throw away traceability because it is
subjectively judged too
large a burden, without providing any start of evidence that it indeed
can't be managed, at least with Wikimedia current ressources.
It's not Wikimedia that will be shouldering the burden, it's every user
of Wikimedia data sets.