[WikiEN-l] Long-term searchability of the internet

Tony Sidaway tonysidaway at gmail.com
Wed Jan 19 00:05:03 UTC 2011


On 18 January 2011 10:56, Charles Matthews
<charles.r.matthews at ntlworld.com> wrote:
> On 17/01/2011 15:30, Tony Sidaway wrote:
>> I suppose my problem here is understanding how the discussion goes
>> from<the useful part of the web is expanding faster than we can keep
>> up>  to<there is a problem with this>.
>
> I believe the "mission statement" approach to WP would necessarily find
> troubles with this phenomenon. Of course we can take the "sum of all
> knowledge" (online and offline) with a pinch of salt; that's what
> mission statements are for. But notice that the built-in inclusionism of
> addressing the issue that way has the practical effect of forcing us to
> build up expertise and criteria. CSD and notability guidelines are there
> to solve (for example) the issue of "garage bands with a MySpace page
> aren't necessarily encyclopedic", but not that issue alone. Across broad
> areas some sifting goes on.

Well, I think you answered the implicit question: naive "mission
statements" involving terms like "sum of all knowledge" aren't of much
practical use.


>
>> On deep and semantic web, these are useful concepts that will help us
>> to develop more capable data mining tools, but not essential for our
>> task at hand, which is to present a particular subset of structured,
>> organized human knowledge.
> We must look both at the "blue sky research" approach, and the pragmatic
> business of presenting a properly edited and categorised piece of
> hypertext to the world, in real time. If we treat the mining options as
> essentially irrelevant, we are planning our own obsolescence.

No complaints there.  We can continue writing an old fashioned
encyclopedia or (one day) become more semantically oriented, or
whatever else comes along.

My issue with the semantic option as it stands at present is that it's
incompatible with our current goal.  It would be fine to write a
search engine to wander off and aggregate all cricket statistics to
produce the ultimate cricket encyclopedia, but we can't do that
without a reliable free source. The free semantic infrastructure
doesn't exist, and that's even before we work out how we assess the
reliability of the information from various sources. It's neither
essential for the task in hand, nor is it clear to me that it will
ever be something this project can do.  Perhaps in ten years time my
qualms will appear laughable but for now the semantic web hasn't yet
encountered its equivalent of the Codd revolution that has made modern
databases such a doddle, and we're talking about much more ambitious
processes than those described by relational calculus.

>> Knowledge is social. We evaluate data as part of a collaboration
>> (Wikipedia merely provides a framework for exploiting this universal
>> human activity). It is unavoidable and irreducible. There is nowhere
>> online a hidden trove of knowledge that we can use without first
>> exposing it to evaluation. And we already have far more potentially
>> useful data than we can ever evaluate so it's a bit pointless worrying
>> about the invisible net in general. Better to use top down methods to
>> identify likely sources (some of which are currently invisible).
> Well, I agree with the last part, since it fits in with my approach as
> of today. WP can still usefully gobble down existing old reference
> material, and if that is done by making it visible on Wikisource on the
> way, so much the better. Given the reactions of others to this concept,
> I think you'd be wise to admit that "evaluation" is pluralistic in nature.
>

Pluralistic as in social, or pluralistic as in multi-faceted?  Either
way, no argument there.



More information about the WikiEN-l mailing list