[WikiEN-l] Proposal: WikiData

Philip Sandifer snowspinner at gmail.com
Wed Feb 13 21:34:09 UTC 2008


One of the frequent inclusion/deletion arguments has been over "cruft"  
of various sorts - plot summaries, "in popular culture" sections,  
strange but interesting lists ("List of songs that mention the title  
over n times" where n was something weird and large was an old  
favorite), etc. The basic problem in these cases is that while the  
information is often verifiable, it seems somewhat tangental to a  
reasoned and well-organized presentation of major facts on a subject.

On the other hand, it is not irrelevant as such either - quite the  
contrary, the information is often valuable, if not in a strictly  
encyclopedic sense. Certainly the drive to eliminate articles that are  
just plot summaries, while well-intentioned, would serve to destroy a  
useful resource that is not duplicated by other free content projects  
at present.

In print publications such interesting tangents exist in the form of  
sidebars. Open Time or Newsweek and you'll see them all over the place  
- articles will hum merrily along, and off on the side will be small  
explanations, graphs, and other tangental pieces of information. But  
for whatever reason, in the online structure, we've largely declined  
to take advantage of that. As a result, we have a messy structure of  
sub-articles and chunks of what are essentially sidebar data dropped  
into an article. And this affects a wide range of articles. On the one  
hand you have something like [[School Hard]] - an article on an  
episode of Buffy - that is interrupted by a credits section, numerous  
lists, and a huge table of in-universe chronology. On the other you  
have something like [[Democratic Party (United States) presidential  
primaries, 2008]] where a narrative of what happened is abandoned in  
favor of graphs, charts, etc.

In both cases the problem is the same - relevant chunks of data are  
choking out the article. And in the latter case, a fair amount has  
already been done about this - there are already 8 sub-articles  
breaking out lengthy chunks of data from this article.

A quick tour of a number of major topics shows the same result with  
sub-articles. [[Hillary Rodham Clinton presidential campaign, 2008]]  
has [[Political positions of Hillary Rodham Clinton]] - a vital chunk  
of information that still amounts to a list of positions, and is not  
meaningfully an article.

I propose that we need to dramatically rethink how we treat chunks of  
data on Wikipedia. In many cases - from fictional topics to real-world  
ones - there is often a large chunk of information that is worth  
presenting, but that does not present well in article form. Our  
current method of spin-off and sub-articles leaves us with a mass of  
articles that often make poor articles even as they contain valuable  
information. (And I would say that [[School Hard]] and [[Political  
positions of Hillary Rodham Clinton]] are articles of more or less  
exactly equal quality)

I propose that we start an active repository for "sidebar content" -  
large chunks of data, lists, tables, summaries, etc. This could be  
done as a namespace - the Sidebar or Data namespace - or as a separate  
project - WikiData. But in either case, the goal would be the same -  
verifiable information that is useful in researching and learning  
about a topic, but that does not present well in the format of an  
encyclopedic overview of the topic. We'd need to come up with a good  
navigation engine - something, in other words, that avoids the litany  
of mistakes in the category system. But I think that this would let us  
dramatically re conceptualize how we cover a number of topics in a way  
that allows both the depth of (at times idiosyncratic) information  
that is widely recognized as one of our great strengths and the clear,  
well-organized prose that we strive for in encyclopedia articles.

In more practical terms, what I'm imagining would be an article on,  
say, Buffy the Vampire Slayer that had some clear link to data and  
sidebars. Click on it, and a navigational engine comes up that guides  
you through the sidebar content - a list of episodes that one could  
delve into and, from there, get plot summaries, credits, overviews of  
reviews, etc. A list of characters, an overview of critical  
commentaries, heck, a huge link collection of reviews of the series or  
of episodes. In other words, a way of having our article - structured  
with a clear lead section, and specific, well-sourced sections - be  
the top layer of a mass of well-organized content. Something that  
gives us an option for a topic beyond "have an article on it," "don't  
have an article on it," or "throw it into a messy list that doesn't  
quite function as an article."

Thoughts?

-Phil


More information about the WikiEN-l mailing list