[Wikipedia-l] Re: [Wikitech-l] Hyperlink convention

David Friedland david at nohat.net
Tue Oct 5 22:48:33 UTC 2004


Jimmy (Jimbo) Wales wrote:
> David Friedland wrote:
> 
>>Rather than trying to live in the fiction that en-us and en-gb are 
>>equally understandable and mutually compatible, we should admit that 
>>they are different, that those differences can and empirically do cause 
>>problems, and that we should create a solution to solve it.
> 
> Why do you say it is a fiction?  I don't see any real problems with it
> at all.  There must be some examples that would tend to persuade me,
> but color/colour and the like are not very compelling.

Well, let's talk about the million/milliard/billion/billiard/trillion 
fiasco.

See [http://en.wikipedia.org/wiki/1921_in_Germany#Reparations]

It says "a plan was formulated by which Germany was to pay 226 milliard 
gold marks"

Now, I'm a fairly linguistically tuned-in guy, and I'm familiar with the 
word "milliard", but only in the sense that it's a large number that has 
something to do with "million" and "billion". I just quickly asked all 
my co-workers if they knew how much a milliard was, and none of them had 
even heard of the word, let alone knew how much it was (and they all 
have degrees in linguistics).

If Wikipedia is to be maximally accessible, it shouldn't use dialectical 
words like this that are mostly inaccessible to to a large group of 
native speakers (in this case, Americans).

But the alternative is not entirely satisfactory to the speakers of 
en-gb  and related dialects. To them, a "billion" can be 
1,000,000,000,000 or 1,000,000,000 depending on which system is being 
used. To them, the least ambiguous way to describe the number 
226,000,000,000 is to say "226 milliard", as was done is this article.

So what is the solution? One suggestion would be just to avoid these 
words, of which admittedly there aren't many. "Milliard" and "billion" 
could just be written out as large numbers. The problem with this 
solution is that it sacrifices overall readability. It's much easier to 
read "1 billion" than "1,000,000,000" (did you notice yourself counting 
the zeroes?).

Another example is the verb "slate":

According to 
[http://en.wikipedia.org/wiki/List_of_words_having_different_meanings_in_British_and_American_English],
slate means "to disparage" in en-gb but "to schedule" in en-us. In the 
(admittedly not very good) article 
[http://en.wikipedia.org/wiki/Ash_Mongoose], we have "the idea of giving 
him a tattoo could be seen as a bit mature so the idea was slated". To 
an en-us reader, this sentence will be at best confusing and at worst 
misleading. Was the idea of giving him a tattoo disparaged or scheduled?

I want to note that all my examples are from the American perspective 
because I am American. Also, it is my understanding (although I don't 
have any firm proof of this) that American culture has a stronger 
overall impact on the British than the reverse, due partly to global 
American cultural hegemony and partly to rampant cultural isolationism 
in the US. In other words, my guess is that it is more likely British 
readers will be familiar with American usage than the other way around.

The differences are indeed few and the likelihood of confusion small, 
but nevertheless possible. Rather than gloss over these problems or 
embrace them in the name of diversity, we should endeavor to eliminate 
the likelihood of confusing our readers. We will have done the reader a 
great disservice if he or she reads an article, and based on dialectical 
differences, comes away with a misunderstanding of the topic.

In most cases, the potential misunderstanding can be fixed by rewording, 
but as in the milliard example, alternatives are non-ideal. Should we 
settle for a non-ideal solution, or should we create a solution that is 
ideal?

It is a fact that modern browsers can be configured to specify what 
language and dialect they prefer, and this feature _could_ be used to 
help reduce the amount of potential misunderstanding on Wikipedia. Why 
not take advantage?

In my last post, I made a proposal to use templates for dialect marking. 
Thus, in the examples above, making the wikitext the following would fix 
  it:

a plan was formulated by which Germany was to pay 226 {{milliard}} gold 
marks

would appear to an en-gb reader as

a plan was formulated by which Germany was to pay 226 milliard gold marks

and to an en-us reader as

a plan was formulated by which Germany was to pay 226 billion gold marks

And if we allow that, who's it going to hurt to allow "The primary 
{{colours}} are red, yellow, and blue."? Certainly a few extra {{ }} 
marks aren't any more burdensome than ''' marks.

Our primary duty is to inform, and we should create solutions for 
impediments to our duty.

 > I discovered the last time I was in London that what we Americans call
 > 'arugula' is called 'rocket'.  This is a rather preposterous name for
 > a lettuce, in my opinion, but nonetheless, that's what they say.  And
 > I was pleased to have learned about it.
 >
 > http://en.wikipedia.org/wiki/Arugula
 >
 > would have informed me of this quite well.  Would it be better to have
 > used fancy wiki markup to deny me this learning opportunity?

If you were reading about the primary exports of a small region of an 
obscure country and in the list was "rocket", is it possible you could 
have come away from the article believing that the obscure country is a 
mjor exporter of rockets? Under ideal conditions, yes, these kinds of 
dialectical differences can be illuminating, but under equally likely 
non-ideal conditions, the differences can be confusing and misleading.

 > Where the languages are similar enough to be mutually intelligible, we
 > don't need to do anything.  Where the languages are different enough
 > to cause trouble, we have a delightful teaching opportunity which adds
 > considerable richness to our work.

While I admire the pluck of characterizing inconsistency as richness, I 
think that "down in the trenches" the reality of the differences in 
dialect (mostly between en-us and en-gb, but also, for example between 
pt-pt and pt-br) is a continuous stream of conflict, debate, confusion, 
and frustration that policy has failed to allieviate.

There exists a technical solution that would alleviate the problem and 
not significantly burden editors. Should we reject this solution on the 
wishful notion that our differences can unite rather than divide us?

- David




More information about the Wikipedia-l mailing list