Hi Risker,
the researchers' conclusion in their own words (see section 4.1,
"Indentation Reliability") is:
*"Incorrect indentation (i.e., indentation that implies a reply-to relation
with the wrong post) is quite common in longer discussions in the EWDC [the
English Wikipedia Discussion Corpus]."*
Responding below to your concerns about their methodology, taking the
opportunity to clear up some statistical misconceptions, which might be
valuable for other contexts too.
On Friday, March 20, 2015, Risker <risker.wp(a)gmail.com> wrote:
On 20 March 2015 at 06:13, Tilman Bayer
<tbayer(a)wikimedia.org> wrote:
On Friday, March 20, 2015, Tilman Bayer
<tbayer(a)wikimedia.org
<javascript:_e(%7B%7D,'cvml','tbayer@wikimedia.org');>> wrote:
Just to throw this in here as one data point: "39% of talk page threads
contain wrong indentations
<
https://meta.wikimedia.org/wiki/Research:Newsletter/2014/November#39.25_of_…
"
PS: The result from that paper was actually even worse than that
(somewhat
sloppy) headline suggests: the researchers
"found that 29 of 74 total
turns, or 39%±14pp of an average thread, had indentation that
misidentified
the turn to which they were a reply."
I'm not sure you really read the underlying study, Tilman; the sample size
is so absurdly small that there is no way it is statistically signifant.
(550 discussions on 83 article talk pages, in case anyone was wondering;
No, the sample size was actually stated right in the sentence I quoted
above: "74 total turns" (talk page comments responding to another one),
together with a ±14pp confidence interval.
And what exactly did you mean here by "statistically significant"? The term
doesn't make mathematical sense when applied to such a measured percentage
in isolation, i.e. without a hypothesis or comparison value. Rather, one
can talk about confidence intervals - a smaller confidence interval means
the estimate is likely to be more precise.
The 550 discussions you quoted refer to a different sample within the same
corpus.
the equivalent of about 10 minutes' worth of discussions on enwiki, except
that they are looking at talk pages that may have
conversations dating back
10+ years.)
This 10 minutes/10 years comparison and the "absurdly small"/"no way"
rhetoric sound a lot like a common statistical fallacy, namely erroneously
assuming that it is "size of the sample as a fraction of the population
that matters" <http://www.amstat.org/publications/jse/v12n2/smith.html>
("Unless the sample encompasses a substantial portion of the population,
the standard error of an estimator depends on the size of the sample, but
not the size of the population. This is a crucial statistical insight that
students find very counterintuitive").
Granted, if one draws a sample of 74 turns from all turns on all talk pages
made in Wikipedia's history, then it's plausible that that overall
population numbers hundreds of millions. But at such a large population
size (or small sample/population ratio) it is is the absolute size of the
sample that matters for the size of the confidence interval - not how large
the sample is compared to the population.
There may be accessible explanations elsewhere that include more of the
math behind it, but perhaps this Khan Academy video
<https://www.youtube.com/watch?v=1JT9oODsClE> helps, which walks one
through a calculation showing how measuring a percentage of 38% in a sample
of just 150 US households (out of 100+million) already allows one to reject
the null hypothesis that the real percentage among the entire population of
all households is less than 30%. It calls 150 a "large" sample in terms of
the approximation regime used there - which I'm sure you must find
extremely shocking as you earlier called a sample of 550 "absurdly small".
For a more thorough derivation, these online lectures
<http://www.stat.berkeley.edu/~stark/SticiGui/Text/confidenceIntervals.htm> are
quite useful (see e.g. the "Conservative confidence intervals for
percentages" section. The "finite population correction
<https://en.wikipedia.org/wiki/Finite_population_correction>" term there is
close to 1 for small sample/population ratios and so the resulting formula
for the confidence interval does no longer depend on N, the population
size).
Sure, in the present case the absolute size of the sample (74) wasn't very
big either, and there are other things to consider such as the selection
method (e.g. they actually selected from whole threads "longer than 10
turns each" only, so that's what the percentage relates to). But the
authors did their due diligence and indicated the limitations resulting
from the sample size by including that ±14% confidence interval. Yes,
that's quite broad and for more precise estimations of the real overall
percentage of wrong indentations (39% or 32% or 48%?...) one would need a
larger sample. But it already makes it highly unlikely that this real
percentage is only 1% or 2%, say.
Hence I don't see a valid reason to dismiss the authors' conclusion that
incorrect indentation is "quite common", or to deny it is likely to be
applicable to the English Wikipedia as a whole.
To make it concrete, it appears that this was one of the threads in their
sample:
https://en.wikipedia.org/wiki/Talk:Grammatical_tense#Gutted - which
is certainly rife with wrong indentations.
By the way, the analogous talk page corpus for the Simple English Wikipedia
has been published at
https://www.ukp.tu-darmstadt.de/data/discourse-analysis/wikipedia-discussio…
and
I'm told that the above corpus for the English Wikipedia is still going to
be published too.
This might be interesting material for further quantitative studies on how
the existing wikitext talk pages are actually used.
And the purpose of the study was to see if this particular
manner of analysing a discussion ("lexical
pairs") was useful in
identifying who said what to whom; it's a discussion of the analysis
process, not the actual discussions.
Your point being? I already wrote in the linked summary that this finding
about wrong indentations was a side result of the paper. But that doesn't
make it go away either.
Nonetheless, if you were trying to illustrate that
there are communication
benefits in having an easily read flow of discussion,
I actually wasn't talking about that here; being easy on readers is a
separate issue. Rather, this was about being easy on contributors. The
quoted result strongly indicates that many users who comment on wikitext
talk pages struggle to get indentation right, even if it may have become
second nature to veteran editors like you and me. That would be an argument
for building a discussion system where it's easier for commenters to
indicate which statement they are replying to, instead of training them in
colon-counting.
--
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB