Metrics - accuracy of Wikipedia articles - Wikimedia-l

List overview All Threads
Download

newer

Metrics - accuracy of Wikipedia articles

older

[Wikimedia Announcements] The...

Cracking Wikipedia

Anthony Cole

7 May 2014 7 May '14

4:17 p.m.

Could someone please point me to all the studies the WMF have conducted into the reliability of Wikipedia's content? I'm particularly interested in the medical content, but would also like to look over the others too. Cheers.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

Show replies by date

Risker

7 May 7 May

5:59 p.m.

On 7 May 2014 16:17, Anthony Cole ahcoleecu@gmail.com wrote:

...

Could someone please point me to all the studies the WMF have conducted into the reliability of Wikipedia's content? I'm particularly interested in the medical content, but would also like to look over the others too. Cheers.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

I've often thought about this myself, and I'm fairly certain the WMF has never done any serious assessment of article quality. Different projects have done so on their own, through content auditing processes and the development of Wikipedia 1.0, but that affects a minority of articles. There are some real challenges in coming up with workable metrics.

For example - Is a stub article inaccurate, incomplete, or really contains all the information it's likely ever going to get?

How does one assess the accuracy of articles where there are multiple sources that we'd consider reliable, but who provide contradictory information on a topic? That would include, for example, all the ongoing boundary issues involving multiple countries, the assessment of historical impact of certain events or persons, and certain scientific topics where new claims and reports happen fairly frequently and may or may not have been reproduced. There may also be geographic or cultural factors that affect the quality of an article, or the perceived notability of a subject, and challenges dealing with cross-language reference sources.

Many of the metrics used for determining "quality" in audited articles on English Wikipedia have very little to do with the actual quality of the article. From the perspective of providing good information, a lot of Manual of Style practices are nice but not required. Certain accessibility standards (alt text for images, media positioning so as not to adversely affect screen-readers) are not quality metrics, strictly speaking; they're *accessibility* standards. There remains a huge running debate about whether or not infoboxes should be required, what information should be in them, how to deal with controversial or complex information in infoboxes, etc.

So I suppose the first step would be in determining what metrics should be included in a quality assessment of a project.

Risker/Anne

Andreas Kolbe

6:14 p.m.

Anne, there are really well-established systems of scholarly peer review. There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

I find it extraordinary that, after 13 years, a project designed to make the sum of human knowledge available to humanity, with an annual budget of $50 million, has no clue how to measure the quality of the content it is providing, no apparent interest in doing so, and no apparent will to spend money on it.

For what it's worth, there was a recent external study of Wikipedia's medical content that came to unflattering results:

http://www.jaoa.org/content/114/5/368.full

---o0o---

Most Wikipedia articles for the 10 costliest conditions in the United States contain errors compared with standard peer-reviewed sources. Health care professionals, trainees, and patients should use caution when using Wikipedia to answer questions regarding patient care.

Our findings reinforce the idea that physicians and medical students who currently use Wikipedia as a medical reference should be discouraged from doing so because of the potential for errors.

---o0o---

On Wed, May 7, 2014 at 10:59 PM, Risker risker.wp@gmail.com wrote:

...

On 7 May 2014 16:17, Anthony Cole ahcoleecu@gmail.com wrote:

...
Could someone please point me to all the studies the WMF have conducted into the reliability of Wikipedia's content? I'm particularly interested

in

...
the medical content, but would also like to look over the others too. Cheers.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

I've often thought about this myself, and I'm fairly certain the WMF has never done any serious assessment of article quality. Different projects have done so on their own, through content auditing processes and the development of Wikipedia 1.0, but that affects a minority of articles. There are some real challenges in coming up with workable metrics.

For example - Is a stub article inaccurate, incomplete, or really contains all the information it's likely ever going to get?

How does one assess the accuracy of articles where there are multiple sources that we'd consider reliable, but who provide contradictory information on a topic? That would include, for example, all the ongoing boundary issues involving multiple countries, the assessment of historical impact of certain events or persons, and certain scientific topics where new claims and reports happen fairly frequently and may or may not have been reproduced. There may also be geographic or cultural factors that affect the quality of an article, or the perceived notability of a subject, and challenges dealing with cross-language reference sources.

Many of the metrics used for determining "quality" in audited articles on English Wikipedia have very little to do with the actual quality of the article. From the perspective of providing good information, a lot of Manual of Style practices are nice but not required. Certain accessibility standards (alt text for images, media positioning so as not to adversely affect screen-readers) are not quality metrics, strictly speaking; they're *accessibility* standards. There remains a huge running debate about whether or not infoboxes should be required, what information should be in them, how to deal with controversial or complex information in infoboxes, etc.

So I suppose the first step would be in determining what metrics should be included in a quality assessment of a project.

Risker/Anne _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

David Gerard

6:19 p.m.

On 7 May 2014 23:14, Andreas Kolbe jayen466@gmail.com wrote:

...

For what it's worth, there was a recent external study of Wikipedia's medical content that came to unflattering results: http://www.jaoa.org/content/114/5/368.full

Osteopaths.

Perhaps we could ask the chiropractors and homeopaths what they think too.

- d.

Nathan

6:29 p.m.

On Wed, May 7, 2014 at 6:19 PM, David Gerard dgerard@gmail.com wrote:

...

On 7 May 2014 23:14, Andreas Kolbe jayen466@gmail.com wrote:

...
For what it's worth, there was a recent external study of Wikipedia's medical content that came to unflattering results: http://www.jaoa.org/content/114/5/368.full

Osteopaths.

Perhaps we could ask the chiropractors and homeopaths what they think too.

d.

You misunderstand - these are doctors of osteopathic medicine in the U.S. They are effectively the equivalent of typical medical doctors. The term osteopath as you use it in the UK and elsewhere has a very different meaning here.

Andreas Kolbe

6:30 p.m.

"In a blinded process, we randomly selected 10 reviewers to examine 2 of the selected Wikipedia articles. Each reviewer was an internal medicine resident or rotating intern at the time of the assignment. This arrangement created redundancy, giving the study 2 independent reviewers for each article. Also, by using physicians as reviewers, we ensured a baseline competency in medical literature interpretation and research."

The articles reviewed were coronary artery disease, lung cancer, major depressive disorder, osteoarthritis, chronic obstructive pulmonary disease, hypertension, diabetes mellitus, back pain, hyperlipidemia and concussion.

Carry on.

On Wed, May 7, 2014 at 11:19 PM, David Gerard dgerard@gmail.com wrote:

...

On 7 May 2014 23:14, Andreas Kolbe jayen466@gmail.com wrote:

...
For what it's worth, there was a recent external study of Wikipedia's medical content that came to unflattering results: http://www.jaoa.org/content/114/5/368.full

Osteopaths.

Perhaps we could ask the chiropractors and homeopaths what they think too.

d.

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Risker

6:40 p.m.

On 7 May 2014 18:30, Andreas Kolbe jayen466@gmail.com wrote:

...

"In a blinded process, we randomly selected 10 reviewers to examine 2 of the selected Wikipedia articles. Each reviewer was an internal medicine resident or rotating intern at the time of the assignment. This arrangement created redundancy, giving the study 2 independent reviewers for each article. Also, by using physicians as reviewers, we ensured a baseline competency in medical literature interpretation and research."

The articles reviewed were coronary artery disease, lung cancer, major depressive disorder, osteoarthritis, chronic obstructive pulmonary disease, hypertension, diabetes mellitus, back pain, hyperlipidemia and concussion.

Carry on.

Ah, but the costliest conditions aren't actually comparable to the relevant Wikipedia articles. For example, the "costly condition" of cancer is compared to the article on lung cancer, despite the fact that we have an article on cancer. The costly condition of "trauma-related disorders" - a very broad topic that would include traumatic amputations, fractures, burns, and a multitude of other issues is compared to the article on concussion; the costly condition of "mental disorders" is compared to the article on major depressive disorder despite, again, haing an article on mental disorders.

And each article is reviewed by only two people; when one looks at the results, we see that in most cases the two reviewers provided very different results.

Risker/Anne

Andreas Kolbe

7:05 p.m.

On Wed, May 7, 2014 at 11:40 PM, Risker risker.wp@gmail.com wrote:

...

Ah, but the costliest conditions aren't actually comparable to the relevant Wikipedia articles. For example, the "costly condition" of cancer is compared to the article on lung cancer, despite the fact that we have an article on cancer. The costly condition of "trauma-related disorders" - a very broad topic that would include traumatic amputations, fractures, burns, and a multitude of other issues is compared to the article on concussion; the costly condition of "mental disorders" is compared to the article on major depressive disorder despite, again, haing an article on mental disorders.

Yes, and hypertension was compared to the WP article on hypertension, hyperlipidemia was compared to the WP article on hyperlipidemia, etc.

And each article is reviewed by only two people; when one looks at the

...

results, we see that in most cases the two reviewers provided very different results.

Frankly, the merits of this study are neither here not there. I just thought it might be of interest.

What troubles me more is that you seem to be saying that meaningful expert review of Wikipedia content is not possible; at least you have not given any indication so far that you think otherwise. Yet beyond the confines of Wikipedia, expert review happens routinely, and daily, and is widely relied upon.

Is it really your belief that academics have no better access to knowledge than does an undisciplined crowd of random people, and that there is no way one could design a study that would give meaningful insight into the current quality level of Wikipedia content? I would find that truly bizarre.

Risker

6:27 p.m.

On 7 May 2014 18:14, Andreas Kolbe jayen466@gmail.com wrote:

...

Anne, there are really well-established systems of scholarly peer review. There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

I find it extraordinary that, after 13 years, a project designed to make the sum of human knowledge available to humanity, with an annual budget of $50 million, has no clue how to measure the quality of the content it is providing, no apparent interest in doing so, and no apparent will to spend money on it.

For what it's worth, there was a recent external study of Wikipedia's medical content that came to unflattering results:

http://www.jaoa.org/content/114/5/368.full

---o0o---

Most Wikipedia articles for the 10 costliest conditions in the United States contain errors compared with standard peer-reviewed sources. Health care professionals, trainees, and patients should use caution when using Wikipedia to answer questions regarding patient care.

Our findings reinforce the idea that physicians and medical students who currently use Wikipedia as a medical reference should be discouraged from doing so because of the potential for errors.

...

Doesn't help very much in assessing the quality of the article on [[Liancourt Rocks]] - when depending on where in the world one is, the article can be reasonably accurate or completely inaccurate. This is one of the geographic issues of which I speak.

There are also issues with the study you reference - it's quite biased toward American information and the articles only have two reviewers. It perhaps points out how easy it is to get junk science published in peer-reviewed journals if the topic is "sexy" enough - their own study wouldn't meet our standards for inclusion.

Risker/Anne

Andreas Kolbe

6:36 p.m.

Junk science? I suppose the Article Feedback Tool was more scientific, then, because that's the best the Foundation has come up with so far.

On Wed, May 7, 2014 at 11:27 PM, Risker risker.wp@gmail.com wrote:

...

On 7 May 2014 18:14, Andreas Kolbe jayen466@gmail.com wrote:

...
Anne, there are really well-established systems of scholarly peer review. There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

I find it extraordinary that, after 13 years, a project designed to make the sum of human knowledge available to humanity, with an annual budget

of

...
$50 million, has no clue how to measure the quality of the content it is providing, no apparent interest in doing so, and no apparent will to

spend

...
money on it.

For what it's worth, there was a recent external study of Wikipedia's medical content that came to unflattering results:

http://www.jaoa.org/content/114/5/368.full

---o0o---

Most Wikipedia articles for the 10 costliest conditions in the United States contain errors compared with standard peer-reviewed sources.

Health

...
care professionals, trainees, and patients should use caution when using Wikipedia to answer questions regarding patient care.

Our findings reinforce the idea that physicians and medical students who currently use Wikipedia as a medical reference should be discouraged from doing so because of the potential for errors.

...
Doesn't help very much in assessing the quality of the article on [[Liancourt Rocks]] - when depending on where in the world one is, the article can be reasonably accurate or completely inaccurate. This is one of the geographic issues of which I speak.

There are also issues with the study you reference - it's quite biased toward American information and the articles only have two reviewers. It perhaps points out how easy it is to get junk science published in peer-reviewed journals if the topic is "sexy" enough - their own study wouldn't meet our standards for inclusion.

Risker/Anne _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Michael Maggs

6:32 p.m.

Measuring the quality of Wikipedia articles in general is an issue that Wikimedia UK is interested in looking at, though by means of automation rather than the gold-standard but much less scalable method of scholarly peer review.

Our early-stage plans for a large-scale IT project to provide automated quality-measuring tools for the community can be found at [1].

On the corresponding talk page my fellow trustee Simon Knight has recently posted the results of a literature survey that may be of some interest, although again he was focusing on automation rather than individual manual review. Almost all of the research has been done by academics, and very little seems to have found its way back to the Wikimedia space where it could be applied in practice.

Michael

[1] https://wikimedia.org.uk/wiki/Technology_Committee/Project_requests/WikiRate...

On 7 May 2014, at 23:14, Andreas Kolbe jayen466@gmail.com wrote:

...

Anne, there are really well-established systems of scholarly peer review. There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

I find it extraordinary that, after 13 years, a project designed to make the sum of human knowledge available to humanity, with an annual budget of $50 million, has no clue how to measure the quality of the content it is providing, no apparent interest in doing so, and no apparent will to spend money on it.

For what it's worth, there was a recent external study of Wikipedia's medical content that came to unflattering results:

http://www.jaoa.org/content/114/5/368.full

---o0o---

Most Wikipedia articles for the 10 costliest conditions in the United States contain errors compared with standard peer-reviewed sources. Health care professionals, trainees, and patients should use caution when using Wikipedia to answer questions regarding patient care.

Our findings reinforce the idea that physicians and medical students who currently use Wikipedia as a medical reference should be discouraged from doing so because of the potential for errors.

---o0o---

On Wed, May 7, 2014 at 10:59 PM, Risker risker.wp@gmail.com wrote:

...
On 7 May 2014 16:17, Anthony Cole ahcoleecu@gmail.com wrote:

...
Could someone please point me to all the studies the WMF have conducted into the reliability of Wikipedia's content? I'm particularly interested

in

...
the medical content, but would also like to look over the others too. Cheers.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

Andreas Kolbe

6:38 p.m.

On Wed, May 7, 2014 at 11:32 PM, Michael Maggs Michael@maggs.name wrote:

...

Measuring the quality of Wikipedia articles in general is an issue that Wikimedia UK is interested in looking at, though by means of automation rather than the gold-standard but much less scalable method of scholarly peer review.

It doesn't *have* to be scalable. That's what sampling was invented for.

Automation. As they say, if all you have is a hammer, every problem looks like a nail.

Michael Maggs

6:45 p.m.

It does have to be scalable if you want to be able to measure any article, at least approximately. Let's say I am interested in 5000 articles on the subjects X, Y and Z, none of which have been manually rated, and never will be due to the scaling problem. An automated tool would be extremely useful to me, and is the best measure I am going to get in the absence of huge resources to measure their qualities in the manual way.

Michael

On 7 May 2014, at 23:38, Andreas Kolbe jayen466@gmail.com wrote:

...

On Wed, May 7, 2014 at 11:32 PM, Michael Maggs Michael@maggs.name wrote:

...
Measuring the quality of Wikipedia articles in general is an issue that Wikimedia UK is interested in looking at, though by means of automation rather than the gold-standard but much less scalable method of scholarly peer review.

It doesn't *have* to be scalable. That's what sampling was invented for.

Automation. As they say, if all you have is a hammer, every problem looks like a nail. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wil Sinclair

10:10 p.m.

Would it be possible for WMF or another organization to initiate and potentially fund a project modeled on the Human Genome Project? That is, WMF or some other institution could host a large database of data that researchers can contribute to and that makes all the data available for researchers to analyze and build visualization tools against? Kinda like a wiki based on data from analyzing another wiki. ;) Such data might add a set of metadata on an article that could be used as a field in a hypercube, for example.

Beyond just hosting the database, it would be possible to write tools that check aspects of these data before commit to make sure that they are consistent with the other data that has already been committed. Large data set leave very unique signatures in aggregate. For example, we use checksums all the time to verify that data has been corrupted during transmissions.

I imagine this isn't the first time someone has thrown something like this in to the Wikipedosphere. If so, what did people think? If not, what do you guys think? :)

,Wil

Nathan

10:17 p.m.

On Wed, May 7, 2014 at 10:10 PM, Wil Sinclair wllm@wllm.com wrote:

...

I imagine this isn't the first time someone has thrown something like this in to the Wikipedosphere. If so, what did people think? If not, what do you guys think? :)

,Wil

I think it sounds a little bit like wikidata.org, with some innovation of potential future applications.

Wil Sinclair

8 May 8 May

3:14 a.m.

I *am* a fan of structured, semantic databases like wikidata, but that's not what I'm talking about.

In highly structured databases, adding properties that may be useful for your research and the work of others would require altering the structure itself, like adding a field, for example. That isn't easy, because the powers that be have to first agree that it is appropriate, worthy, and fits correctly in to the ontology. If there is a type hierarchy, then the sample set would probably have to conform to a sub-tree in the type hierarchy which may not correspond well to the sample set that the researcher is actually interested in.

I'm talking about the exact opposite, actually. Unstructured databases can be easily altered and indexed in much more flexible ways. The indices for these databases wouldn't normally be stored with the data itself; the researchers would get a data dump and create the indices needed for their own studies. Conventions would be enforced if the researchers wish to contribute anything back.

Most importantly, if I understand correctly, wikidata is a secondary database that doesn't correspond one-to-one with Wikipedia articles yet, and it's not clear to me whether it ever will. While it might be interesting to someone using the data collected in Wikipedia and imported in wikidata for semantic-oriented research like basic AI that would help computers win on Jeopardy, it wouldn't be interesting to someone studying Wikipedia itself.

,Wil

On Wed, May 7, 2014 at 7:17 PM, Nathan nawrich@gmail.com wrote:

...

On Wed, May 7, 2014 at 10:10 PM, Wil Sinclair wllm@wllm.com wrote:

...
I imagine this isn't the first time someone has thrown something like this in to the Wikipedosphere. If so, what did people think? If not, what do you guys think? :)

,Wil

I think it sounds a little bit like wikidata.org, with some innovation of potential future applications. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

David Cuenca

5:49 a.m.

On Thu, May 8, 2014 at 9:14 AM, Wil Sinclair wllm@wllm.com wrote:

...

In highly structured databases, adding properties that may be useful for your research and the work of others would require altering the structure itself, like adding a field, for example. That isn't easy, because the powers that be have to first agree that it is appropriate, worthy, and fits correctly in to the ontology. If there is a type hierarchy, then the sample set would probably have to conform to a sub-tree in the type hierarchy which may not correspond well to the sample set that the researcher is actually interested in.

You can create your own instance of Wikibase and decide on the structure, fields, ontology, etc Then you can find the points of intersection with Wikidata concepts and link back to them if you feel like (not necessary, though). More info: https://www.mediawiki.org/wiki/Wikibase

...

I'm talking about the exact opposite, actually. Unstructured databases can be easily altered and indexed in much more flexible ways. The indices for these databases wouldn't normally be stored with the data itself; the researchers would get a data dump and create the indices needed for their own studies. Conventions would be enforced if the researchers wish to contribute anything back.

So basically like running an instance of CKAN? http://ckan.org/

...

Most importantly, if I understand correctly, wikidata is a secondary database that doesn't correspond one-to-one with Wikipedia articles yet, and it's not clear to me whether it ever will. While it might be interesting to someone using the data collected in Wikipedia and imported in wikidata for semantic-oriented research like basic AI that would help computers win on Jeopardy, it wouldn't be interesting to someone studying Wikipedia itself.

It might help to improve the data accuracy since it will be possible to update all uses of any parameter in any article at once. Some wikipedias use that data to generate text, I guess in those cases you could say that with quality data you will have quality text.

Cheers, Micru

phoebe ayers

7 May 7 May

7:22 p.m.

On Wed, May 7, 2014 at 3:14 PM, Andreas Kolbe jayen466@gmail.com wrote:

...

Anne, there are really well-established systems of scholarly peer review. There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

And those peer review systems have lots and lots of problems as well as upsides. Lots of people *are* trying to reinvent peer review, including some very respected scientists.* As an academic science librarian, I can attest to there being widespread and currently ongoing debates about how to review scientific knowledge, whether traditional peer review is sufficient, and how to improve it. The current system for scientific research is often opaque, messy, prone to failure and doesn't always support innovation, and lots of smart people are thinking about it.

Erik: aha! I'd forgotten about those case studies, thanks!

-- phoebe

* http://blogs.berkeley.edu/2013/10/04/open-access-is-not-the-problem/

Andreas Kolbe

7:38 p.m.

On Thu, May 8, 2014 at 12:22 AM, phoebe ayers phoebe.wiki@gmail.com wrote:

...

On Wed, May 7, 2014 at 3:14 PM, Andreas Kolbe jayen466@gmail.com wrote:

...
Anne, there are really well-established systems of scholarly peer review. There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

And those peer review systems have lots and lots of problems as well as upsides. Lots of people *are* trying to reinvent peer review, including some very respected scientists.* As an academic science librarian, I can attest to there being widespread and currently ongoing debates about how to review scientific knowledge, whether traditional peer review is sufficient, and how to improve it. The current system for scientific research is often opaque, messy, prone to failure and doesn't always support innovation, and lots of smart people are thinking about it.

Erik: aha! I'd forgotten about those case studies, thanks!

Given that the post that started this thread referenced medical content, are you telling me that you think it would be useless to have qualified medical experts reviewing Wikipedia's medical content, because the process would be "opaque, messy, prone to failure and doesn't always support innovation"?

Thyge

7:48 p.m.

mr Andreas Kolbe, I would like to tell you, that your mailings here strike me as being negative and unhelpful. If you have any suggestions for improvement, please put them forward, since this is an interesting topic. The "undisciplined crowd of random people" is what the world comprises, and a subset of those are trying their best to bring knowledge to the world and appreciate any help you may provide to improve and measure quality.

regards, Thyge

2014-05-08 1:38 GMT+02:00 Andreas Kolbe wrote

...

Given that the post that started this thread referenced medical content,

...

are you telling me that you think it would be useless to have qualified medical experts reviewing Wikipedia's medical content, because the process would be "opaque, messy, prone to failure and doesn't always support innovation"? _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Andreas Kolbe

8 p.m.

On Thu, May 8, 2014 at 12:48 AM, Thyge ltl.privat@gmail.com wrote:

...

mr Andreas Kolbe, I would like to tell you, that your mailings here strike me as being negative and unhelpful.

...

If you have any suggestions for improvement, please put them forward, since this is an interesting topic. The "undisciplined crowd of random people" is what the world comprises,

That's not all the world comprises. There are universities.

...

and a subset of those are trying their best to bring knowledge to the world and appreciate any help you may provide to improve and measure quality.

As for study design, I'd suggest you begin with a *random* sample of frequently-viewed Wikipedia articles in a given topic area (e.g. those within the purview of WikiProject Medicine), have them assessed by an independent panel of academic experts, and let them publish their results.

All of that is quite doable. You begin with a list of articles from the database, agree a method of random selection, and let experts do their job.

If the results are good, it redounds to Wikipedia's credit. If the results are bad, it provides valuable feedback to the community, an indication of Wikipedia's reliability to the public, an opportunity for further analysis both within and without the Wikipedia community, and an indication of where quality improvement efforts should be focused.

These are all outcomes that are fully in line with the Foundation's mission.

Thyge

8:06 p.m.

Maybe you should suggest that to the universities and not just to this mailing list. Nothing prevents to set up " an independent panel of academic experts" and to start doing that job today. regards, Thyge

2014-05-08 2:00 GMT+02:00 Andreas Kolbe jayen466@gmail.com:

...

On Thu, May 8, 2014 at 12:48 AM, Thyge ltl.privat@gmail.com wrote:

...
mr Andreas Kolbe, I would like to tell you, that your mailings here strike me as being negative and unhelpful.

:)

...
If you have any suggestions for improvement, please put them forward,

since

...
this is an interesting topic. The "undisciplined crowd of random people" is what the world comprises,

That's not all the world comprises. There are universities.

...
and a subset of those are trying their best to bring knowledge to the world

and

...
appreciate any help you may provide to improve and measure quality.

As for study design, I'd suggest you begin with a *random* sample of frequently-viewed Wikipedia articles in a given topic area (e.g. those within the purview of WikiProject Medicine), have them assessed by an independent panel of academic experts, and let them publish their results.

All of that is quite doable. You begin with a list of articles from the database, agree a method of random selection, and let experts do their job.

If the results are good, it redounds to Wikipedia's credit. If the results are bad, it provides valuable feedback to the community, an indication of Wikipedia's reliability to the public, an opportunity for further analysis both within and without the Wikipedia community, and an indication of where quality improvement efforts should be focused.

These are all outcomes that are fully in line with the Foundation's mission. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Andreas Kolbe

8:12 p.m.

On Thu, May 8, 2014 at 1:06 AM, Thyge ltl.privat@gmail.com wrote:

...

Maybe you should suggest that to the universities and not just to this mailing list. Nothing prevents to set up " an independent panel of academic experts" and to start doing that job today. regards, Thyge

Well, I'd like the Foundation to invest in such research, which is why I brought it up here.

I cant think of several instances of donors' money being spent on things that to me seemed less supportive of the Foundation's core mission.

Nathan

8:17 p.m.

On Wed, May 7, 2014 at 8:12 PM, Andreas Kolbe jayen466@gmail.com wrote:

...

Well, I'd like the Foundation to invest in such research, which is why I brought it up here.

I cant think of several instances of donors' money being spent on things that to me seemed less supportive of the Foundation's core mission. _______________________________________________

Perhaps while the UK chapter pursues automated methods of assessment, another chapter can apply for a WMF grant to pursue a more traditional review effort. Maybe Wikimedia DC? I don't think this kind of research is really the WMF's purview; for reasons everyone is familiar with, it's important they remain distant from reviewing and managing content.

Wil Sinclair

8:24 p.m.

I'm a total newb here, and I know the grant system between WMF and the different chapters has been debated in the past. But I have a simple question: if WMF is funding these efforts through grants and the grant money is used to review and/or manage content, wouldn't it be indirectly getting involved with reviewing and managing content?

,Wil

On Wed, May 7, 2014 at 5:17 PM, Nathan nawrich@gmail.com wrote:

...

On Wed, May 7, 2014 at 8:12 PM, Andreas Kolbe jayen466@gmail.com wrote:

...
Well, I'd like the Foundation to invest in such research, which is why I brought it up here.

I cant think of several instances of donors' money being spent on things that to me seemed less supportive of the Foundation's core mission. _______________________________________________

Perhaps while the UK chapter pursues automated methods of assessment, another chapter can apply for a WMF grant to pursue a more traditional review effort. Maybe Wikimedia DC? I don't think this kind of research is really the WMF's purview; for reasons everyone is familiar with, it's important they remain distant from reviewing and managing content. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Nathan

8:30 p.m.

On Wed, May 7, 2014 at 8:24 PM, Wil Sinclair wllm@wllm.com wrote:

...

I'm a total newb here, and I know the grant system between WMF and the different chapters has been debated in the past. But I have a simple question: if WMF is funding these efforts through grants and the grant money is used to review and/or manage content, wouldn't it be indirectly getting involved with reviewing and managing content?

,Wil

Depends on the nature of the grant. In any case I think affiliates are better placed to perform this kind of work anyway, since we'd want it to be done in more than one language and using diverse panels with members from more than just the U.S. But I do think it would be really cool research and the results would certainly be very interesting. It also makes sense as complementary to automated efforts, and then the results of the different methods could be compared to assess effectiveness of the review processes.

Andreas Kolbe

8:56 p.m.

On Thu, May 8, 2014 at 1:30 AM, Nathan nawrich@gmail.com wrote:

...

On Wed, May 7, 2014 at 8:24 PM, Wil Sinclair wllm@wllm.com wrote:

...
I'm a total newb here, and I know the grant system between WMF and the different chapters has been debated in the past. But I have a simple question: if WMF is funding these efforts through grants and the grant money is used to review and/or manage content, wouldn't it be indirectly getting involved with reviewing and managing content?

,Wil

Depends on the nature of the grant. In any case I think affiliates are better placed to perform this kind of work anyway, since we'd want it to be done in more than one language and using diverse panels with members from more than just the U.S. But I do think it would be really cool research and the results would certainly be very interesting. It also makes sense as complementary to automated efforts, and then the results of the different methods could be compared to assess effectiveness of the review processes.

I don't think this is an issue; as Erik has kindly pointed out in this thread, the Foundation has funded at least one such study in the past. (However, this study does not seem to have been based on a random sample – at least I cannot find any mention of the sample selection method in the study's write-up. The selection of a random sample is key to any such effort, and the method used to select the sample should be described in detail in any resulting report.)

To me, funding work that results in content quality feedback to the community does not mean that the Foundation is getting involved in content management. The expert panel would obviously have to have complete academic freedom to publish whatever their findings are, without pre-publication review by the Foundation. I would not expect the experts involved to end up editing Wikipedia; if any of them did, this would be their private initiative as individuals, and not covered by any grant.

I would consider such a research programme an important service to the community, just as the Board provides software, guidance through board resolutions, and so forth.

It would be an equally vital service to the reading public that the Foundation's projects serve.

In my view, any such programme of studies should begin with the English Wikipedia, as it is the most comprehensive and most widely accessed project, including by many non-native speakers looking for more detailed information than their own language version of Wikipedia provides. Medical content would be an excellent area to start with.

Risker

9:41 p.m.

On 7 May 2014 20:56, Andreas Kolbe jayen466@gmail.com wrote:

...

On Thu, May 8, 2014 at 1:30 AM, Nathan nawrich@gmail.com wrote:

...
On Wed, May 7, 2014 at 8:24 PM, Wil Sinclair wllm@wllm.com wrote:

...
I'm a total newb here, and I know the grant system between WMF and the different chapters has been debated in the past. But I have a simple question: if WMF is funding these efforts through grants and the grant money is used to review and/or manage content, wouldn't it be indirectly getting involved with reviewing and managing content?

,Wil

Depends on the nature of the grant. In any case I think affiliates are better placed to perform this kind of work anyway, since we'd want it to

be

...
done in more than one language and using diverse panels with members from more than just the U.S. But I do think it would be really cool research

and

...
the results would certainly be very interesting. It also makes sense as complementary to automated efforts, and then the results of the different methods could be compared to assess effectiveness of the review

processes.

...
I don't think this is an issue; as Erik has kindly pointed out in this thread, the Foundation has funded at least one such study in the past. (However, this study does not seem to have been based on a random sample – at least I cannot find any mention of the sample selection method in the study's write-up. The selection of a random sample is key to any such effort, and the method used to select the sample should be described in detail in any resulting report.)

To me, funding work that results in content quality feedback to the community does not mean that the Foundation is getting involved in content management. The expert panel would obviously have to have complete academic freedom to publish whatever their findings are, without pre-publication review by the Foundation. I would not expect the experts involved to end up editing Wikipedia; if any of them did, this would be their private initiative as individuals, and not covered by any grant.

I would consider such a research programme an important service to the community, just as the Board provides software, guidance through board resolutions, and so forth.

It would be an equally vital service to the reading public that the Foundation's projects serve.

In my view, any such programme of studies should begin with the English Wikipedia, as it is the most comprehensive and most widely accessed project, including by many non-native speakers looking for more detailed information than their own language version of Wikipedia provides. Medical content would be an excellent area to start with.

I think perhaps there is a lack of research into the extent of research already being done by independent, qualified third parties. Several examples are provided in the references of the study you posted, Andreas. For example, this one in the Journal of Oncology Practice[1] compares specific Wikipedia articles for patient-oriented cancer information against the professionally edited PDQ database. It appears that the two were comparable in most areas, except for readability, where the PDQ database was considered significantly more readable. Now, again, this is a small study and it has not been reproduced; however, it took me minutes to find more information on the very subject you're interested in, created by completely independent bodies who have "no pony in the race". There did seem to be a fair number of studies related to medical topics. Now if only we could learn from them - especially on the readability point, which I think really is a very serious issue. Wikipedia isn't really intended to educate physicians about medical topics, it's intended to be a general reference for non-specialists.

Very few people are going to make life-and-death decisions based on our math or physics topic areas, but I'll lay odds that any study would find a significant readability issue with both of them, as well.

Risker/Anne

[1] http://jop.ascopubs.org/content/7/5/319.full

Andreas Kolbe

10:24 p.m.

On Thu, May 8, 2014 at 2:41 AM, Risker risker.wp@gmail.com wrote:

...

I think perhaps there is a lack of research into the extent of research already being done by independent, qualified third parties. Several examples are provided in the references of the study you posted, Andreas. For example, this one in the Journal of Oncology Practice[1] compares specific Wikipedia articles for patient-oriented cancer information against the professionally edited PDQ database. It appears that the two were comparable in most areas, except for readability, where the PDQ database was considered significantly more readable. Now, again, this is a small study and it has not been reproduced; however, it took me minutes to find more information on the very subject you're interested in, created by completely independent bodies who have "no pony in the race". There did seem to be a fair number of studies related to medical topics. Now if only we could learn from them - especially on the readability point, which I think really is a very serious issue. Wikipedia isn't really intended to educate physicians about medical topics, it's intended to be a general reference for non-specialists.

Very few people are going to make life-and-death decisions based on our math or physics topic areas, but I'll lay odds that any study would find a significant readability issue with both of them, as well.

Risker/Anne

[1] http://jop.ascopubs.org/content/7/5/319.full

In the study you reference, Anne, reviewers spent all of 18 minutes on each article. The readability analysis was done by automation.

I did review the link Phoebe posted earlier;

https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Wikipedia_and_m...

I found the number of recent studies assessing actual Wikipedia content in this field there very scanty. The best seemed to be this February 2014 study in the European Journal of Gastroenterology & Hepatology:

http://www.ncbi.nlm.nih.gov/pubmed/24276492

The conclusion of that study, based on a review of 39 articles by three assessors, was that

—o0o—

"Wikipedia is not a reliable source of information for medical students searching for gastroenterology and hepatology articles. Several limitations, deficiencies, and scientific errors have been identified in the articles examined."

—o0o—

There was also this study, concluding that Wikipedia was "fairly reliable":

http://onlinelibrary.wiley.com/doi/10.1111/sdi.12059/full

But they say their reliability assessment was based on a simple count of references, without reviewing the actual article content for accuracy!

—o0o—

Assessment of Reliability

The reliability of nephrology articles in Wikipedia was determined in two ways: (i) mean number of references per article, and (ii) mean percentage of “substantiated” references—which we defined as references corresponding to works published in peer-reviewed journals or from texts with an associated International Standard Book Number (ISBN). —o0o—

As far as I am aware from discussions with members of Wiki Project Med Foundation/WikiProject Medicine, all the studies that have been done to date suffer from small sample size or other methodological limitations. There is a real gap for a large-scale, well-designed study of a random subset of Wikipedia's most-consulted medical articles.*

If Wikipedia wants to be serious about its mission, measuring article quality is a must. Given how widely Wikipedia is used today, it is also a question of social responsibility.

I imagine the reluctance I sense in this discussion is in part due to people's fear that the results might be less than stellar. If so, that fear is misplaced. There is no improvement without performance feedback. If the results are indeed disappointing, the related publicity should lead to an increased focus on improvement efforts, and indeed may encourage a greater influx of better-qualified editors if such are needed.

The project is well into its second decade. It is mature and well-established enough for such a test. It's a question of taking a long-term view. Long-term improvement will be accelerated by honest, knowledgeable feedback.

Studies could be repeated at annual intervals, to track progress and measure improvement. I don't believe there is any other way of arriving at a reliable reference source, which after all is what this entire effort should be about.

* Some ongoing related discussions at WikiProject Medicine here:

https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Medicine#This_conve...

Risker

10:41 p.m.

On 7 May 2014 22:24, Andreas Kolbe jayen466@gmail.com wrote:

...

On Thu, May 8, 2014 at 2:41 AM, Risker risker.wp@gmail.com wrote:

...
I think perhaps there is a lack of research into the extent of research already being done by independent, qualified third parties. Several examples are provided in the references of the study you posted, Andreas. For example, this one in the Journal of Oncology Practice[1] compares specific Wikipedia articles for patient-oriented cancer information

against

...
the professionally edited PDQ database. It appears that the two were comparable in most areas, except for readability, where the PDQ database was considered significantly more readable. Now, again, this is a small study and it has not been reproduced; however, it took me minutes to find more information on the very subject you're interested in, created by completely independent bodies who have "no pony in the race". There did seem to be a fair number of studies related to medical topics. Now if

only

...
we could learn from them - especially on the readability point, which I think really is a very serious issue. Wikipedia isn't really intended to educate physicians about medical topics, it's intended to be a general reference for non-specialists.

Very few people are going to make life-and-death decisions based on our math or physics topic areas, but I'll lay odds that any study would find

a

...
significant readability issue with both of them, as well.

Risker/Anne

[1] http://jop.ascopubs.org/content/7/5/319.full

In the study you reference, Anne, reviewers spent all of 18 minutes on each article. The readability analysis was done by automation.

Yes, of course readability analysis is done by automation. I've yet to find a consistent readability assessment that doesn't use automation. It's not an area where subjectivity is particularly useful.

And that was an average of 18 minutes per article, i.e., 36 minutes: 18 minutes for the WP article and 18 minutes for the PDQ article. How long do you really think it should take? I read several of the articles in under 5 minutes on each site. Of course, the reviewers wouldn't need to look up the definitions of a lot of the terms that lay people would need to look at, because they were already professionally educated in the topic area, so that would significantly reduce the amount of time required to assess the article.

Andreas, you seem to have pre-determined that Wikipedia's medical articles are all terrible and riddled with errors. Realistically, they're amongst the most likely to receive professional editing and review - Wikiproject Medicine does a much better job than people are willing to credit them. The biggest weakness to the articles - and I've heard this from many people who read them - is that they're written at too high a level to be really accessible to lay people. I thought the point that the study made about the benefit of linking to an "English" dictionary definition of complex terms rather than to another highly technical Wikipedia article was a very good one, for example. We could learn from these studies.

Indeed, many science articles are mainly written by professionals in the field (I noted math and physics earlier, but chemistry and of course a large number of computer articles are also written by professionals). The biggest challenge for these subjects is to write them in an accessible way. Note, I said "science" - alternative medicine, history, geopolitical and "soft science" articles are much more problematic.

Risker/Anne

Andreas Kolbe

11:05 p.m.

On Thu, May 8, 2014 at 3:41 AM, Risker risker.wp@gmail.com wrote:

...

Yes, of course readability analysis is done by automation. I've yet to find a consistent readability assessment that doesn't use automation. It's not an area where subjectivity is particularly useful.

And that was an average of 18 minutes per article, i.e., 36 minutes: 18 minutes for the WP article and 18 minutes for the PDQ article. How long do you really think it should take? I read several of the articles in under 5 minutes on each site. Of course, the reviewers wouldn't need to look up the definitions of a lot of the terms that lay people would need to look at, because they were already professionally educated in the topic area, so that would significantly reduce the amount of time required to assess the article.

It took me more than 18 minutes to write the last e-mail in this thread. :)

The lung cancer article, for example, which was among those reviewed, has well over 4,500 words of prose, and cites 141 references. That's a reviewing speed of 250 words per minute. I don't know if you have ever done an FA review ...

...

Andreas, you seem to have pre-determined that Wikipedia's medical articles are all terrible and riddled with errors.

And I think you are being needlessly defensive. I have an open mind as to what the results might be. What I am sure of is that neither you nor I nor the Foundation really know how reliable they are. Why not make an effort to find out?

...

Realistically, they're amongst the most likely to receive professional editing and review - Wikiproject Medicine does a much better job than people are willing to credit them.

Yes, and many editors there are sorely concerned about the quality of medical information Wikipedia provides to the public.

Incidentally, there was a discussion of the JAOA study in The Atlantic today:

http://www.theatlantic.com/health/archive/2014/05/can-wikipedia-ever-be-a-de...

A member of WikiProject Medicine is quoted in it, as is the study's author.

—o0o—

So both sides acknowledge: There are errors in Wikipedia’s health articles. And that’s a problem, because people use them.

—o0o—

...

The biggest weakness to the articles - and I've heard this from many people who read them - is that they're written at too high a level to be really accessible to lay people. I thought the point that the study made about the benefit of linking to an "English" dictionary definition of complex terms rather than to another highly technical Wikipedia article was a very good one, for example. We could learn from these studies.

Indeed, many science articles are mainly written by professionals in the field (I noted math and physics earlier, but chemistry and of course a large number of computer articles are also written by professionals). The biggest challenge for these subjects is to write them in an accessible way. Note, I said "science" - alternative medicine, history, geopolitical and "soft science" articles are much more problematic.

Risker/Anne _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Anthony Cole

8 May 8 May

1:13 a.m.

Wow.

Wil - you're going to love WikiData.

Phoebe: I have seen that list of peer-reviewed articles related to Wikipedia medical content. I've extracted those related to quality and added more from a couple of database searches I did in January and the list of 42 (some are letters and there's a conference abstract, though) are collapsed on the WikiProject Medicine talk page now under the heading, "This thread is notable."

I've read most but not all of those and, as Andreas mentioned, most of those suffered from small sample size and poor or opaque sample selection criteria.

Erik, thank you for pointing to the "reviewer" trial. I had read it before and I'm glad to have this opportunity to tell you how much I love it. There is a big hole in Wikipedia where expert reviewing belongs.

I'm presently on the board of WikiProject Med Foundation, but will be stepping down after Wikimania. I mostly edit medical content. Anne is right, it is heavily curated. But stuff slips through the net of patrollers from time to time, and barely a day goes by without some howler of a long-term problem coming to light.

I would like to know - know, rather than rely on my gut feeling - how accurate our medical content is. To know that, I think the first step would be to get an expert on scientific study design to review the 30-40 existing studies that address the quality of our medical content, and tell us what, if anything, we can take from that prior work - essentially what Anne recommends above, but rather than making my own incompetent and heavily biased assessment, get an expert to do it.

My own, inexpert, belief is that those studies are (mostly) so hopelessly flawed that nothing can seriously be generalised from them. If I'm right, I'd then like us all to consider seriously doing a survey whose design is sufficiently rigorous to give us an answer.

Thanks for your thoughts and attention everyone.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

On Thu, May 8, 2014 at 11:05 AM, Andreas Kolbe jayen466@gmail.com wrote:

...

On Thu, May 8, 2014 at 3:41 AM, Risker risker.wp@gmail.com wrote:

...
Yes, of course readability analysis is done by automation. I've yet to find a consistent readability assessment that doesn't use automation.

It's

...
not an area where subjectivity is particularly useful.

And that was an average of 18 minutes per article, i.e., 36 minutes: 18 minutes for the WP article and 18 minutes for the PDQ article. How long

do

...
you really think it should take? I read several of the articles in

under 5

...
minutes on each site. Of course, the reviewers wouldn't need to look up the definitions of a lot of the terms that lay people would need to look at, because they were already professionally educated in the topic area,

so

...
that would significantly reduce the amount of time required to assess the article.

It took me more than 18 minutes to write the last e-mail in this thread. :)

The lung cancer article, for example, which was among those reviewed, has well over 4,500 words of prose, and cites 141 references. That's a reviewing speed of 250 words per minute. I don't know if you have ever done an FA review ...

...
Andreas, you seem to have pre-determined that Wikipedia's medical

articles

...
are all terrible and riddled with errors.

And I think you are being needlessly defensive. I have an open mind as to what the results might be. What I am sure of is that neither you nor I nor the Foundation really know how reliable they are. Why not make an effort to find out?

...
Realistically, they're amongst the most likely to receive professional editing and review - Wikiproject Medicine does a much better job than people are willing to credit them.

Yes, and many editors there are sorely concerned about the quality of medical information Wikipedia provides to the public.

Incidentally, there was a discussion of the JAOA study in The Atlantic today:

http://www.theatlantic.com/health/archive/2014/05/can-wikipedia-ever-be-a-de...

A member of WikiProject Medicine is quoted in it, as is the study's author.

—o0o—

So both sides acknowledge: There are errors in Wikipedia’s health articles. And that’s a problem, because people use them.

—o0o—

...
The biggest weakness to the articles - and I've heard this from many

people

...
who read them - is that they're written at too high a level to be really accessible to lay people. I thought the point that the study made about the benefit of linking to an "English" dictionary definition of complex terms rather than to another highly technical Wikipedia article was a

very

...
good one, for example. We could learn from these studies.

Indeed, many science articles are mainly written by professionals in the field (I noted math and physics earlier, but chemistry and of course a large number of computer articles are also written by professionals).

The

...
biggest challenge for these subjects is to write them in an accessible

way.

...
Note, I said "science" - alternative medicine, history, geopolitical and "soft science" articles are much more problematic.

Risker/Anne _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Risker

1:44 a.m.

In answer to the question of the WMF funding research: https://meta.wikimedia.org/wiki/Research:FAQ

Risker/Anne

On 8 May 2014 01:13, Anthony Cole ahcoleecu@gmail.com wrote:

...

Wow.

Wil - you're going to love WikiData.

Phoebe: I have seen that list of peer-reviewed articles related to Wikipedia medical content. I've extracted those related to quality and added more from a couple of database searches I did in January and the list of 42 (some are letters and there's a conference abstract, though) are collapsed on the WikiProject Medicine talk page now under the heading, "This thread is notable."

I've read most but not all of those and, as Andreas mentioned, most of those suffered from small sample size and poor or opaque sample selection criteria.

Erik, thank you for pointing to the "reviewer" trial. I had read it before and I'm glad to have this opportunity to tell you how much I love it. There is a big hole in Wikipedia where expert reviewing belongs.

I'm presently on the board of WikiProject Med Foundation, but will be stepping down after Wikimania. I mostly edit medical content. Anne is right, it is heavily curated. But stuff slips through the net of patrollers from time to time, and barely a day goes by without some howler of a long-term problem coming to light.

I would like to know - know, rather than rely on my gut feeling - how accurate our medical content is. To know that, I think the first step would be to get an expert on scientific study design to review the 30-40 existing studies that address the quality of our medical content, and tell us what, if anything, we can take from that prior work - essentially what Anne recommends above, but rather than making my own incompetent and heavily biased assessment, get an expert to do it.

My own, inexpert, belief is that those studies are (mostly) so hopelessly flawed that nothing can seriously be generalised from them. If I'm right, I'd then like us all to consider seriously doing a survey whose design is sufficiently rigorous to give us an answer.

Thanks for your thoughts and attention everyone.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

On Thu, May 8, 2014 at 11:05 AM, Andreas Kolbe jayen466@gmail.com wrote:

...
On Thu, May 8, 2014 at 3:41 AM, Risker risker.wp@gmail.com wrote:

...
Yes, of course readability analysis is done by automation. I've yet to find a consistent readability assessment that doesn't use automation.

It's

...
not an area where subjectivity is particularly useful.

And that was an average of 18 minutes per article, i.e., 36 minutes: 18 minutes for the WP article and 18 minutes for the PDQ article. How

long

...
do

...
you really think it should take? I read several of the articles in

under 5

...
minutes on each site. Of course, the reviewers wouldn't need to look

up

...
...
the definitions of a lot of the terms that lay people would need to

look

...
...
at, because they were already professionally educated in the topic

area,

...
so

...
that would significantly reduce the amount of time required to assess

the

...
...
article.

It took me more than 18 minutes to write the last e-mail in this thread.

:)

...
The lung cancer article, for example, which was among those reviewed, has well over 4,500 words of prose, and cites 141 references. That's a reviewing speed of 250 words per minute. I don't know if you have ever

done

...
an FA review ...

...
Andreas, you seem to have pre-determined that Wikipedia's medical

articles

...
are all terrible and riddled with errors.

And I think you are being needlessly defensive. I have an open mind as to what the results might be. What I am sure of is that neither you nor I

nor

...
the Foundation really know how reliable they are. Why not make an effort

to

...
find out?

...
Realistically, they're amongst the most likely to receive professional editing and review -

Wikiproject

...
...
Medicine does a much better job than people are willing to credit them.

Yes, and many editors there are sorely concerned about the quality of medical information Wikipedia provides to the public.

Incidentally, there was a discussion of the JAOA study in The Atlantic today:

http://www.theatlantic.com/health/archive/2014/05/can-wikipedia-ever-be-a-de...

...
A member of WikiProject Medicine is quoted in it, as is the study's

author.

...
—o0o—

So both sides acknowledge: There are errors in Wikipedia’s health

articles.

...
And that’s a problem, because people use them.

—o0o—

...
The biggest weakness to the articles - and I've heard this from many

people

...
who read them - is that they're written at too high a level to be

really

...
...
accessible to lay people. I thought the point that the study made

about

...
...
the benefit of linking to an "English" dictionary definition of complex terms rather than to another highly technical Wikipedia article was a

very

...
good one, for example. We could learn from these studies.

Indeed, many science articles are mainly written by professionals in

the

...
...
field (I noted math and physics earlier, but chemistry and of course a large number of computer articles are also written by professionals).

The

...
biggest challenge for these subjects is to write them in an accessible

way.

...
Note, I said "science" - alternative medicine, history, geopolitical

and

...
...
"soft science" articles are much more problematic.

Risker/Anne _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Anthony Cole

2:41 a.m.

Thanks Anne.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

On Thu, May 8, 2014 at 1:44 PM, Risker risker.wp@gmail.com wrote:

...

In answer to the question of the WMF funding research: https://meta.wikimedia.org/wiki/Research:FAQ

Risker/Anne

On 8 May 2014 01:13, Anthony Cole ahcoleecu@gmail.com wrote:

...
Wow.

Wil - you're going to love WikiData.

Phoebe: I have seen that list of peer-reviewed articles related to Wikipedia medical content. I've extracted those related to quality and added more from a couple of database searches I did in January and the

list

...
of 42 (some are letters and there's a conference abstract, though) are collapsed on the WikiProject Medicine talk page now under the heading, "This thread is notable."

I've read most but not all of those and, as Andreas mentioned, most of those suffered from small sample size and poor or opaque sample selection criteria.

Erik, thank you for pointing to the "reviewer" trial. I had read it

before

...
and I'm glad to have this opportunity to tell you how much I love it.

There

...
is a big hole in Wikipedia where expert reviewing belongs.

I'm presently on the board of WikiProject Med Foundation, but will be stepping down after Wikimania. I mostly edit medical content. Anne is right, it is heavily curated. But stuff slips through the net of

patrollers

...
from time to time, and barely a day goes by without some howler of a long-term problem coming to light.

I would like to know - know, rather than rely on my gut feeling - how accurate our medical content is. To know that, I think the first step

would

...
be to get an expert on scientific study design to review the 30-40

existing

...
studies that address the quality of our medical content, and tell us

what,

...
if anything, we can take from that prior work - essentially what Anne recommends above, but rather than making my own incompetent and heavily biased assessment, get an expert to do it.

My own, inexpert, belief is that those studies are (mostly) so hopelessly flawed that nothing can seriously be generalised from them. If I'm right, I'd then like us all to consider seriously doing a survey whose design is sufficiently rigorous to give us an answer.

Thanks for your thoughts and attention everyone.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

On Thu, May 8, 2014 at 11:05 AM, Andreas Kolbe jayen466@gmail.com

wrote:

...
...
On Thu, May 8, 2014 at 3:41 AM, Risker risker.wp@gmail.com wrote:

...
Yes, of course readability analysis is done by automation. I've yet

to

...
...
...
find a consistent readability assessment that doesn't use automation.

It's

...
not an area where subjectivity is particularly useful.

And that was an average of 18 minutes per article, i.e., 36 minutes:

18

...
...
...
minutes for the WP article and 18 minutes for the PDQ article. How

long

...
do

...
you really think it should take? I read several of the articles in

under 5

...
minutes on each site. Of course, the reviewers wouldn't need to look

up

...
...
the definitions of a lot of the terms that lay people would need to

look

...
...
at, because they were already professionally educated in the topic

area,

...
so

...
that would significantly reduce the amount of time required to assess

the

...
...
article.

It took me more than 18 minutes to write the last e-mail in this

thread.

...
:)

...
The lung cancer article, for example, which was among those reviewed,

has

...
...
well over 4,500 words of prose, and cites 141 references. That's a reviewing speed of 250 words per minute. I don't know if you have ever

done

...
an FA review ...

...
Andreas, you seem to have pre-determined that Wikipedia's medical

articles

...
are all terrible and riddled with errors.

And I think you are being needlessly defensive. I have an open mind as

to

...
...
what the results might be. What I am sure of is that neither you nor I

nor

...
the Foundation really know how reliable they are. Why not make an

effort

...
to

...
find out?

...
Realistically, they're amongst the most likely to receive professional editing and review -

Wikiproject

...
...
Medicine does a much better job than people are willing to credit

them.

...
...
...
Yes, and many editors there are sorely concerned about the quality of medical information Wikipedia provides to the public.

Incidentally, there was a discussion of the JAOA study in The Atlantic today:

http://www.theatlantic.com/health/archive/2014/05/can-wikipedia-ever-be-a-de...

...
...
A member of WikiProject Medicine is quoted in it, as is the study's

author.

...
—o0o—

So both sides acknowledge: There are errors in Wikipedia’s health

articles.

...
And that’s a problem, because people use them.

—o0o—

...
The biggest weakness to the articles - and I've heard this from many

people

...
who read them - is that they're written at too high a level to be

really

...
...
accessible to lay people. I thought the point that the study made

about

...
...
the benefit of linking to an "English" dictionary definition of

complex

...
...
...
terms rather than to another highly technical Wikipedia article was a

very

...
good one, for example. We could learn from these studies.

Indeed, many science articles are mainly written by professionals in

the

...
...
field (I noted math and physics earlier, but chemistry and of course

a

...
...
...
large number of computer articles are also written by professionals).

The

...
biggest challenge for these subjects is to write them in an

accessible

...
...
way.

...
Note, I said "science" - alternative medicine, history, geopolitical

and

...
...
"soft science" articles are much more problematic.

Risker/Anne _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

...
...
...
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

rupert THURNER

3:02 a.m.

...

...
Andreas, you seem to have pre-determined that Wikipedia's medical

articles

...

...
are all terrible and riddled with errors.

And I think you are being needlessly defensive. I have an open mind as to what the results might be. What I am sure of is that neither you nor I nor the Foundation really know how reliable they are. Why not make an effort

...

find out?

Anybody interested can do it. Now. Anybody interested can improve it. Now. Why it does not happen? It happened for other domains as well.

In my experience there is only one single measure to improve quality: point out the single error which cam be corrected. If you can propose a system, either human or automatic, to do this, feel free.

What imo is the bigger problem: many medical articles are written in a language a mortal cannot understand any more.

...

...
Realistically, they're amongst the most likely to receive professional editing and review - Wikiproject Medicine does a much better job than people are willing to credit them.

Yes, and many editors there are sorely concerned about the quality of medical information Wikipedia provides to the public.

This is the core value of wikipedia since its beginnings: provide a big enough gap to fill.

...

Incidentally, there was a discussion of the JAOA study in The Atlantic today:

http://www.theatlantic.com/health/archive/2014/05/can-wikipedia-ever-be-a-de...

...

A member of WikiProject Medicine is quoted in it, as is the study's

author.

...

—o0o—

So both sides acknowledge: There are errors in Wikipedia’s health

articles.

...

And that’s a problem, because people use them.

Internet literacy includes learning beeing sceptical on what you read i guess .... Wikipedia is not Jesus and never will be, in no domain :)

Rupert

Anthony Cole

3:17 a.m.

Regarding expert review, Doc James has just announced that a version of Wikipedia's article "Dengue fever" has passed peer review and been accepted for publication by the journal Open Medicine. I think this is a special moment.

https://en.wikipedia.org/wiki/Wikipedia_talk:MED#This_conversation_is_notabl...

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

On Thu, May 8, 2014 at 3:02 PM, rupert THURNER rupert.thurner@gmail.comwrote:

...

...
...
Andreas, you seem to have pre-determined that Wikipedia's medical

articles

...
...
are all terrible and riddled with errors.

And I think you are being needlessly defensive. I have an open mind as to what the results might be. What I am sure of is that neither you nor I

nor

...
the Foundation really know how reliable they are. Why not make an effort

to

...
find out?

Anybody interested can do it. Now. Anybody interested can improve it. Now. Why it does not happen? It happened for other domains as well.

In my experience there is only one single measure to improve quality: point out the single error which cam be corrected. If you can propose a system, either human or automatic, to do this, feel free.

What imo is the bigger problem: many medical articles are written in a language a mortal cannot understand any more.

...
...
Realistically, they're amongst the most likely to receive professional editing and review -

Wikiproject

...
...
Medicine does a much better job than people are willing to credit them.

Yes, and many editors there are sorely concerned about the quality of medical information Wikipedia provides to the public.

This is the core value of wikipedia since its beginnings: provide a big enough gap to fill.

...
Incidentally, there was a discussion of the JAOA study in The Atlantic today:

http://www.theatlantic.com/health/archive/2014/05/can-wikipedia-ever-be-a-de...

...
A member of WikiProject Medicine is quoted in it, as is the study's

author.

...
—o0o—

So both sides acknowledge: There are errors in Wikipedia’s health

articles.

...
And that’s a problem, because people use them.

Internet literacy includes learning beeing sceptical on what you read i guess .... Wikipedia is not Jesus and never will be, in no domain :)

Rupert _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Andreas Kolbe

6 a.m.

On Thu, May 8, 2014 at 8:17 AM, Anthony Cole ahcoleecu@gmail.com wrote:

...

Regarding expert review, Doc James has just announced that a version of Wikipedia's article "Dengue fever" has passed peer review and been accepted for publication by the journal Open Medicine. I think this is a special moment.

https://en.wikipedia.org/wiki/Wikipedia_talk:MED#This_conversation_is_notabl...

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

The article will apparently be listed on PubMed. That's indeed an achievement to be proud of. Well done!

There was a discussion earlier in this thread about the likely quality of Wikipedia's medical articles, and the curatorial work of WikiProject Medicine.

I note that in the same post in which Doc James announced this on en:WP, he also said:

---o0o---

How good is our content? Having looked at much of it I have an okay idea. We have about 100-200 high or excellent quality medical articles. We have about 20,000 that are short and just starting out. We have a couple thousand that are okay ish. We have another few hundred to maybe few thousand or so that are a complete disaster. So in summary article quality is variable with a randomly selected article likely to be of moderate to low quality.

---o0o---

Given his qualifications and his longstanding work in WikiProject Medicine, James' guess is probably better than most. But it's not something you could cite.

...

On Thu, May 8, 2014 at 3:02 PM, rupert THURNER <rupert.thurner@gmail.com

...
wrote:

...
...
...
Andreas, you seem to have pre-determined that Wikipedia's medical

articles

...
...
are all terrible and riddled with errors.

And I think you are being needlessly defensive. I have an open mind as

to

...
...
what the results might be. What I am sure of is that neither you nor I

nor

...
the Foundation really know how reliable they are. Why not make an

effort

...
to

...
find out?

Anybody interested can do it. Now. Anybody interested can improve it.

Now.

...
Why it does not happen? It happened for other domains as well.

In my experience there is only one single measure to improve quality:

point

...
out the single error which cam be corrected. If you can propose a system, either human or automatic, to do this, feel free.

What imo is the bigger problem: many medical articles are written in a language a mortal cannot understand any more.

...
...
Realistically, they're amongst the most likely to receive professional editing and review -

Wikiproject

...
...
Medicine does a much better job than people are willing to credit

them.

...
...
...
Yes, and many editors there are sorely concerned about the quality of medical information Wikipedia provides to the public.

This is the core value of wikipedia since its beginnings: provide a big enough gap to fill.

...
Incidentally, there was a discussion of the JAOA study in The Atlantic today:

http://www.theatlantic.com/health/archive/2014/05/can-wikipedia-ever-be-a-de...

...
...
A member of WikiProject Medicine is quoted in it, as is the study's

author.

...
—o0o—

So both sides acknowledge: There are errors in Wikipedia’s health

articles.

...
And that’s a problem, because people use them.

Internet literacy includes learning beeing sceptical on what you read i guess .... Wikipedia is not Jesus and never will be, in no domain :)

Rupert _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Andrew Gray

5:29 p.m.

On 8 May 2014 01:56, Andreas Kolbe jayen466@gmail.com wrote:

...

(However, this study does not seem to have been based on a random sample – at least I cannot find any mention of the sample selection method in the study's write-up. The selection of a random sample is key to any such effort, and the method used to select the sample should be described in detail in any resulting report.)

https://meta.wikimedia.org/wiki/File:EPIC_Oxford_report.pdf

Section 3.3 of the report covers article selection. They went about it backwards (at least, backwards to the way you might expect) - recruiting reviewers and then manually identifying relevant articles, as the original goal was to use relevant topics for individual specialists.

Even this selective method didn't work as well as might be hoped, because the mechanism of the study required a minimum level of content - the articles had to be substantial enough to be useful for a comparison, and of sufficient length and comparable scope in both sets of sources - which ruled out many of the initial selections.

(This is a key point to remember: the study effectively assesses the quality of a subset of "developed" articles in Wikipedia, rather than the presumably less-good fragmentary ones. It's a valid question to ask, but not always the one people think it's answering...)

"Thus the selection of articles was constrained by two important factors: one, the need to find topics appropriate for the academics whom we were able to recruit to the project; secondly, that articles from different online encyclopaedias were of comparable substance and focus. (Such factors would need to be taken carefully into account when embarking on a future large-scale study, where the demands of finding large numbers of comparable articles are likely to be considerable.)"

You'd need to adopt a fairly different methodology if you wanted a random sampling; I suppose you could prefilter a sample by "likely to be suitable" metrics (eg minimum size, article title matching a title list from the other reference works) and randomly select from within *those*, but of course you would still have the fundamental issue that you're essentially reviewing a selected portion of the project.

-- - Andrew Gray andrew.gray@dunelm.org.uk

Asaf Bartov

5:40 p.m.

I will just add that I agree [content] Quality is a strategic goal we have made little systematic progress on (much progress was made in sheer coverage, of course, e.g. via funding and support for content-centered initiatives such as writing and photo competitions).

In the Grantmaking department, we are definitely interested in ideas and proposals to improve (and measure) quality of content on the projects, and not just of outside assessors, but also internal approaches to assessing quality (serving different purposes, to be sure).

I encourage those with concrete ideas to propose something, perhaps in the IdeaLab[1], for further development and refinement.

Asaf

[1] https://meta.wikimedia.org/wiki/Grants:IdeaLab

-- Asaf Bartov Wikimedia Foundation http://www.wikimediafoundation.org Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us make it a reality! https://donate.wikimedia.org

edward

5:43 p.m.

On 08/05/2014 22:29, Andrew Gray wrote:

...

...
Section 3.3 of the report covers article selection. They went about it

backwards (at least, backwards to the way you might expect) - recruiting reviewers and then manually identifying relevant articles, as the original goal was to use relevant topics for individual specialists.

...

...
Even this selective method didn't work as well as might be hoped,

because the mechanism of the study required a minimum level of content - the articles had to be substantial enough to be useful for a comparison, and of sufficient length and comparable scope in both sets of sources - which ruled out many of the initial selections.

After it was published I emailed both the epic and the Oxford team to understand why they chose the articles they did. I was unable to get a satisfactory answer.

The method of selecting the most notable philosopher-theologians from a certain period is a good one. There is no reason it has to be random, so long as there is a clearly defined selection method. However, they were unable to explain why of the most notable subjects, they chose Aquinas and Anselm. I suspect there was a selection bias, as those were the articles which 'looked' the best. (The ones on Ockham and Scotus were so obviously vandalised that even a novice would have spotted the problem).

Even then, as I have already pointed out above, they missed the fact that the Anselm article was plagiarised from Britannica 1911, so that instead of comparing Britannica to Wikiepedia, they were comparing Britannica 2011 with Britannica 1911. And they missed some bad errors that had been introduced by Wikipedia editors when they attempted to modernise the old Britannica prose.

To give a simple example that even Geni will have to concede is not 'subjectively wrong', the Wikipedia article on Anselm said

"Anselm wrote many proofs within Monologion and Proslogion. In the first proof, Anselm relies on the ordinary grounds of realism, which coincide to some extent with the theory of Augustine."

This is a mangled version of the B1911 which reads

"This demonstration is the substance of the Monologion and Proslogion. In the first of these the proof rests on the ordinary grounds of realism"

You see what went wrong? 'first of these' should refer to the first book, namely Monologion. But one editor removed ""This demonstration is the substance of the Monologion and Proslogion" as being too difficult for ordinary readers, leaving 'first of these'. Another editor came along and thought it referred to the first proof. This is quite incorrect.

I am still amazed the Oxford team didn't spot this. Even if you don't know the article was lifted from B1911, the oddity of the assertion should have rung alarm bells. There are about 9 other mistakes of differing severity.

On 08/05/2014 22:29, Andrew Gray wrote:

...

On 8 May 2014 01:56, Andreas Kolbe jayen466@gmail.com wrote:

...
(However, this study does not seem to have been based on a random sample – at least I cannot find any mention of the sample selection method in the study's write-up. The selection of a random sample is key to any such effort, and the method used to select the sample should be described in detail in any resulting report.)

https://meta.wikimedia.org/wiki/File:EPIC_Oxford_report.pdf

Section 3.3 of the report covers article selection. They went about it backwards (at least, backwards to the way you might expect) - recruiting reviewers and then manually identifying relevant articles, as the original goal was to use relevant topics for individual specialists.

Even this selective method didn't work as well as might be hoped, because the mechanism of the study required a minimum level of content

the articles had to be substantial enough to be useful for a

comparison, and of sufficient length and comparable scope in both sets of sources - which ruled out many of the initial selections.

(This is a key point to remember: the study effectively assesses the quality of a subset of "developed" articles in Wikipedia, rather than the presumably less-good fragmentary ones. It's a valid question to ask, but not always the one people think it's answering...)

"Thus the selection of articles was constrained by two important factors: one, the need to find topics appropriate for the academics whom we were able to recruit to the project; secondly, that articles from different online encyclopaedias were of comparable substance and focus. (Such factors would need to be taken carefully into account when embarking on a future large-scale study, where the demands of finding large numbers of comparable articles are likely to be considerable.)"

You'd need to adopt a fairly different methodology if you wanted a random sampling; I suppose you could prefilter a sample by "likely to be suitable" metrics (eg minimum size, article title matching a title list from the other reference works) and randomly select from within *those*, but of course you would still have the fundamental issue that you're essentially reviewing a selected portion of the project.

Stevie Benton

9 May 9 May

6:17 a.m.

Hello everyone,

I think Wikimedia UK has an example project, related to medical articles, that may be of interest. John Byrne is the Wikimedian in Residence at Cancer Research UK, one of the UK's largest charities. He's put together the below message but isn't subscribed to this list so can't post. I am posting on his behalf. I'm happy to answer any questions about this and those I can't, I shall pass on to John.

Thanks and regards,

Stevie

John's message:

Cancer Research UK (CRUK), the world’s largest cancer research charity, have just taken me on as Wikipedian in Residence until mid-December 2014 (4/5 part time).

Parts of the plan for the role are very relevant to this thread. We are aiming to improve WP articles on cancer to ensure they are accurate, up-to-date and accessible to the full range of WP’s readership, working closely with the existing English WP medical editing community, many of whom have already been supportive of the project. With the medical translation project also underway, this is great timing for us to improve important content across large numbers of language versions.

We will be able to draw upon the expertise of both the medical research staff funded by CRUK (over 4000 in the UK) and the various kinds of staff they have with professional expertise in writing for a range of audiences, from patients to scientists ( their editorial policyhttp://www.cancerresearchuk.org/cancer-help/utilities/about-cancerhelp-uk/cancerhelp-uk-policies/editorial-policy/ ).

We are also planning to do research with the public into what they think of specific WP articles, perhaps before and after improvement, and into how they use WP and other sites at the top of search pages when looking for medical information on the internet. There has been little research into this area, and the results should be very useful in focusing the ways medical content generally can be improved.

The CRUK position is funded by the Wellcome Trust and supported by Wikimedia UK, and the budget includes an element for this research. I will be https://en.wikipedia.org/wiki/User:Wiki_CRUK_John in this role (Usually https://en.wikipedia.org/wiki/User:Johnbod. Until early July I will also continue my role (1/5) as Wikimedian in Residence at the Royal Society, the UK’s National Academy for the Sciences)

John Byrne

On 8 May 2014 22:43, edward edward@logicmuseum.com wrote:

...

On 08/05/2014 22:29, Andrew Gray wrote:

...
Section 3.3 of the report covers article selection. They went about it

...
backwards (at least, backwards to the way you might expect) -

recruiting reviewers and then manually identifying relevant articles, as the original goal was to use relevant topics for individual specialists.

Even this selective method didn't work as well as might be hoped,

...
...
because the mechanism of the study required a minimum level of content

the articles had to be substantial enough to be useful for a

comparison, and of sufficient length and comparable scope in both sets of sources - which ruled out many of the initial selections.

After it was published I emailed both the epic and the Oxford team to understand why they chose the articles they did. I was unable to get a satisfactory answer.

The method of selecting the most notable philosopher-theologians from a certain period is a good one. There is no reason it has to be random, so long as there is a clearly defined selection method. However, they were unable to explain why of the most notable subjects, they chose Aquinas and Anselm. I suspect there was a selection bias, as those were the articles which 'looked' the best. (The ones on Ockham and Scotus were so obviously vandalised that even a novice would have spotted the problem).

Even then, as I have already pointed out above, they missed the fact that the Anselm article was plagiarised from Britannica 1911, so that instead of comparing Britannica to Wikiepedia, they were comparing Britannica 2011 with Britannica 1911. And they missed some bad errors that had been introduced by Wikipedia editors when they attempted to modernise the old Britannica prose.

To give a simple example that even Geni will have to concede is not 'subjectively wrong', the Wikipedia article on Anselm said

"Anselm wrote many proofs within Monologion and Proslogion. In the first proof, Anselm relies on the ordinary grounds of realism, which coincide to some extent with the theory of Augustine."

This is a mangled version of the B1911 which reads

"This demonstration is the substance of the Monologion and Proslogion. In the first of these the proof rests on the ordinary grounds of realism"

You see what went wrong? 'first of these' should refer to the first book, namely Monologion. But one editor removed ""This demonstration is the substance of the Monologion and Proslogion" as being too difficult for ordinary readers, leaving 'first of these'. Another editor came along and thought it referred to the first proof. This is quite incorrect.

I am still amazed the Oxford team didn't spot this. Even if you don't know the article was lifted from B1911, the oddity of the assertion should have rung alarm bells. There are about 9 other mistakes of differing severity.

On 08/05/2014 22:29, Andrew Gray wrote:

...
On 8 May 2014 01:56, Andreas Kolbe jayen466@gmail.com wrote:

(However, this study does not seem to have been based on a random sample

...
– at least I cannot find any mention of the sample selection method in the study's write-up. The selection of a random sample is key to any such effort, and the method used to select the sample should be described in detail in any resulting report.)

https://meta.wikimedia.org/wiki/File:EPIC_Oxford_report.pdf

Section 3.3 of the report covers article selection. They went about it backwards (at least, backwards to the way you might expect) - recruiting reviewers and then manually identifying relevant articles, as the original goal was to use relevant topics for individual specialists.

Even this selective method didn't work as well as might be hoped, because the mechanism of the study required a minimum level of content

the articles had to be substantial enough to be useful for a

comparison, and of sufficient length and comparable scope in both sets of sources - which ruled out many of the initial selections.

(This is a key point to remember: the study effectively assesses the quality of a subset of "developed" articles in Wikipedia, rather than the presumably less-good fragmentary ones. It's a valid question to ask, but not always the one people think it's answering...)

"Thus the selection of articles was constrained by two important factors: one, the need to find topics appropriate for the academics whom we were able to recruit to the project; secondly, that articles from different online encyclopaedias were of comparable substance and focus. (Such factors would need to be taken carefully into account when embarking on a future large-scale study, where the demands of finding large numbers of comparable articles are likely to be considerable.)"

You'd need to adopt a fairly different methodology if you wanted a random sampling; I suppose you could prefilter a sample by "likely to be suitable" metrics (eg minimum size, article title matching a title list from the other reference works) and randomly select from within *those*, but of course you would still have the fundamental issue that you're essentially reviewing a selected portion of the project.

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

-- Stevie Benton Head of External Relations Wikimedia UK +44 (0) 20 7065 0993 / +44 (0) 7803 505 173 @StevieBenton Wikimedia UK is a Company Limited by Guarantee registered in England and Wales, Registered No. 6741827. Registered Charity No.1144513. Registered Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT. United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia movement. The Wikimedia projects are run by the Wikimedia Foundation (who operate Wikipedia, amongst other projects). *Wikimedia UK is an independent non-profit charity with no legal control over Wikipedia nor responsibility for its contents.*

Andreas Kolbe

9:48 a.m.

On Fri, May 9, 2014 at 11:17 AM, Stevie Benton < stevie.benton@wikimedia.org.uk> wrote:

...

Hello everyone,

I think Wikimedia UK has an example project, related to medical articles, that may be of interest. John Byrne is the Wikimedian in Residence at Cancer Research UK, one of the UK's largest charities. He's put together the below message but isn't subscribed to this list so can't post. I am posting on his behalf. I'm happy to answer any questions about this and those I can't, I shall pass on to John.

Thanks and regards,

Stevie

John's message:

Cancer Research UK (CRUK), the world’s largest cancer research charity, have just taken me on as Wikipedian in Residence until mid-December 2014 (4/5 part time).

Parts of the plan for the role are very relevant to this thread. We are aiming to improve WP articles on cancer to ensure they are accurate, up-to-date and accessible to the full range of WP’s readership, working closely with the existing English WP medical editing community, many of whom have already been supportive of the project. With the medical translation project also underway, this is great timing for us to improve important content across large numbers of language versions.

We will be able to draw upon the expertise of both the medical research staff funded by CRUK (over 4000 in the UK) and the various kinds of staff they have with professional expertise in writing for a range of audiences, from patients to scientists ( their editorial policy< http://www.cancerresearchuk.org/cancer-help/utilities/about-cancerhelp-uk/ca...

...
).

We are also planning to do research with the public into what they think of specific WP articles, perhaps before and after improvement, and into how they use WP and other sites at the top of search pages when looking for medical information on the internet. There has been little research into this area, and the results should be very useful in focusing the ways medical content generally can be improved.

The CRUK position is funded by the Wellcome Trust and supported by Wikimedia UK, and the budget includes an element for this research. I will be https://en.wikipedia.org/wiki/User:Wiki_CRUK_John in this role (Usually https://en.wikipedia.org/wiki/User:Johnbod. Until early July I will also continue my role (1/5) as Wikimedian in Residence at the Royal Society, the UK’s National Academy for the Sciences)

John Byrne

That's one project I was really glad to see Wikimedia UK supporting. More in that vein, please. :)

Stevie Benton

9:53 a.m.

Thank you! And of course, we're always keen to listen to suggestions and to support volunteers in anything they would like to take forward.

On 9 May 2014 14:48, Andreas Kolbe jayen466@gmail.com wrote:

...

On Fri, May 9, 2014 at 11:17 AM, Stevie Benton < stevie.benton@wikimedia.org.uk> wrote:

...
Hello everyone,

I think Wikimedia UK has an example project, related to medical articles, that may be of interest. John Byrne is the Wikimedian in Residence at Cancer Research UK, one of the UK's largest charities. He's put together the below message but isn't subscribed to this list so can't post. I am posting on his behalf. I'm happy to answer any questions about this and those I can't, I shall pass on to John.

Thanks and regards,

Stevie

John's message:

Cancer Research UK (CRUK), the world’s largest cancer research charity, have just taken me on as Wikipedian in Residence until mid-December 2014 (4/5 part time).

Parts of the plan for the role are very relevant to this thread. We are aiming to improve WP articles on cancer to ensure they are accurate, up-to-date and accessible to the full range of WP’s readership, working closely with the existing English WP medical editing community, many of whom have already been supportive of the project. With the medical translation project also underway, this is great timing for us to improve important content across large numbers of language versions.

We will be able to draw upon the expertise of both the medical research staff funded by CRUK (over 4000 in the UK) and the various kinds of staff they have with professional expertise in writing for a range of

audiences,

...
from patients to scientists ( their editorial policy<

http://www.cancerresearchuk.org/cancer-help/utilities/about-cancerhelp-uk/ca...

...
...
).

We are also planning to do research with the public into what they think

of

...
specific WP articles, perhaps before and after improvement, and into how they use WP and other sites at the top of search pages when looking for medical information on the internet. There has been little research

into

...
this area, and the results should be very useful in focusing the ways medical content generally can be improved.

The CRUK position is funded by the Wellcome Trust and supported by Wikimedia UK, and the budget includes an element for this research. I

will

...
be https://en.wikipedia.org/wiki/User:Wiki_CRUK_John in this role

(Usually

...
https://en.wikipedia.org/wiki/User:Johnbod. Until early July I will

also

...
continue my role (1/5) as Wikimedian in Residence at the Royal Society,

the

...
UK’s National Academy for the Sciences)

John Byrne

That's one project I was really glad to see Wikimedia UK supporting. More in that vein, please. :) _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wil Sinclair

7 May 7 May

8:45 p.m.

I looked at WMF's grant page here: http://meta.wikimedia.org/wiki/Grants. I don't see any mention of grants for academic research. Does the WMF give such grants? If not, why not?

,Wil

On Wed, May 7, 2014 at 5:12 PM, Andreas Kolbe jayen466@gmail.com wrote:

...

On Thu, May 8, 2014 at 1:06 AM, Thyge ltl.privat@gmail.com wrote:

...
Maybe you should suggest that to the universities and not just to this mailing list. Nothing prevents to set up " an independent panel of academic experts" and to start doing that job today. regards, Thyge

Well, I'd like the Foundation to invest in such research, which is why I brought it up here.

I cant think of several instances of donors' money being spent on things that to me seemed less supportive of the Foundation's core mission. _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Nathan

8:54 p.m.

On Wed, May 7, 2014 at 8:45 PM, Wil Sinclair wllm@wllm.com wrote:

...

I looked at WMF's grant page here: http://meta.wikimedia.org/wiki/Grants. I don't see any mention of grants for academic research. Does the WMF give such grants? If not, why not?

,Wil

https://meta.wikimedia.org/wiki/Grants:IEG/The_use_of_Wikipedia_by_doctors_f...

The WMF focuses on submitted grant proposals. The IEG grants cover a pretty wide range of projects, but some of them are research oriented and involve academics and academic institutions. While WMF executives can make a much better judgment of the boundaries of content involvement, there seems to be a substantial difference between agreeing to fund grants and commissioning specific research into a core topic like quality. However, affiliates should have no such concerns. I really hope someone picks up on this kind of project, perhaps I'll suggest it on the Wikimedia DC list or to Wikimedia New England.

geni

8 May 8 May

3:12 a.m.

On 8 May 2014 01:00, Andreas Kolbe jayen466@gmail.com wrote:

...

As for study design, I'd suggest you begin with a *random* sample of frequently-viewed Wikipedia articles in a given topic area (e.g. those within the purview of WikiProject Medicine), have them assessed by an independent panel of academic experts, and let them publish their results.

No control, no calibration. Without those you can't really be sure what you've measured. While academic attitudes to Wikipedia may be of some interest they are not a proxy for quality.

You can try and rigidly and arbitrarily define quality. This is what Hasty et al were trying to do with their rather non standard definition of error. However as the variation in their numbers show this is really rather hard to do. You've also got the problems that your definition of quality may up with little relation to anything anyone cares about and of course you can pretty much get whatever result you want by selecting definitions.

If you allow more subjective judgement (or accept that you can't entirely kill it off) you are going to need something to compare your results to. Unfortunately wikipedia has largely killed off encyclopedia like works which makes it harder to find things to calibrate it against. Still I guess you could construct something ranging from tabloid journalism through broadsheet journalism through mass market books onto text books and conference papers through to journals with various impact factors (technically you could calculate an impact factor for wikipedia. This does not strike me as a good idea).

Since wikipedia's style is fairly obvious you can't do any blinding so you are going to need a bunch of factors to correct for that. Writing a novel article in an encyclopedic but non wikipedian style and getting a series of experts to rate it (half of whom are told it is a wikipedia article and half of whom are told it comes from somewhere else) might get you some data but doesn't leave you any way to correct for any encyclopedia vs paper in journal bias. Sure you can attack that by writing out the same information in various different styles but that requires yet more preliminary testing.

If you are interested in accuracy rather than quality it gets even worse. Have to adjust for effects of say diagrams vs photos, fonts (sure you can standardise those but how to people perceive Wikipedia when its in a different font from the one they are used to?), layout, british vs american english. Sure you can adjust for all this but you need yet more preliminary data.

Probably a load of factors that you miss the first time so you are going to need pilot studies.

Which is why you don't start with medicine. The time of medical experts is expensive. Postdocs in other areas can be have almost depressingly cheaply.

...

All of that is quite doable.

Doable but far more difficult than you seem to think.

...

If the results are good, it redounds to Wikipedia's credit.

No if the results were good we risk having more people rely on wikipedia rather than going to see a medical professional. This is not a good outcome.

-- geni

Andreas Kolbe

4:53 a.m.

On Thu, May 8, 2014 at 8:12 AM, geni geniice@gmail.com wrote:

...

On 8 May 2014 01:00, Andreas Kolbe jayen466@gmail.com wrote:

...
As for study design, I'd suggest you begin with a *random* sample of frequently-viewed Wikipedia articles in a given topic area (e.g. those within the purview of WikiProject Medicine), have them assessed by an independent panel of academic experts, and let them publish their

results.

...
No control, no calibration. Without those you can't really be sure what you've measured. While academic attitudes to Wikipedia may be of some interest they are not a proxy for quality.

Yes Geni, absolutely. If I give Wikipedia's article on diabetes to three acknowledged experts on diabetes for a detailed review, and they tell me at the end of it that it is a wonderful, up-to-date and accurate article – or they tell me that it contains numerous errors of fact – I won't have learned anything. :)

Incidentally, speaking of diabetes, one of the more striking hoaxes in

https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia

is "glucojasinogen". It lasted 4.5 years and entered several academic sources that copied a section of the Wikipedia article, before someone discovered that there was no such thing.

One thing I would say is that if Wikipedia articles were to be compared against articles from another source, they should have roughly the same length. It's not fair to compare a 4,000-word article from Wikipedia against a 500-word article from Britannica. Other than that, I think we could leave the study design to those who do this sort of stuff for a living. It's really not something you and I have to work out here on a mailing list.

geni

12:26 p.m.

On 8 May 2014 09:53, Andreas Kolbe jayen466@gmail.com wrote:

...

Yes Geni, absolutely. If I give Wikipedia's article on diabetes to three acknowledged experts on diabetes for a detailed review, and they tell me at the end of it that it is a wonderful, up-to-date and accurate article – or they tell me that it contains numerous errors of fact – I won't have learned anything. :)

So you concede your approach has extremely limited resolution even under ideal conditions (that is where your testers actually completely agree with each other)?

Gets really fun when you discoverer that the numerous errors of fact are that it uses liters rather than decimeters cubed (in certian contexts liters is slightly ambiguous).

...

Incidentally, speaking of diabetes, one of the more striking hoaxes in

https://en.wikipedia.org/wiki/Wikipedia:List_of_hoaxes_on_Wikipedia

is "glucojasinogen". It lasted 4.5 years and entered several academic sources that copied a section of the Wikipedia article, before someone discovered that there was no such thing.

So what you are saying is that even experts aren't that great on raw fact checking. In fairness we know this. There is a reasons that when last june Organic Letters decided to do some serious fraud detection work on their spectra they went with a data analyst rather than a chemist,

...

One thing I would say is that if Wikipedia articles were to be compared against articles from another source, they should have roughly the same length. It's not fair to compare a 4,000-word article from Wikipedia against a 500-word article from Britannica.

That means you either end up artificially trimming articles (again with a significant risk of creating data anomalies) or have an even harder time getting your calibration curve in place.

...

Other than that, I think we could leave the study design to those who do this sort of stuff for a living. It's really not something you and I have to work out here on a mailing list.

You seem to think its straightforward. If you think that you should be able to propose a study design.

edward

12:42 p.m.

Geni:

...

...
So what you are saying is that even experts aren't that great on raw fact

checking.

It is common to cite articles on the assumption that they would not have been published without review and checking. It is unlikely that a published journal article would be a complete hoax (as opposed to containing errors). It was a mistake for the authors to cite a Wikipedia article, of course.

...

...
You seem to think its straightforward. If you think that you should

be able to propose a study design.

It is straightforward in my field. I have already studied most of the Wikipedia articles in that area, and they all contain glaring errors. Occasionally I clean some of it up, but then the errors quickly appear again.

geni

12:58 p.m.

On 8 May 2014 17:42, edward edward@logicmuseum.com wrote:

...

It is common to cite articles on the assumption that they would not have been published without review and checking. It is unlikely that a published journal article would be a complete hoax (as opposed to containing errors). It was a mistake for the authors to cite a Wikipedia article, of course.

Zee problem is that we know that standard peer review is pretty useless at detecting fraud. Which is understandable. If I claim to have made a chemical and provide a plausible mechanism what are you going to do? Spectra are approach but its easy enough to calculate a spectra and add some noise and a couple of solvent peaks. Sure there are ways to counter that but they are a bit outside the skill set of your standard peer reviewers.

So while it is unlikely that a published journal article would be a complete hoax (outside of the yield section anyway) there is little reason to think that has anything to do with peer review.

...

...
...
You seem to think its straightforward. If you think that you should be

able to propose a study design.

It is straightforward in my field. I have already studied most of the Wikipedia articles in that area, and they all contain glaring errors. Occasionally I clean some of it up, but then the errors quickly appear again.

Please robustly define "glaring". Please also understand if I don't accept you as an impartial source on the matter rendering your subjective judgements of limited value.

-- geni

edward

1:13 p.m.

On 08/05/2014 17:58, geni wrote:

...

...
So while it is unlikely that a published journal article would be a

complete hoax

This is because they have a robust review process, which Wikipedia doesn't. Enough said.

...

...
Please robustly define "glaring".

Glaring means obvious, in plain view, manifest etc. I gave some examples here http://wikipediocracy.com/2014/02/23/islands-of-sanity/

One example:"It can be speculated that one of the first people in Europe who consulted the map was William Vorilong, noted philosopher from England, who was shown the map while travelling with japanese visitor Yoshimitsu Kage." William was French, not English. And he never visited Japan.

...

...
Please also understand if I don't accept you as an impartial source

on the matter rendering your subjective judgements of limited value.

They are not subjective judgments, see above. 'Glaring' /= 'subjective'. Why don't you accept me as an impartial source? Because I have written articles critical of Wikipedia? Oh right.

Some of these problems can be fixed. But fixing problems means recognising there is a problem, no?

Edward

George Herbert

1:33 p.m.

I would like to make a couple of contradictory points...

One, WMF and the editing communities should seek more, better *external* reviews with some preference ... What we ourselves find and decide about our content is less valuable than unbiased external reviews. That doesn't mean external reviews will automatically be better quality, but external viewpoints are inherently valuable.

WMF sponsored but not influenced external studies may be an acceptable balance point, but that should be carefully thought about.

Two, internal studies are also valuable, but should be done carefully. I have not yet had a chance to follow up the internal study links upthread. The advantage here is that if we can establish criteria that are reasonably robust and externally-reviewed-and-supported, then having internal reviewers rank versus those criteria is likely to get a lot more quantity of review results.

On Thu, May 8, 2014 at 10:13 AM, edward edward@logicmuseum.com wrote:

...

On 08/05/2014 17:58, geni wrote:

...
...
So while it is unlikely that a published journal article would be a

complete hoax

This is because they have a robust review process, which Wikipedia doesn't. Enough said.

...
...
Please robustly define "glaring".

Glaring means obvious, in plain view, manifest etc. I gave some examples here http://wikipediocracy.com/2014/02/23/islands-of-sanity/

One example:"It can be speculated that one of the first people in Europe who consulted the map was William Vorilong, noted philosopher from England, who was shown the map while travelling with japanese visitor Yoshimitsu Kage." William was French, not English. And he never visited Japan.

...
...
Please also understand if I don't accept you as an impartial source on

the matter rendering your subjective judgements of limited value.

They are not subjective judgments, see above. 'Glaring' /= 'subjective'. Why don't you accept me as an impartial source? Because I have written articles critical of Wikipedia? Oh right.

Some of these problems can be fixed. But fixing problems means recognising there is a problem, no?

Edward

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

-- -george william herbert george.herbert@gmail.com

phoebe ayers

1:56 p.m.

On Thu, May 8, 2014 at 10:13 AM, edward edward@logicmuseum.com wrote:

...

On 08/05/2014 17:58, geni wrote:

...
...
So while it is unlikely that a published journal article would be a

complete hoax

This is because they have a robust review process, which Wikipedia doesn't. Enough said.

Geni did say "unlikely", not "it never happens": http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-paper...

By which I which I don't mean to say most literature is useless or a fraud: it's not! But it's also not a 100% black or white picture.

-- phoebe

-- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com *

edward

2:27 p.m.

...

...
By which I which I don't mean to say most literature is useless or a

fraud: it's not! But it's also not a 100% black or white picture. -- phoebe

The 'not perfect ' fallacy.

Peer reviewed literature is not perfect Wikipedia is not perfect Ergo, Wikipedia is equally good as peer reviewed literature

You can see that's not valid, right?

All I can say is that from my experience of my limited area of expertise, Wikipedia articles on the subject fall far short of what you would find elsewhere (e.g. Stanford Encyclopedia of Philosophy).

Anthony Cole

2:27 p.m.

As Phoebe and (I think) Anne point out, there are many relevant aspects of quality. Readability, pertinence, neutrality, concision and comprehensiveness are all important factors but, when it comes to safety and efficacy claims in our medical articles, for me they pale into insignificance beside that other element of quality: veracity.

I agree with those above who highlight the flaws in the current scholarly peer-review process. If enWikipedia is to embrace scholarly review (and we should) we need to confront and address the well-known problems with peer review in today's scholarship.

Whether we use scholars to assess the veracity, pertinence, comprehensiveness and neutrality of our articles as part of a self-assessment process, or as a service to our readers, I believe the quality of our scholarly review must be beyond reproach.

Above I mention the journal Open Medicine has peer-reviewed a version of Wikipedia's "Dengue fever" thanks to the tireless efforts of Doc James and others (not me). I see this as a significant threshold. Once that article is published in the journal, James will be adding a clickable icon to the top of the current Wikipedia version, linking the reader to the PubMed abstract (or PubMed Central full version - I'm not sure which).

I know nothing about Open Medicine's editorial or review processes, though. As a start - to break this new ground - I am delighted to have this go forward as it is. But can we bend our minds now - or soon - to the question of whose reviewed versions should we be linking to. If the Journal of the New Zealand Acupuncture Society reviews and publishes a version of our "Acupuncture" article, do we link to it at the top of the article? If the Lancet - publishers of Andrew Wakefield's fraudulent MMR vaccine-autism-collitis paper overcomes its issues with CC-BY-SA and reviews and publishes a version of "Cancer", do we link to it?

I have a lot more to say on this issue but would like to hear some civil, thoughtful responses to the above before ploughing ahead. Let me say again, to be very clear, I support linking to the reviewed version of "Dengue fever". It is after all virtually identical to Wikipedia's current version, and any differences in the current version have not had the added filter of expert eyse from the scholarly-review process. But it is time for us to start thinking carefully and talking amongst ourselves about the question of scholarly review.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole

On Fri, May 9, 2014 at 1:56 AM, phoebe ayers phoebe.wiki@gmail.com wrote:

...

On Thu, May 8, 2014 at 10:13 AM, edward edward@logicmuseum.com wrote:

...
On 08/05/2014 17:58, geni wrote:

...
...
So while it is unlikely that a published journal article would be a

complete hoax

This is because they have a robust review process, which Wikipedia doesn't. Enough said.

Geni did say "unlikely", not "it never happens":

http://www.nature.com/news/publishers-withdraw-more-than-120-gibberish-paper...

By which I which I don't mean to say most literature is useless or a fraud: it's not! But it's also not a 100% black or white picture.

-- phoebe

--

I use this address for lists; send personal messages to phoebe.ayers <at>

gmail.com * _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

David Gerard

3:15 p.m.

On 8 May 2014 19:27, Anthony Cole ahcoleecu@gmail.com wrote:

...

I agree with those above who highlight the flaws in the current scholarly peer-review process. If enWikipedia is to embrace scholarly review (and we should) we need to confront and address the well-known problems with peer review in today's scholarship.

While acknowledging the likely truth of the flaws in scientific knowledge production as it stands (single studies in medicine being literally useless, as 80% are actually wrong) ... I think you'll have a bit of an uphill battle attempting to enforce stronger standards in Wikipedia than exist in the field itself. We could go to requiring all medical sourced to be Cochrane-level studies of studies of studies, but then you need to convince everyone else to delete the entirety of the long tail of articles that won't yet have those.

- d.

phoebe ayers

3:26 p.m.

---------- Forwarded message ---------- From: David Gerard dgerard@gmail.com Date: Thu, May 8, 2014 at 12:15 PM Subject: Re: [Wikimedia-l] Metrics - accuracy of Wikipedia articles To: Wikimedia Mailing List wikimedia-l@lists.wikimedia.org

On 8 May 2014 19:27, Anthony Cole ahcoleecu@gmail.com wrote:

...

I agree with those above who highlight the flaws in the current scholarly peer-review process. If enWikipedia is to embrace scholarly review (and we should) we need to confront and address the well-known problems with peer review in today's scholarship.

That actually is the current best practice for medical articles in English, I believe, and I think it's a good one: https://en.wikipedia.org/wiki/Wikipedia:MEDRS

Sourcing to reviews when possible is particularly relevant for a field (like medicine) that has a well-established tradition of conducting and publishing systematic reviews -- but I find it a useful practice in lots of areas, on the theory that reviews are generally more helpful for someone trying to find out more about a topic.

Anthony: I hear you about veracity being particularly important in medical articles; and I don't mean to get us too far in the weeds about what quality means -- there's lots to do on lots of articles that I think would be pretty obvious quality improvement, including straight-up fact-checking.

-- phoebe

Andreas Kolbe

9 May 9 May

8:38 a.m.

On Thu, May 8, 2014 at 8:26 PM, phoebe ayers phoebe.wiki@gmail.com wrote:

...

---------- Forwarded message ---------- From: David Gerard dgerard@gmail.com

...
While acknowledging the likely truth of the flaws in scientific knowledge production as it stands (single studies in medicine being literally useless, as 80% are actually wrong) ... I think you'll have a bit of an uphill battle attempting to enforce stronger standards in Wikipedia than exist in the field itself. We could go to requiring all medical sourced to be Cochrane-level studies of studies of studies,

That actually is the current best practice for medical articles in English, I believe, and I think it's a good one: https://en.wikipedia.org/wiki/Wikipedia:MEDRS

Indeed so, and I agree it is a good idea.

...

Sourcing to reviews when possible is particularly relevant for a field (like medicine) that has a well-established tradition of conducting and publishing systematic reviews -- but I find it a useful practice in lots of areas, on the theory that reviews are generally more helpful for someone trying to find out more about a topic.

This is of course part of the same scholarly system that I was referring to earlier in this discussion.

Within Wikipedia, peer-reviewed publications and/or systematic reviews of such studies are considered among the most valuable and high-quality sources. They're a vital building block of the knowledge that Wikipedia seeks to disseminate. We know that all human methods are imperfect; but we're also agreed that the scholarly method is, by and large, superior to other methods of knowledge production.

Now, when I suggested that the Foundation bring these established methods to bear on Wikipedia itself, you (and one or two others) chimed in with concerns about real and potential flaws of scholarly studies and the peer review system. It seemed to me as though underlying these comments there were some sense that, while scholarly methods were good to illuminate any topic under the sun that Wikipedia writes about, they wouldn't be welcome as a method to illuminate Wikipedia itself.

I am well aware of the various documented problems with peer review, and its occasional failures. They haven't led Wikipedia to abandon its view that, by and large, peer-reviewed studies are among the best sources available. So I didn't think your raising problems with aspects of the scholarly method was particularly germane to this discussion of content quality studies. If we didn't believe in the scholarly method, we wouldn't privilege its output in Wikipedia.

...

Anthony: I hear you about veracity being particularly important in medical articles; and I don't mean to get us too far in the weeds about what quality means -- there's lots to do on lots of articles that I think would be pretty obvious quality improvement, including straight-up fact-checking.

I think any research programme evaluating the quality of Wikipedia content should first and foremost focus on such basics: veracity and fact checking.

...

...
Given that the post that started this thread referenced medical content, are you telling me that you think it would be useless to have qualified medical experts reviewing Wikipedia's medical content, because the

process

...

...
would be "opaque, messy, prone to failure and doesn't always support innovation"?

...

No, that is not what I am saying; and leaping to that conclusion seems

...

a rather pointy and bad-faith approach, which makes it just that much more of an effort to participate in this conversation -- if you want to have a dialog with other people, please try to be more generous in your assumptions.

I hope I have explained why I reacted the way I did. Your comments led me to believe that you were simply not very keen on Wikipedia being subjected to a test, using the most objective method available.

...

What I was trying to say is that I don't think your implication that there is already a well-designed solution that will fix all our problems is correct -- both because it's difficult to apply peer review in this context, and because peer review has plenty of problems itself. I think blind-review quality studies can be useful, but I don't think they're a panacea, anymore than scholarly peer review is itself a panacea for making sure good scholarly work gets published.

There are well-established methods for assessing the quality of written work. I should think that a team composed of both academics well-versed in study design and statistics and Wikimedians familiar with Wikipedia content would over time be able to come up with a methodology that produces good results in assessing project content in various topic areas against the Wikimedia vision.

Once the basic framework has been established, the academics concerned should be given full intellectual freedom to assess the content as they see fit.

I think such efforts would demonstrate leadership, and reflect well on the Foundation.

...

Anyway, reviewer studies are one tool for assessing quality, but imho they are mostly good for raising awareness of Wikipedia within a particular field (thus possibly gaining new editors), and occasionally for correcting the few articles that do get reviewed.

...

Article quality has lots of dimensions, including those that reviewers might look for, and others that might not be apparent:

...

factual accuracy -- that seems pretty straightforward, though of course

it's not always -- cf historical debates, new evidence coming to light, etc.

important facets of the topic being highlighted and appropriate coverage

-- also pretty straightforward, except when it's not: what if a new and emerging theory isn't noted, or a historical one given short shrift? More to the point for reviewers, what if *my* theory isn't highlighted?

A good bibliography and references -- I think experts can particularly

weigh in on this, though standards vary widely across fields and articles for what gets cited, and what's good/seminal/classic is of course never easy to determine and is always under debate.

For some of these aspects, the Wikimedia movement has standards that could be communicated to reviewers. For example, the requirement that content be neutral, reflect prevalent opinions in proportion to their prevalence in the best sources, and so on – a reviewer should not complain that a theory she or he doesn't like, but which is part of scholarly discourse, is given due visibility in Wikipedia. Failures might occur, but we know no system is perfect. All you can do is impress upon reviewers what ideal you are pursuing, and trust in their intellectual honesty to assess articles in terms of their being an effort to meet that goal.

...

clear writing -- sometimes we get accused of being too dry or pedantic,

when that's our house style. What to do with this?

Accessibility -- depends entirely on who is reading it, doesn't it? are

our physics articles accessible to grad students? Usually. Accessible to laypeople or 10th graders? Rarely.

In other areas, like the one you mention here, standards are lacking. I cannot recall the Wikimedia Foundation board ever having provided guidance on whether maths content, say, should be written so that it is helpful to kids doing their schoolwork, to maths students doing their coursework, or maths professors looking to brush up on an area. This is a point Anne touched upon, and there have been many complaints over the years that some Wikipedia content is not written in a way that would be helpful to the average reader.*

I think this reflects a lack of vision on the part of the Foundation as to what kind of reference work Wikipedia should be. And I believe the reason is that opinions on the matter vary, and that people in the Foundation feel that whatever guidance they might provide on such an issue would be disliked by some section of the Wikipedia community.

(Personally, I think that every maths article should at least have an introduction that a 9th-grader would be able to understand.)

...

Answers readers' questions -- hard to know without something like

article

...

feedback or another measuring mechanism. The questions of a new student

are

...

rarely those of an expert. Using medicine as an example: does the article on cancer answer the questions of doctors, or of newly-diagnosed patients (who are likely to be reading it)? Or the patients' relatives and caregivers? (Or none of the above?)

...

So yes, we should do reviewer studies to review for "objective" quality. Also, if we're serious about seeing how our articles meet reader needs [certainly one dimension of quality], we should also do reviewer studies with lots of groups of reviewers (medical experts, high school students, cancer patients!) And we should look at automated quality metrics, since reviewing 31 million articles by hand does not necessarily scale. And, we should look into ways to follow up on quality studies with things like to-do lists generated from reviewers, getting people in societies and universities engaged in editing based on the outcome of reviewing, etc. -- so that all of this work has the outcome of measurably improved quality.

...

Personally (not speaking for the WMF or other trustees here) I think the best thing the WMF can do is provide a platform for this kind of work:

yes,

...

we can (and do) fund research studies, but in line with our general

mission

...

to provide the infrastructure for the projects to grow on, we can also

help

...

build tools to make this work easier, so that groups like Wiki Project Med etc can get studies done easily as well. And we (the community) should develop a list of tools that those interested in doing this work need and want -- and those tools could be developed anywhere, under the aegis of

the

...

WMF or not.

I disagree. I do think the Foundation, and you as a board member, have a responsibility here. The provision of "high quality content" is part of the Foundation's core aims and values.

You have money – about ten times as much as five or six years ago. I would urge you to invest some of it in seeing how good the present system of content production is at delivering content that meets the aspirations expressed in the Foundation's core values.

Gaining objective data on this would be instructive to both the movement and the public, and provide an important stimulus for quality improvement and measurement efforts, and the recruitment of qualified editors. I understand that quantitative metrics (number of edits, editors, articles and page views) are easier to collect, but still find it disappointing that you haven't made more vigorous efforts to evaluate quality, using the input of subject matter experts.

Identifying problems is useful. Neither the Foundation nor the community should be afraid of some being found, and becoming public. They should be glad, because whenever a problem is brought into focus, it presents the Wikimedia movement with a chance to overcome it, and become a better and more effective movement as a result.

Likewise, when Wikipedians get an article through scholarly peer review, as WikiProject Medicine/Wiki Project Med Foundation have just now managed to do, this motivates further such efforts and fosters learning within the community, based on outside expert input. That is a really good thing, and I would like you to support such grassroots efforts to the best of your ability.

...

(Off the top of my head, these could include: tools to pull a random/blind sample from a category, perhaps across already-rated articles, that could be replicated across topics to do multiple comparable reviewer studies. Tools to consolidate editor-rating metrics from across languages; maybe representing those ratings in Wikidata. A strong to-do list functionality, and a strong category/quality rating intersection functionality, so that, say, an oncologist interested in working on poor-quality cancer articles could easily get to editing. Displaying all this data easily in the projects, by article. etc. etc.)

These are good ideas that would complement any research initiative undertaken by the Foundation itself. I for one would be happy to see resources invested in pursuing them.

But please consider funding your own content quality research programme, and supporting and encouraging such research being done. The aim in this should not be to be given a clean bill of health that validates the status quo, but the identification of things gone wrong, and improvement opportunities.

Content quality is not the only area worth studying. Wikipedian anthropology and sociology – investigating behavioural patterns, interaction patterns and the effectiveness of administrative structures in the community – would be another worthwhile topic of study. There are plenty of anecdotes of social dysfunction in the community (evidence can be found any day at AN/I, or in documented failures such as that of the Croatian Wikipedia highlighted in the press last autumn).** There have been a handful of studies focused on this area, but I would love the Foundation see more to advance and publicise such research.

Basically, I believe there are unexploited potentials here. Academic research programmes would create synergies, lead to an influx of new people and ideas, and vitalise the movement. The media coverage that Wiki Project Med Foundation/WikiProject Medicine has generated is a good example of that. Such public debates have multiple benefits: they make Wikipedia less opaque, they explain to the public how Wikipedia content is generated and why relying on Wikipedia content may sometimes be a bad idea, they provide visibility for the quality improvement efforts that are underway, and demonstrate social responsibility.

* See the discussion of Wikipedia's maths articles by Alan Riskin from the Maths Department at Whittier College, see http://wikipediocracy.com/2013/10/20/elementary-mathematics-on-wikipedia-2/

** http://www.dailydot.com/politics/croatian-wikipedia-fascist-takeover-controv...

David Gerard

8 May 8 May

3:11 p.m.

On 8 May 2014 17:42, edward edward@logicmuseum.com wrote:

...

Geni:

...
...
You seem to think its straightforward. If you think that you should be able to propose a study design.

...

It is straightforward in my field. I have already studied most of the Wikipedia articles in that area, and they all contain glaring errors.

Your area is philosophy, and an obscure area at that. The thread is talking about medicine.

- d.

Wil Sinclair

3:21 p.m.

Maybe the name of the thread should be changed, then.

,Wil

On Thu, May 8, 2014 at 12:11 PM, David Gerard dgerard@gmail.com wrote:

...

On 8 May 2014 17:42, edward edward@logicmuseum.com wrote:

...
Geni:

...
...
You seem to think its straightforward. If you think that you should be able to propose a study design.

...
It is straightforward in my field. I have already studied most of the Wikipedia articles in that area, and they all contain glaring errors.

Your area is philosophy, and an obscure area at that. The thread is talking about medicine.

d.

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

edward

3:44 p.m.

On 08/05/2014 20:11, David Gerard wrote:

...

...
Your area is philosophy, and an obscure area at that.

My specialism covers the intellectual history of Western Europe from 400 CE to 1400 CE roughly. In the history of logic, right up to the late nineteenth century. If you remember, I wrote the first version (mostly unchanged today) of https://en.wikipedia.org/wiki/Zermelo_set_theory .

It's probably obscure relative to Pokemon studies and TV shows, sorry about that.

...

...
The thread is talking about medicine.

The thread is called "Metrics - accuracy of Wikipedia articles", and it opens "Could someone please point me to all the studies the WMF have conducted into the reliability of Wikipedia's content? I'm particularly interested in the medical content, **but would also like to look over the others too**".

Erik mentioned the flawed Oxford study, which was in my area.

With every kind wish.

Edward

Risker

7 May 7 May

8:04 p.m.

On 7 May 2014 19:38, Andreas Kolbe jayen466@gmail.com wrote:

...

On Thu, May 8, 2014 at 12:22 AM, phoebe ayers phoebe.wiki@gmail.com wrote:

...
On Wed, May 7, 2014 at 3:14 PM, Andreas Kolbe jayen466@gmail.com

wrote:

...
...
Anne, there are really well-established systems of scholarly peer

review.

...
...
There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

And those peer review systems have lots and lots of problems as well as upsides. Lots of people *are* trying to reinvent peer review, including some very respected scientists.* As an academic science librarian, I can attest to there being widespread and currently ongoing debates about how

to

...
review scientific knowledge, whether traditional peer review is

sufficient,

...
and how to improve it. The current system for scientific research is

often

...
opaque, messy, prone to failure and doesn't always support innovation,

and

...
lots of smart people are thinking about it.

Erik: aha! I'd forgotten about those case studies, thanks!

Given that the post that started this thread referenced medical content, are you telling me that you think it would be useless to have qualified medical experts reviewing Wikipedia's medical content, because the process would be "opaque, messy, prone to failure and doesn't always support innovation"?

Andreas, I don't think that's necessarily what is being said here. However, the review needs to be scientifically valid, and the review in the JAOA isn't. For example, it does not require that the assessor look at the references used in the article to determine whether or not the reference meets the arbitratory standard set (i.e. peer-reviewed source updated or published within the last 5 years), and whether or not the article says what the reference says. Instead, the assessors looked at sources that may or may not have been used in the article, thus eroding any control for disagreement amongst scientific peers - something that most editors who work in this area know is surprisingly common.

The study itself identifies very significant, possibly fatal, limitations, including the use of essentially random reference sources that just happen to be available, the level of understanding of the subjects by the reviewers, the limited number of reviewers, and the fact that subject matter experts themselves are often in disagreement. It has not demonstrated repeatability.

It's possible to create a study that's worthwhile. This one wasn't it.

Risker/Anne

phoebe ayers

8 May 8 May

1:52 p.m.

On Wed, May 7, 2014 at 4:38 PM, Andreas Kolbe jayen466@gmail.com wrote:

...

On Thu, May 8, 2014 at 12:22 AM, phoebe ayers phoebe.wiki@gmail.com wrote:

...
On Wed, May 7, 2014 at 3:14 PM, Andreas Kolbe jayen466@gmail.com

wrote:

...
...
Anne, there are really well-established systems of scholarly peer

review.

...
...
There is no need to reinvent the wheel, or add distractions such as infoboxes and other bells and whistles.

And those peer review systems have lots and lots of problems as well as upsides. Lots of people *are* trying to reinvent peer review, including some very respected scientists.* As an academic science librarian, I can attest to there being widespread and currently ongoing debates about how

to

...
review scientific knowledge, whether traditional peer review is

sufficient,

...
and how to improve it. The current system for scientific research is

often

...
opaque, messy, prone to failure and doesn't always support innovation,

and

...
lots of smart people are thinking about it.

Erik: aha! I'd forgotten about those case studies, thanks!

Given that the post that started this thread referenced medical content, are you telling me that you think it would be useless to have qualified medical experts reviewing Wikipedia's medical content, because the process would be "opaque, messy, prone to failure and doesn't always support innovation"?

No, that is not what I am saying; and leaping to that conclusion seems like a rather pointy and bad-faith approach, which makes it just that much more of an effort to participate in this conversation -- if you want to have a dialog with other people, please try to be more generous in your assumptions.

What I was trying to say is that I don't think your implication that there is already a well-designed solution that will fix all our problems is correct -- both because it's difficult to apply peer review in this context, and because peer review has plenty of problems itself. I think blind-review quality studies can be useful, but I don't think they're a panacea, anymore than scholarly peer review is itself a panacea for making sure good scholarly work gets published.

Anyway, reviewer studies are one tool for assessing quality, but imho they are mostly good for raising awareness of Wikipedia within a particular field (thus possibly gaining new editors), and occasionally for correcting the few articles that do get reviewed.

Article quality has lots of dimensions, including those that reviewers might look for, and others that might not be apparent:

* factual accuracy -- that seems pretty straightforward, though of course it's not always -- cf historical debates, new evidence coming to light, etc. * important facets of the topic being highlighted and appropriate coverage -- also pretty straightforward, except when it's not: what if a new and emerging theory isn't noted, or a historical one given short shrift? More to the point for reviewers, what if *my* theory isn't highlighted? * A good bibliography and references -- I think experts can particularly weigh in on this, though standards vary widely across fields and articles for what gets cited, and what's good/seminal/classic is of course never easy to determine and is always under debate. * clear writing -- sometimes we get accused of being too dry or pedantic, when that's our house style. What to do with this? * Accessibility -- depends entirely on who is reading it, doesn't it? are our physics articles accessible to grad students? Usually. Accessible to laypeople or 10th graders? Rarely. * Answers readers' questions -- hard to know without something like article feedback or another measuring mechanism. The questions of a new student are rarely those of an expert. Using medicine as an example: does the article on cancer answer the questions of doctors, or of newly-diagnosed patients (who are likely to be reading it)? Or the patients' relatives and caregivers? (Or none of the above?)

So yes, we should do reviewer studies to review for "objective" quality. Also, if we're serious about seeing how our articles meet reader needs [certainly one dimension of quality], we should also do reviewer studies with lots of groups of reviewers (medical experts, high school students, cancer patients!) And we should look at automated quality metrics, since reviewing 31 million articles by hand does not necessarily scale. And, we should look into ways to follow up on quality studies with things like to-do lists generated from reviewers, getting people in societies and universities engaged in editing based on the outcome of reviewing, etc. -- so that all of this work has the outcome of measurably improved quality.

Personally (not speaking for the WMF or other trustees here) I think the best thing the WMF can do is provide a platform for this kind of work: yes, we can (and do) fund research studies, but in line with our general mission to provide the infrastructure for the projects to grow on, we can also help build tools to make this work easier, so that groups like Wiki Project Med etc can get studies done easily as well. And we (the community) should develop a list of tools that those interested in doing this work need and want -- and those tools could be developed anywhere, under the aegis of the WMF or not.

(Off the top of my head, these could include: tools to pull a random/blind sample from a category, perhaps across already-rated articles, that could be replicated across topics to do multiple comparable reviewer studies. Tools to consolidate editor-rating metrics from across languages; maybe representing those ratings in Wikidata. A strong to-do list functionality, and a strong category/quality rating intersection functionality, so that, say, an oncologist interested in working on poor-quality cancer articles could easily get to editing. Displaying all this data easily in the projects, by article. etc. etc.)

-- phoebe

phoebe ayers

7 May 7 May

7:12 p.m.

On Wed, May 7, 2014 at 1:17 PM, Anthony Cole ahcoleecu@gmail.com wrote:

...

Could someone please point me to all the studies the WMF have conducted into the reliability of Wikipedia's content? I'm particularly interested in the medical content, but would also like to look over the others too. Cheers.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole _______________________________________________

Hi Anthony,

There have been a number of studies by researchers looking at various subsets of Wikipedia's medical articles; do you know about the list here? https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Medicine/Wikipedia_and_m...

The research group at the WMF to the best of my knowledge hasn't run any reliability-of-articles studies itself, but there have been lots done by domain experts in various fields.

best, Phoebe

-- * I use this address for lists; send personal messages to phoebe.ayers <at> gmail.com *

Erik Moeller

7:17 p.m.

On Wed, May 7, 2014 at 4:12 PM, phoebe ayers phoebe.wiki@gmail.com wrote:

...

The research group at the WMF to the best of my knowledge hasn't run any reliability-of-articles studies itself, but there have been lots done by domain experts in various fields.

We commissioned one: https://meta.wikimedia.org/wiki/Research:Accuracy_and_quality_of_Wikipedia_e...

We also did this experiment a while ago, which is IMO the best model for a generalizable approach (i.e. work with groups of credentialed experts, provide a simple API):

http://blog.wikimedia.org/2010/12/09/encyclopedia-of-life-curates-wikipedias...

I don't think it should be our highest priority, but I do think more research and development in this area is warranted.

Erik

-- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation

Anders Wennersten

8 May 8 May

2:21 a.m.

For sv:wp we only look at quality and reliability (and coverage) only for specific subjects areas. I am very skeptical of the value of a general study as we already know we are awfully weak in many areas, like geographic entities in African countries. Some examples of our findings *Swedish adm unts: 100% or close to for coverage, reliability and quality in articles. Also are we working to establish a close link to Wikidata, to make sure all other version can get the same results in corresponding articles *Birds that are present in Sweden: also here 100% for coverage, reliability and quality in articles. This because of a dedicated, competent and enthusiastic workgroup *Medicin. One of our problem areas. Also it is important to know the Swedish government has since long uphold a very qualified webbbased information base of health related subjects, so we do not want the svwp in any way to be in contradiction of this "official" info. We also found problems with articles developed in the medicin project as they had recommendation of thing like penicilin use that different from what Swedish medicine practise said. We solved this partly buy removing part of this info, but to generalize on this get absurd, make it shorter then quality and reliabilty goes up

So I wonder if you base assumption of studies of "reliability of Wikipedia's content" actually are relevant

Anders

Anthony Cole skrev 2014-05-07 22:17:

...

Could someone please point me to all the studies the WMF have conducted into the reliability of Wikipedia's content? I'm particularly interested in the medical content, but would also like to look over the others too. Cheers.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Gerard Meijssen

2:41 a.m.

Hoi, When you consider that many if not most US mayors of a city with over 25.000 inhabitants of the 19th century have a Wikipedia article, it is relevant to notice that most South African members of the National Assemblee do not have an article [1].

When you consider that English is a major language in South Africa it says a lot about the bias of Wikipedia. It also does not take much effort to bring this information to you. It took several hours of adding them to Wikidata. It is also far from complete. It does however make the point.

Yes, we can spend money on researching the quality of the English Wikipedia in a narrow band and, yes health information is important but there is a concerted effort under way to maintain a high quality of information. The most it will do is provide more information about things we more or less already know.

The question that I would like to raise is: how are we going to make available the information that is dormant in Wikipedia? The aim of the WMF is after all "sharing the sum of all knowledge"... Thanks, GerardM

[1] http://tools.wmflabs.org/wikidata-todo/autolist.html?q=CLAIM%5B39%3A16744266]

On 8 May 2014 08:21, Anders Wennersten mail@anderswennersten.se wrote:

...

For sv:wp we only look at quality and reliability (and coverage) only for specific subjects areas. I am very skeptical of the value of a general study as we already know we are awfully weak in many areas, like geographic entities in African countries. Some examples of our findings *Swedish adm unts: 100% or close to for coverage, reliability and quality in articles. Also are we working to establish a close link to Wikidata, to make sure all other version can get the same results in corresponding articles *Birds that are present in Sweden: also here 100% for coverage, reliability and quality in articles. This because of a dedicated, competent and enthusiastic workgroup *Medicin. One of our problem areas. Also it is important to know the Swedish government has since long uphold a very qualified webbbased information base of health related subjects, so we do not want the svwp in any way to be in contradiction of this "official" info. We also found problems with articles developed in the medicin project as they had recommendation of thing like penicilin use that different from what Swedish medicine practise said. We solved this partly buy removing part of this info, but to generalize on this get absurd, make it shorter then quality and reliabilty goes up

So I wonder if you base assumption of studies of "reliability of Wikipedia's content" actually are relevant

Anders

Anthony Cole skrev 2014-05-07 22:17:

Could someone please point me to all the studies the WMF have conducted

...
into the reliability of Wikipedia's content? I'm particularly interested in the medical content, but would also like to look over the others too. Cheers.

Anthony Cole http://en.wikipedia.org/wiki/User_talk:Anthonyhcole _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe

3883

Age (days ago)

3885

Last active (days ago)

wikimedia-l@lists.wikimedia.org

66 comments

20 participants

tags (0)

participants (20)

Anders Wennersten
Andreas Kolbe
Andrew Gray
Anthony Cole
Asaf Bartov
David Cuenca
David Gerard
edward
Erik Moeller
geni
George Herbert
Gerard Meijssen
Michael Maggs
Nathan
phoebe ayers
Risker
rupert THURNER
Stevie Benton
Thyge
Wil Sinclair