Are anyone doing any work on automated quality assurance of articles? Not the ORES-stuff, that is about creating hints from measured features. I'm thinking about verifying existence and completeness of citations, and structure of logical arguments.
John
Hoi, Would checking if a date of death exists in articles be of interest to you. The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When the date of death is missing, there is either an issue at Wikidata (not the same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles? Not the ORES-stuff, that is about creating hints from measured features. I'm thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very interesting if it would be possible to say something about half-life of an error. I'm pretty sure this follows number of page views if ordinary logged-in editing is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen <gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest to you. The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When the date of death is missing, there is either an issue at Wikidata (not the same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles? Not the ORES-stuff, that is about creating hints from measured features. I'm thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Agree it is an interesting question. One would need to clearly define what you mean by an "error" though.
Simple vandalism is a relatively easy category to look at but otherwise it is complicated.
One has:
1) Unreffed stuff for which one can find a supporting source 2) Text that is partly supported by the source provided 3) Stuff well supported by a poor quality source 4) Stuff that is out of date but supported by an older source 5) Stuff that is controversial with different high quality sources coming to different opinions
The better question might be:
1) What percentage of the sources used by Wikipedia are of "high quality"? Might be somewhat less difficult to define if done within a specific area of expertise.
2) If one looks at X number of statements on WP what percentage are well supported by the reference associated with them. Maybe this could be grade as not supported at all, partly supported, mostly supported, completely supported.
This has been on my list of things to study for some time. But happy to see someone run with it. James
On Sat, Apr 15, 2017 at 6:08 PM, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very interesting if it would be possible to say something about half-life of an error. I'm pretty sure this follows number of page views if ordinary logged-in editing is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest to
you.
The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When the date of death is missing, there is either an issue at Wikidata (not the same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles?
Not
the ORES-stuff, that is about creating hints from measured features.
I'm
thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, How can you check for consistency when you are not able to appreciate if certain facts (like date of death) exist and are the same? What can you say about sources when some Wikipedias insist on sources in their own language and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all our projects and we have not, to the best of my knowledge, assessed what the quality of Wikidata is on interwiki links.. Case in point, I fixed an error today about a person that was said to be dead because a Commons category was not correctly linked.
When you study the consistency of English Wikipedia only, you only add to the current bias in research.
When you want to know about the half life of an error, you can find in the history when for instance a date was mentioned for a first time and find the same date in another language. This is not trivial as the format of a language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very interesting if it would be possible to say something about half-life of an error. I'm pretty sure this follows number of page views if ordinary logged-in editing is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest to
you.
The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When the date of death is missing, there is either an issue at Wikidata (not the same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles?
Not
the ORES-stuff, that is about creating hints from measured features.
I'm
thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and what questions you want to be answered. A linguist will have a very different approach than a computer scientist, for example. If you ask me, only a human being can judge an article if it comes to content quality and textual quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, How can you check for consistency when you are not able to appreciate if certain facts (like date of death) exist and are the same? What can you say about sources when some Wikipedias insist on sources in their own language and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all our projects and we have not, to the best of my knowledge, assessed what the quality of Wikidata is on interwiki links.. Case in point, I fixed an error today about a person that was said to be dead because a Commons category was not correctly linked.
When you study the consistency of English Wikipedia only, you only add to the current bias in research.
When you want to know about the half life of an error, you can find in the history when for instance a date was mentioned for a first time and find the same date in another language. This is not trivial as the format of a language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an error. I'm pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest to
you.
The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When
the
date of death is missing, there is either an issue at Wikidata (not the same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles?
Not
the ORES-stuff, that is about creating hints from measured features.
I'm
thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, Humans are overrated. I saw this answer on Facebook [1] and [2] compare the two and tell me why we accept the bias in our editors. Why are we satisfied with what we write about when there is more to inform about. Remember what we aim to achieve. It does not say text, it says share the sum of all knowledge. Thanks, GerardM
[1] https://upload.wikimedia.org/wikipedia/commons/0/07/Geotagged_articles_in_en... [2] https://upload.wikimedia.org/wikipedia/commons/2/2b/WorldmapGeonamesallCount...
On 16 April 2017 at 18:59, Ziko van Dijk zvandijk@gmail.com wrote:
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and what questions you want to be answered. A linguist will have a very different approach than a computer scientist, for example. If you ask me, only a human being can judge an article if it comes to content quality and textual quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, How can you check for consistency when you are not able to appreciate if certain facts (like date of death) exist and are the same? What can you
say
about sources when some Wikipedias insist on sources in their own
language
and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all
our
projects and we have not, to the best of my knowledge, assessed what the quality of Wikidata is on interwiki links.. Case in point, I fixed an
error
today about a person that was said to be dead because a Commons category was not correctly linked.
When you study the consistency of English Wikipedia only, you only add to the current bias in research.
When you want to know about the half life of an error, you can find in
the
history when for instance a date was mentioned for a first time and find the same date in another language. This is not trivial as the format of a language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an error.
I'm
pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest
to
you.
The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When
the
date of death is missing, there is either an issue at Wikidata (not
the
same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com
wrote:
Are anyone doing any work on automated quality assurance of
articles?
Not
the ORES-stuff, that is about creating hints from measured
features.
I'm
thinking about verifying existence and completeness of citations,
and
structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
I wrote a proposal a few years ago on how we could identfy some types of bias. The idea was to compare ranking of pageviews, and notify other projects about missing articles. I don't think anyone has done any followup om that
Den søn. 16. apr. 2017, 19.12 skrev Gerard Meijssen < gerard.meijssen@gmail.com>:
Hoi, Humans are overrated. I saw this answer on Facebook [1] and [2] compare the two and tell me why we accept the bias in our editors. Why are we satisfied with what we write about when there is more to inform about. Remember what we aim to achieve. It does not say text, it says share the sum of all knowledge. Thanks, GerardM
[1]
https://upload.wikimedia.org/wikipedia/commons/0/07/Geotagged_articles_in_en... [2]
https://upload.wikimedia.org/wikipedia/commons/2/2b/WorldmapGeonamesallCount...
On 16 April 2017 at 18:59, Ziko van Dijk zvandijk@gmail.com wrote:
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and what questions you want to be answered. A linguist will have a very different approach than a computer scientist, for example. If you ask me, only a human being can judge an article if it comes to content quality and
textual
quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, How can you check for consistency when you are not able to appreciate
if
certain facts (like date of death) exist and are the same? What can you
say
about sources when some Wikipedias insist on sources in their own
language
and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all
our
projects and we have not, to the best of my knowledge, assessed what
the
quality of Wikidata is on interwiki links.. Case in point, I fixed an
error
today about a person that was said to be dead because a Commons
category
was not correctly linked.
When you study the consistency of English Wikipedia only, you only add
to
the current bias in research.
When you want to know about the half life of an error, you can find in
the
history when for instance a date was mentioned for a first time and
find
the same date in another language. This is not trivial as the format
of a
language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an error.
I'm
pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest
to
you.
The idea is that Wikidata knows about dates of death and for
"living
people" the fact of a death should be the same in all projects.
When
the
date of death is missing, there is either an issue at Wikidata (not
the
same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com
wrote:
Are anyone doing any work on automated quality assurance of
articles?
Not
the ORES-stuff, that is about creating hints from measured
features.
I'm
thinking about verifying existence and completeness of citations,
and
structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Sorry for the sprellig, I write this on a mobile with Norwegian spellchecker.
Gerrards last question is about coverage, and bias, which is part of the overall quality for the project as such.
Den søn. 16. apr. 2017, 19.22 skrev John Erling Blad jeblad@gmail.com:
I wrote a proposal a few years ago on how we could identfy some types of bias. The idea was to compare ranking of pageviews, and notify other projects about missing articles. I don't think anyone has done any followup om that
Den søn. 16. apr. 2017, 19.12 skrev Gerard Meijssen < gerard.meijssen@gmail.com>:
Hoi, Humans are overrated. I saw this answer on Facebook [1] and [2] compare the two and tell me why we accept the bias in our editors. Why are we satisfied with what we write about when there is more to inform about. Remember what we aim to achieve. It does not say text, it says share the sum of all knowledge. Thanks, GerardM
[1]
https://upload.wikimedia.org/wikipedia/commons/0/07/Geotagged_articles_in_en... [2]
https://upload.wikimedia.org/wikipedia/commons/2/2b/WorldmapGeonamesallCount...
On 16 April 2017 at 18:59, Ziko van Dijk zvandijk@gmail.com wrote:
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and what questions you want to be answered. A linguist will have a very different approach than a computer scientist, for example. If you ask me, only a human being can judge an article if it comes to content quality and
textual
quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, How can you check for consistency when you are not able to appreciate
if
certain facts (like date of death) exist and are the same? What can
you
say
about sources when some Wikipedias insist on sources in their own
language
and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all
our
projects and we have not, to the best of my knowledge, assessed what
the
quality of Wikidata is on interwiki links.. Case in point, I fixed an
error
today about a person that was said to be dead because a Commons
category
was not correctly linked.
When you study the consistency of English Wikipedia only, you only
add to
the current bias in research.
When you want to know about the half life of an error, you can find in
the
history when for instance a date was mentioned for a first time and
find
the same date in another language. This is not trivial as the format
of a
language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an
error.
I'm
pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of
interest
to
you.
The idea is that Wikidata knows about dates of death and for
"living
people" the fact of a death should be the same in all projects.
When
the
date of death is missing, there is either an issue at Wikidata
(not
the
same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com
wrote:
> Are anyone doing any work on automated quality assurance of
articles?
Not
> the ORES-stuff, that is about creating hints from measured
features.
I'm
> thinking about verifying existence and completeness of
citations,
and
> structure of logical arguments. > > John > _______________________________________________ > Wikimedia-l mailing list, guidelines at:
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > wiki/Wikimedia-l > New messages to: Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l
,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
John, the AROWF project GSoC student implemented your proposal last year:
https://github.com/priyankamandikal/arowf/blob/master/backlog.py
She also used WikiWho to suggest review of out-of-date passages, and both categories and readability metrics to suggest review of unclear passages:
https://github.com/priyankamandikal/arowf/blob/master/recent_script.py
https://github.com/priyankamandikal/arowf/blob/master/copy_edit.py
This year she has agreed to co-mentor a voice-interactive tutorial system for instructing on the use of her project, with which we plan to simultaneously coach speech pronunciation.
On Sun, Apr 16, 2017 at 11:23 AM John Erling Blad jeblad@gmail.com wrote:
I wrote a proposal a few years ago on how we could identfy some types of bias. The idea was to compare ranking of pageviews, and notify other projects about missing articles. I don't think anyone has done any followup om that
Den søn. 16. apr. 2017, 19.12 skrev Gerard Meijssen < gerard.meijssen@gmail.com>:
Hoi, Humans are overrated. I saw this answer on Facebook [1] and [2] compare
the
two and tell me why we accept the bias in our editors. Why are we
satisfied
with what we write about when there is more to inform about. Remember
what
we aim to achieve. It does not say text, it says share the sum of all knowledge. Thanks, GerardM
[1]
https://upload.wikimedia.org/wikipedia/commons/0/07/Geotagged_articles_in_en...
[2]
https://upload.wikimedia.org/wikipedia/commons/2/2b/WorldmapGeonamesallCount...
On 16 April 2017 at 18:59, Ziko van Dijk zvandijk@gmail.com wrote:
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and what questions you want to be answered. A linguist will have a very
different
approach than a computer scientist, for example. If you ask me, only a human being can judge an article if it comes to content quality and
textual
quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, How can you check for consistency when you are not able to appreciate
if
certain facts (like date of death) exist and are the same? What can
you
say
about sources when some Wikipedias insist on sources in their own
language
and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about
all
our
projects and we have not, to the best of my knowledge, assessed what
the
quality of Wikidata is on interwiki links.. Case in point, I fixed an
error
today about a person that was said to be dead because a Commons
category
was not correctly linked.
When you study the consistency of English Wikipedia only, you only
add
to
the current bias in research.
When you want to know about the half life of an error, you can find
in
the
history when for instance a date was mentioned for a first time and
find
the same date in another language. This is not trivial as the format
of a
language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com
wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an
error.
I'm
pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of
interest
to
you.
The idea is that Wikidata knows about dates of death and for
"living
people" the fact of a death should be the same in all projects.
When
the
date of death is missing, there is either an issue at Wikidata
(not
the
same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com
wrote:
> Are anyone doing any work on automated quality assurance of
articles?
Not
> the ORES-stuff, that is about creating hints from measured
features.
I'm
> thinking about verifying existence and completeness of
citations,
and
> structure of logical arguments. > > John > _______________________________________________ > Wikimedia-l mailing list, guidelines at:
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > wiki/Wikimedia-l > New messages to: Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
It is a manual rating system, which can be used for quality improvements. Cost-less rating systems have a inherit problem with gaming. That can be counteracted with rating of the raters, often called meta rating. You use reputation of the raters by observing disagreement and then use that to calculate a trust when they rate articles or statements about the articles.
If there is a cost to do rating the quality of the rating will usually go up. If there is no cost with doing a rating the quality goes down, as it become easier to game the system.
Note that manual rating only works for subjective quality assessment.
On Sun, Apr 16, 2017 at 10:04 PM, James Salsman jsalsman@gmail.com wrote:
John, the AROWF project GSoC student implemented your proposal last year:
https://github.com/priyankamandikal/arowf/blob/master/backlog.py
She also used WikiWho to suggest review of out-of-date passages, and both categories and readability metrics to suggest review of unclear passages:
https://github.com/priyankamandikal/arowf/blob/master/recent_script.py
https://github.com/priyankamandikal/arowf/blob/master/copy_edit.py
This year she has agreed to co-mentor a voice-interactive tutorial system for instructing on the use of her project, with which we plan to simultaneously coach speech pronunciation.
On Sun, Apr 16, 2017 at 11:23 AM John Erling Blad jeblad@gmail.com wrote:
I wrote a proposal a few years ago on how we could identfy some types of bias. The idea was to compare ranking of pageviews, and notify other projects about missing articles. I don't think anyone has done any
followup
om that
Den søn. 16. apr. 2017, 19.12 skrev Gerard Meijssen < gerard.meijssen@gmail.com>:
Hoi, Humans are overrated. I saw this answer on Facebook [1] and [2] compare
the
two and tell me why we accept the bias in our editors. Why are we
satisfied
with what we write about when there is more to inform about. Remember
what
we aim to achieve. It does not say text, it says share the sum of all knowledge. Thanks, GerardM
[1]
Geotagged_articles_in_enWP_map_RENDER_small.png
[2]
WorldmapGeonamesallCountries.jpg
On 16 April 2017 at 18:59, Ziko van Dijk zvandijk@gmail.com wrote:
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and
what
questions you want to be answered. A linguist will have a very
different
approach than a computer scientist, for example. If you ask me, only
a
human being can judge an article if it comes to content quality and
textual
quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen <gerard.meijssen@gmail.com
:
Hoi, How can you check for consistency when you are not able to
appreciate
if
certain facts (like date of death) exist and are the same? What can
you
say
about sources when some Wikipedias insist on sources in their own
language
and sources in other languages you cannot read? How do you check
for
consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about
all
our
projects and we have not, to the best of my knowledge, assessed
what
the
quality of Wikidata is on interwiki links.. Case in point, I fixed
an
error
today about a person that was said to be dead because a Commons
category
was not correctly linked.
When you study the consistency of English Wikipedia only, you only
add
to
the current bias in research.
When you want to know about the half life of an error, you can find
in
the
history when for instance a date was mentioned for a first time and
find
the same date in another language. This is not trivial as the
format
of a
language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com
wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an
error.
I'm
pretty sure this follows number of page views if ordinary
logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com > wrote:
> Hoi, > Would checking if a date of death exists in articles be of
interest
to
you. > The idea is that Wikidata knows about dates of death and for
"living
> people" the fact of a death should be the same in all projects.
When
the
> date of death is missing, there is either an issue at Wikidata
(not
the
> same precision is one) or at a project. > > When a difference is found, the idea is that it is each
projects
> responsibility to do what is needed. No further automation. > Thanks, > GerardM > > On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com
wrote:
> > > Are anyone doing any work on automated quality assurance of
articles?
Not > > the ORES-stuff, that is about creating hints from measured
features.
I'm > > thinking about verifying existence and completeness of
citations,
and
> > structure of logical arguments. > > > > John > > _______________________________________________ > > Wikimedia-l mailing list, guidelines at:
> > wiki/Mailing_lists/Guidelines and
> > wiki/Wikimedia-l > > New messages to: Wikimedia-l@lists.wikimedia.org > > Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
> _______________________________________________ > Wikimedia-l mailing list, guidelines at:
> wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ > wiki/Wikimedia-l > New messages to: Wikimedia-l@lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Gerard, I looked at the two images, but have no idea of what point you are trying to make about them. Could you be a bit more descriptive? Cheers, Peter
-----Original Message----- From: Wikimedia-l [mailto:wikimedia-l-bounces@lists.wikimedia.org] On Behalf Of Gerard Meijssen Sent: Sunday, April 16, 2017 7:11 PM To: Wikimedia Mailing List Subject: Re: [Wikimedia-l] Quality assurance of articles
Hoi, Humans are overrated. I saw this answer on Facebook [1] and [2] compare the two and tell me why we accept the bias in our editors. Why are we satisfied with what we write about when there is more to inform about. Remember what we aim to achieve. It does not say text, it says share the sum of all knowledge. Thanks, GerardM
[1] https://upload.wikimedia.org/wikipedia/commons/0/07/Geotagged_articles_in_en... [2] https://upload.wikimedia.org/wikipedia/commons/2/2b/WorldmapGeonamesallCount...
On 16 April 2017 at 18:59, Ziko van Dijk zvandijk@gmail.com wrote:
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and what questions you want to be answered. A linguist will have a very different approach than a computer scientist, for example. If you ask me, only a human being can judge an article if it comes to content quality and textual quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, How can you check for consistency when you are not able to appreciate if certain facts (like date of death) exist and are the same? What can you
say
about sources when some Wikipedias insist on sources in their own
language
and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all
our
projects and we have not, to the best of my knowledge, assessed what the quality of Wikidata is on interwiki links.. Case in point, I fixed an
error
today about a person that was said to be dead because a Commons category was not correctly linked.
When you study the consistency of English Wikipedia only, you only add to the current bias in research.
When you want to know about the half life of an error, you can find in
the
history when for instance a date was mentioned for a first time and find the same date in another language. This is not trivial as the format of a language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an error.
I'm
pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest
to
you.
The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When
the
date of death is missing, there is either an issue at Wikidata (not
the
same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com
wrote:
Are anyone doing any work on automated quality assurance of
articles?
Not
the ORES-stuff, that is about creating hints from measured
features.
I'm
thinking about verifying existence and completeness of citations,
and
structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscr ibe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscrib e
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
_______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Textual and factual quality are different. Often we spellcheck an article and claim it to be of good quality, but I believe that is the lesser problem although it is part of the overall quality.
Den søn. 16. apr. 2017, 18.59 skrev Ziko van Dijk zvandijk@gmail.com:
Hello John,
Article quality is an interesting subject. I guess that it depends extremely on what is the scientific discipline you come from, and what questions you want to be answered. A linguist will have a very different approach than a computer scientist, for example. If you ask me, only a human being can judge an article if it comes to content quality and textual quality, by the way. Maybe you want to elaborate on what are your questions?
Kind regards Ziko
2017-04-16 9:44 GMT+02:00 Gerard Meijssen gerard.meijssen@gmail.com:
Hoi, How can you check for consistency when you are not able to appreciate if certain facts (like date of death) exist and are the same? What can you
say
about sources when some Wikipedias insist on sources in their own
language
and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all
our
projects and we have not, to the best of my knowledge, assessed what the quality of Wikidata is on interwiki links.. Case in point, I fixed an
error
today about a person that was said to be dead because a Commons category was not correctly linked.
When you study the consistency of English Wikipedia only, you only add to the current bias in research.
When you want to know about the half life of an error, you can find in
the
history when for instance a date was mentioned for a first time and find the same date in another language. This is not trivial as the format of a language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an error.
I'm
pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest
to
you.
The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When
the
date of death is missing, there is either an issue at Wikidata (not
the
same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com
wrote:
Are anyone doing any work on automated quality assurance of
articles?
Not
the ORES-stuff, that is about creating hints from measured
features.
I'm
thinking about verifying existence and completeness of citations,
and
structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org
?subject=unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Yes I think using WD to look at stuff like dates of death between different languages would be interesting.
J
On Sun, Apr 16, 2017 at 1:44 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, How can you check for consistency when you are not able to appreciate if certain facts (like date of death) exist and are the same? What can you say about sources when some Wikipedias insist on sources in their own language and sources in other languages you cannot read? How do you check for consistency when we have over 280 Wikipedias with possible content?
Do know that only Wikidata approaches a state where it knows about all our projects and we have not, to the best of my knowledge, assessed what the quality of Wikidata is on interwiki links.. Case in point, I fixed an error today about a person that was said to be dead because a Commons category was not correctly linked.
When you study the consistency of English Wikipedia only, you only add to the current bias in research.
When you want to know about the half life of an error, you can find in the history when for instance a date was mentioned for a first time and find the same date in another language. This is not trivial as the format of a language is diverse think Thai for instance. Thanks, GerardM
On 16 April 2017 at 02:08, John Erling Blad jeblad@gmail.com wrote:
This is more about checking consistency between projects. It is interesting, but not quite what I was asking about. It is very
interesting
if it would be possible to say something about half-life of an error. I'm pretty sure this follows number of page views if ordinary logged-in
editing
is removed.
On Sun, Apr 16, 2017 at 12:08 AM, Gerard Meijssen < gerard.meijssen@gmail.com
wrote:
Hoi, Would checking if a date of death exists in articles be of interest to
you.
The idea is that Wikidata knows about dates of death and for "living people" the fact of a death should be the same in all projects. When
the
date of death is missing, there is either an issue at Wikidata (not the same precision is one) or at a project.
When a difference is found, the idea is that it is each projects responsibility to do what is needed. No further automation. Thanks, GerardM
On 15 April 2017 at 23:50, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles?
Not
the ORES-stuff, that is about creating hints from measured features.
I'm
thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hi John,
This may be of interest to you:
We are working on building recommendation systems than can help editors identify how to expand already existing articles in Wikipedia. This includes but is not limited to identifying what sections are missing from an article, what citations, what images, infobox information, etc. This is research in its early days, if you'd like to follow up with it please visit https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_stubs_across_la...
Best, Leila
Leila Zia Senior Research Scientist Wikimedia Foundation
On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles? Not the ORES-stuff, that is about creating hints from measured features. I'm thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Definitly part of the overall quality. I wonder, do you have any stats om how much positive change the previous attempts have triggered?
Den man. 17. apr. 2017, 02.04 skrev Leila Zia leila@wikimedia.org:
Hi John,
This may be of interest to you:
We are working on building recommendation systems than can help editors identify how to expand already existing articles in Wikipedia. This includes but is not limited to identifying what sections are missing from an article, what citations, what images, infobox information, etc. This is research in its early days, if you'd like to follow up with it please visit
https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_stubs_across_la...
Best, Leila
Leila Zia Senior Research Scientist Wikimedia Foundation
On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles? Not the ORES-stuff, that is about creating hints from measured features. I'm thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
On Sun, Apr 16, 2017 at 11:58 PM, John Erling Blad jeblad@gmail.com wrote:
Definitly part of the overall quality. I wonder, do you have any stats om how much positive change the previous attempts have triggered?
[John and I went off-list for me to understand which specific previous attempts he had in mind when asking the above. I have a better sense now and I'm responding to that.]
I'm providing some pointers to indications or controlled experiment results that show how well recommendations in the space of article creation work (Note that I don't have results for the article expansion work to share for now.):
We built an end-to-end system that identifies missing articles in a given language, ranks them according to their importance in that given language, and recommends them to editors who are interested to create them (interest is inferred based on the topic of the articles the editor has edited in the recent past). We ran a controlled experiment and showed that you can increase article creation rate in Wikipedia by a factor of 3.2 if you do personalized recommendations in the setting of the experiment (which was editors receiving recommendations over email) while maintaining the same level of quality as organically created articles on Wikipedia. We also showed that personalized recommendations increase article creation rate by a factor of almost 2 when compared to non-personalized recommendations. If you are interested about the details of this study, you can read the paper that describes it fully at https://arxiv.org/abs/1604.03235. If you prefer a verbal presentation on this topic, I've recently presented this work, why it's important, and some of the work we've started in the article expansion research in CITRIS Exchange seminar series http://citris-uc.org/spring-2017-citris-research-exchange-seminar-series/ in University of California, Berkeley. You can check out the presentation slides https://commons.wikimedia.org/wiki/File:Growing_Wikipedia_Across_Languages_via_Recommendations_CITRIS_20170315.pdf and the video of the presentation https://www.youtube.com/watch?v=lHbnvRwFC_A&index=7&list=PLYTiwx6hV33vqwW7HWyYHMca4H0Ru6KQT .
Outside of the experimental setting, Content Translation is using the recommendation API https://meta.wikimedia.org/wiki/Recommendation_API behind the research I explained above as part of their "Suggestions" feature. Every week, around 180 articles get published on Wikipedia using Suggestions feature alone. This is around 6-9% of all articles created via Content Translation every week. So, we have some evidence that in practice, these recommendations work, too.
I hope this helps.
Best, Leila
Den man. 17. apr. 2017, 02.04 skrev Leila Zia leila@wikimedia.org:
Hi John,
This may be of interest to you:
We are working on building recommendation systems than can help editors identify how to expand already existing articles in Wikipedia. This includes but is not limited to identifying what sections are missing from an article, what citations, what images, infobox information, etc. This
is
research in its early days, if you'd like to follow up with it please
visit
https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia
_stubs_across_languages
Best, Leila
Leila Zia Senior Research Scientist Wikimedia Foundation
On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles?
Not
the ORES-stuff, that is about creating hints from measured features.
I'm
thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wik i/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, This is an interesting avenue. My I suggest one practical side of this?
When you analyse articles and find that some things are missing, it will help a lot when you can target these articles to the people who are likely interested. When people interested in soccer learn that a soccer player died, they are more likely to edit even write an article.
The approach for finding a subject that could do with more attention is one I applaud. When you want to do this across languages think Wikidata to define the area of interest for users. It will always include all the articles in all the languages. As you have seen with the Listeria lists, showing red links and Wikidata items is trivial. Thanks, Gerard
On 17 April 2017 at 02:04, Leila Zia leila@wikimedia.org wrote:
Hi John,
This may be of interest to you:
We are working on building recommendation systems than can help editors identify how to expand already existing articles in Wikipedia. This includes but is not limited to identifying what sections are missing from an article, what citations, what images, infobox information, etc. This is research in its early days, if you'd like to follow up with it please visit https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_stubs_across_ languages
Best, Leila
Leila Zia Senior Research Scientist Wikimedia Foundation
On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles? Not the ORES-stuff, that is about creating hints from measured features. I'm thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi Gerard,
On Mon, Apr 17, 2017 at 7:54 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
When you analyse articles and find that some things are missing, it will help a lot when you can target these articles to the people who are likely interested. When people interested in soccer learn that a soccer player died, they are more likely to edit even write an article.
You are absolutely right. This is what we even tested in the article creation recommendation experiment and you could see that providing personalized recommendations (where personalization was on the basis of matching editors interests based on their history of contributions) does better than random important recommendations. A few pointers for you:
* Check out section 2.3 of the paper at https://arxiv.org/pdf/1604.03235.pdf to see how this was done. * I talk briefly about how we do the editor interest modeling at https://youtu.be/lHbnvRwFC_A?t=20m44s
In general, we have at least two ways for recommending to people what they like to edit: one would be using the information in their past edit history and building topical models that can help us learn what topics an editor is interested in. The other is by asking the editor to provide some seeds of interest to us. For example, we ask you to tell us what kind of article you would edit, and we give you recommendations similar to the seed you provide. Each have its own advantages and you sometimes have to mix the two approaches (and more) to give the editor enough breadth and depth of topics to choose from.
The approach for finding a subject that could do with more attention is one I applaud. When you want to do this across languages think Wikidata to define the area of interest for users. It will always include all the articles in all the languages. As you have seen with the Listeria lists, showing red links and Wikidata items is trivial.
Yes, finding what is missing in a Wikipedia language by comparing language editions is relatively easy, thanks to Wikidata. :) What is hard is ranking these millions of missing articles in any language based on some notion of importance. We developed a ranking system for the research I mentioned above. You can read about it in Section 2.2 of the paper at https://arxiv.org/pdf/1604.03235.pdf%E2%80%8B. I talk about in less details at https://youtu.be/lHbnvRwFC_A?t=16m58s. In a nutshell: we built a prediction model that aims to predict the number of pageviews the article would receive had it existed in the destination language where it's missing today. The higher this predicted number for a missing article in a language, the more important it is to create it.
Best, Leila
Thanks, Gerard
On 17 April 2017 at 02:04, Leila Zia leila@wikimedia.org wrote:
Hi John,
This may be of interest to you:
We are working on building recommendation systems than can help editors identify how to expand already existing articles in Wikipedia. This includes but is not limited to identifying what sections are missing from an article, what citations, what images, infobox information, etc. This
is
research in its early days, if you'd like to follow up with it please
visit
Wikipedia_stubs_across_
languages
Best, Leila
Leila Zia Senior Research Scientist Wikimedia Foundation
On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles?
Not
the ORES-stuff, that is about creating hints from measured features.
I'm
thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hoi, When you consider Wikidata's data as a predictor of relevance and interest, the biggest problem is that Wikidata does not hold enough data at this time. The one approach I find missing in the approach you discuss in your presentation is local and timely information. Of relevance here are awards but also local events like elections. The problem is that for many countries we do not even know about such awards. They indicate what is of local relevance. There are many ways we can open up our community to these awards. I will come up with ideas in the future.
So yes, your approach is good but like the translation tool it relies on English content. It will be much better when we promote translation from French, Russian, German and Chinese as well.
Another aspect I am totally missing are bot generated articles. We can and should have stubs generated from data, cached and not saved as an article. Basically they as a stepping stone towards an article. They are there to inform in any language about what we do know.
I am missing it because the Wikimedia Foundation is not the "Wikipedia Foundation", our aim is to share in the sum of all available knowledge and that is what we could do when we use well presented cached data when we do not have an article. When people dismiss the Cebuano Wikipedia effort they are typically trolling but what I do resent most is that we do not even study the effect of bot generated articles and their value to readers.
Another approach is that we consider the use of our content to external parties. This is where Wikidata can benefit from Sources that care to share what they have. I have written about quality assurance but the bottom line is that most of the external sources may have flaws but are no worse that what we have. A perspective you may be able to confirm. Yet another reason to consider external parties is that sharing our data with them can be of benefit to our readers. When we are able to link into local library systems, we are able to do so in the Netherlands, it becomes valuable to our readers to include data on authors so that they can find them in their local library. The point is that once you are adding data one more statement is quickly added.
So yes I do like your presentation, I like it very much. It does not cover everything and that is imho a consequence of the ingrained Wikipedia and editor bias. We have largely forgotten that what we do is not about either but about sharing information. If there is something that I wish for 2030 it is that we care about providing and sharing information, providing and sharing the sum of all knowledge. Yes, well written text is to be preferred and we should indeed do everything to get as much of this as we can. Thanks, GerardM
On 17 April 2017 at 18:40, Leila Zia leila@wikimedia.org wrote:
Hoi Gerard,
On Mon, Apr 17, 2017 at 7:54 AM, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:
When you analyse articles and find that some things are missing, it will help a lot when you can target these articles to the people who are
likely
interested. When people interested in soccer learn that a soccer player died, they are more likely to edit even write an article.
You are absolutely right. This is what we even tested in the article creation recommendation experiment and you could see that providing personalized recommendations (where personalization was on the basis of matching editors interests based on their history of contributions) does better than random important recommendations. A few pointers for you:
- Check out section 2.3 of the paper at https://arxiv.org/pdf/1604.
03235.pdf to see how this was done.
- I talk briefly about how we do the editor interest modeling at
https://youtu.be/lHbnvRwFC_A?t=20m44s
In general, we have at least two ways for recommending to people what they like to edit: one would be using the information in their past edit history and building topical models that can help us learn what topics an editor is interested in. The other is by asking the editor to provide some seeds of interest to us. For example, we ask you to tell us what kind of article you would edit, and we give you recommendations similar to the seed you provide. Each have its own advantages and you sometimes have to mix the two approaches (and more) to give the editor enough breadth and depth of topics to choose from.
The approach for finding a subject that could do with more attention is
one
I applaud. When you want to do this across languages think Wikidata to define the area of interest for users. It will always include all the articles in all the languages. As you have seen with the Listeria lists, showing red links and Wikidata items is trivial.
Yes, finding what is missing in a Wikipedia language by comparing language editions is relatively easy, thanks to Wikidata. :) What is hard is ranking these millions of missing articles in any language based on some notion of importance. We developed a ranking system for the research I mentioned above. You can read about it in Section 2.2 of the paper at https://arxiv.org/pdf/1604.03235.pdf%E2%80%8B. I talk about in less details at https://youtu.be/lHbnvRwFC_A?t=16m58s. In a nutshell: we built a prediction model that aims to predict the number of pageviews the article would receive had it existed in the destination language where it's missing today. The higher this predicted number for a missing article in a language, the more important it is to create it.
Best, Leila
Thanks, Gerard
On 17 April 2017 at 02:04, Leila Zia leila@wikimedia.org wrote:
Hi John,
This may be of interest to you:
We are working on building recommendation systems than can help editors identify how to expand already existing articles in Wikipedia. This includes but is not limited to identifying what sections are missing
from
an article, what citations, what images, infobox information, etc. This
is
research in its early days, if you'd like to follow up with it please
visit
Wikipedia_stubs_across_
languages
Best, Leila
Leila Zia Senior Research Scientist Wikimedia Foundation
On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of articles?
Not
the ORES-stuff, that is about creating hints from measured features.
I'm
thinking about verifying existence and completeness of citations, and structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Hi Gerard,
We're diverging from the initial thread. I'll respond to one point, we should take the rest of the discussion somewhere else. :)
On Mon, Apr 17, 2017 at 11:36 AM, Gerard Meijssen <gerard.meijssen@gmail.com
wrote:
So yes, your approach is good but like the translation tool it relies on English content.
No. The approach and implementation is from any language to any language. You can play with a very simplified version of the recommendations at https://recommend.wmflabs.org/. You can choose any language as source or destination.
Best, Leila
Thanks, GerardM
On 17 April 2017 at 18:40, Leila Zia leila@wikimedia.org wrote:
Hoi Gerard,
On Mon, Apr 17, 2017 at 7:54 AM, Gerard Meijssen < gerard.meijssen@gmail.com> wrote:
When you analyse articles and find that some things are missing, it
will
help a lot when you can target these articles to the people who are
likely
interested. When people interested in soccer learn that a soccer player died, they are more likely to edit even write an article.
You are absolutely right. This is what we even tested in the article creation recommendation experiment and you could see that providing personalized recommendations (where personalization was on the basis of matching editors interests based on their history of contributions) does better than random important recommendations. A few pointers for you:
- Check out section 2.3 of the paper at https://arxiv.org/pdf/1604.
03235.pdf to see how this was done.
- I talk briefly about how we do the editor interest modeling at
https://youtu.be/lHbnvRwFC_A?t=20m44s
In general, we have at least two ways for recommending to people what
they
like to edit: one would be using the information in their past edit
history
and building topical models that can help us learn what topics an editor
is
interested in. The other is by asking the editor to provide some seeds of interest to us. For example, we ask you to tell us what kind of article
you
would edit, and we give you recommendations similar to the seed you provide. Each have its own advantages and you sometimes have to mix the
two
approaches (and more) to give the editor enough breadth and depth of
topics
to choose from.
The approach for finding a subject that could do with more attention is
one
I applaud. When you want to do this across languages think Wikidata to define the area of interest for users. It will always include all the articles in all the languages. As you have seen with the Listeria
lists,
showing red links and Wikidata items is trivial.
Yes, finding what is missing in a Wikipedia language by comparing
language
editions is relatively easy, thanks to Wikidata. :) What is hard is
ranking
these millions of missing articles in any language based on some notion
of
importance. We developed a ranking system for the research I mentioned above. You can read about it in Section 2.2 of the paper at https://arxiv.org/pdf/1604.03235.pdf%E2%80%8B. I talk about in less details at https://youtu.be/lHbnvRwFC_A?t=16m58s. In a nutshell: we built a prediction model that aims to predict the number of pageviews the article would receive had it existed in the destination language where it's missing today. The higher this predicted number for a missing article in a language, the more important it is to create it.
Best, Leila
Thanks, Gerard
On 17 April 2017 at 02:04, Leila Zia leila@wikimedia.org wrote:
Hi John,
This may be of interest to you:
We are working on building recommendation systems than can help
editors
identify how to expand already existing articles in Wikipedia. This includes but is not limited to identifying what sections are missing
from
an article, what citations, what images, infobox information, etc.
This
is
research in its early days, if you'd like to follow up with it please
visit
Wikipedia_stubs_across_
languages
Best, Leila
Leila Zia Senior Research Scientist Wikimedia Foundation
On Sat, Apr 15, 2017 at 2:50 PM, John Erling Blad jeblad@gmail.com wrote:
Are anyone doing any work on automated quality assurance of
articles?
Not
the ORES-stuff, that is about creating hints from measured
features.
I'm
thinking about verifying existence and completeness of citations,
and
structure of logical arguments.
John _______________________________________________ Wikimedia-l mailing list, guidelines at:
wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=
unsubscribe>
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/
mailman/listinfo/wikimedia-l,
mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/ wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/ wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikimedia-l@lists.wikimedia.org