References to newspaper articles behind paywalls like newspapers.com

List overview All Threads
Download

newer

older

[CfP] ESWC 2020 - **One week...

Tracking tasks for wikidata query...

PWN

11 Nov 2019 11 Nov '19

4:44 p.m.

Hi all, I’m constantly encountering newspaper articles that have disappeared from Google and are no longer viewable or even discoverable via Google. They are often the sole reference url for statements, yet are behind a paywall - notably newspapers.com. How is the community handling the paywalling of historical newspaper resources? Thank you.

Sent from my iPad

Show replies by date

Andy Mabbett

11 Nov 11 Nov

5:26 p.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

On Mon, 11 Nov 2019 at 16:44, PWN pariswritersnews@gmail.com wrote:

...

I’m constantly encountering newspaper articles that have disappeared from Google and are no longer viewable or even discoverable via Google. They are often the sole reference url for statements, yet are behind a paywall - notably newspapers.com. How is the community handling the paywalling of historical newspaper resources?

Whenever I cite something on Wikidata, or Wikipedia, I submit a copy to the Internet Archive's Wayback Machine using the add-on for Firefox:

https://github.com/jonathanmccann/archive-url-firefox-addon

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Wynand van der Walt

12 Nov 12 Nov

5:36 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

I like what Andy is doing, As a librarian, however, an alternative would be to only cite the newspaper without the URL as this analog citation would be valid, meaning just cite the newspaper article.

Regards,

Wynand van der Walt Head Librarian: Technical Services Rhodes University Library

On Mon, Nov 11, 2019 at 7:28 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:

...

On Mon, 11 Nov 2019 at 16:44, PWN pariswritersnews@gmail.com wrote:

...
I’m constantly encountering newspaper articles that have disappeared

from Google and

...
are no longer viewable or even discoverable via Google. They are often the sole reference url for statements, yet are behind a

paywall - notably

...
newspapers.com. How is the community handling the paywalling of historical newspaper

resources?

Whenever I cite something on Wikidata, or Wikipedia, I submit a copy to the Internet Archive's Wayback Machine using the add-on for Firefox:

https://github.com/jonathanmccann/archive-url-firefox-addon

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Andra Waagmeester

7:02 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

I don't if I agree with "just citing" the newspaper article. Why not push for resolvable citations if we have the technology? There is not much value in a citation if you can't access the source to verify, don't you think?

On Tue, Nov 12, 2019 at 6:38 AM Wynand van der Walt wynlib40@gmail.com wrote:

...

I like what Andy is doing, As a librarian, however, an alternative would be to only cite the newspaper without the URL as this analog citation would be valid, meaning just cite the newspaper article.

Regards,

Wynand van der Walt Head Librarian: Technical Services Rhodes University Library

On Mon, Nov 11, 2019 at 7:28 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:

...
On Mon, 11 Nov 2019 at 16:44, PWN pariswritersnews@gmail.com wrote:

...
I’m constantly encountering newspaper articles that have disappeared

from Google and

...
are no longer viewable or even discoverable via Google. They are often the sole reference url for statements, yet are behind a

paywall - notably

...
newspapers.com. How is the community handling the paywalling of historical newspaper

resources?

Whenever I cite something on Wikidata, or Wikipedia, I submit a copy to the Internet Archive's Wayback Machine using the add-on for Firefox:

https://github.com/jonathanmccann/archive-url-firefox-addon

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Gerard Meijssen

7:17 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

Hoi, It depends. When you approach it from a scientific point of view, you write your paper and are personally responsible for what you write. The bias of your paper may include the effort you take to verify sources. In Wikipedia it is NOT your paper and it is NOT only your responsibility. At the opposite end are the papers in the language you do not understand and consequently not accepted as a source, there are the papers, books that have not been

So a Wikipedia is biased because of the limiting of sources and as it is NOT a personal responsibility, there is also the notion if we should accept the bias that limiting brings. Thanks, GerardM

On Tue, 12 Nov 2019 at 08:03, Andra Waagmeester andra@micel.io wrote:

...

I don't if I agree with "just citing" the newspaper article. Why not push for resolvable citations if we have the technology? There is not much value in a citation if you can't access the source to verify, don't you think?

On Tue, Nov 12, 2019 at 6:38 AM Wynand van der Walt wynlib40@gmail.com wrote:

...
I like what Andy is doing, As a librarian, however, an alternative would be to only cite the newspaper without the URL as this analog citation would be valid, meaning just cite the newspaper article.

Regards,

Wynand van der Walt Head Librarian: Technical Services Rhodes University Library

On Mon, Nov 11, 2019 at 7:28 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:

...
On Mon, 11 Nov 2019 at 16:44, PWN pariswritersnews@gmail.com wrote:

...
I’m constantly encountering newspaper articles that have disappeared

from Google and

...
are no longer viewable or even discoverable via Google. They are often the sole reference url for statements, yet are behind a

paywall - notably

...
newspapers.com. How is the community handling the paywalling of historical newspaper

resources?

Whenever I cite something on Wikidata, or Wikipedia, I submit a copy to the Internet Archive's Wayback Machine using the add-on for Firefox:

https://github.com/jonathanmccann/archive-url-firefox-addon

-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

fn＠imm.dtu.dk

8:22 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

I see a gradual shift from open news articles to articles behind paywall, that I guess must be a natural adjustment of the income model as newspapers transition from paper to the Internet. Good article, suitable for sources in Wikipedia and Wikidata, will probably more and more likely be behind paywalls. We need to get use to that and it has also been the case for books - to some degree - for a long time.

The Internet Archive, tools like Andy Mabbett refers to and the 'archive-url' parameter for (the English) Wikipedia's cite news template as well as Wikidata's P1065 may be a good solution, but we may also keep in mind that it might not be sustainable copyright-wise in the long term, cf. discussions around controlled lending and Society of Authors' concerns https://publishingperspectives.com/2019/01/copyright-battle-internet-archive... (not currently behind paywall :)

/Finn

On 12/11/2019 08:17, Gerard Meijssen wrote:

...

So a Wikipedia is biased because of the limiting of sources and as it is NOT a personal responsibility, there is also the notion if we should accept the bias that limiting brings. Thanks, GerardM

On Tue, 12 Nov 2019 at 08:03, Andra Waagmeester <andra@micel.io mailto:andra@micel.io> wrote:

I don't if I agree with "just citing" the newspaper article. Why not
push for resolvable citations if we have the technology? There is
not much value in a citation if you can't access the source to
verify, don't you think?

On Tue, Nov 12, 2019 at 6:38 AM Wynand van der Walt
<wynlib40@gmail.com <mailto:wynlib40@gmail.com>> wrote:

    I like what Andy is doing, As a librarian, however, an
    alternative would be to only cite the newspaper without the URL
    as this analog citation would be valid, meaning just cite the
    newspaper article.

    Regards,

    Wynand van der Walt
    Head Librarian: Technical Services
    Rhodes University Library



    On Mon, Nov 11, 2019 at 7:28 PM Andy Mabbett
    <andy@pigsonthewing.org.uk <mailto:andy@pigsonthewing.org.uk>>
    wrote:

        On Mon, 11 Nov 2019 at 16:44, PWN
        <pariswritersnews@gmail.com
        <mailto:pariswritersnews@gmail.com>> wrote:

         > I’m constantly encountering newspaper articles that have
        disappeared from Google and
         > are no longer viewable or even discoverable via Google.
         > They are often the sole reference url for statements, yet
        are behind a paywall - notably
         > newspapers.com <http://newspapers.com>.
         > How is the community handling the paywalling of
        historical newspaper resources?

        Whenever I cite something on Wikidata, or Wikipedia, I
        submit a copy
        to the Internet Archive's Wayback Machine using the add-on for
        Firefox:

        https://github.com/jonathanmccann/archive-url-firefox-addon

        -- 
        Andy Mabbett
        @pigsonthewing
        http://pigsonthewing.org.uk

        _______________________________________________
        Wikidata mailing list
        Wikidata@lists.wikimedia.org
        <mailto:Wikidata@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/wikidata

    _______________________________________________
    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Gerard Meijssen

9:16 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

Hoi, As more and more conditions are piled up, bias is not reduced but induced. Thanks, GerardM

On Tue, 12 Nov 2019 at 09:55, fn@imm.dtu.dk wrote:

...

I see a gradual shift from open news articles to articles behind paywall, that I guess must be a natural adjustment of the income model as newspapers transition from paper to the Internet. Good article, suitable for sources in Wikipedia and Wikidata, will probably more and more likely be behind paywalls. We need to get use to that and it has also been the case for books - to some degree - for a long time.

The Internet Archive, tools like Andy Mabbett refers to and the 'archive-url' parameter for (the English) Wikipedia's cite news template as well as Wikidata's P1065 may be a good solution, but we may also keep in mind that it might not be sustainable copyright-wise in the long term, cf. discussions around controlled lending and Society of Authors' concerns

https://publishingperspectives.com/2019/01/copyright-battle-internet-archive... (not currently behind paywall :)

/Finn

On 12/11/2019 08:17, Gerard Meijssen wrote:

...
Hoi, It depends. When you approach it from a scientific point of view, you write your paper and are personally responsible for what you write. The bias of your paper may include the effort you take to verify sources. In Wikipedia it is NOT your paper and it is NOT only your responsibility. At the opposite end are the papers in the language you do not understand and consequently not accepted as a source, there are the papers, books that have not been

So a Wikipedia is biased because of the limiting of sources and as it is NOT a personal responsibility, there is also the notion if we should accept the bias that limiting brings. Thanks, GerardM

On Tue, 12 Nov 2019 at 08:03, Andra Waagmeester <andra@micel.io mailto:andra@micel.io> wrote:
I don't if I agree with "just citing" the newspaper article. Why not
push for resolvable citations if we have the technology? There is
not much value in a citation if you can't access the source to
verify, don't you think?

On Tue, Nov 12, 2019 at 6:38 AM Wynand van der Walt
<wynlib40@gmail.com <mailto:wynlib40@gmail.com>> wrote:

    I like what Andy is doing, As a librarian, however, an
    alternative would be to only cite the newspaper without the URL
    as this analog citation would be valid, meaning just cite the
    newspaper article.

    Regards,

    Wynand van der Walt
    Head Librarian: Technical Services
    Rhodes University Library



    On Mon, Nov 11, 2019 at 7:28 PM Andy Mabbett
    <andy@pigsonthewing.org.uk <mailto:andy@pigsonthewing.org.uk>>
    wrote:

        On Mon, 11 Nov 2019 at 16:44, PWN
        <pariswritersnews@gmail.com
        <mailto:pariswritersnews@gmail.com>> wrote:

         > I’m constantly encountering newspaper articles that have
        disappeared from Google and
         > are no longer viewable or even discoverable via Google.
         > They are often the sole reference url for statements, yet
        are behind a paywall - notably
         > newspapers.com <http://newspapers.com>.
         > How is the community handling the paywalling of
        historical newspaper resources?

        Whenever I cite something on Wikidata, or Wikipedia, I
        submit a copy
        to the Internet Archive's Wayback Machine using the add-on
for

...
        Firefox:

        https://github.com/jonathanmccann/archive-url-firefox-addon

        --
        Andy Mabbett
        @pigsonthewing
        http://pigsonthewing.org.uk

        _______________________________________________
        Wikidata mailing list
        Wikidata@lists.wikimedia.org
        <mailto:Wikidata@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/wikidata

    _______________________________________________
    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:
Wikidata@lists.wikimedia.org>

...
    https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Liam Wyatt

13 Nov 13 Nov

12:03 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

To guard against link-rot of news websites when they're used as Reference for statements in Wikidata, is there any bot that is systematically going around and collecting new URLs that are added with the Reference URL property (P854), adding them to Internet Archive, copying the result, and then adding that as Archive URL property (P1065) + Archive Date (P2960) back into the original statement?

If not, would that be feasible, and a good idea?

-Liam

Lucas Werkmeister

1:43 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

I believe that’s what InternetArchiveBot https://www.wikidata.org/wiki/User:InternetArchiveBot is for, but apparently at the moment it only updates references when they break; at least, that’s how Ymblanter summarized the request for permissions https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/InternetArchiveBot. Also, its contributions https://www.wikidata.org/wiki/Special:Contributions/InternetArchiveBot show no edits since October 29, I’m not sure if that’s a problem.

Cheers, Lucas

On 13.11.19 01:03, Liam Wyatt wrote:

...

To guard against link-rot of news websites when they're used as Reference for statements in Wikidata, is there any bot that is systematically going around and collecting new URLs that are added with the Reference URL property (P854), adding them to Internet Archive, copying the result, and then adding that as Archive URL property (P1065) + Archive Date (P2960) back into the original statement?

If not, would that be feasible, and a good idea?

-Liam

Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Federico Leva (Nemo)

8:49 a.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

Let's not spread publisher FUD on Wikimedia lists, please.

Liam Wyatt, 13/11/19 02:03:

...

is there any bot that is systematically going around and collecting new URLs that are added with the Reference URL property (P854), adding them to Internet Archive,

As long as MediaWiki is not broken, and the URLs are announced on the expected venues (I believe https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#API), yes, the Internet Archive immediately crawls them. This is documented at https://www.mediawiki.org/wiki/Archived_Pages (sort of).

Federico

Samuel Klein

1:21 p.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

On Wed, Nov 13, 2019 at 4:07 AM Federico Leva (Nemo) nemowiki@gmail.com wrote:

...

Let's not spread publisher FUD on Wikimedia lists, please.

I was about to say the same thing... else the lists will be takenn over by FUD/BCH wars "we may also keep in mind that 'limiting the availability, sourcing, and reusability of a claim' may not be sustainable reliability-wise, as sources that impose those limits are increasingly being challenged as biased, unaccountable, and unreliable."

...

Liam Wyatt, 13/11/19 02:03:

...
is there any bot that is systematically going around and collecting new URLs that are added with the Reference URL property (P854), adding them to Internet Archive,

As long as MediaWiki is not broken, and the URLs are announced on the expected venues (I believe https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams#API), yes, the Internet Archive immediately crawls them. This is documented at https://www.mediawiki.org/wiki/Archived_Pages (sort of).

How complete is the coverage now? /S

Houcemeddine A. Turki

1:53 p.m.

New subject: References to newspaper articles behind paywalls like newspapers.com

Dear all, I thank you for your answer. I highly recommend you to see https://meta.wikimedia.org/wiki/Talk:Wikidata/Strategy/2019#Feedback_and_que... that deals in part with this important topic. The problem is that we are not sure when using a newspaper article if this article is neutral or not. This fact caused a lot of significant problems in Wikipedia and we do not want to reproduce the same thing in Wikidata. Yours Sincerely, Houcemeddine Turki ________________________________ De : Wikidata wikidata-bounces@lists.wikimedia.org de la part de Samuel Klein meta.sj@gmail.com Envoyé : mercredi 13 novembre 2019 14:21 À : Discussion list for the Wikidata project wikidata@lists.wikimedia.org Cc : Mark Graham mark@archive.org Objet : Re: [Wikidata] References to newspaper articles behind paywalls like newspapers.com

On Wed, Nov 13, 2019 at 4:07 AM Federico Leva (Nemo) <nemowiki@gmail.commailto:nemowiki@gmail.com> wrote: Let's not spread publisher FUD on Wikimedia lists, please.

Liam Wyatt, 13/11/19 02:03:

...

is there any bot that is systematically going around and collecting new URLs that are added with the Reference URL property (P854), adding them to Internet Archive,

How complete is the coverage now? /S

1718

Age (days ago)

1720

Last active (days ago)

wikidata@lists.wikimedia.org

11 comments

11 participants

tags (0)

participants (11)

Andra Waagmeester
Andy Mabbett
Federico Leva (Nemo)
fn＠imm.dtu.dk
Gerard Meijssen
Houcemeddine A. Turki
Liam Wyatt
Lucas Werkmeister
PWN
Samuel Klein
Wynand van der Walt