Re: [Foundation-l] A proposal of partnership between Wikimedia Foundation and Internet Archive

28 Aug 2010

*What would it take to produce such a feed?**
*
A real-time feed may or may not be the best idea, for several reasons.
- One issue is that every edit would have to be examined not only for
external links, but for external links that were not present previously.
Doing this real-time may cause slowdowns or additional load for the servers
- keep in mind that we would have to scan external links on all edits for
all Wikipedia's; Counted together this would result in a very, very busy
feed towards IA.
- Sometimes added links are spam or otherwise not acceptable, which means
they may be removed soon after. In such a case man would prefer not having
them archived, since it would be a waste of time and work for IA.

An alternate solution could be forwarding a list of new links every day. The
Database
Layout<http://upload.wikimedia.org/wikipedia/commons/4/41/Mediawiki-data…
Wikimedia seems to sugest that all external links are stored in a
separate table in the database (And i presume this includes links in
reference tags). I wonder if it would be possible to dump this entire table
for IA, and afterwards send incremental change
packages<http://en.wikipedia.org/wiki/Changeset>to them (Once a day
perhaps?). That way they would always have a list of
external links used by Wikipedia, and it would decrease the problem with
performance hits, spam and links no longer used. If we only forwarded a feed
with NEW links IA might end up with a long list of links which are removed
over time. And above everything - the External Links table is simply a
database table, which should be incredibly easy to read and process for IA,
without custom coding required to read and store a feed.

But perhaps the people at the tech mailing list have another \ better idea
on how this should work :)

~Excirial

On Sat, Aug 28, 2010 at 9:48 AM, Samuel Klein &lt;meta.sj(a)gmail.com&gt; wrote:

...
  Gordon @ IA was most friendly and helpful.  archive-it
is a
 subscription service for focused collections of sites; he had a
 different idea better suited to our work.

 Gordon writes:
  Now, given the importance of Wikipedia and
editorial significant of  things
  it outlinks-to, perhaps we could set up something
specially focused on  its
  content (and the de facto stream of
newly-occurring outlinks), that would
 require no conscious effort by editors but greatly increase the odds that
 anything linked from Wikipedia would (a few months down the line) also be
 in our Archive. Is there (or could there be) a feed of all outlinks that  IA
  could crawl almost nonstop? 
 That sounds excellent to me, if possible (and I think close to what
 emijrp had in mind!)  What would it take to produce such a feed?

 SJ

 PS - An aside: IA's policies include taking down any links on request,
 so this would not be a foolproof archive, but a 99% one.

 On Tue, Aug 24, 2010 at 9:13 PM, Samuel Klein &lt;meta.sj(a)gmail.com&gt; wrote:
  I've asked Gordon Mohr @ IA about how to work
with archive-it.  I will
 cc: this thread on any response.

 SJ

 On Tue, Aug 24, 2010 at 8:56 PM, George Herbert
 &lt;george.herbert(a)gmail.com&gt; wrote:
> On Tue, Aug 24, 2010 at 5:48 PM, Samuel Klein &lt;meta.sj(a)gmail.com&gt; 
wrote:
 >> Here's the Archive's on-demand
service:
>>
>> http://archive-it.org
>>
>> That would be the most reliable way to set up the partnership emijrp
>> proposes.  And it's certainly a good idea.  Figuring out how to make
>> it work for almost all editors and make it spam-proof may be
>> interesting.
>>
>> SJ
>>
>>
>>
>> On Tue, Aug 24, 2010 at 8:45 PM, Ray Saintonge &lt;saintonge(a)telus.net&gt; 
wrote:
 >>> David Gerard wrote:
>>>> On 24 August 2010 14:57, emijrp &lt;emijrp(a)gmail.com&gt; wrote:
>>>>
>>>>> I want to make a proposal about external links preservation. Many
 times,
 >>>>> when you check an external
link or a link reference, the website is  dead or
 >>>>> offline. This websites are
important, because they are the sources  for the
 >>>>> facts showed in the articles.
Internet Archive searches for  interesting
 >>>>> websites to save in their
hard disks, so, we can send them our  external
 >>>>> links sql tables (all
projects and languages of course). They  improve their
 >>>>> database and we always have a
copy of the sources text to check when  needed.
 >>>>> I think that this can be a
cool partnership.
>>>>>
>>>> +1
>>>>
>>>>
>>> Are people who clean up dead links taking the time to check Internet
>>> Archive to se if the page in question is there?
>>>
>>>
>>> Ec
>>>
>>> _______________________________________________
>>> foundation-l mailing list
>>> foundation-l(a)lists.wikimedia.org
>>> Unsubscribe: 
https://lists.wikimedia.org/mailman/listinfo/foundation-l
   >

 --
 Samuel Klein          identi.ca:sj           w:user:sj

 _______________________________________________
 foundation-l mailing list
 foundation-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l 

 I actually proposed some form of Wikimedia / IArchive link
 collaboration some years ago to a friend who worked there at the time;
 however, they left shortly afterwards.

 I like SJ's particular idea.  Who has current contacts with Brewster
 Kahle or someone else over there?

 --
 -george william herbert
 george.herbert(a)gmail.com

 _______________________________________________
 foundation-l mailing list
 foundation-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

 --
 Samuel Klein          identi.ca:sj           w:user:sj

 --
 Samuel Klein          identi.ca:sj           w:user:sj

 _______________________________________________
 foundation-l mailing list
 foundation-l(a)lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [Foundation-l] A proposal of partnership between Wikimedia Foundation and Internet Archive