[Foundation-l] A proposal of partnership between Wikimedia Foundation and Internet Archive

Excirial wp.excirial at gmail.com
Tue Aug 24 20:13:38 UTC 2010


* It gives the impression that they are so ineffective at archiving recent
content as to be effectively irrelevant.*

Your not the only person asking that question, have a look at this FAQ
entry<http://www.archive.org/about/faqs.php#103>and This
forum post<http://www.archive.org/post/320741/large-site-with-no-entries-at-all-for-2008-2009-2010>.
To specifically quote the FAQ:
**

* It generally takes 6 months or more (up to 24 months) for pages to appear
in the Wayback Machine after they are collected, because of delays in
transferring material to long-term storage and indexing, or the requirements
of our collection partners. *

*In some cases, crawled content from certain projects can appear in a much
shorter timeframe — as little as a few weeks from when it was crawled. Older
material for the same pages and sites may still appear separately, months
later. *

*There is no access to files before they appear in the Wayback Machine.
*

* Even at their peak they rarely archived more than a few hundred pages per
major domain per year, which still amounts to a tiny fraction of the
internet*
Keep in mind that sub-pages are indexed separately. For example the
Administrators
noticeboard
<http://web.archive.org/web/*/http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard>and
blocking policy<http://web.archive.org/web/*/http://en.wikipedia.org/wiki/Wikipedia:Blocking_policy>are
indexed at least several times a year. Equally keep in mind that the
reliable sources we use rarely change content on a later date. A news
article published in a news paper is static, and most news article's posted
are equally static (With one or two updates before being moved from the main
page). As of such we don't need a high interval for updates - a single back
link is often more then sufficient for referencing purposes, since we aren't
keeping a revision history for sources.

~Excirial




On Tue, Aug 24, 2010 at 9:50 PM, Robert Rohde <rarohde at gmail.com> wrote:

> Does anyone know what the status of the Internet Archive is with
> respect to being a practical ongoing concern?
>
> In the last couple years IA has added relatively little web-based content.
>
> For example, their Wayback Machine currently offers:
>
> www.nytimes.com: 11 pages since 2006
> en.wikipedia.org: 5 pages since 2008
> www.nasa.gov: 12 pages since 2008
> scienceblogs.com: 0 pages since 2008
>
> It gives the impression that they are so ineffective at archiving
> recent content as to be effectively irrelevant.  They do have a
> warning that it can take 6 or more months for newly accessed content
> to be incorporated into their database, but at this point the delay
> has been significantly more than that.  Even at their peak they rarely
> archived more than a few hundred pages per major domain per year,
> which still amounts to a tiny fraction of the internet.
>
> The idea of seeking collaborations with people that archive web
> content is a good one, but I don't know that IA is really in a
> position to be all that useful.
>
> -Robert Rohde
>
> On Tue, Aug 24, 2010 at 6:57 AM, emijrp <emijrp at gmail.com> wrote:
> > Hi all;
> >
> > I want to make a proposal about external links preservation. Many times,
> > when you check an external link or a link reference, the website is dead
> or
> > offline. This websites are important, because they are the sources for
> the
> > facts showed in the articles. Internet Archive searches for interesting
> > websites to save in their hard disks, so, we can send them our external
> > links sql tables (all projects and languages of course). They improve
> their
> > database and we always have a copy of the sources text to check when
> needed.
> >
> > I think that this can be a cool partnership.
> >
> > Regards,
> > emijrp
> > _______________________________________________
> > foundation-l mailing list
> > foundation-l at lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
> >
>
> _______________________________________________
> foundation-l mailing list
> foundation-l at lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
>


More information about the foundation-l mailing list