* It gives the impression that they are so ineffective at archiving recent content as to be effectively irrelevant.*
Your not the only person asking that question, have a look at this FAQ entryhttp://www.archive.org/about/faqs.php#103and This forum posthttp://www.archive.org/post/320741/large-site-with-no-entries-at-all-for-2008-2009-2010. To specifically quote the FAQ: **
* It generally takes 6 months or more (up to 24 months) for pages to appear in the Wayback Machine after they are collected, because of delays in transferring material to long-term storage and indexing, or the requirements of our collection partners. *
*In some cases, crawled content from certain projects can appear in a much shorter timeframe — as little as a few weeks from when it was crawled. Older material for the same pages and sites may still appear separately, months later. *
*There is no access to files before they appear in the Wayback Machine. *
* Even at their peak they rarely archived more than a few hundred pages per major domain per year, which still amounts to a tiny fraction of the internet* Keep in mind that sub-pages are indexed separately. For example the Administrators noticeboard http://web.archive.org/web/*/http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboardand blocking policyhttp://web.archive.org/web/*/http://en.wikipedia.org/wiki/Wikipedia:Blocking_policyare indexed at least several times a year. Equally keep in mind that the reliable sources we use rarely change content on a later date. A news article published in a news paper is static, and most news article's posted are equally static (With one or two updates before being moved from the main page). As of such we don't need a high interval for updates - a single back link is often more then sufficient for referencing purposes, since we aren't keeping a revision history for sources.
~Excirial
On Tue, Aug 24, 2010 at 9:50 PM, Robert Rohde rarohde@gmail.com wrote:
Does anyone know what the status of the Internet Archive is with respect to being a practical ongoing concern?
In the last couple years IA has added relatively little web-based content.
For example, their Wayback Machine currently offers:
www.nytimes.com: 11 pages since 2006 en.wikipedia.org: 5 pages since 2008 www.nasa.gov: 12 pages since 2008 scienceblogs.com: 0 pages since 2008
It gives the impression that they are so ineffective at archiving recent content as to be effectively irrelevant. They do have a warning that it can take 6 or more months for newly accessed content to be incorporated into their database, but at this point the delay has been significantly more than that. Even at their peak they rarely archived more than a few hundred pages per major domain per year, which still amounts to a tiny fraction of the internet.
The idea of seeking collaborations with people that archive web content is a good one, but I don't know that IA is really in a position to be all that useful.
-Robert Rohde
On Tue, Aug 24, 2010 at 6:57 AM, emijrp emijrp@gmail.com wrote:
Hi all;
I want to make a proposal about external links preservation. Many times, when you check an external link or a link reference, the website is dead
or
offline. This websites are important, because they are the sources for
the
facts showed in the articles. Internet Archive searches for interesting websites to save in their hard disks, so, we can send them our external links sql tables (all projects and languages of course). They improve
their
database and we always have a copy of the sources text to check when
needed.
I think that this can be a cool partnership.
Regards, emijrp _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l