* It gives the impression that they are so ineffective at archiving recent
content as to be effectively irrelevant.*
Your not the only person asking that question, have a look at this FAQ
entry<http://www.archive.org/about/faqs.php#103>and This
forum
post<http://www.archive.org/post/320741/large-site-with-no-entries-at-al…10>.
To specifically quote the FAQ:
**
* It generally takes 6 months or more (up to 24 months) for pages to appear
in the Wayback Machine after they are collected, because of delays in
transferring material to long-term storage and indexing, or the requirements
of our collection partners. *
*In some cases, crawled content from certain projects can appear in a much
shorter timeframe — as little as a few weeks from when it was crawled. Older
material for the same pages and sites may still appear separately, months
later. *
*There is no access to files before they appear in the Wayback Machine.
*
* Even at their peak they rarely archived more than a few hundred pages per
major domain per year, which still amounts to a tiny fraction of the
internet*
Keep in mind that sub-pages are indexed separately. For example the
Administrators
noticeboard
<http://web.archive.org/web/*/http://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard>and
blocking
policy<http://web.archive.org/web/*/http://en.wikipedia.org/wiki/Wikiped…
indexed at least several times a year. Equally keep in mind that the
reliable sources we use rarely change content on a later date. A news
article published in a news paper is static, and most news article's posted
are equally static (With one or two updates before being moved from the main
page). As of such we don't need a high interval for updates - a single back
link is often more then sufficient for referencing purposes, since we aren't
keeping a revision history for sources.
~Excirial
On Tue, Aug 24, 2010 at 9:50 PM, Robert Rohde <rarohde(a)gmail.com> wrote:
Does anyone know what the status of the Internet
Archive is with
respect to being a practical ongoing concern?
In the last couple years IA has added relatively little web-based content.
For example, their Wayback Machine currently offers:
www.nytimes.com: 11 pages since 2006
en.wikipedia.org: 5 pages since 2008
www.nasa.gov: 12 pages since 2008
scienceblogs.com: 0 pages since 2008
It gives the impression that they are so ineffective at archiving
recent content as to be effectively irrelevant. They do have a
warning that it can take 6 or more months for newly accessed content
to be incorporated into their database, but at this point the delay
has been significantly more than that. Even at their peak they rarely
archived more than a few hundred pages per major domain per year,
which still amounts to a tiny fraction of the internet.
The idea of seeking collaborations with people that archive web
content is a good one, but I don't know that IA is really in a
position to be all that useful.
-Robert Rohde
On Tue, Aug 24, 2010 at 6:57 AM, emijrp <emijrp(a)gmail.com> wrote:
Hi all;
I want to make a proposal about external links preservation. Many times,
when you check an external link or a link reference, the website is dead
or
offline. This websites are important, because
they are the sources for
the
facts showed in the articles. Internet Archive
searches for interesting
websites to save in their hard disks, so, we can send them our external
links sql tables (all projects and languages of course). They improve
their
database and we always have a copy of the sources
text to check when
needed.
I think that this can be a cool partnership.
Regards,
emijrp
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/foundation-l
_______________________________________________
foundation-l mailing list
foundation-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/foundation-l