Search engine visibility, ProofreadPage link structure - Wikisource-l

29 Apr 2010

As I'm uploading and proofreading texts, I'm surprised how slow Google
is to pick up the new content. As far as I can see, there's nothing
that blocks search engines from indexing Page: or Index: pages, so
it should not be entirely necessary to transclude content into pages
in the main namespace, right?

For example, I'm googling the exact phrase "exportable effekter"
(antiquated Swedish) which since April 3 (25 days) has been here,
http://sv.wikisource.org/wiki/Sida:Post-_och_Inrikes_Tidningar_1835-12-31_3…

In my experience, Google is very quick to pick up new content in
Wikipedia. I assume it tracks the recent changes page.

Is it a problem that the URL ends in ".jpg"? Would search engines
avoid or delay to index it, assuming it was an image?

Part of my problem is that I'm proofreading entire newspapers, and
so far I have only transcluded a few articles. Maybe I should make a
main namespace page for each day's full issue. Would that help me?
When you proofread The New York Times (English) or Die Gartenlaube
(German Wikisource), do you proofread entire pages and issues, and
do you transclude everything that you proofread?

 From the Index: page, the <pagelist/> generates normal HTML links
to each page, which is fine. But from the individual pages, the
links to the previous (<), next (>) and index (^) pages are created
only by Javascript. Is this a problem for search engines? Is it
a problem for blind readers? Is there any good reason not to
generate standard HTML links for these navigation tabs?

-- 
  Lars Aronsson (lars(a)aronsson.se)
  Aronsson Datateknik - http://aronsson.se