Hello all,
Apologies for cross-posting.
For those who have not noticed till now, Google is not indexing any Wikisource language editions for the last couple of years which practically means that any Wikisource contents in any languages, which are being created in these years, are not searchable on Google and hence largely remain invisible on the web.
This is an extremely demotivating and frustrating situation for the existing Wikisource volunteers to witness, draining away all of our past and current efforts to bring and retain viewers, readers, GLAM partners and any potential new editors. We already have a very low awareness and visibility about Wikisource among general internet users due to lack of organized support in these years but the invisibility on Google search engine could become the last nail in our coffin, unless it is fixed soon.
There is a phabricator ticket raised by Darwinius back in December 2022 - https://phabricator.wikimedia.org/T325607.
Can't this issue be put into priority by sys admins and WMF to work upon? Wikisource is still a sister project of Wikimedia and it needs some very basic care, after all.
Regards, Bodhisattwa (Bengali Wikisource volunteer)
Based on the comments on the task (and personally looking at the associated data), this is not a one click fix, and will require community involvement.
My current assumption is that the reason a lot of pages are not getting indexed is that we do not have enough wikilinks between articles across Wikisource (leading to orphans and walled gardens). Even those that have wikilinks are often from linked list-style articles (contents pages, categories and such) which are not being traversed by Google (I assume due to some internal criteria that maps the quality of a page based on the content).
One potential way to mitigate this (which I suggested in the task) would be to implement a mechanism similar to navboxes/See also sections on Wikipedia?
Another potential way that we could go about this could be to make Author: pages less list-like, having some kind of summary of each work being linked (this is based on looking at the content on en and bn wikisource) ?
Regards, Sohom Datta --- Open-source contributor @Wikimedia
On Tue, Aug 1, 2023 at 12:21 PM Bodhisattwa bodhisattwa.rgkmc@gmail.com wrote:
Hello all,
Apologies for cross-posting.
For those who have not noticed till now, Google is not indexing any Wikisource language editions for the last couple of years which practically means that any Wikisource contents in any languages, which are being created in these years, are not searchable on Google and hence largely remain invisible on the web.
This is an extremely demotivating and frustrating situation for the existing Wikisource volunteers to witness, draining away all of our past and current efforts to bring and retain viewers, readers, GLAM partners and any potential new editors. We already have a very low awareness and visibility about Wikisource among general internet users due to lack of organized support in these years but the invisibility on Google search engine could become the last nail in our coffin, unless it is fixed soon.
There is a phabricator ticket raised by Darwinius back in December 2022 - https://phabricator.wikimedia.org/T325607.
Can't this issue be put into priority by sys admins and WMF to work upon? Wikisource is still a sister project of Wikimedia and it needs some very basic care, after all.
Regards, Bodhisattwa (Bengali Wikisource volunteer)
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
To better index, you must follow as much as possible structured page.
More gadget you use, more unreadable is the pages to the machines.
Try to have a blind person reading the Wikisource pages with their own tools. Will it be difficult for them? Found the problem.
You want to create pages which are more user friendly? But you do it for people having eyes. Machines doesn't have eyes.
KInd regards
On 01/08/2023 08:47, Bodhisattwa wrote:
Hello all,
Apologies for cross-posting.
For those who have not noticed till now, Google is not indexing any Wikisource language editions for the last couple of years which practically means that any Wikisource contents in any languages, which are being created in these years, are not searchable on Google and hence largely remain invisible on the web.
This is an extremely demotivating and frustrating situation for the existing Wikisource volunteers to witness, draining away all of our past and current efforts to bring and retain viewers, readers, GLAM partners and any potential new editors. We already have a very low awareness and visibility about Wikisource among general internet users due to lack of organized support in these years but the invisibility on Google search engine could become the last nail in our coffin, unless it is fixed soon.
There is a phabricator ticket raised by Darwinius back in December 2022 - https://phabricator.wikimedia.org/T325607.
Can't this issue be put into priority by sys admins and WMF to work upon? Wikisource is still a sister project of Wikimedia and it needs some very basic care, after all.
Regards, Bodhisattwa (Bengali Wikisource volunteer)
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
Thanks Ilario, That's very good general advice but would you have more concrete and specific ones? For instance, what would you change on https://en.wikisource.org/wiki/Baltimore_American/Volume_192/Issue_34,925/Tr... or https://en.wikisource.org/wiki/The_Poetical_Works_of_William_Motherwell/The_... ?
For everyone : Plus, the problem is not *better* indexing (that would be great of course), the problem here is that a lot (if not most) Wikisource pages are *not* indexed! If you type "when men went up in balloons" or "The banners rustle in the breeze" in Google (taken from the previous examples), Wikisource doesn't appear at all (and for other examples, when it does, it's incomplete and with a long delay of - at least - several months, the first example was done last week, the second was created in February!). That's why some of us suspect something is wrong somewhere (maybe because of the proofread extension ? and that Google doesn't "see" our pages.
Cheers, Nicolas
Le mar. 1 août 2023 à 11:24, Ilario valdelli valdelli@gmail.com a écrit :
To better index, you must follow as much as possible structured page.
More gadget you use, more unreadable is the pages to the machines.
Try to have a blind person reading the Wikisource pages with their own tools. Will it be difficult for them? Found the problem.
You want to create pages which are more user friendly? But you do it for people having eyes. Machines doesn't have eyes.
KInd regards
On 01/08/2023 08:47, Bodhisattwa wrote:
Hello all,
Apologies for cross-posting.
For those who have not noticed till now, Google is not indexing any Wikisource language editions for the last couple of years which practically means that any Wikisource contents in any languages, which are being created in these years, are not searchable on Google and hence largely remain invisible on the web.
This is an extremely demotivating and frustrating situation for the existing Wikisource volunteers to witness, draining away all of our past and current efforts to bring and retain viewers, readers, GLAM partners and any potential new editors. We already have a very low awareness and visibility about Wikisource among general internet users due to lack of organized support in these years but the invisibility on Google search engine could become the last nail in our coffin, unless it is fixed soon.
There is a phabricator ticket raised by Darwinius back in December 2022 - https://phabricator.wikimedia.org/T325607.
Can't this issue be put into priority by sys admins and WMF to work upon? Wikisource is still a sister project of Wikimedia and it needs some very basic care, after all.
Regards, Bodhisattwa (Bengali Wikisource volunteer)
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
-- Ilario Valdelli Wikimedia CH Verein zur Förderung Freien Wissens Association pour l’avancement des connaissances libre Associazione per il sostegno alla conoscenza libera Switzerland - 8008 Zürich Wikipedia: Ilario Skype: valdelli Tel: +41764821371 http://www.wikimedia.ch _______________________________________________ Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
For everyone :
Plus, the problem is not *better* indexing (that would be great of course),
the problem here is that a lot (if not most) Wikisource pages are *not* indexed! If you type "when men went up in balloons" or "The banners rustle in the breeze" in Google (taken from the previous examples), Wikisource doesn't appear at all (and for other examples, when it does, it's incomplete and with a long delay of - at least - several months, the first example was done last week, the second was created in February!). That's why some of us suspect something is wrong somewhere (maybe because of the proofread extension ? and that Google doesn't "see" our pages.
I've tested a bunch of non-indexed pages (some of which are from 2011) across multiple Wikisources, it shows up as "URL is available to Google'' (indicating that Googlebot can see the page). I am pretty sure that there is no issue wrt to this specific thing on ProofreadPage/Wikisource extension's side (there could be one on Google's end?)
More gadget you use, more unreadable is the pages to the machines.
Try to have a blind person reading the Wikisource pages with their own
tools. Will it be difficult for them? Found the problem.
You want to create pages which are more user friendly? But you do it for
people having eyes. Machines doesn't have eyes.
I agree with this on principle, but Wikisource pages are by default readable without any javascript. And testing with Google bot (on the search console) shows that it is able to read the associated content.
I think this is a good opportunity to discuss with Google's Search Team
here in Singapore in 2 weeks time.
This would definitely be great :)
Regards, Sohom Datta --- Open-source contributor @Wikimedia
On Tue, Aug 1, 2023 at 3:18 PM Nicolas VIGNERON vigneron.nicolas@gmail.com wrote:
Thanks Ilario, That's very good general advice but would you have more concrete and specific ones? For instance, what would you change on https://en.wikisource.org/wiki/Baltimore_American/Volume_192/Issue_34,925/Tr... or https://en.wikisource.org/wiki/The_Poetical_Works_of_William_Motherwell/The_... ?
For everyone : Plus, the problem is not *better* indexing (that would be great of course), the problem here is that a lot (if not most) Wikisource pages are *not* indexed! If you type "when men went up in balloons" or "The banners rustle in the breeze" in Google (taken from the previous examples), Wikisource doesn't appear at all (and for other examples, when it does, it's incomplete and with a long delay of - at least - several months, the first example was done last week, the second was created in February!). That's why some of us suspect something is wrong somewhere (maybe because of the proofread extension ? and that Google doesn't "see" our pages.
Cheers, Nicolas
Le mar. 1 août 2023 à 11:24, Ilario valdelli valdelli@gmail.com a écrit :
To better index, you must follow as much as possible structured page.
More gadget you use, more unreadable is the pages to the machines.
Try to have a blind person reading the Wikisource pages with their own tools. Will it be difficult for them? Found the problem.
You want to create pages which are more user friendly? But you do it for people having eyes. Machines doesn't have eyes.
KInd regards
On 01/08/2023 08:47, Bodhisattwa wrote:
Hello all,
Apologies for cross-posting.
For those who have not noticed till now, Google is not indexing any Wikisource language editions for the last couple of years which practically means that any Wikisource contents in any languages, which are being created in these years, are not searchable on Google and hence largely remain invisible on the web.
This is an extremely demotivating and frustrating situation for the existing Wikisource volunteers to witness, draining away all of our past and current efforts to bring and retain viewers, readers, GLAM partners and any potential new editors. We already have a very low awareness and visibility about Wikisource among general internet users due to lack of organized support in these years but the invisibility on Google search engine could become the last nail in our coffin, unless it is fixed soon.
There is a phabricator ticket raised by Darwinius back in December 2022 - https://phabricator.wikimedia.org/T325607.
Can't this issue be put into priority by sys admins and WMF to work upon? Wikisource is still a sister project of Wikimedia and it needs some very basic care, after all.
Regards, Bodhisattwa (Bengali Wikisource volunteer)
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
-- Ilario Valdelli Wikimedia CH Verein zur Förderung Freien Wissens Association pour l’avancement des connaissances libre Associazione per il sostegno alla conoscenza libera Switzerland - 8008 Zürich Wikipedia: Ilario Skype: valdelli Tel: +41764821371 http://www.wikimedia.ch _______________________________________________ Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
Hi all, for what I've read, it suffices to generate a sitemap file with MediaWiki and how to submit it to Google. There is a script for that: generateSitemap.php. Once done, the sitemap has to be updated regularly in order to include the new pages.
If it is more complicated, I hope that in Singapore, directly speaking to people, can solve the matter. Cheers, A. *Ruthven* on Wikipedia
On Tue, 1 Aug 2023 at 12:03, Sohom Datta dattasohom1@gmail.com wrote:
For everyone :
Plus, the problem is not *better* indexing (that would be great of
course), the problem here is that a lot (if not most) Wikisource pages are *not* indexed! If you type "when men went up in balloons" or "The banners rustle in the breeze" in Google (taken from the previous examples), Wikisource doesn't appear at all (and for other examples, when it does, it's incomplete and with a long delay of - at least - several months, the first example was done last week, the second was created in February!). That's why some of us suspect something is wrong somewhere (maybe because of the proofread extension ? and that Google doesn't "see" our pages.
I've tested a bunch of non-indexed pages (some of which are from 2011) across multiple Wikisources, it shows up as "URL is available to Google'' (indicating that Googlebot can see the page). I am pretty sure that there is no issue wrt to this specific thing on ProofreadPage/Wikisource extension's side (there could be one on Google's end?)
More gadget you use, more unreadable is the pages to the machines.
Try to have a blind person reading the Wikisource pages with their own
tools. Will it be difficult for them? Found the problem.
You want to create pages which are more user friendly? But you do it for
people having eyes. Machines doesn't have eyes.
I agree with this on principle, but Wikisource pages are by default readable without any javascript. And testing with Google bot (on the search console) shows that it is able to read the associated content.
I think this is a good opportunity to discuss with Google's Search Team
here in Singapore in 2 weeks time.
This would definitely be great :)
Regards, Sohom Datta
Open-source contributor @Wikimedia
On Tue, Aug 1, 2023 at 3:18 PM Nicolas VIGNERON < vigneron.nicolas@gmail.com> wrote:
Thanks Ilario, That's very good general advice but would you have more concrete and specific ones? For instance, what would you change on https://en.wikisource.org/wiki/Baltimore_American/Volume_192/Issue_34,925/Tr... or https://en.wikisource.org/wiki/The_Poetical_Works_of_William_Motherwell/The_... ?
For everyone : Plus, the problem is not *better* indexing (that would be great of course), the problem here is that a lot (if not most) Wikisource pages are *not* indexed! If you type "when men went up in balloons" or "The banners rustle in the breeze" in Google (taken from the previous examples), Wikisource doesn't appear at all (and for other examples, when it does, it's incomplete and with a long delay of - at least - several months, the first example was done last week, the second was created in February!). That's why some of us suspect something is wrong somewhere (maybe because of the proofread extension ? and that Google doesn't "see" our pages.
Cheers, Nicolas
Le mar. 1 août 2023 à 11:24, Ilario valdelli valdelli@gmail.com a écrit :
To better index, you must follow as much as possible structured page.
More gadget you use, more unreadable is the pages to the machines.
Try to have a blind person reading the Wikisource pages with their own tools. Will it be difficult for them? Found the problem.
You want to create pages which are more user friendly? But you do it for people having eyes. Machines doesn't have eyes.
KInd regards
On 01/08/2023 08:47, Bodhisattwa wrote:
Hello all,
Apologies for cross-posting.
For those who have not noticed till now, Google is not indexing any Wikisource language editions for the last couple of years which practically means that any Wikisource contents in any languages, which are being created in these years, are not searchable on Google and hence largely remain invisible on the web.
This is an extremely demotivating and frustrating situation for the existing Wikisource volunteers to witness, draining away all of our past and current efforts to bring and retain viewers, readers, GLAM partners and any potential new editors. We already have a very low awareness and visibility about Wikisource among general internet users due to lack of organized support in these years but the invisibility on Google search engine could become the last nail in our coffin, unless it is fixed soon.
There is a phabricator ticket raised by Darwinius back in December 2022 - https://phabricator.wikimedia.org/T325607.
Can't this issue be put into priority by sys admins and WMF to work upon? Wikisource is still a sister project of Wikimedia and it needs some very basic care, after all.
Regards, Bodhisattwa (Bengali Wikisource volunteer)
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
-- Ilario Valdelli Wikimedia CH Verein zur Förderung Freien Wissens Association pour l’avancement des connaissances libre Associazione per il sostegno alla conoscenza libera Switzerland - 8008 Zürich Wikipedia: Ilario Skype: valdelli Tel: +41764821371 http://www.wikimedia.ch _______________________________________________ Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
Wikisource-l mailing list -- wikisource-l@lists.wikimedia.org To unsubscribe send an email to wikisource-l-leave@lists.wikimedia.org
Le mar. 1 août 2023 à 12:03, Sohom Datta dattasohom1@gmail.com a écrit :
For everyone :
Plus, the problem is not *better* indexing (that would be great of
course), the problem here is that a lot (if not most) Wikisource pages are *not* indexed! If you type "when men went up in balloons" or "The banners rustle in the breeze" in Google (taken from the previous examples), Wikisource doesn't appear at all (and for other examples, when it does, it's incomplete and with a long delay of - at least - several months, the first example was done last week, the second was created in February!). That's why some of us suspect something is wrong somewhere (maybe because of the proofread extension ? and that Google doesn't "see" our pages.
I've tested a bunch of non-indexed pages (some of which are from 2011) across multiple Wikisources, it shows up as "URL is available to Google'' (indicating that Googlebot can see the page). I am pretty sure that there is no issue wrt to this specific thing on ProofreadPage/Wikisource extension's side (there could be one on Google's end?)
Strange. What does Googlebot index if *nothing* appears on Google search engine?
I'll be at Wikimania in Singapore but not at the Google-Mind The Gap event.
Cheers,
Nicolas
On 2023-08-01 11:47, Nicolas VIGNERON wrote:
Plus, the problem is not *better* indexing (that would be great of course), the problem here is that a lot (if not most) Wikisource pages are *not* indexed!
My unscientific impression is that Google was always reluctant to index anything in Wikisource's Page: namespace. But in the most recent 5-6 years, after they stopped publishing any news about Google Books, they are also very rarely indexing any new books that I scan and make available in Project Runeberg. Perhaps they view us as competitors or as link farms who try to spam the web with duplicate material?
wikisource-l@lists.wikimedia.org