Google can certainly index our beloved, well-behaved, text- and context-rich, low-bandwidth sites.*
The fact that this happens differentially for Google and not other indexes implies it's within their control.
If you're getting boilerplate responses about SEO, you may not be talking to the people who care or can resolve this.

I wonder if we can make this easier for indexers to understand and address by 
a) maintaining an index of essential free knowledge
  -- a star catalog of sites in the constellation: including our core sites, MDwiki, &c,
  -- pointers for each to a sitemap or equivalent, and a change-feed or equivalent
b) maintaining visualizations of index speed and coverage, via spot checks

SJ

* Jorge wrote: "we don’t have any influence or can decide what Google indexes..." -- we seem to have a good deal of soft influence.
"...or where Wikimedia content ranks in their search" -- as I understand it, this isn't about search rank at all.  It's about being able to find newly added knowledge, that doesn't exist anywhere else online, in a range of languages.  (asking about search rank may rightly trigger a boilerplate immune response)
** Scholar and Patents have their own feeds they prioritize; this could be a similar carve-out of attention. The sitemaps don't need to be accessible to "any spider on the web" (if this is why we turned them off). Something that only shows pages created or changed in the last window would also suffice.



On Fri, Jan 19, 2024 at 10:46 AM Michael Snow <wikipedia@frontier.com> wrote:
I realize SEO has its own jargon, but to those not immersed in the field
it is completely tautological to say a page is not indexed because "the
indexing process determines that the page is unlikely to be requested in
search." In an open-ended search, you aren't necessarily requesting a
specific page, you're only asking the search engine to point you to
pages that will hopefully be relevant to your query. It would be more
honest and straightforward for Google to say that "based on our
knowledge of what people search for, your page would appear so rarely
among the highest-ranked results that we're not going to bother
including it in our index."

--Michael Snow

On 1/19/2024 4:55 AM, nperry@wikimedia.org wrote:
> Hi everyone,
>
> I am Nicholas Perry, Senior Manager of Strategic Partnerships at WMF. Following up on Jorge's previous email to add a summary of Google's recent response to this issue, which was originally shared by Suman on this Phabricator ticket: https://phabricator.wikimedia.org/T325607.
>
> ----
> The web is really large and the search index can simply not include every single page. A page that otherwise has no problems may not be indexed for a myriad of complex reasons, for instance if the indexing process determines that the page is unlikely to be requested in search. This is in line with the Search Central documentation that states: "Google doesn't guarantee that it will crawl, index, or serve your page, even if your page follows the Google Search Essentials."
> ----
>
> Google also shared a document containing resource links, which can be found in the Phabricator ticket. They also encouraged people to submit any questions and attend their SEO Office Hours (https://developers.google.com/search/help/office-hours), with the caveat that Google might not be able to answer all questions in a given instance.
>
> Best,
>
> Nicholas
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/N2DTL2NU377YCEGAQAVRF7EPCGB76OAB/
To unsubscribe send an email to wikimedia-l-leave@lists.wikimedia.org


--
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266