[Foundation-l] Google Webmaster Tools and Foundation projects
Brian
Brian.Mingus at colorado.edu
Fri Nov 2 22:39:05 UTC 2007
There is one portion of Google's Terms of Service which, if it were agreed
that releasing this data is not a violation of anyone's rights, might
require permission from Google:
5.5 Unless you have been specifically permitted to do so in a separate
> agreement with Google, you agree that you will not reproduce, duplicate,
> copy, sell, trade or resell the Services for any purpose.
On 11/2/07, Brian <Brian.Mingus at colorado.edu> wrote:
>
> Given that Google is the single largest contributor of traffic to all
> Wikimedia projects, it seems that the simple act of signing up to Google
> Webmaster Tools would provide a vast amount of data that Google is already
> collecting on the projects for free. This data is extremely interesting, and
> includes:
>
> - The exact phrases used in external links to your site
> - The top search queries used to access your site, in the following
> csv format:
>
> > Site Information,Location,Search Type,Top search queries,Top
> > search query clicks
> > http://en.wikipedia.org/wiki/Main_Page, (India) google.co.in,Web
> > Search,"[wikipedia:1][universal access to knowledge:6]["the world's best
> > encyclopedia":6]"
>
> - This includes the position of your site in the results for that
> query
> - The PageRank of all of your pages; the distribution of the
> PageRank of all of your pages; your page with the highest PageRank
> - The number of people who have subscribed to the rss feeds on your
> site using Google products that allow this.
> - Something akin to the inverse document frequency of the words in
> your site and the words used in external links to your site, as computed by
> GoogleBot
>
> How this would work:
>
> 1. A Wikimedia representative creates a Google account for the
> Foundation
> 2. Each language version of each project is added. This may be a one
> time labor intensive process, or it might be more straightforward. I have no
> way of testing this right now.
> 3. Click the "Download data for all sites" button.
> 4. Profit ;)
>
> It seems that releasing this data does not violate the Foundation privacy
> policy. The search query data is collected under an agreement between the
> user and Google, before they ever enter the domain of a Foundation project.
> That point aside, there are no unique identifiers that link one user to a
> set of queries, only a relationship between a set of queries and a country.
> There is no specific information on when a query was performed.
>
> The community will no doubt come up with interesting visualizations and
> applications of this data. Articles that are receiving a relatively high
> amount of traffic but are of a relatively low level of quality can be
> targeted for improvement, for example.
>
> I would volunteer to automate as much of this as possible, including
> downloading the data at certain intervals.
>
> Please discuss! :)
> /Brian
>
More information about the foundation-l
mailing list