Re: [Analytics] Backlinks TO Wikipedia

11 Jan 2016

On Sun, Jan 10, 2016 at 8:05 AM, Edison Nica &lt;edisonn(a)0pii.com&gt; wrote:

...
  Dario Taraborelli <dtaraborelli@...> writes:

 what Greg said, Common Crawl is an excellent data source to answer  these
questions, see:

 http://blog.commoncrawl.org/2015/04/announcing-the-common-crawl-index/
 http://blog.commoncrawl.org/2015/02/wikireverse-visualizing-reverse- 
links-with-open-data/

 for aggregate stats about referrals to individual articles by traffic  and
aggregated at domain level you
  mail also be interested in this dataset:

 http://figshare.com/articles/Wikipedia_Clickstream/1305770

 > On Dec 2, 2015, at 8:06 AM, Greg Lindahl <lindahl <at> pbm.com> 
wrote:
  ...

 > On Tue, Dec 01, 2015 at 07:50:23PM +0100, Federico Leva (Nemo)  wrote:
  >> Edison Nica, 29/11/2015 16:56:
 >>> how many non-wikipedia pages point to a certain wikipedia page
 >...
   >> I guess the only way we have to know
this (other than grepping
 >> request logs for referrers, which would be quite a nightmare) is to
 >> access the Google Webmaster account for wikipedia.org (to which a
 >> couple employees had access, IIRC).
 ...
   > There are a couple of other ways to figure
out inlinks:
 ...
   > * Common Crawl
 > * Commercial SEO services like Moz or Ahrefs
 ...
   > In the medium term the Internet Archive is
going to be generating  this
  > kind of link data as part of the Wayback
Machine search engine  effort.
  ...

 > And finally, Edison, counting the number of inlinks without
 > considering their rank or popularity will probably leave you
 > vulnerable to people orchestrating googlebombs. And you might want  to
   also know
the anchortext, that's extremely valuable for search
 indexing.

 -- greg

 _______________________________________________
 Analytics mailing list
 Analytics <at> lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics 
 Dario Taraborelli  Head of Research, Wikimedia Foundation
 wikimediafoundation.org • nitens.org •  <at> readermeter

 _______________________________________________
 Analytics mailing list
 Analytics <at> lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/analytics

 Thank you all for your replies, and I apologize for improper usage of
 English language (see 'no offence')

 I built my first Wikipedia Search App a while ago, it is a test best for
 my Offline Search Engine, and it contains only Medical Related
 Information for now.

 https://play.google.com/store/apps/details?
 id=com.zeropii.publish.txt.medical (BTW, this app has no Permissions,
 and not tracking of what the user is searching, and watch out, the APK
 is 78MB, if you plan installing)

 I am now building the second version, which will extend to full
 Wikipedia.

 If everything works right, I have another 3-6 months until I will need
 the Analytics to improve the search.

 BTW, if this is public information, what Search Engine do you use?

We use an elasticsearch cluster to power search. The full index not
including replicas is just under 3TB.

Do you use a custom one?
...

The search engine is not custom. We do use a custom mediawiki extension to
turn user queries into elasticsearch queries.
https://www.mediawiki.org/wiki/Extension:CirrusSearch

> DO you use the Analytics to refine search?
...
  > Currently no. We are working up some things
now to use analytics to
generate a popularity score based on page view data to improve search. We
also have a stretch goal to calculate page rank within wikis to replace our
current usage of incoming wikilink count as part of the scoring algo.

> My goal is to understand if Analytics could substantially improve my
> (Wikipedia) search engine or not.
...
  > Thank you again for your answers and
pointers!
...
  > Edison Nica
> www.0Pii.com
> Edisonn at 0pii dot com
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Analytics] Backlinks TO Wikipedia