Backlinks TO Wikipedia

List overview All Threads
Download

newer

older

Objective Revision Evaluation...

Edits per month for Mathematics...

Edison Nica

29 Nov 2015 29 Nov '15

3:56 p.m.

Hi,

I am interested to know if wikipedia makes public how many backlinks each page gets.

I am working on a search for wikipedia, and I as you would expect, it sucks.

So I went and tested same searches directly on wikipedia, and no offence, they suck even more.

So I went on Google, and performed same searches, with the added site:wikipedia.org, and Google was a little bit better (although not much compared with my 1-day-development-seach-engine).

I want to make my wikipedia search better, and having a table that would tell me how many non-wikipedia pages point to a certain wikipedia page, might improve my algorithm.

Anyone knows if wikipedia publishes such data?

Thank you! Edison Nica Http://www.0pii.com Edisonn@0pii.com

Sent from my T-Mobile 4G LTE Device

Attachments:

attachment.htm (text/html — 1.1 KB)

Show replies by date

Oliver Keyes

1 Dec 1 Dec

6:34 p.m.

We don't, to my knowledge. For search-related ideas and queries I'd recommend checking out the Discovery team's mailing list; that's what they tend to work on.

FWIW, prefacing a statement with "no offence" normally does little beyond indicating that the speaker knew it would be offensive and wanted to disclaim it while still being rude :)

On 29 November 2015 at 10:56, Edison Nica edisonn@0pii.com wrote:

...

Hi,

I am interested to know if wikipedia makes public how many backlinks each page gets.

I am working on a search for wikipedia, and I as you would expect, it sucks.

So I went and tested same searches directly on wikipedia, and no offence, they suck even more.

So I went on Google, and performed same searches, with the added site:wikipedia.org, and Google was a little bit better (although not much compared with my 1-day-development-seach-engine).

I want to make my wikipedia search better, and having a table that would tell me how many non-wikipedia pages point to a certain wikipedia page, might improve my algorithm.

Anyone knows if wikipedia publishes such data?

Thank you! Edison Nica Http://www.0pii.com Edisonn@0pii.com

Sent from my T-Mobile 4G LTE Device

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

-- Oliver Keyes Count Logula Wikimedia Foundation

Federico Leva (Nemo)

6:50 p.m.

Edison Nica, 29/11/2015 16:56:

...

how many non-wikipedia pages point to a certain wikipedia page

I guess the only way we have to know this (other than grepping request logs for referrers, which would be quite a nightmare) is to access the Google Webmaster account for wikipedia.org (to which a couple employees had access, IIRC).

Nemo

Greg Lindahl

2 Dec 2 Dec

4:06 p.m.

On Tue, Dec 01, 2015 at 07:50:23PM +0100, Federico Leva (Nemo) wrote:

...

Edison Nica, 29/11/2015 16:56:

...
how many non-wikipedia pages point to a certain wikipedia page

I guess the only way we have to know this (other than grepping request logs for referrers, which would be quite a nightmare) is to access the Google Webmaster account for wikipedia.org (to which a couple employees had access, IIRC).

There are a couple of other ways to figure out inlinks:

* Common Crawl * Commercial SEO services like Moz or Ahrefs

In the medium term the Internet Archive is going to be generating this kind of link data as part of the Wayback Machine search engine effort.

And finally, Edison, counting the number of inlinks without considering their rank or popularity will probably leave you vulnerable to people orchestrating googlebombs. And you might want to also know the anchortext, that's extremely valuable for search indexing.

-- greg

Dario Taraborelli

3 Dec 3 Dec

12:21 a.m.

what Greg said, Common Crawl is an excellent data source to answer these questions, see:

http://blog.commoncrawl.org/2015/04/announcing-the-common-crawl-index/ http://blog.commoncrawl.org/2015/02/wikireverse-visualizing-reverse-links-wi...

for aggregate stats about referrals to individual articles by traffic and aggregated at domain level you mail also be interested in this dataset:

http://figshare.com/articles/Wikipedia_Clickstream/1305770

...

On Dec 2, 2015, at 8:06 AM, Greg Lindahl lindahl@pbm.com wrote:

On Tue, Dec 01, 2015 at 07:50:23PM +0100, Federico Leva (Nemo) wrote:

...
Edison Nica, 29/11/2015 16:56:

...
how many non-wikipedia pages point to a certain wikipedia page

I guess the only way we have to know this (other than grepping request logs for referrers, which would be quite a nightmare) is to access the Google Webmaster account for wikipedia.org (to which a couple employees had access, IIRC).

There are a couple of other ways to figure out inlinks:

Common Crawl

Commercial SEO services like Moz or Ahrefs

In the medium term the Internet Archive is going to be generating this kind of link data as part of the Wayback Machine search engine effort.

And finally, Edison, counting the number of inlinks without considering their rank or popularity will probably leave you vulnerable to people orchestrating googlebombs. And you might want to also know the anchortext, that's extremely valuable for search indexing.

-- greg

Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • @readermeter

Edison Nica

10 Jan 10 Jan

4:05 p.m.

Dario Taraborelli <dtaraborelli@...> writes:

...

what Greg said, Common Crawl is an excellent data source to answer

these questions, see:

...

http://blog.commoncrawl.org/2015/04/announcing-the-common-crawl-index/ http://blog.commoncrawl.org/2015/02/wikireverse-visualizing-reverse-

links-with-open-data/

...

for aggregate stats about referrals to individual articles by traffic

and aggregated at domain level you

...

mail also be interested in this dataset:

http://figshare.com/articles/Wikipedia_Clickstream/1305770

...
On Dec 2, 2015, at 8:06 AM, Greg Lindahl <lindahl <at> pbm.com>

wrote:

...

...
On Tue, Dec 01, 2015 at 07:50:23PM +0100, Federico Leva (Nemo)

wrote:

...

...
...
Edison Nica, 29/11/2015 16:56:

...
how many non-wikipedia pages point to a certain wikipedia page

I guess the only way we have to know this (other than grepping request logs for referrers, which would be quite a nightmare) is to access the Google Webmaster account for wikipedia.org (to which a couple employees had access, IIRC).

There are a couple of other ways to figure out inlinks:

Common Crawl

Commercial SEO services like Moz or Ahrefs

In the medium term the Internet Archive is going to be generating

this

...

...
kind of link data as part of the Wayback Machine search engine

effort.

...

...
And finally, Edison, counting the number of inlinks without considering their rank or popularity will probably leave you vulnerable to people orchestrating googlebombs. And you might want

...

...
also know the anchortext, that's extremely valuable for search indexing.

-- greg

Analytics mailing list Analytics <at> lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • <at> readermeter

Analytics mailing list Analytics <at> lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Thank you all for your replies, and I apologize for improper usage of English language (see 'no offence')

I built my first Wikipedia Search App a while ago, it is a test best for my Offline Search Engine, and it contains only Medical Related Information for now.

https://play.google.com/store/apps/details? id=com.zeropii.publish.txt.medical (BTW, this app has no Permissions, and not tracking of what the user is searching, and watch out, the APK is 78MB, if you plan installing)

I am now building the second version, which will extend to full Wikipedia.

If everything works right, I have another 3-6 months until I will need the Analytics to improve the search.

BTW, if this is public information, what Search Engine do you use? Do you use a custom one? DO you use the Analytics to refine search?

My goal is to understand if Analytics could substantially improve my (Wikipedia) search engine or not.

Thank you again for your answers and pointers!

Edison Nica www.0Pii.com Edisonn at 0pii dot com

Erik Bernhardson

11 Jan 11 Jan

5:10 p.m.

On Sun, Jan 10, 2016 at 8:05 AM, Edison Nica edisonn@0pii.com wrote:

...

Dario Taraborelli <dtaraborelli@...> writes:

...
what Greg said, Common Crawl is an excellent data source to answer

these questions, see:

...
http://blog.commoncrawl.org/2015/04/announcing-the-common-crawl-index/ http://blog.commoncrawl.org/2015/02/wikireverse-visualizing-reverse-

links-with-open-data/

...
for aggregate stats about referrals to individual articles by traffic

and aggregated at domain level you

...
mail also be interested in this dataset:

http://figshare.com/articles/Wikipedia_Clickstream/1305770

...
On Dec 2, 2015, at 8:06 AM, Greg Lindahl <lindahl <at> pbm.com>

wrote:

...
...
On Tue, Dec 01, 2015 at 07:50:23PM +0100, Federico Leva (Nemo)

wrote:

...
...
...
Edison Nica, 29/11/2015 16:56:

...
how many non-wikipedia pages point to a certain wikipedia page

I guess the only way we have to know this (other than grepping request logs for referrers, which would be quite a nightmare) is to access the Google Webmaster account for wikipedia.org (to which a couple employees had access, IIRC).

There are a couple of other ways to figure out inlinks:

Common Crawl

Commercial SEO services like Moz or Ahrefs

In the medium term the Internet Archive is going to be generating

this

...
...
kind of link data as part of the Wayback Machine search engine

effort.

...
...
And finally, Edison, counting the number of inlinks without considering their rank or popularity will probably leave you vulnerable to people orchestrating googlebombs. And you might want

to

...
...
also know the anchortext, that's extremely valuable for search indexing.

-- greg

Analytics mailing list Analytics <at> lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Dario Taraborelli Head of Research, Wikimedia Foundation wikimediafoundation.org • nitens.org • <at> readermeter

Analytics mailing list Analytics <at> lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

Thank you all for your replies, and I apologize for improper usage of English language (see 'no offence')

I built my first Wikipedia Search App a while ago, it is a test best for my Offline Search Engine, and it contains only Medical Related Information for now.

https://play.google.com/store/apps/details? id=com.zeropii.publish.txt.medical (BTW, this app has no Permissions, and not tracking of what the user is searching, and watch out, the APK is 78MB, if you plan installing)

I am now building the second version, which will extend to full Wikipedia.

If everything works right, I have another 3-6 months until I will need the Analytics to improve the search.

BTW, if this is public information, what Search Engine do you use?

We use an elasticsearch cluster to power search. The full index not including replicas is just under 3TB.

Do you use a custom one?

...

The search engine is not custom. We do use a custom mediawiki extension to turn user queries into elasticsearch queries. https://www.mediawiki.org/wiki/Extension:CirrusSearch

...

DO you use the Analytics to refine search?

Currently no. We are working up some things now to use analytics to

generate a popularity score based on page view data to improve search. We also have a stretch goal to calculate page rank within wikis to replace our current usage of incoming wikilink count as part of the scoring algo.

...

My goal is to understand if Analytics could substantially improve my (Wikipedia) search engine or not.

Thank you again for your answers and pointers!

Edison Nica www.0Pii.com Edisonn at 0pii dot com _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics

3268

Age (days ago)

3311

Last active (days ago)

analytics@lists.wikimedia.org

6 comments

6 participants

tags (0)

participants (6)

Dario Taraborelli
Edison Nica
Erik Bernhardson
Federico Leva (Nemo)
Greg Lindahl
Oliver Keyes