Hello,
I am doing a research on Wikipedia Visitor patterns and I could really use
your team's help. Is it possible to get a list of the 2,000,000 most
visited pages in Wikipedia since January 2015? I just need:
Article Title (as used on the Wikipedia website); number of visitors
if it is not easy to get the number since January 2015, I am open to a date
range if you have a date range that makes the query easy for you.
I would really appreciate your team's help. I'll gladly donate something to
The Foundation in return for your help. If you have an alternative
way/service/suggestion on how I can get this please do let me know.
Mark Haans
Hello,
At CDC/NIOSH we are interested in Wikipedia page view data for the 22 European Union countries. Do you have information on page views (including possibly unique views) for those countries?
Thanks,
James Hare
Hey folks,
I know there was some work in the past on systems to support keeping
database reports up to date. I'm looking into this type of work with Jeph
Paul now and I realized I don't have any good pointers to this past work.
Right now, we're looking at running database reports based on cron jobs and
checking the recentchanges table to make sure that replication isn't too
lagged. Is there a better way?
FWIW, I expect these queries to run daily and have a runtime of up to an
hour.
-Aaron
I was just wondering if you were referring to bots such as webcrawlers that are merely hitting the site or are you Referring to bots like Yobot and cluebot that are doing actual edits.
Also. Do you count edits done using apps like AWB and twinkle as bots?
Sent from my T-Mobile 4G LTE device
------ Original message------From: Oliver KeyesDate: Thu, Oct 1, 2015 4:19 PMTo: A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics.;Subject:Re: [Analytics] Meeting notes - Identifying Bots
Does Analytics have any interest in involvement in a rate-limitingproject as a partial solution to this?On 1 October 2015 at 11:21, Nuria Ruiz wrote:> We met briefly to weight the pros and cons of taking up a project to tag bot> traffic.>> TL;DR>> While we see several projects that will benefit from a more precise bot> identification we think that at this time there are workarrounds that we can> do to filter bot traffic in most areas and that we should not expend the> resources and computation effort that a through bot detection system will> require.>> We think is worth spending time in quantifying our TRUE bot traffic so when> management has a question like "How much of our traffic is crawling?" we can> give an estimate by, say, researching bot traffic monthly, weekly and daily> in one given month.>> At this time 15% of our pageview traffic (not requests) are detected bots,> we estimate that the real bot traffic might be quite a big higher.>>> More detailed Notes here:> https://wikitech.wikimedia.org/wiki/Analytics/Bots>> Attendees please modify/correct notes as needed.>> _______________________________________________> Analytics mailing list> Analytics(a)lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics>-- Oliver KeyesCount LogulaWikimedia Foundation_______________________________________________Analytics mailing listAnalytics@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/analytics
We met briefly to weight the pros and cons of taking up a project to tag
bot traffic.
TL;DR
While we see several projects that will benefit from a more precise bot
identification we think that at this time there are workarrounds that we
can do to filter bot traffic in most areas and that we should not expend
the resources and computation effort that a through bot detection system
will require.
We think is worth spending time in quantifying our TRUE bot traffic so when
management has a question like "How much of our traffic is crawling?" we
can give an estimate by, say, researching bot traffic monthly, weekly and
daily in one given month.
At this time 15% of our pageview traffic (not requests) are detected bots,
we estimate that the real bot traffic might be quite a big higher.
More detailed Notes here:
https://wikitech.wikimedia.org/wiki/Analytics/Bots
Attendees please modify/correct notes as needed.