Hi Leila,
I was hoping to try predict what categories of articles viewers would read:
•
But, I realized that Wikipedia categories doesn't have a well-defined structure. For
example, I think it's possible that articles could have a recursive chaining of
categories (a subcategory could have many parent categories, and may continue
indefinitely). So, it seems impossible to derive the idea of a "main category".
I was previously hoping that if it was possible to derive a "main category", I
could extend the findings, by relating it to current socio-political events. To meet my
course requirements, I may have to adjust our project idea. However, if you have possible
(maybe related insights / strategies), that would be very appreciated. Also, thank you
very much for taking the time to respond to me!
Thank you,
Jeff Levesque
-----Original Message-----
From: Leila Zia <leila(a)wikimedia.org>
Sent: Wednesday, May 23, 2018 7:34 PM
To: Jeffrey Levesque <jlevesqu(a)syr.edu>
Cc: Wikimedia Answers <answers(a)wikimedia.org>rg>; A mailing list for the Analytics Team
at WMF and everybody who has an interest in Wikipedia and analytics.
<analytics(a)lists.wikimedia.org>
Subject: Re: Jeff Levesque: List of Articles By Categories (College Project)
+ Analytics, our public analytics related mailing list [1]
Hi Jeff,
Let me give it a try:
* Re pageviews: a lot has changed since the Kaggle contest days you refer to. :) I highly
recommend you check out
where our hourly
pageviews per article live. In case you need it, abbreviations used in the file names are
documented. [2]
* Can you expand more what you are trying to do? The short answer for your category
related question is that you have to parse XML dumps, but we may have some good pointers
for you to save you from that. If you tell us more, we're more likely to be able to
help.
* And, if you decide to continue research on Wiki(m|p)edia data (which I hope you do:),
consider signing up in our public research list at
--
Leila Zia
Senior Research Scientist, Lead
Wikimedia Foundation
On Wed, May 23, 2018 at 3:22 PM, Wikimedia Answers <answers(a)wikimedia.org> wrote:
Forwarding for your evaluation :) Feel free to include
the wider
Research team.
best,
Joe
---------- Forwarded message ----------
From: Jeffrey Levesque <jlevesqu(a)syr.edu>
Date: Tue, May 22, 2018 at 7:48 AM
Subject: Re: Jeff Levesque: List of Articles By Categories (College
Project)
To: "info-en(a)wikimedia.org" <info-en(a)wikimedia.org>
Cc: "answers(a)wikimedia.org" <answers(a)wikimedia.org>
Hi,
Is there a known API, where I can supply the article name, and attain
the corresponding "category" the article belongs to? I'm thinking I
could write a python script and iterate the kaggle dataset, then send
some POST request to hopefully some existing API, to determine the articles
"category".
Thank you,
Jeff Levesque
https://github.com/jeff1evesque
On May 22, 2018, at 10:37 AM, Jeffrey Levesque <jlevesqu(a)syr.edu> wrote:
Hi,
Do you guys have a more recent time series of Wikipedia article
traffic. I'm noticing that the kaggle dataset does not have a lot of
articles that are on Wikipedia. Do you guys have a good idea of how I
can categorize the dataset I have?
Thank you,
Jeff Levesque
https://github.com/jeff1evesque
On May 22, 2018, at 8:40 AM, Jeffrey Levesque <jlevesqu(a)syr.edu> wrote:
Hi,
I am masters student at Syracuse University. For my data science
class, I am doing a project trying to analyze traffic patterns for
Wikipedia. I’ve attained the Kaggle dataset for 2015-2016 data:
https://www.kaggle.com/headsortails/wiki-traffic-forecast-exploration-
wtf-eda/data
However, the dataset only provides the frequency of visits to
particular pages on a given day. Could I request to attain a list of
articles grouped by “Categories”? I’ve tried to use the API (i.e.
https://en.wikipedia.org/wiki/Special:Export). But, that doesn’t seem
to generate a full output. Additionally, in the list it supplies subcategories.
So, I tried using the URL API (i.e.
https://en.wikipedia.org/w/api.php?action=query&list=categorymembers&am…)format=json).
But, that also seems to return an even shorter result set:
{"batchcomplete":"","continue":{"cmcontinue":"page|2d2941313f2b292d3d0
447454f31434f39293f011701dc16|55503653","continue":"-||"},"query":{"ca
tegorymembers":[{"pageid":22939,"ns":0,"title":"Physics"},{"pageid":24
489,"ns":0,"title":"Outline of
physics"},{"pageid":3445246,"ns":0,"title":"Glossary
of classical
physics"},{"pageid":1653925,"ns":100,"title":"Portal:Physics"},{"pagei
d":50926902,"ns":0,"title":"Action
angle
coordinates"},{"pageid":9079863,"ns":0,"title":"Aerometer"},{"pageid":
52657328,"ns":0,"title":"Bayesian model of computational
anatomy"},{"pageid":49342572,"ns":0,"title":"Group
actions in computational
anatomy"},{"pageid":50724262,"ns":0,"title":"Blasius\u2013Chaplygin
formula"},{"pageid":33327002,"ns":0,"title":"Cabbeling"}]}}
Thank you,
Jeff Levesque
(603) 969-5363