+discovery list.
On Tue, Dec 6, 2016 at 12:53 PM, Sumit Asthana asthana.sumit23@gmail.com wrote:
Hi,
I was extracting the Wikipedia cirrus dump of articles using ?action=cirrusDump for feature extraction from articles and noticed two keys "score" and "popularity_score". Can anyone tell what exactly do these keys denote and how're they calculated?
I'm curious to know the possible use cases of these scores in Machine Learning as I'm currently processing articles.
-- -Thanks, Sumit http://mediawiki.org/wiki/User:Sumit.iitp
AI mailing list AI@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ai
popularity score is calculated once a week for the previous weeks data. This score is basically (article page views) / (all article page views). See https://github.com/wikimedia/wikimedia-discovery-analytics/blob/master/hive/...
score is an old version of popularity score, we changed the name to make it more distinct but it lingers in some places because we update documents rather than completely replace them. Feel free to ignore.
On Tue, Dec 6, 2016 at 11:16 AM, Adam Baso abaso@wikimedia.org wrote:
+discovery list.
On Tue, Dec 6, 2016 at 12:53 PM, Sumit Asthana asthana.sumit23@gmail.com wrote:
Hi,
I was extracting the Wikipedia cirrus dump of articles using ?action=cirrusDump for feature extraction from articles and noticed two keys "score" and "popularity_score". Can anyone tell what exactly do these keys denote and how're they calculated?
I'm curious to know the possible use cases of these scores in Machine Learning as I'm currently processing articles.
-- -Thanks, Sumit http://mediawiki.org/wiki/User:Sumit.iitp
AI mailing list AI@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ai
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery
Doh wrong file, i meant https://github.com/wikimedia/wikimedia-discovery-analytics/blob/master/oozie...
On Tue, Dec 6, 2016 at 11:52 AM, Erik Bernhardson < ebernhardson@wikimedia.org> wrote:
popularity score is calculated once a week for the previous weeks data. This score is basically (article page views) / (all article page views). See https://github.com/wikimedia/wikimedia-discovery- analytics/blob/master/hive/popularity_score/create_ popularity_score_table.hql
score is an old version of popularity score, we changed the name to make it more distinct but it lingers in some places because we update documents rather than completely replace them. Feel free to ignore.
On Tue, Dec 6, 2016 at 11:16 AM, Adam Baso abaso@wikimedia.org wrote:
+discovery list.
On Tue, Dec 6, 2016 at 12:53 PM, Sumit Asthana <asthana.sumit23@gmail.com
wrote:
Hi,
I was extracting the Wikipedia cirrus dump of articles using ?action=cirrusDump for feature extraction from articles and noticed two keys "score" and "popularity_score". Can anyone tell what exactly do these keys denote and how're they calculated?
I'm curious to know the possible use cases of these scores in Machine Learning as I'm currently processing articles.
-- -Thanks, Sumit http://mediawiki.org/wiki/User:Sumit.iitp
AI mailing list AI@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/ai
discovery mailing list discovery@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/discovery