Hi all.
I just discovered this, it may be potentially interesting for the Wikipedia research community.
In short, now for any Wikipedia page, not only articles, e.g.
http://en.wikipedia.org/wiki/History_of_free_and_open_source_software
You can access, from the corresponding "View history" page:
* Nice stats (via soxred93 tool in Toolserver) : http://toolserver.org/~soxred93/articleinfo/index.php?article=History_of_Fre...
* Ranked contributors (Daniel's tool in Toolserver): http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wik...
* Revision history search (WikiBlame): http://wikipedia.ramselehof.de/wikiblame.php?lang=en&article=History_of_...
* Page view statistics: http://stats.grok.se/en/201101/History_of_Free_Software
And... incredible:
* Number of watchers (!!!) (mzmcbride tool in Toolserver): http://toolserver.org/~mzmcbride/cgi-bin/watcher.py?db=enwiki_p&titles=H...
I don't know when (exactly) these services were activated.
I've also found some (still inactive) "API" links. Anybody has any further info about this?
Cheers, Felipe.
Hi, Felipe,
these tools are really useful!
I like much the "Wikipedia Page History Statistics" too: http://vs.aka-online.de/cgi-bin/wppagehiststat.pl
Here in Brazil I've developed (with a computer science student) a tool that extracs other interesting data from pages history, like number of protections and duration of time of each, number of revertions and editions undone, number anda percentage of editions made by administrators, bots and IP etc.
Unfortunately it works only in portuguese Wikipedia, but we are very interessed in open the code e make it better.
BTW, as it's my first mensage here, let me present myself: I'm journalist, teacher in Federal University of Viçosa and PHD student in Applied Linguistics in Minas Gerais Federal University. In summary, I'm studing the editorial process of "Biographies of living persons" in portuguese Wikipedia.
Best,
De: Carlos d'Andréa carlosdand@gmail.com
Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Enviado: mar,25 enero, 2011 22:22 Asunto: Re: [Wiki-research-l] New toolbox Wikipedia pages
Hi, Felipe,
these tools are really useful!
I like much the "Wikipedia Page History Statistics" too: http://vs.aka-online.de/cgi-bin/wppagehiststat.pl
Here in Brazil I've developed (with a computer science student) a tool that extracs other interesting data from pages history, like number of protections and duration of time of each, number of revertions and editions undone, number anda percentage of editions made by administrators, bots and IP etc.
Unfortunately it works only in portuguese Wikipedia, but we are very interessed in open the code e make it better.
Nice to meet you, Carlos.
You might also like: http://meta.wikimedia.org/wiki/Statistics
There are some tools producing stats for any language, including:
http://meta.wikimedia.org/wiki/StatMediaWiki http://meta.wikimedia.org/wiki/WikiXRay
Best, Felipe
BTW, as it's my first mensage here, let me present myself: I'm journalist, teacher in Federal University of Viçosa and PHD student in Applied Linguistics in Minas Gerais Federal University. In summary, I'm studing the editorial process of "Biographies of living persons" in portuguese Wikipedia.
Best,
Hi, Felipe,
I'd heard about WikiXRay in your thesis, and it sounded really good.
Thanks, best, Carlos
On Tue, Jan 25, 2011 at 8:32 PM, Felipe Ortega glimmer_phoenix@yahoo.eswrote:
*De:* Carlos d'Andréa carlosdand@gmail.com *Para:* Research into Wikimedia content and communities < wiki-research-l@lists.wikimedia.org> *Enviado:* mar,25 enero, 2011 22:22 *Asunto:* Re: [Wiki-research-l] New toolbox Wikipedia pages
Hi, Felipe,
these tools are really useful!
I like much the "Wikipedia Page History Statistics" too: http://vs.aka-online.de/cgi-bin/wppagehiststat.pl
Here in Brazil I've developed (with a computer science student) a tool that extracs other interesting data from pages history, like number of protections and duration of time of each, number of revertions and editions undone, number anda percentage of editions made by administrators, bots and IP etc.
Unfortunately it works only in portuguese Wikipedia, but we are very interessed in open the code e make it better.
Nice to meet you, Carlos.
You might also like: http://meta.wikimedia.org/wiki/Statistics
There are some tools producing stats for any language, including:
http://meta.wikimedia.org/wiki/StatMediaWiki http://meta.wikimedia.org/wiki/WikiXRay
Best, Felipe
BTW, as it's my first mensage here, let me present myself: I'm journalist, teacher in Federal University of Viçosa and PHD student in Applied Linguistics in Minas Gerais Federal University. In summary, I'm studing the editorial process of "Biographies of living persons" in portuguese Wikipedia.
Best,
-- Carlos d'Andréa carlosdand.com novasm.blogspot.com
On Tue, Jan 25, 2011 at 6:30 PM, Felipe Ortega glimmer_phoenix@yahoo.eswrote:
Hi all.
I just discovered this, it may be potentially interesting for the Wikipedia research community.
In short, now for any Wikipedia page, not only articles, e.g.
http://en.wikipedia.org/wiki/History_of_free_and_open_source_software
You can access, from the corresponding "View history" page:
- Nice stats (via soxred93 tool in Toolserver) :
http://toolserver.org/~soxred93/articleinfo/index.php?article=History_of_Fre...http://toolserver.org/%7Esoxred93/articleinfo/index.php?article=History_of_Free_Software 〈=en&wiki=wikipedia
- Ranked contributors (Daniel's tool in Toolserver):
http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wik...http://toolserver.org/%7Edaniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=History_of_Free_Software
- Revision history search (WikiBlame):
http://wikipedia.ramselehof.de/wikiblame.php?lang=en&article=History_of_...
- Page view statistics:
http://stats.grok.se/en/201101/History_of_Free_Software
And... incredible:
- Number of watchers (!!!) (mzmcbride tool in Toolserver):
http://toolserver.org/~mzmcbride/cgi-bin/watcher.py?db=enwiki_p&titles=H...http://toolserver.org/%7Emzmcbride/cgi-bin/watcher.py?db=enwiki_p&titles=History_of_Free_Software
I don't know when (exactly) these services were activated.
I've also found some (still inactive) "API" links. Anybody has any further info about this?
Cheers, Felipe.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hello folks thanks for sharing your projects! I have a question. Is there a way to see from which source users land to wikipedia's pages? I mean: are the users entering their keywords in wiki search field, or are they landed from google? that means: are the visibility of the articles depending on keywords users look for, or on structure of the internet (turining wiki pages very visible)?
2011/1/27 Carlos d'Andréa carlosdand@gmail.com
Wiki-research-l@lists.wikimedia.org
Here another question, different topic:
we would like to examine the network property of the wiki. There are already some results here an there, though we would like to have a closer look at it, to eventually improve the knowledge base.
To do that, we need to access the pages of wiki (only articles by now), with article name, abstract, meta keys, internal hyperlinks connecting them, and external hyperlinks base.
We found the db list in gz but they are very large files, and here my question. how to manipulate them with phpmyadmin? any other open source tool to handle datafiles of such size?
an easy way to get first results would be to have the db of articles with above parameters in xml sheet. Also a portion of it would be interesting for a demo project to work on.
Any idea/reference? Many thanks, Luigi Assom
I introduce myself too: my background is in visual communication + international development. I am working with a friend who is PhD in theoretical physics, we are both interested in learning platform and emerging self-organized information patterns.
On Thu, Jan 27, 2011 at 2:28 PM, Luigi Assom luigi.assom@gmail.com wrote:
Hello folks thanks for sharing your projects! I have a question. Is there a way to see from which source users land to wikipedia's pages? I mean: are the users entering their keywords in wiki search field, or are they landed from google? that means: are the visibility of the articles depending on keywords users look for, or on structure of the internet (turining wiki pages very visible)?
2011/1/27 Carlos d'Andréa carlosdand@gmail.com
Wiki-research-l@lists.wikimedia.org
-- Luigi Assom
Skype contact: oggigigi
On 27/01/2011 14:35, Luigi Assom wrote:
Here another question, different topic:
we would like to examine the network property of the wiki. There are already some results here an there, though we would like to have a closer look at it, to eventually improve the knowledge base.
To do that, we need to access the pages of wiki (only articles by now), with article name, abstract, meta keys, internal hyperlinks connecting them, and external hyperlinks base.
We found the db list in gz but they are very large files, and here my question. how to manipulate them with phpmyadmin? any other open source tool to handle datafiles of such size?
an easy way to get first results would be to have the db of articles with above parameters in xml sheet. Also a portion of it would be interesting for a demo project to work on.
Hi Luigi, there are various tools for reading XML dump files and importing them into MySQL, which is probably the best option if you want to handle very large files like the dumps for the English wikipedia. See here: http://meta.wikimedia.org/wiki/Data_dumps#Tools
If you're only interested in a subset of the articles, and just in the current revisions, another possibility is crawling the website via the Mediawiki API http://www.mediawiki.org/wiki/API There are several client libraries, a Google query for you favourite language should return you some pointers.
I'll add another note to this "article view" discussion:
I have parsed the hourly, per-page statistics at [http://dammit.lt/wikistats/]. If one assumes uniform intra-hour distributions, this makes it possible to arrive at highly accurate view estimates for arbitrary pages, for arbitrary time intervals.
I have found this useful to measure how many people saw a particular revision and used this heavily in my anti-vandalism research.
I believe this is the same data source all these other services are using -- but I don't do any aggregation. I've got data for all of 2010 for en.wiki (some 400+GB). I'd imagine this volume of parsing and storage isn't something all Wiki researchers are capable of.
So, while I'm yet to develop this into a formal public-facing API -- I'd be willing to run queries for interested researchers -- and they should feel free to contact me.
Thanks, -Andrew G. West
On 01/25/2011 04:22 PM, Carlos d'Andréa wrote:
Hi, Felipe,
these tools are really useful!
I like much the "Wikipedia Page History Statistics" too: http://vs.aka-online.de/cgi-bin/wppagehiststat.pl
Here in Brazil I've developed (with a computer science student) a tool that extracs other interesting data from pages history, like number of protections and duration of time of each, number of revertions and editions undone, number anda percentage of editions made by administrators, bots and IP etc.
Unfortunately it works only in portuguese Wikipedia, but we are very interessed in open the code e make it better.
BTW, as it's my first mensage here, let me present myself: I'm journalist, teacher in Federal University of Viçosa and PHD student in Applied Linguistics in Minas Gerais Federal University. In summary, I'm studing the editorial process of "Biographies of living persons" in portuguese Wikipedia.
Best,
-- Carlos d'Andréa carlosdand.com http://carlosdand.com novasm.blogspot.com http://novasm.blogspot.com
On Tue, Jan 25, 2011 at 6:30 PM, Felipe Ortega <glimmer_phoenix@yahoo.es mailto:glimmer_phoenix@yahoo.es> wrote:
Hi all. I just discovered this, it may be potentially interesting for the Wikipedia research community. In short, now for any Wikipedia page, not only articles, e.g. http://en.wikipedia.org/wiki/History_of_free_and_open_source_software You can access, from the corresponding "View history" page: * Nice stats (via soxred93 tool in Toolserver) : http://toolserver.org/~soxred93/articleinfo/index.php?article=History_of_Free_Software <http://toolserver.org/%7Esoxred93/articleinfo/index.php?article=History_of_Free_Software>〈=en&wiki=wikipedia * Ranked contributors (Daniel's tool in Toolserver): http://toolserver.org/~daniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=History_of_Free_Software <http://toolserver.org/%7Edaniel/WikiSense/Contributors.php?wikilang=en&wikifam=.wikipedia.org&grouped=on&page=History_of_Free_Software> * Revision history search (WikiBlame): http://wikipedia.ramselehof.de/wikiblame.php?lang=en&article=History_of_Free_Software <http://wikipedia.ramselehof.de/wikiblame.php?lang=en&article=History_of_Free_Software> * Page view statistics: http://stats.grok.se/en/201101/History_of_Free_Software And... incredible: * Number of watchers (!!!) (mzmcbride tool in Toolserver): http://toolserver.org/~mzmcbride/cgi-bin/watcher.py?db=enwiki_p&titles=History_of_Free_Software <http://toolserver.org/%7Emzmcbride/cgi-bin/watcher.py?db=enwiki_p&titles=History_of_Free_Software> I don't know when (exactly) these services were activated. I've also found some (still inactive) "API" links. Anybody has any further info about this? Cheers, Felipe. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org <mailto:Wiki-research-l@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Andrew,
So, while I'm yet to develop this into a formal public-facing API -- I'd be willing to run queries for interested researchers -- and they should feel free to contact me.
are you aware of this tool based on your data: http://stats.grok.se ?
It also has a JSON interface, which is really handy (I used it with a simple python script to download view stats for a sample of pages in a given timeframe)
Dario
apologies – that's obviously just an interface to Domas Mituzas' raw data!
Dario
On 25 Jan 2011, at 23:02, Dario Taraborelli wrote:
Andrew,
So, while I'm yet to develop this into a formal public-facing API -- I'd be willing to run queries for interested researchers -- and they should feel free to contact me.
are you aware of this tool based on your data: http://stats.grok.se ?
It also has a JSON interface, which is really handy (I used it with a simple python script to download view stats for a sample of pages in a given timeframe)
Dario
Dario,
Yes, it is certainly the same data source.
First, I wasn't aware there was a JSON API for [http://stats.grok.se] -- can you provide everyone a link to it?
Second, at least in visual form, that site presents only daily totals. The actual data uses hourly dumps -- and I was thinking my contribution could be finer granularity for those who need it (assuming I am not mistaken).
Thanks, -AW
On 01/25/2011 06:06 PM, Dario Taraborelli wrote:
apologies – that's obviously just an interface to Domas Mituzas' raw data!
Dario
On 25 Jan 2011, at 23:02, Dario Taraborelli wrote:
Andrew,
So, while I'm yet to develop this into a formal public-facing API -- I'd be willing to run queries for interested researchers -- and they should feel free to contact me.
are you aware of this tool based on your data: http://stats.grok.se ?
It also has a JSON interface, which is really handy (I used it with a simple python script to download view stats for a sample of pages in a given timeframe)
Dario
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Andrew,
these are examples of the JSON response:
daily totals: http://stats.grok.se/json/en/201002/Britney_Spears (note that each month is represented by an array of 32 values starting with a 0)
monthly totals: http://stats.grok.se/json/en/2010/Britney_Spears
That's correct – daily stats are the best resolution you can get with this tool.
Dario
On 26 Jan 2011, at 00:57, Andrew G. West wrote:
Dario,
Yes, it is certainly the same data source.
First, I wasn't aware there was a JSON API for [http://stats.grok.se] -- can you provide everyone a link to it?
Second, at least in visual form, that site presents only daily totals. The actual data uses hourly dumps -- and I was thinking my contribution could be finer granularity for those who need it (assuming I am not mistaken).
Thanks, -AW
On 01/25/2011 06:06 PM, Dario Taraborelli wrote:
apologies – that's obviously just an interface to Domas Mituzas' raw data!
Dario
On 25 Jan 2011, at 23:02, Dario Taraborelli wrote:
Andrew,
So, while I'm yet to develop this into a formal public-facing API -- I'd be willing to run queries for interested researchers -- and they should feel free to contact me.
are you aware of this tool based on your data: http://stats.grok.se ?
It also has a JSON interface, which is really handy (I used it with a simple python script to download view stats for a sample of pages in a given timeframe)
Dario
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Andrew G. West, Doctoral Student Dept. of Computer and Information Science University of Pennsylvania, Philadelphia PA Phone: (304)-415-5824 Email: westand@cis.upenn.edu Website: http://www.cis.upenn.edu/~westand
Hi,
On Tue, Jan 25, 2011 at 9:30 PM, Felipe Ortega glimmer_phoenix@yahoo.es wrote:
You can access, from the corresponding "View history" page:
...
I don't know when (exactly) these services were activated.
Most of them were added to the "View history" page in 2008 and 2009:
http://en.wikipedia.org/w/index.php?title=MediaWiki:Histlegend&action=hi...
We featured an overview of such page history related tools in the Signpost a while ago, also mentioning a few others:
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-09-20/Dispatc...
Regards, HaeB
----- Mensaje original ---- De: Wikipedia Signpost wikipediasignpost@gmail.com Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Enviado: mié,26 enero, 2011 07:20 Asunto: Re: [Wiki-research-l] New toolbox Wikipedia pages
Hi,
On Tue, Jan 25, 2011 at 9:30 PM, Felipe Ortega glimmer_phoenix@yahoo.es wrote:
You can access, from the corresponding "View history" page:
...
I don't know when (exactly) these services were activated.
Most of them were added to the "View history" page in 2008 and 2009:
http://en.wikipedia.org/w/index.php?title=MediaWiki:Histlegend&action=hi...
Well, in particular it looks like the tool for "better statistics" was introduced in addition to the ranked list of contributors last month:
06:57, 27 December 2010 Seattle Skier (talk | contribs) (1,492 bytes) (Add better "Revision history statistics" tool)
We featured an overview of such page history related tools in the Signpost a while ago, also mentioning a few others:
http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2010-09-20/Dispatc...
Thanks, I had missed those ones.
In fact, we had a discussion on watchlists in this list last Summer and it wasn't mentioned that it was connected with "History view" pages since Oct. 2009, AFAIK.
I thought this could be useful, since I was asking around and many people hadn't notice this toolbox before (neither in the English Wikipedia, nor in other languages where it looks like it is being activated).
Best, Felipe.
Regards, HaeB
Hi -
I am looking for the latest dump of the interwiki (language) link table. Can anyone point me to the right file, I cannot seem to find it on the downloads page or on the mirror sites! Someone suggested to me the latest may be as of Jan 2010? Thanks! Andreea
----- Mensaje original ----
De: "Gorbatai, Andreea" agorbatai@hbs.edu Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Enviado: mié,26 enero, 2011 23:52 Asunto: [Wiki-research-l] Interwiki links dump
Hi -
I am looking for the latest dump of the interwiki (language) link table. Can anyone point me to the right file, I cannot seem to find it on the downloads page or on the mirror sites! Someone suggested to me the latest may be as of Jan 2010? Thanks! Andreea
Hi, Andreea.
I think you're looking for this one: http://download.wikimedia.org/enwiki/20100116/enwiki-20100116-langlinks.sql....
The set of predefined prefixes and links to parse the file can be also found here: http://download.wikimedia.org/enwiki/20100116/enwiki-20100116-interwiki.sql....
Indeed, latest version is from January 2010.
HTH.
Felipe. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
2011/1/25 Felipe Ortega glimmer_phoenix@yahoo.es
Hi all.
I just discovered this, it may be potentially interesting for the Wikipedia research community.
In short, now for any Wikipedia page, not only articles, e.g.
More precisely, for any English Wikipedia page. This tool is useful for all languages, but it was implemented only in some Wikipedias.
----- Mensaje original ----
De: Amir E. Aharoni amir.aharoni@mail.huji.ac.il Para: Research into Wikimedia content and communities wiki-research-l@lists.wikimedia.org Enviado: mié,26 enero, 2011 09:08 Asunto: Re: [Wiki-research-l] New toolbox Wikipedia pages
2011/1/25 Felipe Ortega glimmer_phoenix@yahoo.es
Hi all.
I just discovered this, it may be potentially interesting for the Wikipedia research community.
In short, now for any Wikipedia page, not only articles, e.g.
More precisely, for any English Wikipedia page. This tool is useful for all languages, but it was implemented only in some Wikipedias.
In fact, I discovered the links in the Spanish Wikipedia, thanks to the blue highlighted background.
Best, F. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org