Some statistics I wish I had...

List overview All Threads
Download

newer

older

Cross-Site-Scripting arbitrary...

More image requests

Erik Zachte

5 Jun 2006 5 Jun '06

8:58 a.m.

Jimbo:

...

It would be nice to track that number over time... are we becoming

"younger" as a community, "older" as a community?

...

Staying about the same? Are old-timers sticking around longer than they

used to, or jumping ship faster?

...

There are also a whole set of related questions.

You know I was just thinking to do something similar for mailling lists. Are people active on a certain list for a long time etc.

I will put the above questions on my wikistats todo list. First need to focus on unicode update for EasyTimeline which has been in the queue for way too long.

Erik Zachte

Show replies by date

Felipe Ortega

27 Jun 27 Jun

12:14 a.m.

Hello Erik.

My name' s Felipe Ortega, Ph.D. student at the Rey Juan Carlos University (Madrid, Spain). I' m currently working on my thesis, trying to build a quantitative analysis model for Wikipedia. I' ve carefully studied all your efforts as Wikipedia admins and application contributors in this field.

In the past few weeks I' ve read a bunch of mail messages talking about what is precisely my first goal: extracting behavioral conclusions from a quantitative analysis of wikipedia database dumps in all languages.

But, despite all my efforts, and some mails offering myself as contributor I have received no answer from Wikipedia Community. I only wanted to contribute in a very interesting area (I think) and I hope it could lead me to build an interesting thesis about this topic. I'm currently developing some scripts in Phyton that analyze database dumps.

I wrote a paper with some preliminar results, if you may take a glance to it. I only ask for some collaboration from anyone involved with the project, because otherwise maybe I should simply think that all my efforts don' t bother anyone.

Thanks.

Felipe Ortega.

Erik Zachte erikzachte@infodisiac.com escribió: Jimbo:

...

It would be nice to track that number over time... are we becoming

"younger" as a community, "older" as a community?

...

Staying about the same? Are old-timers sticking around longer than they

used to, or jumping ship faster?

...

There are also a whole set of related questions.

You know I was just thinking to do something similar for mailling lists. Are people active on a certain list for a long time etc.

I will put the above questions on my wikistats todo list. First need to focus on unicode update for EasyTimeline which has been in the queue for way too long.

Erik Zachte

_______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

---------------------------------

LLama Gratis a cualquier PC del Mundo. Llamadas a fijos y móviles desde 1 céntimo por minuto. http://es.voice.yahoo.com

Jeremy Dunck

12:36 a.m.

On 6/26/06, Felipe Ortega glimmer_phoenix@yahoo.es wrote:

...

In the past few weeks I' ve read a bunch of mail messages talking about what is precisely my first goal: extracting behavioral conclusions from a quantitative analysis of wikipedia database dumps in all languages.

But, despite all my efforts, and some mails offering myself as contributor I have received no answer from Wikipedia Community. I only wanted to contribute in a very interesting area (I think) and I hope it could lead me to build an interesting thesis about this topic. I'm currently developing some scripts in Phyton that analyze database dumps.

I wrote a paper with some preliminar results, if you may take a glance to it. I only ask for some collaboration from anyone involved with the project, because otherwise maybe I should simply think that all my efforts don' t bother anyone.

Felipe, I'm a bachelor student at University of Texas at Dallas, also working on what I'm calling fine-grained statistics for Wikipedia, using python to interpret text from database dumps. :)

I've been working on it off and on for almost a year. The big problems for me has been disk space and wikitext parsing. After fiddling for quite a while trying to make my own parser, I have finally broken down to using the HTML as rendered by MediaWiki, then using that as the basis for the rest.

My basic goal is to provide statistics for things such as "how many revisions has this piece of text survived" and similar, then render that information onto a reader's wikipedia browser page. As a second goal, it'd be nice to find some combination of stats that suggest which bits of a page a more likely trustworthy.

How far have you gotten, and does it sound like we're on the same track?

Cheers, Jeremy

Jesus M. Gonzalez-Barahona

28 Jun 28 Jun

2:49 a.m.

Hi, Jeremy,

I'm supervising (just from an academic point of view) Felipe's work. We have (I guess) enough disk space, and probably could get more if needed. We have also some experience in analyzing libre (free, open source) software projects (have a look at http://libresoft.urjc.es ), and probably several of the techniques we're using for that could be applied to analyzing wikipedia.

So, if you're interested, we could explore how to collaborate. For now, we're using the database dumps for the analysis. From your message it seems that you're spidering the content from the website, am I right?

Saludos,

Jesus.

On Mon, 2006-06-26 at 11:36 -0500, Jeremy Dunck wrote:

...

On 6/26/06, Felipe Ortega glimmer_phoenix@yahoo.es wrote:

...
In the past few weeks I' ve read a bunch of mail messages talking about what is precisely my first goal: extracting behavioral conclusions from a quantitative analysis of wikipedia database dumps in all languages.

But, despite all my efforts, and some mails offering myself as contributor I have received no answer from Wikipedia Community. I only wanted to contribute in a very interesting area (I think) and I hope it could lead me to build an interesting thesis about this topic. I'm currently developing some scripts in Phyton that analyze database dumps.

I wrote a paper with some preliminar results, if you may take a glance to it. I only ask for some collaboration from anyone involved with the project, because otherwise maybe I should simply think that all my efforts don' t bother anyone.

Felipe, I'm a bachelor student at University of Texas at Dallas, also working on what I'm calling fine-grained statistics for Wikipedia, using python to interpret text from database dumps. :)

I've been working on it off and on for almost a year. The big problems for me has been disk space and wikitext parsing. After fiddling for quite a while trying to make my own parser, I have finally broken down to using the HTML as rendered by MediaWiki, then using that as the basis for the rest.

My basic goal is to provide statistics for things such as "how many revisions has this piece of text survived" and similar, then render that information onto a reader's wikipedia browser page. As a second goal, it'd be nice to find some combination of stats that suggest which bits of a page a more likely trustworthy.

How far have you gotten, and does it sound like we're on the same track?

Cheers, Jeremy _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l

-- ----- Jesus M. Gonzalez Barahona | GSyC (DITTE) Edificio Departamental II (ESCET) | jgb@gsyc.escet.urjc.es / jgb@computer.org Universidad Rey Juan Carlos | tel: +34 91 664 74 67 c/ Tulipan s/n | fax: +34 91 664 74 94 28933 Mostoles, Spain

6758

Age (days ago)

6780

Last active (days ago)

wikitech-l@lists.wikimedia.org

3 comments

4 participants

tags (0)

participants (4)

Erik Zachte
Felipe Ortega
Jeremy Dunck
Jesus M. Gonzalez-Barahona