Wikitech-l January 2009

wikitech-l@lists.wikimedia.org

94 participants
87 discussions

Bugzilla Weekly Report
by reporter＠isidore.wikimedia.org 19 Jan '09

19 Jan '09

MediaWiki Bugzilla Report for January 12, 2009 - January 19, 2009 Status changes this week Bugs NEW : 114 Bugs ASSIGNED : 9 Bugs REOPENED : 30 Bugs RESOLVED : 90 Total bugs still open: 3226 Resolutions for the week: Bugs marked FIXED : 51 Bugs marked REMIND : 0 Bugs marked INVALID : 8 Bugs marked DUPLICATE : 16 Bugs marked WONTFIX : 5 Bugs marked WORKSFORME : 8 Bugs marked LATER : 2 Bugs marked MOVED : 0 Specific Product/Component Resolutions & User Metrics New Bugs Per Component Page rendering 5 Special pages 5 Site requests 5 User interface 4 Deleting 4 New Bugs Per Product MediaWiki 33 Wikimedia 13 MediaWiki extensions 6 Wikipedia Mobile 1 Top 5 Bug Resolvers JSchulz_4587 [AT] msn.com 19 roan.kattouw [AT] home.nl 14 innocentkiller [AT] gmail.com 6 rhalsell [AT] wikimedia.org 6 mikelifeguard [AT] fastmail.fm 4

1 0

How does Wikipedia limit concurrent edit?
by howard chen 17 Jan '09

17 Jan '09

Hello, Are there any official documents related to the concurrent edit / locking model used in Wikipedia (DB)? Thanks.

3 2

Revision history edit compressor
by Robert Rohde 17 Jan '09

17 Jan '09

As I assume most people here know, each revision in the full history dumps for Mediawiki reports the complete page text. So even though an edit may have changed only a few characters, the entire page is reported for each revision. This is one of the reasons that full history dumps are very large. Recently I've written some Python code to re-express the revision history into an "edit syntax", using an xml compatible notation for changes with expressions like: <replace>, <delete>, <insert>, etc. Since many revisions really only consist of small changes to the text, using the notation I've been developing can greatly reduce the size of the dump, while still maintaining a human readable syntax. For example, I recently ran it against the full history dump of ruwiki (179 GB uncompressed, 1.2 M pages, 11.2 M revisions), and got a 94% reduction in size (11.1 GB). Because it is still a text based format, it stacks well with traditional file compressors (bz2: 89% reduction - 1.24 GB; 7z: 91% reduction - 1.07 GB). It also could be a precursor to analysis designed to work out "primary" authors and other tasks where one wants to know who is making large edits and who is making small, housekeeping edits. Obviously, as a compressor it is most successful with large pages which have a large number of relatively minor revisions. For example, the enwiki history of [[Saturn]] (current size 57 kb, 4741 revisions) sees a 99.1% size reduction. I suspect that the size reduction on large wikis, like en or de, would be even greater than the 94% for ruwiki since larger wikis tend to have larger pages and more revisions per page. The current version of my compressor averaged a little better than 250 revisions per second on ruwiki (about 12 hours total) on a 18-month-old desktop. However, as the CPU utilization was only 50-70% of a full processing core most of the time, I suspect that my choice to read and write from an external hard drive may have been the limiting factor. On a good machine, 400+ rev/s might be a plausible number for the current code. Or in short, the overhead for figuring out my edit syntax is relatively small compared to the generation time for the current dumps (which I'm guessing is limited by communication with the text data store). My code has some quirks and known bugs, and I'd describe it as a late-stage alpha version at the moment. It still needs considerable work (not to mention documentation) before I would consider it to be something ready for general use. However, I wanted to know if this is a project of interest to Mediawiki developers or other people. Placed in the dump chain, it could substantially reduce the size of the human readable dumps (at the expense that one would need to process through a series of edits if you wanted see the full-text of any specific revision). Or utilized for different purposes, it could help figure out major vs. minor editors, etc. If this project is mostly just a curiosity for my own use, then I will probably keep the code pretty crude. However, if other people are interested in using something like this, then I am willing to put more effort into developing something that is cleaner and more generally usable. So, I'd like to know whether there are people (besides myself) who are interested in seeing the full history dumps expressed in an edit syntax rather than the full-text syntax currently used. -Robert Rohde

12 23

Wikimedia hiring Linux system administrator
by Brion Vibber 16 Jan '09

16 Jan '09

So you want to run a top-10 web site? Now's your chance... We're now hiring for a full-time system administrator to help monitor, maintain, and document the 400+ Linux/Unix servers that operate Wikipedia and its sister projects. This position will be based at our San Francisco headquarters, but will work closely with our remote staff and volunteers. Currently, system administration tasks are spread over our other tech staff and volunteers, who have to split their time with software development, data center management, and network planning. A full-time system administrator will let us be more responsive to site issues when they happen, and more importantly be more proactive about planning for and averting problems before they affect the folks back home. We've got operating systems to upgrade, configurations to document, software installations to automate, and a lot of service data that needs to be monitored and digested... if you think you've got the chops for it, send us your CV by the end of January! http://wikimediafoundation.org/wiki/Job_openings/System_Administrator -- brion vibber (brion @ wikimedia.org) CTO, Wikimedia Foundation

1 0

Template Special:Export/Import
by Dawson 16 Jan '09

16 Jan '09

Hello, I have done a Special:Export latest revision of http://en.wikipedia.org/w/index.php?title=Diabetes_mellitus including templates, and copied: {{Infobox Disease | Name = TestSMW | Image = | Caption = | DiseasesDB = | ICD10 = {{ICD10|Group|Major|minor|LinkGroup|LinkMajor}} | ICD9 = 000000 | ICDO = | OMIM = | MedlinePlus = | eMedicineSubj = | eMedicineTopic = | MeshID = }} Into my test page http://wiki.medicalstudentblog.co.uk/index.php/ TestSMW -- However as you can see, it comes out all garbaged. Can anyone advise? I should now have all the templates from the export/ import, perhaps I'm missing some other extension(s)? Thanks, Dawson

5 6

Re: [Wikitech-l] Is the page status in the database?
by Eugenio Tacchini 16 Jan '09

16 Jan '09

At 16.15 15/01/2009 -0500, you wrote: >On Thu, Jan 15, 2009 at 11:54 AM, Eugenio Tacchini <eugenio(a)favoriti.it> wrote: >> Thanks for yor reply. >> >> I don't need a generale measure but I need the status for each single >> page; as far as I have seen probably the only solution is to look at >> the corresponding templates, maybe via the table marco suggested me. > >Yes, the way you want to do this is checking templatelinks. This is >how disambiguations are checked in the software, and it could be used >for stubs and so on too. Ok, thanks again. Eugenio

1 0

Re: [Wikitech-l] Is the page status in the database?
by Aryeh Gregor 15 Jan '09

15 Jan '09

On Thu, Jan 15, 2009 at 11:54 AM, Eugenio Tacchini <eugenio(a)favoriti.it> wrote: > Thanks for yor reply. > > I don't need a generale measure but I need the status for each single > page; as far as I have seen probably the only solution is to look at > the corresponding templates, maybe via the table marco suggested me. Yes, the way you want to do this is checking templatelinks. This is how disambiguations are checked in the software, and it could be used for stubs and so on too.

1 0

Re: [Wikitech-l] Is the page status in the database?
by Chad 15 Jan '09

15 Jan '09

On Thu, Jan 15, 2009 at 11:00 AM, Eugenio Tacchini <eugenio(a)favoriti.it>wrote: > Hello everybody, > I'm looking, for academic research purposes, for the "status" of > wikipedia pages. For "status" I mean: > - stub > - normal > - good article > - featured > > Is there any coloumn in the mediawiki database schema that can give > me this information? > > Thanks in advance. > > Cheers, > > Eugenio > None of this information is stored in the database really. A count of real articles vs. stubs is sort of stored in the site_stats. The content pages that aren't stubs are counted in ss_good_articles. However, ss_total_pages is all pages, content or not; making this a bad metric for your purposes. An individual wiki's concept of stub/normal/good/ featured is a completely arbitrary system not based in the actual software in any way. The only idea I'd have (for English Wikipedia) would be cross-referencing to see which articles contain the {{stub}} (or similar) template, as that should give you a good idea of stubs. Perhaps similar things could be done with {{featured}}? The only other idea I'd have would be to check the FlaggedRevs tables to see how they describe individual articles. However, the English Wikipedia doesn't use the extension yet, and I don't think this information is included in the dumps, iirc. -Chad

2 1

Re: [Wikitech-l] Is the page status in the database?
by Marco Schuster 15 Jan '09

15 Jan '09

On Thu, Jan 15, 2009 at 5:00 PM, Eugenio Tacchini <eugenio(a)favoriti.it> wrote: > Hello everybody, > I'm looking, for academic research purposes, for the "status" of > wikipedia pages. For "status" I mean: > - stub revision.rev_len > - normal pretty obvious - everything above stub level > - good article > - featured templatelinks (maybe, not sure!!) marco ps: you might want to apply for a toolserver account or ask someone with ts access to execute queries for you

2 1

Is the page status in the database?
by Eugenio Tacchini 15 Jan '09

15 Jan '09

Hello everybody, I'm looking, for academic research purposes, for the "status" of wikipedia pages. For "status" I mean: - stub - normal - good article - featured Is there any coloumn in the mediawiki database schema that can give me this information? Thanks in advance. Cheers, Eugenio

1 0

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2009