Wikitech-l January 2009

wikitech-l@lists.wikimedia.org

94 participants
87 discussions

Crawling deWP
by Marco Schuster 28 Jan '09

28 Jan '09

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, I want to crawl around 800.000 flagged revisions from the German Wikipedia, in order to make a dump containing only flagged revisions. For this, I obviously need to spider Wikipedia. What are the limits (rate!) here, what UA should I use and what caveats do I have to take care of? Thanks, Marco PS: I already have a revisions list, created with the Toolserver. I used the following query: "select fp_stable,fp_page_id from flaggedpages where fp_reviewed=1;". Is it correct this one gives me a list of all articles with flagged revs, fp_stable being the revid of the most current flagged rev for this article? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Use GnuPG with Firefox : http://getfiregpg.org (Version: 0.7.2) iD8DBQFJf5wcW6S2GapJUuQRAl8NAJ0Xs+ImyTqmoX2Vtj6k6PK9ntlS5wCeJjsl M5kMETB3URYni5TilIOt8Fs= =j7Og -----END PGP SIGNATURE-----

5 13

enable $wgAllowCopyUploads follow-up
by Michael Dale 28 Jan '09

28 Jan '09

Revising the $wgAllowCopyUploads request ... The thread ended here: http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040942.html Any updates on this; or ideas on how we could support client initiated importing of media assets over http? --michael

2 1

Secure Server IPs?
by Robert Rohde 28 Jan '09

28 Jan '09

On enwiki, the secure server (i.e. secure.wikimedia.org) is currently written down as using: 66.230.192.0–66.230.239.255 It seems unlikely that the server really uses or needs such a large range. In addition, we received a report that 66.230.230.230 is operating as a TOR exit node. Since Wikipedia policy is to prohibit anon editing and account creation from TOR nodes, it would be nice to clarify this. Thanks. -Robert Rohde

2 1

Make upload headings changeable
by Marcus Buck 28 Jan '09

28 Jan '09

In function getInitialPageText in SpecialUpload.php hardcoded headings are added to the license and file description provided by the user. It would be great, if those hardcoded headings could be changed to a MediaWiki message that can be altered onwiki. On Commons, which is multilingual, for example the headings are more like information clutter than useful structuring tools. The file description heading will always be in the user language and the license heading always in the content language, English (I have no idea why the difference, but it's in the source code). That does add more confusion than benefit. Especially since both description and license are wrapped in templates by default and thus don't need the headings to be distinguishable. Would be nice if that could be fixed. And then I want to bring back to mind my last message about "localize transcluded image description pages". My proposal was easy to implement, uncontroversial, so I think, and it would provide a big gain in usability. At least it's relatively easy to implement for somebody with commit access, unlike me. I would appreciate, if somebody could change the code. Thank you. Marcus Buck

4 13

AbuseFilter extension testing
by Brion Vibber 27 Jan '09

27 Jan '09

Quick note -- we're now testing Werdna's AbuseFilter extension on test.wikipedia.org. AbuseFilter will make it easier for on-wiki admins to set up automatic detection and tagging in response to common suspect editing patterns. A lot of this kind of filtering is being done in client-side bot tools today, but it can be hard to coordinate what's going on, and responses are usually limited to heavy-handed reversions... by building the filters into the wiki, actions can range from simply tagging an edit for human attention to emergency desysopping, depending on what's appropriate. * Docs: http://www.mediawiki.org/wiki/Extension:AbuseFilter * Take a peek: http://test.mediawiki.org/wiki/Special:AbuseFilter Currently, sysops can define filter rules on Test Wikipedia, but with some limitations on what the system can do in response: * Tagging with visibility in RecentChanges/History/Contribs/etc isn't implemented yet (needs some support in MediaWiki core that hasn't been merged yet) * Filter-triggered blocks, range blocks, and removal from groups is disabled since we don't want people going crazy just yet. ;) Werdna will be polishing up the capabilities, interface, and capabilities of AbuseFilter over the next couple weeks ... in response to your help testing and providing feedback. Go check it out! :D -- brion

2 2

Article blaming
by Platonides 27 Jan '09

27 Jan '09

With all the discussion on foundation-l about contributors and attribution, I have noted that while there're two different implementations for blaming mediawiki articles, none of them seem to be publically available. There're some example results, but not the tools themselves. The implementations I am aware are: *Roman Nosov (svn user roman) blamemap extension (2006-2007), which was available at http://217.147.83.36:9001/wiki/Freebsd?trackchanges=blamemap&oldid=1524 *Greg Hewgill wikiblame (2008) http://hewgill.com/journal/entries/461-wikipedia-blame Is the code available and I have missed it? Do we have any other implementation?

8 9

Re: [Wikitech-l] Transcoding Video Contributions in Mediawiki
by Michael Dale 26 Jan '09

26 Jan '09

good points.... I don't think it would be a _bad_ idea to support server side transcoding it ofcourse gives more flexibility to have the original file and then let us target different output formats in the future. Would let us support camera video uploads etc. But there are logistical issues. It adds a bit of complexity / cost to the server side setup. Additionally we are interested in working with archive.org who already offers free transcode to ogg from arbitrary uploaded formats for free licensed content. They have 2100+ transcode/storage cpu units and petabytes of storage. Commons has on the order of 40 TB storage and all of (already busy) wikimedias servers together are around 400 units ... It makes sense to encourage long form video contribution to be supported via partnership with archive.org. Especially once we have them integrated as an archive provider. Firefogg ideally is not "complex" for end users. Its a one click extension install, the user does not have to know anything about encoding video. We supply the transcode settings via the javascript api so the settings are identical to what we would request server side. Using an extension also lets us control the upload system so we can have it upload in 1 meg chunks for example. That way we can improve usability around multi hundred meg POST uploads by giving progress indicators, support resumed uploads etc. > Would it be worth providing a simple http-upload to a server-side transcoder > for these relatively small files that are low-quality to begin with? > yes I would support that effort. Just focused on the firefogg stuff right now. If you have time to push forward on this we can try and get something set up. > wouldn't it be more efficient to let > an infrastructure like the one I created encode _all_ versions used for > streaming, whether for desktops or mobile devices, from a single > archival-quality upload? yes, it may be more ideal to just upload the HQ version and have the server do the transcode. Your transcode infrastructure could be very useful for that. But we will have to see how the logistical issues mentioned above play out. peace, --michael

1 0

Bugzilla Weekly Report
by reporter＠isidore.wikimedia.org 26 Jan '09

26 Jan '09

MediaWiki Bugzilla Report for January 19, 2009 - January 26, 2009 Status changes this week Bugs NEW : 110 Bugs ASSIGNED : 5 Bugs REOPENED : 12 Bugs RESOLVED : 81 Total bugs still open: 3247 Resolutions for the week: Bugs marked FIXED : 56 Bugs marked REMIND : 0 Bugs marked INVALID : 5 Bugs marked DUPLICATE : 10 Bugs marked WONTFIX : 5 Bugs marked WORKSFORME : 4 Bugs marked LATER : 2 Bugs marked MOVED : 0 Specific Product/Component Resolutions & User Metrics New Bugs Per Component Page rendering 7 CentralNotice 4 SemanticResultFormats 4 Categories 4 User interface 4 New Bugs Per Product MediaWiki 33 Wikimedia 6 MediaWiki extensions 22 Top 5 Bug Resolvers JSchulz_4587 [AT] msn.com 20 tom.maaswinkel [AT] 12wiki.eu 7 roan.kattouw [AT] home.nl 5 niklas.laxstrom [AT] gmail.com 5 raimond.spekking [AT] gmail.com 4

1 0

403 with content to Python?
by Andre Engels 25 Jan '09

25 Jan '09

Through a message on another list, I found that when one tries to reach wikipedia (or at least wikipedia-en) specifying the User Agent as "Python-urllib/1.17", the server gives a "403 Forbidden" response, together with the content of the page. Two questions: 1. Why is this User Agent getting this response? If I remember correctly, this was installed in the early days of the pywikipediabot, when Brion wanted to block it because it had a programming error causing it to fetch each page twice (sometimes even more?). If that is the actual reason, I see no reason why it should still be active years afterward... 2. If this User Agent is really to be blocked, why do we still provide the content of the page that is forbidden? -- André Engels, andreengels(a)gmail.com

7 14

Downloading Wikipedia HTML 2008-06
by Joe Webber 23 Jan '09

23 Jan '09

A few months ago I successfully downloaded the November 2006 HTML version of Wikipedia (about 6GB expanding to 90GB) and the October 2008 xml.bz2 file (4.1GB converted to 7.1GB Wikitaxi format). I have just downloaded the June 2008 HTML version in .tar.7z format and extracted into .tar format (14.3GB.to 230GB). I now have no idea what to do next. I ran WinRAR on it and it gave up after more than 6 million files. 1. How do I actually access all this information? I use the Wikitaxi version, but only the HTML version allows access to, for instance, categories, so the latest version would be useful. 2. Is there any way to recompress it to a reasonable size such that I can still access it without it occupying nearly all my disk?3. Or, failing that, is there any way to access the original .tar.7z file, as BzReader can access .xml.bz2 files?

1 0

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2009