Wikitech-l January 2009

wikitech-l@lists.wikimedia.org

94 participants
87 discussions

default quality for ogg content?
by Michael Dale 08 Jan '09

08 Jan '09

Once we ship the firefogg extension support for the uploading videos; commons should request that users select the highest quality source video footage available ie the HD video their camera captured or DV original edited footage from their local computer and then commons will supply the transcode settings. I think it would be good if we wrote up some documentation to explain this to uploaders... any volunteers to help on that front? Presently for the firefogg upload support I have arbitrarily chosen 400 pixels wide with keep-aspect ratio and 500kbs bitrate. Firefogg could let us request multiple encodes or profiles from the user. Should we plan on supporting multiple "profiles" ie multiple quality settings? Ie one version at around 320 wide 300kbs for low bandwidth / resolution environments, cell phones etc (300kbs should be "acceptable quality" once the new Thusnelda theora encoder lands). We could additionally read the source file resolution that users provide and choose a "maximum quality preservation version" we could probably even ship the Dirac codec with firefogg (Dirac is a high quality at high resolution wavelate codec for more on dirac see your favorite info source ;) If we want to support multiple quality settings for a single "stream" this will require a bit more infrastructure. Specifically I propose we add another namespace for temporal media called Stream: and have it directly map to ROE xml something like: http://tinyurl.com/72x57r more info on ROE http://wiki.xiph.org/index.php/ROE File:my_movie_low_quality.ogg and File:my_movie_high_quality.ogg would soft redirect to Stream:my_movie and all the meta info would be stored there. The Stream namespace also allows us to group other media tracks that share a temporal meaning such as multiple language audio dubbing and multilingual transcripts/ subtitles. The javascript player can then dynamically select audio language and or subtitles based on the user language. Stream namespace could also store "mirrors" or point to "torrents" improving syndication and bandwith cost distribution for high traffic HD content ie (Miro could read the ROE file and grab the torrent rather than hit our servers) and or a Firefox torrent extension could be detected by our javascript player and choose the torrent over hitting our servers for the HD content. Not to say all these things will happen at once ... just pointing out the need for a new namespace to group idential temporal meaning files. --michael

1 0

Should <references/> be transparently appended when missing?
by Erik Moeller 08 Jan '09

08 Jan '09

This has probably been raised before (is there a bug for it?), but it appears to me that it would be significantly more user-friendly to append a <references/> section at the bottom of the text that is being parsed when it is missing. This would * make it easier for new users to discover referencing functionality (oldbies could still properly reformat the tag); * solve the "references missing in section preview" problem. Is there any obvious reason not to do this? -- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

8 7

Interwiki conflicts
by Lars Aronsson 08 Jan '09

08 Jan '09

I just recently started to play with interwiki.py (Pywikipedia bot framework) for propagating interwiki links. My interest comes from organizing the category tree, so I'm focusing on interwiki links between categories. Interwiki bots normally run in autonomous mode, but this means they give up on complicated cases. If I run this script under manual supervision, without the "-autonomous" option, it stops and asks me how to resolve each conflict. This happens ever so often. I have now (manually) sorted out the interwiki links between all languages of Category:Knowledge, which was intertwined with Category:Science, and Category:Austrian writers which was mixed up with Category:Austrian literature. Such mistakes easily happen, of course. Who can spot errors in all these languages? Many languages had interwiki links from their category for Austrian writers to the Japanese category for Austrian literature. I'm not sure exactly when or where this error originated. But on June 19, 2007, the English and Spanish Wikipedia's interwiki link to Japanese changed from Austrian novelists to Austrian literature, i.e. from one error to another. Ten days later, this link was copied to the Dutch Wikipedia. The error was corrected on en.wikipedia on October 1, 2007, but remained on other languages. Yes, that's 15 months ago. The circular interwiki link structure from en:Category:Austrian writers to es:Categoría:Escritores de Austria to ja:... and back to en:Category:Austrian literature is such a conflict that makes interwiki.py give up when it runs in autonomous mode. Thus, corrections (as on October 1) do not propagate. Instead a report about the conflict is given in a logfile, but apparently nobody had fixed this problem in the last 15 monhts. This conflict also blocked new interwiki links from propagating. After I cleared up the mess, 21 new interwiki links were added to the category on the Russian Wikipedia (one where I have a bot flag). That means 21 languages of Wikipedia had created categories (or announced them to the interwiki system) for Austrian writers in the last 15 months, and they all added their interwiki link to the English Wikipedia. But these additions did not propagate because of the conflict. So, my question: Has anybody mapped exactly how many such interwiki conflicts we have? Or how many interwiki sets do we have without conflicts? Could/should someone make a list of current conflicts and try to rank them by importance, so we can get started in fixing them? In the longer term, we need to redesign the interwiki links into a centralized system, that can be maintained. I think the way to do this is to use Wikimedia Commons. Instead of copying all the interwiki links to every language of Wikipedia, it should be enough to add {{commons|Category:Writers from Austria}}, and the rest should happen automatically. -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

9 16

maxlength=200 or 255
by Raimond Spekking 08 Jan '09

08 Jan '09

For a lot (all?) text input fields for reasons we have set a maxlength of 200 but some are set to 255. Is there a special reason for 200? Or should we increase all to 255? Or reduce all to 200? I like consistency :) Raymond.

7 11

A small Commons research project
by Erik Moeller 08 Jan '09

08 Jan '09

We're currently working on a grant proposal that is related to the usability for uploading and embedding media files to Wikimedia Commons. (This is an area that we will likely not be able to address in detail as part of the Stanton project, so we're trying to parcel it into a separate project.) As part of this proposal, I would like to make a compelling case that pictures and other media uploaded to Commons benefit from strongly from the increased visibility, especially through Wikipedia articles. I'd also like to demonstrate that images get used in multiple languages and multiple projects. The simplest research approach that any volunteer could take is to take a sample (say 50 featured media and 50 random ones) and to catalog in a spreadsheet usage across Wikimedia projects, using the CheckUsage tool. But I'm sure there are other approaches - both quantitative and qualitative - that might work as well, e.g. based on Wikipedia article traffic statistics. I'd love to see some volunteer input into this question, which essentially boils down: Why is Wikimedia Commons awesome, and why is it worth investing in to make it even better? I've started a page on Meta here if you want to contribute ideas on-wiki: http://commons.wikimedia.org/wiki/Commons:Case_for_Commons But feel free to e-mail me off-list as well. :-) Thanks for any and all help, Erik -- Erik Möller Deputy Director, Wikimedia Foundation Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate

1 0

add media wizard early prototype
by Michael Dale 07 Jan '09

07 Jan '09

As part of the sequencer I have been working on an add media wizard to enable the searching and inserting of media to a given sequence. This add media wizard could also serve as an entry point to adding media to pages. == demo == First disclaimer: this is still pretty early on in development more of an early semi-working prototype than a beta or anything usable. (For example have not done much cross browser testing yet (use firefox)) But if you want to check it out go ahead and add: importScriptURI('http://mvbox2.cse.ucsc.edu/w/extensions/MetavidWiki/skins/external_media_wi…'); to your User:{username}/monobook.js page or you can load a slightly older version at: http://en.wikipedia.org/w/extensions/MetavidWiki/skins/external_media_wizar… Once installed to your user page go to edit some page like "sandbox" highlight a word or place the cursor where you want to insert media. Click the add media wizard at the top right of the edit box. You should get a few images from commons can click on an image to insert, add in-line description, crop if you like and then preview the insert into the page. Once happy "do the insert" it will paste in the wiki code to insert that image into the page you can modify and then re-preview the page if you like. You can also do a political search like "Iraq" or "Obama" and pull up metavid clips to see how the setting in-and-out points of video has been prototyped so far. (note: metavid fallback to flash video while the html5 video tag for firefox is still maturing... if you are using a firerfox nightly http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/ you may have to click the options button in the lower right to select ogg video stream) Once done playing with the prototype, I recommend removing the script from your user page. I will update the list once its more _ready_ for wider usage. It should work on remote wikis with the commons import system. Ie you can try including this script on your local mediaWiki installation .. if you have $wgAllowCopyUploads enabled your wiki should be able to download and import commons images directly into the wiki. == Issues == So this brings up a whole host of issues.. here are some that I thought of... I thought I would ping this to get some more ;) === security === * Right now the wizard pulls directly from remote repositories, (ie commons and metavid.org serves up the search results in json with a callback).. This means the compromise of any server that we support as a remote repository will result in xss issue. This is true for any remote script that users include but would be a bigger problem if the add media wizard moves into more common usage. ** We should probably proxy the results so we can just process them as RSS and run normal script filters on the data. -- this is slower and adds more strain to our servers but provides more security. ** or we limit the "included by default" repositories and put in a kill switch of sorts that we can run to stop injecting from any compromised remote repository? making it difficult to cause big xss issues "by default". We have users jump though some hoops to enable less common remote repositories similar to how the user scripts work? === performance / maintainable/modular code / internationalization === * JS Library loading... we want to start moving towards more modular scripts ie we don't need to include all the remote repository objects on first load (ie we have a search object for flicker we want to dynamically add in that remote repository search object as necessary when the user click on the flicker repository tab. ) For portability outside of mediaWiki we have to have each js object/file should define any user language messages that it includes, that way our script server system can send out the right language messages with the JS library that uses them. Some discussion about a javascript loading system took place on this list not long ago.. I said I would revisit the issue so I will try and do that soon :) http://lists.wikimedia.org/pipermail/wikitech-l/2008-December/040625.html === Licenses / archive restrictions == So far I have just included the metavid "remote" repository (importing is not working yet until we enable Uploading by URL and do some fixes for cross site issues (ie your inserting content on the wikipedia domain but want the resource to be uploaded to commons). In terms of external archive license issues I am thinking we essentially require that the external archive provide license info and we represent that with a little icon below each image and then pull the appropriate template into the import resource description. The other obvious external archives restriction is for video that they provide the video in ogg theora format. Preferably they run oggz_chop so that a segment of the video can be dynamically selected. We have already been working with archive.org on this front see: http://metavid.org/blog/2008/12/08/archiveorg-ogg-support/ == road map / up-and-coming efforts == * get scaling working ... (right now just defaults to thumb or cropped size). * license support (add in license thumbnails and wikimedia commons template mappings for import descriptions) * layout control (real time layout control will let you adjust size and float layout properties... maybe even let you move the image around in the page. * "add by URL" option for parsing resource pages of common repositories. ie maybe you find a picture using flickr's search engine you want to copy and paste that url not search for it again. * uploading .. integrate http://firefogg.org/ for uploading video from arbitrary source content to ogg theora with server side provided encoding settings. * fix importing of videos (from metavid initially, but done in a general way to support archive.org video inserts) * javascript loader (integrate a solution to the large set of many javascript files / localization problem) * add annodex oggz_chop to wikipedia server side architecture so that we can support setting in-and-out points for ogg video (like we can do on metavid and archive.org video) * improve generalization of search classes and add support more remote repositories. (archive.org, fliker etc) * improve mediaWiki api so we can query for "only videos" or only svg and or both the Title and Description text at the same time. * make a more general protocol for establishing queriable properties. This will let us do discovery of "advanced search" parameters in a general way. More complicated use case is for full semantic wiki. For some examples of finding video clips with semantic searches see: http://metavid.org/wiki/Sample_Semantic_Queries_page * improve the image "editor" integrate and or improve around multi-user collaboration some library for simple canvas/image manipulations ie: http://editor.pixastic.com/ (with server side support for rendering out these transformations for performance and older and or otherwise crippled web browsers (ie IE) ** along those lines add in server side support for cropping with the larger transformation framework in mind. peace, michael

1 0

Enwiki Dump Crawling since 10/15/2008
by yegg＠alum.mit.edu 05 Jan '09

05 Jan '09

The current enwiki database dump (http://download.wikimedia.org/enwiki/20081008/) has been crawling along since 10/15/2008. I realize that dumps can appear stalled in their normal processing (http://meta.wikimedia.org/wiki/Data_dumps#Schedule), but in the recent past (as far as I know) they have not been stalled this long without there being something actually wrong. The completion date for "All pages with complete page edit history" (where it is currently) fluctuates within the latter half of 2009. Is this purposeful? And is there anything I (or other community members) can do about it? I personally just need the pages-articles part. Would it be possible to dump up to that part on a different thread? Thank you for your time. Gabriel Weinberg

3 4

Section anchor encoding
by Brion Vibber 05 Jan '09

05 Jan '09

[Breaking this thread off...] On 12/28/08 1:32 AM, Niklas Laxström wrote: > The anchors of non-latin headers are already (latin) gibberish: > #.D0.A4.D0.B8.D0.BB.D1.8C.D0.BC.D0.BE.D0.B3.D1.80.D0.B0.D1.84.D0.B8.D1.8F > > It doesn't seem reasonable to think that people could create anchors > in their head from text, except in special cases. If we're going to stick with strict ASCII-limited anchors, it might be worth considering making them more legible, say with transliteration to ASCII Latin chars. :P On the other hand, XHTML *doesn't* actually limit us this way! The XHTML 1.0 recommendation of restriction to [A-Za-z][A-Za-z0-9:_.-]* is for compatibility with HTML 4.0, which defines: ID and NAME tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods ("."). XHTML specifcies ID and NMTOKEN types here, which are *not* restricted to ASCII, but rather a large number of scripts: http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-NameChar http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Letter http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Digit http://www.w3.org/TR/2000/WD-xml-2e-20000814#NT-Extender If there are no major browser compatibility problems, I would probably recommend we roll back the nasty old .XX encoding for HTML 4 compatibility, in which case we could quite legally produce something direct, such as: http://ru.wikipedia.org/wiki/Уплисцихе#Уплисцихе_в_средневековье which URL-encodes out to: http://ru.wikipedia.org/wiki/%D0%A3%D0%BF%D0%BB%D0%B8%D1%81%D1%86%D0%B8%D1%… (which can be nicely displayed as pretty Unicode in the URL bar of modern browsers) as opposed to the current: http://ru.wikipedia.org/wiki/%D0%A3%D0%BF%D0%BB%D0%B8%D1%81%D1%86%D0%B8%D1%… -- brion

2 7

Subpage titles
by Michael J. Walsh 04 Jan '09

04 Jan '09

This is a proposal to change how the MediaWiki software displays subpages. This isn't really an issue over a Wikipedia because subpages in the main namespace are disabled, but using subpages at Wikisource is a standard way of dividing up works, leaving only a table of contents at the root article. The problem is that this results in pages titles like this: United States Code/Title 35/Chapter 14/Section 151 or even Nicene and Post-Nicene Fathers: Series II/Volume I/Constantine/The Life of Constantine/Book II/Chapter 23 which IMHO looks more like a file system than a user-friendly website. I would suggest United States Code » Title 35 » Chapter 14 » Section 151 or my own favourite United States Code » Title 35 » Chapter 14 » Section 151 (With "Section 151" in bigger font.) This would effectively involve moving the subpages div above the title and changing the title from the entire path to just the subpage name. See: http://en.wikisource.org/wiki/Wikisource:Scriptorium#Subpage_formatting for the discussion I started on Wikisource and further down the same page http://en.wikisource.org/wiki/ Wikisource:Scriptorium#Subpage_formatting:_some_more_examples for some formatted versions of the above examples and http://en.wikisource.org/wiki/Nicene_and_Post- Nicene_Fathers:_Series_II/Volume_I/Constantine/ The_Life_of_Constantine/Book_II/Chapter_23 and http://en.wikisource.org/wiki/Treaty_on_European_Union/ Protocol_on_the_convergence_criteria_referred_to_in_Article_109j_of_the_ Treaty_establishing_the_European_Community for examples of how ugly the current setup can be. Michael

8 18

Change redirection on wikimania.wikimedia.org
by James R. 04 Jan '09

04 Jan '09

Hi guys, Could someone with the access please change the redirection on http://wikimania.wikimedia.org/ to the wikimania2009.wikimedia.org site? As it is now 2009, I think it would be best suitable to change this to the latest site. Thanks, James -- [[User:JamesR]] (formerly [[User:E]]) English Wikipedia Administrator Wikimedia Australia Member

4 6

← Newer
1
2
3
4
5
6
7
8
9
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l January 2009