Wikitech-l September 2007

wikitech-l@lists.wikimedia.org

98 participants
126 discussions

A brewing embarassment- Zero byte results for non-zero byte files.

by Gregory Maxwell

We've had a long term problem with amane returning a success code and content-length: 0 for perfectly valid files which are certainly longer than zero bytes. Once one of these zero byte files has been dished out the squids will happily hold on to it for a long time. Purging clears the problem for the impacted file. The problem seems to go up and down in frequency depending on how overloaded amane is. It seems to be more frequent with larger files, but that might just be the biases of own experiences talking. People have toyed with the problem on and off but it's still not fixed. Tim took the the OggHandler extension live today. The inline players have brought a lot more attention to media support. But I've now gotten more bug reports related to stuck files than any other issue with the player. It's pretty miserable that our new feature is going to give a poor first impression to a lot of people because of this unresolved backend issue.

16 years, 8 months

MediaWiki automated test run failure 2007-09-08

by brion＠pobox.com

An automated run of parserTests.php showed the following failures: This is MediaWiki version 1.12alpha (r25657). Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"... Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"... 17 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed] Passed 527 of 544 tests (96.88%)... 17 tests failed!

16 years, 8 months

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [25605] trunk/extensions/OggHandler/OggPlayer.js

by Brion Vibber

tstarling(a)svn.wikimedia.org wrote: > Revision: 25605 > Author: tstarling > Date: 2007-09-07 04:02:32 +0000 (Fri, 07 Sep 2007) > > Log Message: > ----------- > You might find email to be more efficient for sending private > messages -- that way only the recipient downloads it, instead of the recipient plus 100 million Wikipedia readers. Or indeed, a commit notice. With a large file in the data parameter in Mozilla, the browser downloads the entire file before it starts playing. The referenced document may describe streaming, but the same technique is also necessary for cross-browser progressive download. Comments are meant to aid all future maintainers of the software to understand the purpose of the nonobvious techniques, so a private e-mail would be entirely mistargeted. Since you'd prefer to keep the JavaScript comments small, though, I've gone ahead and replaced it with a URL to this commit message. -- brion vibber (brion @ wikimedia.org)

16 years, 8 months

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [25635] trunk/phase3

by Simetrical

On 9/7/07, brion(a)svn.wikimedia.org <brion(a)svn.wikimedia.org> wrote: > They force operational work before we can update the software again Do they? I assumed that Wikimedia updates didn't actually run update.php, with schema changes done manually instead. I deliberately committed nothing that depended on them, so if you updated the rest of the software nothing would break from not doing the schema change until you're ready. I was going to commit code actually using the new stuff only after the schemas were updated, presumably in another batch some time from now. What's problematic in that scenario?

16 years, 8 months

Cookies and preferences on Wikimedia sites. Was: Video/audio player extension now on test.wikipedia.org

by Gregory Maxwell

On 9/6/07, Tim Starling <tstarling(a)wikimedia.org> wrote: [snip] > Note that when you choose a player, it saves a cookie with your selection, > and then uses that choice from then on. The expiry time might need some > tweaking though, I think at the moment it expires at the end of the session. So.. I've been avoiding using cookies to save state for any JS used by anonymous users because setting a persistant cookie will break squid caching. For something like a video player preference it might not be too bad.. but it would be good if we had a safe way to save a little client side state from scripts without risk of hurting caching.

16 years, 8 months

MediaWiki automated test run failure 2007-09-07

by brion＠pobox.com

An automated run of parserTests.php showed the following failures: This is MediaWiki version 1.12alpha (r25611). Reading tests from "maintenance/parserTests.txt"... Reading tests from "extensions/Cite/citeParserTests.txt"... Reading tests from "extensions/Poem/poemParserTests.txt"... Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"... 17 still FAILING test(s) :( * URL-encoding in URL functions (single parameter) [Has never passed] * URL-encoding in URL functions (multiple parameters) [Has never passed] * Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed] * Link containing double-single-quotes '' (bug 4598) [Has never passed] * message transform: <noinclude> in transcluded template (bug 4926) [Has never passed] * message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed] * BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed] * HTML bullet list, unclosed tags (bug 5497) [Has never passed] * HTML ordered list, unclosed tags (bug 5497) [Has never passed] * HTML nested bullet list, open tags (bug 5497) [Has never passed] * HTML nested ordered list, open tags (bug 5497) [Has never passed] * Inline HTML vs wiki block nesting [Has never passed] * Mixing markup for italics and bold [Has never passed] * dt/dd/dl test [Has never passed] * Images with the "|" character in the comment [Has never passed] * Parents of subpages, two levels up, without trailing slash or name. [Has never passed] * Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed] Passed 527 of 544 tests (96.88%)... 17 tests failed!

16 years, 8 months

Please try LiquidThreads

by Erik Moeller

LiquidThreads is a threaded discussion system for MediaWiki, developed by David McCabe with project management by Stichting Open Progress. It replaces standard talk pages in the wiki. I hope that one day it will be used on Wikimedia Foundation projects. LiquidThreads development has reached a stage where we'd like to invite the general public to try it out. There is an online demo: http://wikixp.org/lqt/index.php/Main_Page Please report bugs & issues _directly on the wiki_ and not on this mailing list. Sincerely, Erik Möller CTO, Stichting Open Progress

16 years, 8 months

Categories of current page

by Andreas Rindler

Hi, I am creating a tag extension that uses the categories of the current page. How can I get a list of all categories that are assigned on the page? And I don't want the parent categories. I have found $wgPageTitle = $parser->getTitle(); $wgParentCats = $wgPageTitle->getParentCategories(); but it lists all the parent categories as well. Thanks, Andi

16 years, 8 months

Statistics on templates and references

by Lars Aronsson

A year ago, I wrote a little script for extracting template calls from the XML database dump. The idea is that many templates are infoboxes that provide structured information, such as the population density of a country or bibliographic information in book citations. The script is now updated to also extract ISBNs and <ref> tags, as if these had been templates. http://meta.wikimedia.org/wiki/User:LA2/Extraktor I downloaded the reasonably small Wikipedia dumps for the Scandinavian and Baltic languages and compiled some statistics, such as the 50 most used templates, the 20 most cited ISBNs and the 15 most common things to find inside <ref> tags. http://meta.wikimedia.org/wiki/User:LA2/Extraktor_stats_200709 Of these languages, Swedish is the biggest (the uncompressed database dump is 600 MB) followed by Finnish (481 MB) and Norwegian (415 MB). But Finnish is far ahead in the use of references and templates. One way to describe this degree of structure is the size of my script's output compared to its input: Language Dump size Extraktor output ----------------- --------- ---------------- lt = Lithuanian 152 MB 18.4 % or 28 MB no = Norwegian 415 MB 16.9 % nn = Nynorsk 85 MB 15.3 % fi = Finnish 481 MB 14.1 % is = Icelandic 66 MB 12.7 % se = Sami 5.1 MB 10.8 % da = Danish 209 MB 10.5 % sv = Swedish 600 MB 10.2 % fo = Faroese 7.8 MB 8.9 % et = Estonian 116 MB 8.3 % lv = Latvian 45 MB 8.2 % fiu-vro = Võro 3.5 MB 6.4 % I can't fully explain why the Lithuanian WP ranks so high. Perhaps there is an opening <ref> that doesn't close, causing many bytes to be included? If so, my script could help to find and hunt down such errors. (I also tried the Yiddish Wikipedia and got an even higher ranking, but I can't understand anything of that language, so I'm totally clueless.) And the ranking doesn't quite capture the fact that the Finnish Wikipedia contains 59365 <ref> tags and 15108 ISBNs, while Swedish has 28956 and 10742, respectively, and the Norwegian 19078 and 9060. The main difference seems to be the "good" examples above 12% and the laggards below 12%. Swedish and Danish should learn from Norwegian and Finnish. My conclusions are not final. The message is that the script exists, and you are all free to help in digging out interesting information. -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

16 years, 8 months

DBpedia - Querying Wikipedia like a Database: Improved dataset released.

by Chris Bizer

Hi all, after quite some work into improving the DBpedia information extraction framework, we have released a new version of the DBpedia dataset today. DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia and to link other datasets on the Web to Wikipedia data. The DBpedia dataset describes 1,950,000 "things", including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset consists of around 103 million RDF triples. The Dataset has been extracted from the July 2007 Wikipedia dumps of English, German, French, Spanish, Italian, Portuguese, Polish, Swedish, Dutch, Japanese, Chinese, Russian, Finnish and Norwegian versions of Wikipedia. It contains descriptions in all these languages. Compared to the last version, we did the following: 1. Improved the Data Quality We increased the quality of the data, be improving the DBpedia information extraction algorithms. So if you have decided that the old version of the dataset was too dirty for your application, please look again, you will be surprised :-) 2. Third Classification Schema Added We have added a third classification schema to the dataset. Beside of the Wikipedia categorization and the YAGO classification, concepts are now also classified by associating them to WordNet synsets. 3. Geo-Coordinates The dataset contains geo-coordinates for for geographic locations. Geo-coordinates are expressed using the W3C Basic Geo Vocabulary. This enables location-based SPARQL queries. 4. RDF Links to other Open Datasets We interlinked DBpedia with further open datasets and ontologies. The dataset now contains 440 000 external RDF links into the Geonames, Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP Bibliography and Project Gutenberg datasets. Altogether, the network of interlinked datasources around DBpedia currently amounts to around 2 billion RDF triples which are accessible as Linked Data on the Web. The DBpedia dataset is licensed under the terms GNU Free Documentation License. The dataset can be accessed online via a SPARQL endpoint and as Linked Data. It can also be downloaded in the form of RDF dumps. Please refer to the DBpedia webpage for more information about the dataset and its use cases: http://dbpedia.org/ Many thanks for their excellent work to: 1. Georgi Kobilarov (Freie Universität Berlin) who redesigned and improved the extraction framework and implemented many of the interlinking algorithms. 2. Piet Hensel (Freie Universität Berlin) who improved the infobox extraction code, wrote the unit test suite. 3. Richard Cyganiak (Freie Universität Berlin) for his advice on redesigning the architecture of the extraction framework and for helping to solve many annoying Unicode and URI problems. 4. Zdravko Tashev (OpenLink Software) for his patience to try several times to import buggy versions of the dataset into Virtuoso. 5. OpenLink Software altogether for providing the server that hosts the DBpedia SPARQL endpoint. 6. Sören Auer, Jens Lehmann and Jörg Schüppel (Universität Leipzig) for the original version of the infobox extraction code. 7. Tom Heath and Peter Coetzee (Open University) for the RDFS version of the YAGO class hirarchy. 8. Fabian M. Suchanek, Gjergji Kasneci (Max-Plank-Institut Saarbrücken) for allowing us to integrate the YAGO classification. 9. Christian Becker (Freie Universität Berlin) for writing the geo-coordinates and the homepage extractor. 10. Ivan Herman, Tim Berners-Lee, Rich Knopman and many others for their bug reports. Have fun exploring the new dataset :-) Cheers Chris -- Chris Bizer Freie Universität Berlin Phone: +49 30 838 54057 Mail: chris(a)bizer.de Web: www.bizer.de -- Chris Bizer Freie Universität Berlin Phone: +49 30 838 54057 Mail: chris(a)bizer.de Web: www.bizer.de

16 years, 8 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l September 2007