Wikitech-l November 2005

wikitech-l@lists.wikimedia.org

139 participants
172 discussions

Brute force image server upgrade
by Brion Vibber 23 Nov '05

23 Nov '05

Since everybody's so frustrated about this, I'm going to go ahead and force the issue with the upload server. I'll be disabling uploads and turning off the upload.wikimedia.org web server for a few hours so we can get everything moved over and totally copied once and for all. Alas this'll mean not seeing images for a few hours, but it should finally be nicer after this. :D http://meta.wikimedia.org/wiki/November_2005_image_server -- brion vibber (brion @ pobox.com)

2 5

Dutch Main Page
by Galwaygirl 22 Nov '05

22 Nov '05

Hi all, On the Dutch Wikipedia we have a recurring discussion about our new main page. It loads very slowly at times. Some claim it's because there's too many images and templates in it, some claim it's because servers are slow, some claim it's due to both factors. What do experts think? ;-) http://nl.wikipedia.org/wiki/Hoofdpagina http://nl.wikipedia.org/wiki/Sjabloon:Inhoud (Template, versions 5 nov 2005 13:52 and up) Btw: User:Waerth has even gone on strike until the old main page is put back into place... (http://nl.wikipedia.org/wiki/Gebruiker:Waerth) Thanks, Galwaygirl

4 5

Mediawiki RDF extension available
by Evan Prodromou 22 Nov '05

22 Nov '05

Hi, folks. Just a quick note to let you know that there's an extension for MediaWiki available that allows customized RDF output and in-page user input of Turtle RDF. Code is here: http://wikitravel.org/~evan/mw-rdf-0.3.tar.gz This is in production on Wikitravel, only works for MediaWiki 1.4.x (at least for the history model, probably some other stuff is broken with the new database schema, too). More info here: http://wikitravel.org/en/Wikitravel:RDF http://meta.wikimedia.org/wiki/RDF README file is attached for below for people who don't follow URLs so much. I'll add it to extensions section of mediawiki CVS RSN, but I've been using darcs for version control so far and I CBA to merge to CVS yet. ~Evan ________________________________________________________________________ MediaWiki RDF extension version 0.3 16 November 2005 This is the README file for the RDF extension for MediaWiki software. The extension is only useful if you've got a MediaWiki installation; it can only be installed by the administrator of the site. The extension adds RDF (= Resource Definition Framework) support to MediaWiki. It will show RDF data about a page with a new special page, Special:Rdf. It allows users to add custom RDF statements to a page between <rdf> ... </rdf> tags. Administrators and programmers can add new automated RDF models, too. This is the first version of the extension and it's almost sure to have bugs. See the BUGS section below for info on how to report problems. == License == Copyright 2005 Evan Prodromou <evan(a)wikitravel.org> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA == Installation == You have to have MediaWiki 1.4.x installed for this software to work. Sorry, but that's the version I've got installed, so it's the one this software works with. You also have to install RAP, the RDF API for PHP (www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/) . I used version 0.92, plus some custom hacks to make the N3 parser less fragile. You have to apply a patch to the distribution if you want RDF to work; it's included in this distribution. (Future versions of RAP will have these enhancements). You can copy the file MwRdf.php to the extensions directory of your MediaWiki installation. Then add these lines to your LocalSettings.php: define("RDFAPI_INCLUDE_DIR", "/full/path/to/rdfapi-php/api/"); require_once("extensions/MwRdf.php"); == 60-second intro to RDF == RDF is a framework for making statements about resources. Statements are in the form: subject predicate object Here, "subject" is a "resource" such as a person, place, idea, Web page, picture, concept, or whatever. "Predicates" are names of properties of a resource, like its color, shape, texture, size, history, or relationships to other "resources". The object is the value of the property. So "car color red" would be a statement about a car; "Evan hasBrother Nate" would be a statement about a person. Of course, it's important to be definite about which resources and which properties we're discussing. In the Web world, each "resource" is identified with a URI (usually an URL). For electronic resources, this is usually pretty easy; the main page of English-language Wikipedia, for example, has the URI "http://en.wikipedia.org/wiki/Main_Page". However, for analog subjects like people or ideas or physical objects, this can be a little trickier. There's no general solution, but the typical workaround is to use real or made-up URIs to "stand in" for offline entities. For example, you could use the URI for my Wikitravel user page, "http://wikitravel.org/en/User:Evan", as the URI for me. Or you could use my email address in URI form, like "mailto:evan@wikitravel.org". People who need to agree on statements often create 'vocabularies' or 'schemas' that map concepts, object, and relationships to URIs. By popularizing such a mapping, we can all agree about what a particular URI "means". For example, the Dublin Core Metadata Initiative (DCMI) (http://www.dublincore.org/) has a schema for very simple metadata, such as you'd find on a library card. They've defined (among other things), that the idea of authoring or creating something is represented by the URL http://purl.org/dc/elements/1.1/creator. So you could say: http://www.fsf.org http://purl.org/dc/elements/1.1/creator mailto:rms@gnu.org ... means that the creator of the Free Software Foundation is Richard Stallman. There are a lot of RDF models out there; you can also create your own if you want. RDF statements can be encoded in a number of different ways. By far the most popular is as XML, sometimes called "RDF/XML". "Turtle" is another format, which uses plain text rather than XML; and "Ntriples" is still another. == Models == For any given resource you can describe it from many different perspectives. For example, you can describe a man in terms of his academic career, his job experience, his family members, his body parts' size and weight, his location in space, his membership in organizations, his hobbies and interests, etc. In this extension, we use the term "model" to describe a perspective on a resource. For example, listing the links to and from a page is one model; its edit history is another model. You can choose which models you want to know about when querying the system for RDF statements about a subject, and only statements in that model are returned. This is mostly a concession to performance; it doesn't make sense to calculate information about the history of a page if calling program isn't going to use it. There are a number of models built into this extension; you can also add your own, if you know how to code PHP. The models have short little codenames for easy access, listed below. Models built in: * dcmes: Dublin Core Metadata Element Set (DCMES) data. Mostly information about who edited a page, when, and other simple stuff. Titles, format, etc. This is a common vocabulary that's very useful for general-purpose bots. * cc: Creative Commons metadata. Gives license information; there are a few tools and search engines that use this data. * linksto, linksfrom, links: Internal wiki links to and from a page. "links" is a shortcut for both. * image: DCMES information about images in a page. * history: version history of a page; who edited the page and when. * interwiki: links to different language versions of a page. * categories: which categories a page is in. * inpage: a special model for blocks of RDF embedded into the source code of MediaWiki pages; see "In-page RDF" below for info. == Special:RDF == You can view RDF for a page using the [[Special:Rdf]] feature. It should be listed on the list of special pages as "Rdf". Enter the title of the page you want RDF for in the title box, and choose one or more of the RDF models from the multiselect box. You can also select which output format you want; XML is probably most useful and can be viewed in a browser. The Special:Rdf page can also be called directly, with the following parameters: * target: title of the article to get RDF info about. If no target URL is provided, the special page shows the input form. * modelnames: comma-separated list of model names, like "links,cc,history". Default is a list of standard models, configurable per-site (see below). * format: output format; one of 'xml', 'turtle' and 'ntriples'. Default is XML. == In-page RDF == Any user can make additional RDF statements about any resource by adding an in-page RDF block to the page. The RDF needs to be in Turtle format (http://www.dajobe.org/2004/01/turtle/), which is extremely simple. It's a subset of Notation3 (http://www.w3.org/DesignIssues/Notation3.html), for which there is a good introduction. (http://www.w3.org/2000/10/swap/Primer.html) RDF blocks are delimited by the tag "<rdf>". They're invisible for normal output, but they can provide information for RDF-reading items. Here's an example: Mathematics is ''very'' hard. <rdf> <> dc:subject "Mathematics"@en . </rdf> Here, the rdf block says that the subject of the article is "Mathematics". Note that <> in Turtle means "this document". Another example: Chilean wines are quite delicious. <rdf> <> dc:source <http://example.org/chileanwines.html> . <http://example.org/chileanwines.html> dc:creator "Bob Smith" . </rdf> Here, we've said that the article's source is another Web page on another server; we can also say that that other Web page's author is Bob Smith. In-page RDF is displayed whenever the "inpage" model is requested for Special:RDF; it's one of the defaults. It's also useful for people making MediaWiki extensions; you can have users add information in in-page RDF, and then extract it and read it using the function MwRdfGetModel(). This lets users add data that isn't for presentation but perhaps for automated tools to use. Note also that MediaWiki templates are expanded when in-page RDF is queries. So if the syntax of Turtle is daunting, you can add templates that make it easier. For example, we could create a template Template:Source for showing source documents: <rdf> <> dc:source <{{{1}}}> . <{{{1}}}> dc:creator "{{{2|anonymous}}}" . </rdf> We could then make the same statement as above with a template transclusion: {{source|http://example.org/chileanwines.html|Bob Smith}} Note that a number of namespaces are pre-defined for your RDF blocks. Some basic namespaces are provided by RAP; you can define custom namespaces with the global variable $wgRdfNamespaces . In addition, each of the article namespaces is mapped to a namespace prefix in Turtle, so you can say something like this: <rdf> Wikitravel_talk:Spelling dc:subject Wikitravel:Spelling . :Montreal dc:spatial "Montreal" . </rdf> Note that the default prefix (":") is the article namespace. == Customization == There are a few customization variables available, mostly for programmers. $wgRdfDefaultModels -- an array of names of the default models to use when no model name is specified. $wgRdfNamespaces -- You can add custom namespaces to this associative array, of the form 'prefix' => 'uri' . $wgRdfModelFunctions -- an associative array mapping model names to functions that generate the model. See below for how to add a new model. $wgRdfOutputFunctions -- A map of output format to functions that generate that output. You can add new output formats by adding to this array. == Extending == You can add new RDF models to the framework by creating a model function and adding it to the $wgRdfModelFunctions array. The function will get a single MediaWiki Article object as a parameter; it should return a single RAP Model object (a collection of statements) as a result. For example, function CharacterCount($article) { # create a new model $model = ModelFactory::getDefaultModel(); # get the article source $text = $article->getContent(true); # ... and its size $size = mb_strlen($text); # Get the resource for this article $ar = MwRdfArticleResource($article); # Add a statement to the model $model->add(new Statement($ar, new Resource("http://example.org/charcount"), new Literal($size))); # return the model return $model; } You can then give the model a name like so: $wgRdfModelFunctions['charcount'] = 'CharacterCount'; You can add a message to the site describing your model like so: $wgMessageCache->addMessages(array('rdf-charcount' => 'Count of characters')); You can also create model-outputting functions if you so desire; they should accept a RAP model as input and make output as they would to the Web. This is probably only useful if you want a specific RDF encoding mechanism that's not RDF/XML, Turtle, or Ntriples; for example, TriG or TriX. == Future == These are some future directions I'd like to see things go: * Store statements in DB: statements could be stored in the database when the page is saved and retrieved when needed. This would make it to do extended queries based on information about *all* pages. * Performance: there wasn't much performance tuning and there are probably way too many DB hits and reads and such. * Semantic tuning: I'd like to make sure that the statements in the standard models are accurate and useful. == Bugs == Send bug reports, patches, and feature requests to Evan Prodromou <evan(a)wikitravel.org> . -- Evan Prodromou <evan(a)wikitravel.org> Wikitravel (http://wikitravel.org/) -- the free, complete, up-to-date and reliable world-wide travel guide

1 0

Portal namespace
by Guillaume Blanchard 22 Nov '05

22 Nov '05

Hi, We requested a new 'portal' namespace about one year ago but this came to nothing. Some days ago, we discovered the English and German Wikipedia are now using this namespace (sic!) so we requesting to be able to do same [1]. The French word for portal is 'portail'. Regards, Aoineko [1] http://fr.wikipedia.org/wiki/Wikip%C3%A9dia:Le_Bistro/30_ao%C3%BBt_2005#Un_…

4 3

Re: [Wikitech-l] Re: <link> elements for interlanguage link
by Any File 21 Nov '05

21 Nov '05

Rob Lanphier wrote: > Subject: Re: [Wikitech-l] Re: <link> elements for interlanguage link > information > To: Wikimedia developers <wikitech-l(a)wikimedia.org> > Message-ID: <1132538575.6605.35.camel(a)localhost.localdomain> > Content-Type: text/plain > > I'm not aware of any <link> syntax, but one way to do it would be for > MediaWiki to issue an HTTP 301 status (permanent redirect) to the new > page, rather than returning 200 and giving the content. That probably > introduces an unacceptably large performance penalty, though (extra > round trip per request). > > The "Content-Location" HTTP header is a potential longshot. I don't > think Google documents their use/non-use of this header, but it's one of > those "can't hurt" kind of things. > > http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.14 > > It appears it can be tacked on using a meta http-equiv tag in the HTML > head. > Reading the specs it seems more to differenziate among different content retrived from the same URI/URL rather than (as is our case) to state that different URL/URI correspond to the smae content. Despite of this, it seems a brillant idea to use this header. Does anybody has any contact with google or yahoo (or other web search) to ask them their opinion about this and other possible solution? AnyFile

1 0

Re: <link> elements for interlanguage link
by Any File 21 Nov '05

21 Nov '05

Timwi wrote: <snip> > Speaking of which - this reminds me of an idea I had a while ago and I > was wondering if anyone would be interested to hear this. Currently many > Wikipedia pages in Google search results are redirects (for example, > Google for "nonogram" and look at the seventh search result). I was > wondering if there is a <link> element one could use to say that another > URL is the "real" page? Then the page returned for a redirect's URL > would tell search engines the URL of the page it's redirecting to. Is a list the names of the pages redirected to a page inserted among he keywords of the target page? One more radical way would be to respond to a request of a page that is a redirect page with an http redirect, when a web crawler is detected asking the page. AnyFile

1 0

Wikisource dump
by Sergey Samoylov 21 Nov '05

21 Nov '05

Hi there... Is it possible to get dump of en.wikisource.org <http://en.wikisource.org> The dump from http://download.wikimedia.org/special/sources/ include just dump of wikisource.org <http://wikisource.org> which doesn't contain usefull articles. Just messages like: This page was moved to http://en.wikisource.org/wiki/The_Pioneers_-_Chapter_5 Please do not add anymore english pages to this site.<p> Thank you All the best! Sergey

2 2

RE: [Wikitech-l] Re: LiquidThreads [WAS: Re: the discussion page: summary and better display of the treats, mbox format?]
by Lane, Ryan 21 Nov '05

21 Nov '05

Timwi <timwi@...> writes: > > > > I do think these are two seperate points: > > > > * how to improve the discussion pages on a wiki > > * whether each author own his/her comment or not. > > But the point is that the answer to the second influences whether the > solution proposed for the first is seen as an "improvement". I feel that > if the ability to edit other people's comments is taken away from me, I > can't label it an "improvement". > You may not label it an improvement, but there are others who definately would. > > Discussions, OTOH, also involve personal opinions. Danger lies ahead when > > the opionion can be changed, but is still labeled (or signed, if you > > wish) with the original authors name. > > We already have this "danger", and we've had it since the beginning of > Phase II, and it has not turned out to be a great problem, so this is > not an argument. > I've had people complain to me about moving their comments around on my LDAP patch's page on meta. I erased one person's edit because it was a non-working solution, and had a complaint about that. Just because you don't think this is a problem, doesn't mean it isn't a problem. I can definately see lawsuits based upon this. This is definately a valid argument. > > Just imagine that this discussion we have is on a wiki, this is the latest > > edition (you would need to check the history, aka mailing list archives > > to see the full revisions) and it contained: > > > > On Tuesday 01 November 2005 17:36, Timwi wrote: > >>>Any model, if over applied, is harmful. > >>Agree. > >>I am strongly in favour of LiquidThreads. > > > > See the danger? > > A fallacious argument by false dilemma, or by lack of imagination, or > whatever you wanna call it. You almost provided the answer to this one > yourself: > > > (for the record, the above quote of three lines was > > written/shortened by me, not Timwi). > > And that is what it should say. > > COMMENT #328645 by [[User:Timwi]] > > Agree. I am strongly in favour of LiquidThreads. > (This comment was last edited by [[User:Tels]] <date/time>.) > > If <date/time> is a minute ago, I better check the diff. If it was an > hour ago, I can probably assume that your edit was harmless. > > Therefore, again, your "danger" is not an argument against the ability > to edit comments. > Why can you assume that the edit was harmless? During katrina, I had no internet access for weeks. If someone maliciously edited some of my comments during that time, would you assume that what was there is actually what I wrote? Ignoring catastrophies like a large blackout, or a hurricane: say someone goes on vacation, or simply hasn't checked his discussions recently, or if an article's discussion page hasn't been updated in a long while, and someone stops checking it as often; in these cases, vandalism may go unnoticed for QUITE a while, where readers may be seeing the vandalised version for the entire time. In this aspect, there is "danger" in others editing comments. > > If we can improve the discussion page itself, *and* prevent > > misrepresentation at the same time, well, that would be great :) > > It's really easy. > > Timwi > I think the original idea of LiquidThreads is a good solution for the problem. I don't believe the implementation would be easy though ;). Ryan Lane

15 34

Wikimedia Servers and Organization
by Jürgen Herz 20 Nov '05

20 Nov '05

Hello, I used the last hours trying to dig in the infrastructural organization of the Wikimedia servers. My starting points where [[meta:Wikimedia_servers]] and Ganglia and my motivation was Wikipedias slowness in the last time. In contrast to my expectations, the database servers are far away from being under high load. It even seems the pressure is so low, you can easily live without holbach and webster for days (resp. over a month). Bottlenecks are Apaches and Squids (yes, I know that's nothing new for you). But like all other clusters too, the load is very unequally distributed over the machines. For example the Yahoo! squids showed yf1003 9.39, yf1000 7.60, yf1004 1.60, yf1002 1.44, yf1001 0.73 at noon (UTC) today and similar load values (albeit with a different distribution) at other times. Or the Apaches in Florida: 16 Apaches with load around 15, 9 between 1.5 and 2, 8 between 1 and 1.5 and 10 less than 1. Where does this come from, or is this wanted? Wouldn't a more balanced load be better? Other point: The Yahoo! Squids do virtually nothing between 18:00 and 0:00 (and machines besides yf1000-yf1004 to virtually nothing around the clock). How nice would it be make them helping out the other overloaded machines in Florida and Netherlands at least in these six hours. And no, I don't criticize anyone or know how to do it better. But available informations look strange to me - it would be great to get some explanations. Speaking of explanations. I've three more simple questions: 1. Squids at lopar idle all the time since dns has been moved of them. What where the problems with them and will they be back soon? 2. Commons is very slow since the move from the prior "overloaded" server to the new one. Any explanation to satisfy a simple user? And what server is the new one? 3. I read about new machines srv51-70. Where do they come from? Can't see a recent order for them or they are mentioned on [[meta:Wikimedia_servers]]. Thank you in advance, Juergen

5 6

DokuWiki -> MediaWiki Konverter
by norial 20 Nov '05

20 Nov '05

Hallo Ich möchte von dem DokuWiki (http://wiki.splitbrain.org/wiki:dokuwiki) Daten in das MediaWiki importieren. Die Daten sind als UTF-8 Textfile gespeichert. Gibt es für sowas schon ein Konverter? Mein Idee ist es, direkt in die mySQl-Datenbank zu importieren. Was ich vor allem suche ist eine Routine, die ins MediaWIki am Besten über eine Funktion wie insertNewArticle schreibt. Daher muss das PHP-Skript soweit ich das sehe nur die localsettings.php laden, zur Datenbank verbinden, einen Admin anmelden und dann den Text, Zusammenfassung, Autor übergeben und schreiben. Die Routine zum Auslesen der Daten würde ich natürlich selbst einfügen. Kann mir dabei jemand helfen? Grüße andreas

2 2

← Newer
1
2
3
4
5
6
7
8
9
...
18
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l November 2005