Wikitech-l December 2008

wikitech-l@lists.wikimedia.org

89 participants
65 discussions

Re: [Wikitech-l] The never-dying topic: category intersection
by Aerik Sylvan 08 Dec '08

08 Dec '08

Okay, it's not quite done, and it's still really crude, but starting to take shape - I've got some basic intersections functionality running on Wikidweb - I hacked skin.php and added links to the special intersections page. The intersections are using a MyIsam fulltext index. I'm not using 'boolean mode' queries, as this seems to give more interesting results. (look at any article page at http://wikidweb.com) The UI on the special page itself is really ugly and needs lots of work, and once this is all done it will have to be ported to an up-to-date version of mediawiki (I'm way down rev), but *it's a start*. Comments, suggestions, criticisms, and offers to help all welcome. Best Regards, Aerik -- http://eventfeed.org - An Initiative Promoting Syndication of Events http://www.wikidweb.com - the Wiki Directory of the Web http://tagthis.info - Hosted Tagging for your website!

3 2

The never-dying topic: category intersection
by Magnus Manske 08 Dec '08

08 Dec '08

(feel free to bash me if we had this variant already, I couldn't find it in the list archives) Task: On German Wikipedia (yay atomic categories!), find women who were born in 1901 and died in 1986. Runtime : Toolserver, <2 sec Query: SELECT * FROM ( SELECT page_title,count(cl_to) AS cnt FROM page,categorylinks WHERE page_id=cl_from AND cl_to in ( "Frau" , "Geboren_1901" , "Gestorben_1986" ) GROUP BY cl_from ) AS tbl1 WHERE tbl1.cnt = 3 ; Trying to "poison" the query by also looking in all GFDL images ("GFDL-Bild", ~60K entries in category) increases runtime to 3 sec., so not that bad. I've implemented this as a tool now: http://toolserver.org/~magnus/category_intersection.php Queries seem to take a little longer there (2-4 sec) compared to the command line. Articles on en.wikipedia with "1905 births" and "1967 deaths" took <0.4 sec. OTOH, looking for images on Commons in "GFDL" and "Buildings in Berlin" took ~2min. Might be the giant GFDL category, or the toolserver, or both. I'll try to fiddle with it some more utilising cat_pages/cat_files. Magnus

16 42

Does info@wikipedia.org work?
by David Gerard 08 Dec '08

08 Dec '08

Just a quick check - I did the BBC Radio 4 Today show about the [[:en:Virgin Killer]] blocking. Both presenters asked afterwards how to get their crappy articles fixed - I said "email info(a)wikimedia.org". Bet you they email info(a)wikipedia.org - does that address redirect in the obvious and sensible fashion? Same for info-en@, etc? I realise these addresses are deprecated, but you know people are going to go there first. - d.

2 1

Bugzilla Weekly Report
by reporter＠isidore.wikimedia.org 07 Dec '08

07 Dec '08

MediaWiki Bugzilla Report for December 01, 2008 - December 08, 2008 Status changes this week Bugs NEW : 107 Bugs ASSIGNED : 10 Bugs REOPENED : 12 Bugs RESOLVED : 52 Total bugs still open: 3197 Resolutions for the week: Bugs marked FIXED : 21 Bugs marked REMIND : 0 Bugs marked INVALID : 4 Bugs marked DUPLICATE : 11 Bugs marked WONTFIX : 7 Bugs marked WORKSFORME : 9 Bugs marked LATER : 0 Bugs marked MOVED : 0 Specific Product/Component Resolutions & User Metrics New Bugs Per Component Uniwiki 10 Site requests 6 Special pages 4 API 4 General/Unknown 3 New Bugs Per Product MediaWiki 25 Wikimedia 10 MediaWiki extensions 15 Top 5 Bug Resolvers roan.kattouw [AT] home.nl 7 markus [AT] semantic-mediawiki.org 7 innocentkiller [AT] gmail.com 6 siebrand [AT] wikipedia.be 5 JSchulz_4587 [AT] msn.com 5

1 0

Re: [Wikitech-l] The never-dying topic: category intersection
by Aerik Sylvan 06 Dec '08

06 Dec '08

Magnus - I checked out your tool, but it looks like you're using a query against the categorylinks table? Have you played with setting up a new table for categories and fulltext indexing it? Use group_concat to get all of a pages categories into one field, then create a fulltext index on that field. You get much better performance than using the categorylinks table (kind of weird, eh?) Are you pinging a live database, or a copy made from a dump? (please excuse my ignorance if this is common knowledge) I'm working on dummying up a UI using the same approach (fulltext index of categories) on wikidweb and will write back when I've got something worth looking at... Best Regards, Aerik -- http://eventfeed.org - An Initiative Promoting Syndication of Events http://www.wikidweb.com - the Wiki Directory of the Web http://tagthis.info - Hosted Tagging for your website!

3 2

Donating/offering domain wikipedia.ro for use
by Gutza 06 Dec '08

06 Dec '08

Dear all, I am an administrator, bureaucrat and checkuser on the Romanian Wikipedia. I have contacted the current owner of domain wikipedia.ro asking him to consider donating the domain to Wikimedia Foundation, Inc. (there is no local chapter in Romania). He's considering the option, but in the meanwhile he has offered to allow us use of the domain -- in other words, he asked for the appropriate nameserver IP addresses he should associate with that domain (obviously, that should lead to the content currently served at ro.wikipedia.org). Could that be arranged? If so, please provide the respective IP addresses so I can pass them on to him. In a different train of thoughts, should he agree donating the domain altogether to WMF, can WMF to take ownership, or is that against any policy? Best regards, Gutza

6 10

GitTorrent (pie-in-the-sky post)
by David Gerard 05 Dec '08

05 Dec '08

http://advogato.org/article/994.html Peer-to-peer git repositories. Imagine a MediaWiki with the data stored in git, and updates distributed peer-to-peer. "Imagine if Wikipedia could be mirrored locally, run on a local mirror, where content was pushed and pulled, GPG-Digitally-signed; content shared via peer-to-peer instead of overloading the Wikipedia servers." This would certainly go some way to solving the "a good dump is all but impossible" problem ... (so, anyone hacked up a git backend for MediaWiki revisions rather than MySQL? :-) ) - d.

10 12

DISPLAYTITLE patch waiting for review
by Remember the dot 05 Dec '08

05 Dec '08

Greetings developers, I finally got around to finishing my patch for bug 12298: https://bugzilla.wikimedia.org/show_bug.cgi?id=12998 This patch should completely eliminate the need for JavaScript hacks that alter the page title. If a developer could take a look, that would be great! -- Remember the dot http://en.wikipedia.org/wiki/User:Remember_the_dot

1 0

Re: [Wikitech-l] The never-dying topic: category intersection
by Aerik Sylvan 05 Dec '08

05 Dec '08

On Wed, 03 Dec 2008 17:05:39 +0100, Roan Kattouw <roan.kattouw(a)home.nl> wrote: > > > Daniel Schwen schreef: > > So how does this take care of deep indexing non-atomic categories? > > > Err.. what? Please explain what you mean by that. I think he means finding stuff that's already buried in sub-sub categories, when you query on a parent category. Like querying for and intersection of [[Category:Deceased people]] and [[Category:Presidents of the United States]] won't find the guys listed in [[Category:Deceased Presidents of the United States]] without re categorizing those entries. > > > =>How will this extension be even remotely useful for let's say commons? > > > Without addressing Commons in particular, having an efficient way to get > pages in the intersection of multiple categories would allow wikis to > delete a category such as [[Category:Deceased Presidents of the United > States]] and replace it by, say, [[Intersection:Deceased Presidents of > the United States]], which would list all articles in > [[Category:Deceased people]] and [[Category:Presidents of the United > States]]. My extension alone doesn't make that possible, but it makes > implementing such a feature considerably easier. > > This discussion is far from over. The basic problems are _not_ solved. > > > Would you care to elaborate on what those unsolved problems are? I thought we were 90% of the way there when you wrote this extension, having reasonably solved the efficiency (speed) issues with the fulltext and lucene based approaches, and the view of the atomic categories problem was that it would be solved by people, not tech. In other words, I thought we all assumed that once people were empowered with category intersections, they'd make categories that make use of them. If not, then that's a problem to solve, but not an obstacle to implementing category intersection. My input would be to implement intersections, see what happens, and look at other functionality for intersections v.2. > > > I'm sure this thread will die out soon. > > Half of the participants will again be soothed by the promise of some > easy > > solution just barely beyond the horizon, while the half that realizes > that > > said solution _cannot possibly work_ without a radical reform of the > category > > system will again be too annoyed (I'm getting there already) to continue > > discussing. > It would be nice if you didn't judge people as naive rightaway. > Seconded. But it sounds like maybe those of us who'd like to see this happen should discuss a UI (or several) for it. I was thinking the most intuitive interface was a sort of "browse" type function, where for any given group of categories (could just be one category), you have two result sets: related categories (other categories of pages in the starting category), and articles at the intersection of the group. The articles are what we generally think of, but the related categories gives us an intuitive way to navigate through category intersections. The articles in the group of categories are the problem we've already solved (mostly): they are the result from the fulltext or lucene search. The related categories problem is harder, I think, as the most obvious way to get to that is to get all the categories belonging to those articles, and then collapse them and rank them. For large result sets, this can get time consuming again, and we would not want to (I think) build the related categories only with the first page of results. OTOH... if we took the first 100 results of a given category intersection, then queries the categorylinks table for all the categories belonging to those articles, and collapsed that... that would be a pretty good estimate at related categories. It wouldn't give all of them, but it would be a nice set of sample data. What do you think? Onto a soap box for a minute: the fact that this topic won't die, in 4 years, to me means that it's a really needed feature. Once implemented it will give people a great tool to more efficiently find information. Looking at things that are happening around the web with tags, Google adopting ideas from Wikia search, semantic web stuff, I'm thinking that we are really at the beginning of a movement to add structured metadata to information on the net. In concert with all the wonderful algorithms that try to guess what a given web page is about, we are doing things to explicitly state what a web page is about, providing users a much better chance at being able to find it. Developing category intersections for Wikipedia would be a milestone in that movement. Aerik -- http://eventfeed.org - An Initiative Promoting Syndication of Events http://www.wikidweb.com - the Wiki Directory of the Web http://tagthis.info - Hosted Tagging for your website!

6 6

Re: [Wikitech-l] [WikiEN-l] Suggestion on how referencing system could be improved
by Gregory Maxwell 05 Dec '08

05 Dec '08

On Wed, Dec 3, 2008 at 8:46 PM, Thomas Larsen <larsen.thomas.h(a)gmail.com> wrote: > Hi all, > > The current <ref>...</ref>...<references/> system produces nice > references, but it is flawed--all the text contained in a given > reference appears in the text that the reference is linked from. For > example: [snip] > Once way I could conceive of correcting the problem is to have a > reference tag that provides only a _link_ to the note via a label and > another type of reference tag that actually _defines_ and _displays_ > the note. For example: [snip] Thats a lot like what we used to do, the problem is that references were *constantly* orphaned, scrambled, etc. The references were often nonsense. My view is that the current behavior is bad mostly because it makes it very hard to read the text in edit, you get this wall of meaningless markup. Instead I propose: Have javascript mediate the edit box so that inline references are converted to little red [R] text, moving your cursor into the [R] area by clicking or arrowkeying causes it to expand to display the full reference. You can add references by simply typing them like normal and then they'll collapse when you navigate away, or you can press some "insert reference" button that pops up a dialog that asks for the relevant information which then types the completed reference for you. This type of hiding could also be applied to other common inline markup and dramatically improve usability. This type of edit box mediation has been done by other edit-helper userscripts, so it's certainly possible. Thoughts?

6 6

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l December 2008