Wikitech-l February 2008

wikitech-l@lists.wikimedia.org

100 participants
104 discussions

Re: [Wikitech-l] Wikitech-l Digest, Vol 55, Issue 40
by Aerik Sylvan 25 Feb '08

25 Feb '08

On Sat, Feb 23, 2008, Tim Starling <tstarling(a)wikimedia.org> wrote: > > But it seems to me, if you look at data storage software already in use, > Lucene is much better suited for computing intersections than MySQL. > Tim, aren't you kind of the point guy for the lucene search? Would you be up for setting up a categories index? I don't know how the update works (I think, from what I've read, that it does a big index regeneration on some kind of schedule, but I really don't know). I think it could be implemented as either a separate index, or as a new field on the current index. I'd be happy to help, but I'm totally unfamiliar with the code, and don't really want to set up Java on my server for testing... I've created lucene indexes on the categories table before, but not in any way that even approaches a production type environment. Maybe that still leaves some opportunity to help though. Best Regards, Aerik

4 3

Re: [Wikitech-l] Hidden Categories and Category Intersection
by Simetrical 24 Feb '08

24 Feb '08

On Sun, Feb 24, 2008 at 3:48 AM, Samuel Wantman <sam(a)wantman.net> wrote: > Fabulous! Can we do it? Why not? You don't have to ask here to use bots. Use whatever bots you like, and if one becomes a problem (which is rare) someone will tell you. > If it literally can happen "right > now", I'd actually prefer it be left turned off until us category wonks can > generate a plan. I get the feeling we're talking about a different feature here. The feature to hide categories is already enabled and is not going to be turned off (although it needs to be improved somewhat). Everything else is user-level stuff that isn't really relevant to this list, as far as I can tell.

1 0

Wikimania 2008 - Call for participation
by Mark (Markie) 24 Feb '08

24 Feb '08

Hi Sorry for the spam like email but this is just an email to let you all know about the Wikimania 2008 Call for Participation. I have included this below, and the page can also be found on the Wikimania wiki @ http://wikimania2008.wikimedia.org/wiki/Call_for_Participation. Please do forward this onto any local project mailing lists or anyone else who may be interested in this. Also there are translations available on wiki. Many thanks Mark (User:Markie) == Call for Participation == Wikimania is an annual global event devoted to Wikimedia projects around the globe (including Wikipedia, Wikibooks, Wikisource, Wikinews, Wiktionary, Wikiversity, Wikiquote, Wikispecies, and Wikimedia Commons) and for its editors and users to gather, meet each other, exchange ideas, and report on research and projects. It is a community event, which is also open to the public and to researchers. This year's conference will be held from '''July 17-19, 2008''' in Alexandria, Egypt at the new Library of Alexandria (Bibliotheca Alexandrina). For more information, please visit the Wikimania 2008 Home page at http://wikimania2008.wikimedia.org We are accepting submissions for presentations, workshops, panels, posters, open spaces, and artistic artifacts. Please carefully follow the submission guidelines below. Submissions can be sent via the following link: :https://wikimedia.pentabarf.org/submission/wikimania2008 === Important dates === * 1 February – 16 March : Submission * 17 March – 30 April : Review, feedback and notification of acceptance * 17 – 19 July 2008 : '''Wikimania''' === Conference Tracks === Submissions should address one or more of the following themes: ; Wikimedia Communities : Interesting projects and particularities within the communities; policy creation within individual projects; conflict resolution and community dynamics; reputation and identity; multi-lingualism, languages and cultures; social studies. We explicitly invite you to discuss your local Wikimedia project's community. ; Free Knowledge : Open access to information; ways to gather and distribute free knowledge, usage of the Wikimedia projects in education, journalism, research; ways to improve content quality and usability; copyright laws and other legal areas that interfere with Wikimedia projects. Free Content in the Middle-East/Africa. ; Technical infrastructure : Issues related to MediaWiki development and extensions; Wikimedia's technical infrastructure; new ideas for development (including case studies from other wikis or similar projects). ; Scientific track : Academic papers about massively collaborative work, open and free content creation, community dynamics, the social or economic aspects of the Wikimedia projects, and other topics related to Wikimedia projects. Papers submitted to the scientific track will be peer reviewed by a reviewing committee regarding their novelty, rigour, and estimated impact, and accepted or rejected based on these reviews. The papers will be published in proceedings afterwards, and depending on the number and the quality of the submissions, a journal special issue may be pursued. Scientific track papers must be in English, and must not exceed 7,500 words (or 15 pages LNCS). Your topic must be related either to the Wikimedia projects and their communities, or to the creation of free content in general. === Types of Submissions === We are seeking submissions for * presentations (10–30 minute talks with discussion afterwards) * workshops/open discussions (60–120 minute session with a discussion leader and more involvement of the audience) * panels (group of 2-5 speakers to discuss on a specific subject) * posters (printed presentations or visual displays that can stand on their own) * artistic artifacts (plays, competitions, comedy, visualizations, or other representations of some aspect of the projects) In addition there will the possibility to give [[lightning talks]] (5 minute short presentations). These will be organized on the Wikimania 2008 wiki without need to submit via the submission system. === Submission Guidelines === Wikimania is organized by volunteers, so please help us minimize wasted effort by submitting via the submission system and following these guidelines. All submissions MUST explicitly include the following: # an English "Event title" # a short English "Abstract" of your event in 50 to 100 words. The abstract will be used for the public schedule. # the "Track" your submission fits in best (Wikimedia Communities, Free Knowledge, Technical infrastructure, or Scientific) # the "Event type" (presentation, workshop, panel, poster, artistic...) # information about the speaker (full name, email, a short description of at least 2 sentences...) # for submissions to the scientific track: set "Submission of paper for proceedings" to "yes" and upload a paper instead of the "Description" below as "Attachment". Papers must be in English, and must not exceed 7,500 words. In addition you can add some more information like a a subtitle of the event, an image (will be resized to 128x128px) and private "Submission notes" for reviewers and conference organisation. In particular you should give: * a more detailed "Description" of your event in English or Arabic. The description is essential for review: please give an overview of the areas to be covered or taught. The better you describe your submission, the more likely it will get accepted. State clearly the relevance to the Wikimedia projects and whether submission concerns a specific wiki project. You can also include links. The description will later be used for the public schedule but you can edit it before. * special requirements (such as equipment for a workshop or panel) if needed * the language used for presentation * whether you want to submit a paper for proceedings * whether you want to submit presentation slides * whether the presentation is intended to be a specific length * the target audience you are going to reach and what previous knowledge is needed * images or sketches of the poster or artistic artifact if available * for panel submissions a suggested moderator and short biographies of each suggested panelist In the "Submission notes" you should tell us whether you will attend to Wikimania (a) surely, (b) probably, (c) only if your submission is accepted, or (d) only if we provide travel and/or accommodation. You can also add yourself to the public list of attendees at the Wikimania 2008 wiki: http://wikimania2008.wikimedia.org/wiki/Attendees Please note that all submissions must be dual licensed under the GNU Free Documentation License version 1.2 or later ''and'' the Creative Commons Attribution License! By submitting for Wikimania 2008 you agree to this condition. For more information see the submission guidelines at http://wikimania2008.wikimedia.org/wiki/Submission ===Submissions=== Once you are sure you have included all of the required information, please send your submission before the respective deadline through our '''submission system''': :https://wikimedia.pentabarf.org/submission == See also == * About the venue: http://wikimania2008.wikimedia.org/wiki/Venue * Brainstorming page for program ideas: http://wikimania2008.wikimedia.org/wiki/Program_ideas * Editable list of attendees: http://wikimania2008.wikimedia.org/wiki/Attendees

1 0

Hidden Categories and Category Intersection
by Samuel Wantman 24 Feb '08

24 Feb '08

Hidden categories should appear in the category namespace, but not elsewhere. As for being "Hidden" or "Admin", I can envision uses for both. Admin categories could have a separate collapsible listing, while hidden categories might have some other uses. Since we've also been discussing the problems of implementing "Category Intersection", an interim solution could be repopulating parent categories and "hiding" intersection categories. Fully populated parent categories are the norm in some projects like German Wikipedia and they also appear sometimes in English Wikipedia (eg. Category:Operas). I have a proposal posted currently about fully populating "Index" categories at en:Wikipedia talk:Categorization, and it would be much improved if the intersection categories could be hidden. The primary reason we have been deleting intersection categories is because they clutter articles. If they didn't clutter articles, they wouldn't be a problem. Perhaps the non-hidden categories could be expanded with a [+] the same way subcategories are expanded. For example, if someone is listed under "Methodist", clicking on the plus might add the hidden categories "American methodist" or "Methodist presidents". This would require searching to see if any of the hidden categories are descendants of the clicked on category. This pseudo categorization intersection system would also be an incentive to get ready for a real implementation. For Category Intersection to work, hundreds of categories will need to be repopulated. Along those lines, I'm wondering about yet another interim step toward full category intersection. A while back, several of us editors on English Wikipedia worked on a design for an interface for implementing Category Intersection (it is at en:WP:CI). We envision check boxes next to each category listing in an article, and then a button that queries the intersection. If making the query were to create a hidden category and automatically categorize all the articles that result from the query, the next time the request is made it could just display the results, just like any other category. There might be a timer that resets (every week?) that would force another query to update the category. This way each intersection query would happen fairly infrequently -- as infrequently as need be to keep from overloading the servers. There would need to be a naming convention for the automatically generated categories, perhaps using a double colon -- so the intersection of Category:Mozart and Category:Operas would generate Category:Mozart::Operas. I don't think we'd want these auto-generated categories to be orphan categories. The category could be automatically put in a maintenance category, or better yet, a child category of each parent could be created to hold all automatically created categories. If the category is called "Operas" this holding category could be called "Intersections with Operas" or "Operas and..." If the query is worth keeping it could be recategorized by an editor (eg. Category:Operas by composer). It would probably be useful to be able to see how often the query was requested. If intersection categories get renamed, a category redirect should be able to get the user (and future queries) to the correct place. If any of these intersection queries cause problems, an administrator could protect the category page. The next time the query is requested, the blocked page would keep the query from being run. The user would see the reason for the blocked query posted on the category page. This would prevent two or more huge categories from being intersected (eg. Category:Living People intersected with Category:Films). If the CPU time was analyzed for each query automatically, the blocks might be able to happen automatically. -- Samuel Wantman en:User:Sam

3 2

Extracting only the text from a Wikipedia page
by Ragib Hasan 24 Feb '08

24 Feb '08

Hi, I need to extract the only the text from a Wikipedia page. I.e., I need to remove all wiki markup, section headings etc, to extract only the text a reader will read. For example, for the text : '''Paris''' ([[Help:IPA|pronounced]] /paʁi/ in French; /ˈpaɹɪs/ in English) is the [[communes of France|capital city]] of [[France]]. It is situated on the [[Seine|River Seine]], in northern France, at the heart of the [[Île-de-France (region)|Île-de-France]] [[Regions of France|region]] (aka "Paris Region"; in French: ''Région Parisienne'' or ''RP''). The City of Paris has an estimated population of 2,167,994 within its administrative limits (January 2006)." I need to get the following after extraction: Paris (pronounced /paʁi/ in French; /ˈpaɹɪs/ in English) is the capital city France. It is situated on the River Seine, in northern France, at the heart of the Île-de-France region (aka "Paris Region"; in French: ''Région Parisienne'' or ''RP''). The City of Paris has an estimated population of 2,167,994 within its administrative limits (January 2006)." Using Pywikipediabot framework, I can get the raw text, but not the text-sans-markups. Since I need to do some textual analysis on the article contents, I need to get rid of all the extra markups, citation tags or other templates. So, what is the best/easiest way to do this? Thanks in advance. Ragib -- Ragib Hasan PhD Student Dept of Computer Science University of Illinois at Urbana-Champaign 201 N Goodwin Avenue Urbana IL 61801 Website: http://www.ragibhasan.com http://netfiles.uiuc.edu/rhasan/www

3 2

Re: [Wikitech-l] SVN: [31215] trunk/phase3
by Ashar Voultoiz 23 Feb '08

23 Feb '08

brion(a)svn.wikimedia.org a écrit : > Revision: 31215 > Author: brion > Date: 2008-02-23 00:29:36 +0000 (Sat, 23 Feb 2008) > > Log Message: > ----------- > bump trunk to 1.13 Why not 1.14 just like some buildings skip the 13th floor ? :) http://en.wikipedia.org/wiki/Thirteenth_floor -- Ashar Voultoiz

2 1

Re: [Wikitech-l] Category tables and Category Intersection
by Aerik Sylvan 23 Feb '08

23 Feb '08

Simetrical wrote: > > We don't have to move off MySQL, we just have to use a different > system for this one feature. That's perfectly plausible; we use > Lucene for search. > > > Ah, something I actually know something about. This is the third or fourth time, to my knowledge, that we've discussed category intersection in depth. Last year (I think it was last year) I did a bunch of pretty extensive testing, including running MySQL queries against the categories table using various methods (joins, subselects, you name it) and the consensus was that was way too slow (queries against large categories were awful - Living People was a test case). So, I also loaded the categories into the cur table (I'm using an old schema) and created a field holding all the categories with underscores for spaces in the categories (like it appears in the url). This made MySQL's fulltext index see the whole category as one word. This performed *much* faster, and you could use boolean queries to get fancy. I also created a lucene index which I queried with zend_search_lucene. This actually performed pretty comparably to the MySQL fulltext index. It's all in the archives somewhere. I think either of those solutions would probably be okay, but if it's wildly poplular the load might be a bit much. I didn't get (that I recall) any really conclusive opinions from the group or the core developers. But, based on all that, here's my suggestion: create a new lucene index of categories using all the existing tools, and do boolean queries against that. I think it's the path of least resistance, and the performance should be quite acceptable (pretty much be definition). On a related topic, anybody on the list mess around with clucene? I'm still playing with it off and on... (I'm a novice at c/c++) seems like a good choice for a high performance web based search (doesn't have the overhead of being Java)... Best Regards, Aerik

1 0

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [31196] trunk/phase3
by Simetrical 23 Feb '08

23 Feb '08

On Fri, Feb 22, 2008 at 1:49 PM, <vasilievvv(a)svn.wikimedia.org> wrote: > Remove "(not written yet)" text from title (introduced in r31140. It broke many bots which used title attribute of link for getting its target. Any bot that uses title for that purpose seems to me to be seriously broken. Isn't that exactly what href is for? Normalizing href is not hard.

2 2

square brackets within href and Parser behaviour
by Christoph Hanke 22 Feb '08

22 Feb '08

Hi, I 've got a parser-extension (using $wgExtensionFunctions) which needs to insert real square brackets within a anchor-href-attribute in the html output page. The problem is that all my squarebrackets within the href-attributs get converted to %5B and %5D when I return my html to the Parser. How can I suppress this behaviour of Wikimedia? Thanks a lot Christoph If you are interested why I would like to do such a stuff. Here's an example where it's necessary to use square brackets in href-tags. http://www.w3.org/2006/07/SWD/RDFa/syntax/#id104592 -- Psst! Geheimtipp: Online Games kostenlos spielen bei den GMX Free Games! http://games.entertainment.web.de/de/entertainment/games/free

1 0

Re: [Wikitech-l] [MediaWiki-CVS] SVN: [31180] trunk/extensions (import of Collection extension)
by Siebrand Mazeland 22 Feb '08

22 Feb '08

Hi jojo, A few requests regarding implementation of i18n for the extension Collection: * please use wfLoadExtensionMessages which takes care of proper fallback and is used by most of the extensions in the repo * please use a unique message ID prefix to avoid conflicts with message IDs of other extensions (f.e. "coll-") If you want I can make those changes for you. Please let me know. Also, please create an entry for your account 'jojo' in http://svn.wikimedia.org/viewvc/mediawiki/USERINFO/ so more personal contact is possible. Cheers! Siebrand -----Oorspronkelijk bericht----- Van: mediawiki-cvs-bounces(a)lists.wikimedia.org [mailto:mediawiki-cvs-bounces@lists.wikimedia.org] Namens jojo(a)mayflower.knams.wikimedia.org Verzonden: vrijdag 22 februari 2008 11:24 Aan: mediawiki-cvs(a)lists.wikimedia.org Onderwerp: [MediaWiki-CVS] SVN: [31180] trunk/extensions Revision: 31180 Author: jojo Date: 2008-02-22 10:24:20 +0000 (Fri, 22 Feb 2008) Log Message: ----------- initial import Added Paths: ----------- trunk/extensions/Collection/Collection.i18n.php Added: trunk/extensions/Collection/Collection.i18n.php =================================================================== --- trunk/extensions/Collection/Collection.i18n.php (rev 0) +++ trunk/extensions/Collection/Collection.i18n.php 2008-02-22 10:24:20 UTC (rev 31180) @@ -0,0 +1,105 @@ + 'collection' => 'Collection', + 'collections' => 'Collections', +

2 2

← Newer
1
2
3
4
5
6
7
...
11
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l February 2008