Wikitech-l August 2010

wikitech-l@lists.wikimedia.org

123 participants
97 discussions

Selenium Framework - Question on coding conventions

by Markus Glaser

Hello everybody, in order to further develop the selenium framework [1], I need to make a few design decisions, especially on coding conventions, which I'd like to discuss on this list, since they affect the way of how extension- and core developers write their tests. 1) Where are the tests located? I suggest for core to put them into maintenance/tests/selenium. That is where they are now. For extensions I propse a similar structure, that is <extensiondir>/tests/selenium. 2) How are the tests organized? Tests are organized in testing suites. Each suite represents a conhesive set of tests. So it is possible to have more than one test suite per extension / core area. Test suites are technically classes. The files should follow this naming convention: <NameOfExtension><[Subset]>TestSuite.php. The subset is optional. For example, in the PagedTiffHandler extension, it would be PagedTiffHandlerTestSuite.php and PagedTiffHandlerUploadsTestSuite.php. This should also be the name of the class. Alternatively, we could use the word "Selenium" somewhere in there in order to be able to divide between unit and selenium tests. In that case I suggest to use PagedTiffHandlerSeleniumTestSuite.php and PagedTiffHandlerUploadsSeleniumTestSuite.php. Hmmm... this gives pretty long names. Any suggestions? 3) How does the framework know there are tests? The tests should be registered with the autoloader in the extension entry file. In core, they should be registered directly with the autoloader. 4) Which tests should be executed? Since Selenium tests are slow, not every test should be executed in each test run. So At the moment, there is a variable $wgSeleniumTestSuites which can be set in LocalSettings.php and which contains the tests that should be run. If things become more dynamically (e.g. when tests should be run on svn commit), there could be a function to add to this variable. 5) Aesthetics... There is an awful lot of "Selenium" in the class names, method names, file names and variable names. It might be a good idea to use "Sn" everywhere except for path names. Two things need to be kept in mind: * The idea is to use a similar structure for unit- and selenium tests (selenium tests are based on unit tests anyway). I assume at some point, the tests should also be compatible with a continuous integration server. * The wiki that executes the selenium tests is not neccesarily the one that is being tested if the tests run against a selenium grid. If anybody would like to share their opinion on my suggestions, I'd be very glad! Regards, Markus [1] http://www.mediawiki.org/wiki/SeleniumFramework (documentation will be updated soon..)

13 years, 8 months

Re: [Wikitech-l] New password hashing proposal

by Jonathan Leybovich

> Tim Starling wrote: > > So the time has probably come for us to come up with a "C" type > password hashing scheme, to replace the B-type hashes that we use at > the moment. What about using public key cryptography? Generate a key-pair and use the "public" key to produce your password hashes. Store the private key offline in an underground vault just in case someday you'll need to recover the original passwords in order to rehash them. Needless to say the key-pair must be entirely for internal use and not already part of some PKI system (i.e. the basis for one of Wikimedia's signed SSL certificates).

13 years, 8 months

访问我的Netlog个人主页

by 杨杰

嗨, 我创建了一个Ｎｅｔｌｏｇ个人主页，其中包括我的图片、视频、博客和活动。非常希望邀请你成为我的朋友，一起共享我们的天地。这需要你先在Ｎｅｔｌｏｇ上注册哦！在登录后，你也可以创建属于自己的个人主页了。看一看: http://zh.netlog.com/go/mailurl/-bT0xNTQ2OTM4NDcmbD0xJmdtPTM3JnU9JTJGZ28lMk… 祝好, 杨杰 ---------------------------------------------------------------- 如果你不想再接接收到来自朋友的任何邀请，则请 http://zh.netlog.com/go/mailurl/-bT0xNTQ2OTM4NDcmbD0yJmdtPTM3JnU9JTJGZ28lMk… Don't want to receive invitations from your friends anymore? http://zh.netlog.com/go/mailurl/-bT0xNTQ2OTM4NDcmbD0zJmdtPTM3JnU9aHR0cCUzQS…

13 years, 8 months

Storing data across requests

by Bryan Tong Minh

Hi, I have been working on getting asynchronous upload from url to work properly[1]. A problem that I encountered was that I need to store data across requests. Normally I would use $_SESSION, but this data should also be available to job runners, and $_SESSION isn't. As I see there are basically two ways to get a data store. The first is to store the objects in the DB using wfGetCache( CACHE_DB ); I'm not sure though whether it is meant to be used this way. Alternatively I could revive my staged-upload work. In this branch, all so called stashed uploads, uploads that require user intervention before they can be completed have their meta data stored in the database instead of in the session. That would be still quite a lot of work though. Or is there any other mechanism to be able to share data between the jobqueue and requests? Regards, Bryan [1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/btongminh?offse…

13 years, 8 months

LiquidThreads namespaces active on Wikimedia wikis but Lqt not active

by Siebrand Mazeland

It looks like some change in the software for Wikimedia recently caused four namespaces for LiquidThreads to become active in many (all?) Wikimedia wikis, even if LiquidThreads is not reported as installed on Special:Version for the wiki. An issue was created by MZMcBride noting a breaking API change because "LiquidThreads API namespaces don't include canonical key"[1]. Werdna commented on IRC, as also included in the issue report, this is some "hugely annoying" "feature of the localisation cache". Siebrand [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=24837

13 years, 8 months

Architectural revisions to improve category sorting

by Aryeh Gregor

I'm going to begin working on the following bugs: * "Support collation by a certain locale (sorting order of characters)", https://bugzilla.wikimedia.org/show_bug.cgi?id=164 (only parts related to category sorting) * "Subcategory paging is not separate from article or image paging", https://bugzilla.wikimedia.org/show_bug.cgi?id=1211 * "CategoryTree is inefficient", https://bugzilla.wikimedia.org/show_bug.cgi?id=23682 As well as possibly: * "Categories need to be structured by namespace", https://bugzilla.wikimedia.org/show_bug.cgi?id=450 * "Natural number sorting in category listings", https://bugzilla.wikimedia.org/show_bug.cgi?id=6948 There are essentially two problems here: 1) We currently sort articles on category pages by the Unicode code point of their sort key. This is terrible for anything other than English, and dodgy sometimes even for English. (This is bugs 164 and 6948.) 2) We have no way to efficiently get all items that are in a category and also in a particular namespace. Particularly, we can't retrieve all subcategories without scanning all items in the category, which is inefficient when we have a few (or no) subcategories and tons of items. (This is bugs 1211, 23682, and 450.) One part of (2) needs to be clarified. The primary use-case is obviously that we want to be able to count subcategories efficiently, or display all of them when we only display some of the items in the category: this is bugs 1211 and 23682. Secondarily, we have a request at bug 450 to organize category pages by namespace, so main, Talk:, User:, etc. are all paginated separately. I think the goal for (2) should be to allow efficient separate retrieval of subcategories, files, and other pages, but not to distinguish between namespaces otherwise. The major motivation is that to do this efficiently, we'll need to add namespace info to the categorylinks table, and we want this to stay consistent with the info in the page table. Categories, files, and other types of pages cannot be moved to one another, as far as I know (it would hardly make sense), so it automatically stays consistent this way. This is a big plus, because there are inevitably bugs that cause denormalized data to fall out of sync (look at cat_pages). Furthermore, I don't think it's obvious that we want separate namespaces to display separately at all on category pages. What's a case where that would be desired? It would break up the display a lot, with a bunch of separate headers for different namespaces, when each namespace might only have a few items. Most categories whose sort appearance you'd care about (i.e., excepting maintenance categories) will have nearly everything in one namespace anyway. You could always split the category into separate ones per namespace if you want them separate. So I propose that we keep the current category/normal page/file split, and paginate those three parts of the page separately. So you'd have up to 200 subcategories, then below that up to 200 normal pages, then below that up to 200 files. (The numbers could be adjusted. Currently they're hardcoded, which is stupid.) Paginating subcategories separately is obviously needed. Paginating files separately is not really needed, but it would be much more consistent. The overall solution, then, would be: 1) Change the way category sortkeys are generated. Start them with a letter depending on namespace, like 'C' for category, 'P' for regular page, 'F' for file. After that first letter, append a sortkey generated by ICU or whatever. I think Tim has opinions on what would be a good choice to convert the article title into sort key -- if not, I'll have to research it and hopefully not come up with a completely incorrect answer. 2) On category pages, maintain three offsets and do three queries (or maybe UNION them together, doesn't matter), one for each of categories/regular pages/files. Because of (1), this will be efficient and will also sort less unreasonably for non-English languages. One problem that was pointed out somewhere in the massive useless discussion on bug 164 is that we'd have to do something to display the first letter for each section. Currently it's just the first letter of the sortkey, but if that's some binary string, that becomes a problem. I'm not seeing an obvious solution, since the sortkey-generation algorithm will be opaque to us. If it sorts Á the same as A, then how do we figure out that the "canonical" first letter for the section should be "A" and not "Á"? How do we even figure out where the sections begin or end? Would that even make sense in all cases? At a first pass, I'd say we should just skip the first letter and display all the items straight from beginning to end without section divisions. I don't think that's a big problem. This is just my initial thoughts. Feedback appreciated. If people agree with the general approach, I can start coding this up tomorrow.

13 years, 8 months

Re: [Wikitech-l] Parallel text alignment (was: Push translation)

by Lars Aronsson

On 08/07/2010 02:23 AM, Andreas Kolbe wrote: > Word-processing the Google output to arrive at a readable, written text creates more work than it saves. This is where our experience differs. I'm working faster with the Google Translator Toolkit than without. > If Google want to build up their translation memory, I suggest they pay publishers for permission to analyse existing, published translations, and read those into their memory. This will give them a database of translations that the market judged good enough to publish, written by people who (presumably) understood the subject matter they were working in. If we forget Google for a while, this is actually something that we could do on our own. There are enough texts in Wikisource (out of copyright books) that are available in more than one language. In some cases, we will run into old spelling and use of language, but it will be better than nothing. The result could be good input to Wiktionary. Here is the Norwegian original of Nansen's Eskimoliv, http://no.wikisource.org/wiki/Indeks:Nansen-Eskimoliv.djvu And here is the Swedish translation, both from 1891, http://sv.wikisource.org/wiki/Index:Eskimålif.djvu Norwegian: Grønland er paa en eiendommelig vis knyttet til vort land og folk. Swedish: Grönland är på ett egendomligt sätt knutet till vårt land och vårt folk. As you can see, there is one difference already in this first sentence: The original ends "to our country and people", while the translation ends "to our country and our people". Is there any good free software for aligning parallel texts and extracting translations? Looking around, I found NAtools, TagAligner, and Bitextor, but they require texts to be marked up already. Are these the best and most modern tools available? -- Lars Aronsson (lars(a)aronsson.se) Aronsson Datateknik - http://aronsson.se

13 years, 8 months

(pas de sujet)

by ThomasV

I would like to extend the syntax of the <ref> tag (Cite extension), in order to deal with footnotes that are spread on several transcluded pages. Since the Cite extension is widely used, I guess I better ask here first. Here is an illustration of the problem : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc… On the bottom of the scan you can see the second half of a footnote. That footnote begins at the previous page : http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc… Wikisourcers currently have no way to deal with these cases in a clean way. I have written a patch for this (the code is here : http://dpaste.org/QOMH/ ). This patch extends the "ref" syntax by adding a "follow" parameter, like this : <ref follow="foo">bar</ref> After two pages are transcluded, the wikitext passed to the parser will look like this : blah blah blah blah blah blah<ref name="note1">beginning of note 1</ref> blah blah blah blah blah blah blah blah blah<ref follow="note1">end of note</ref> blah blah blah This wikitext is rendered as a single footnote, located in the text at the position of the parent <ref>. If the parent <ref> is not found (as is the case when you render only the second page), then the text inside the tag is rendered at the beginning of the list of references, with no number and no link. does this make sense ? Thomas

13 years, 8 months

Deployment code directory structure

by Jeroen De Dauw

Hey, As the first components of the deployment stuff I'm working on are getting finished, I find myself unsure where to put them in core. I think the best approach would be to rename includes/installer to includes/deployment, which can then hold all deployment related code, and maybe has some subdirectories for stuff like the new web installer. Any reasons not to do this? Cheers -- Jeroen De Dauw * http://blog.bn2vs.com * http://wiki.bn2vs.com Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69 66 65! --

13 years, 8 months

Any policy on API usage?

by Mingli Yuan

Hi, folks, I am working on a project to provide Wikipedia recentchanges for different categories. You can navigate different recentchanges via category tree, and subscribe it by rss, or call it by json. A prototype had been passed, now I am working for a real service. Since I need access recentchanges by API very very frequently (vary from language, every 100~10 sec a call ), Are there any policy on API usage except the rule that user agent should be set? I only find Bot policy, but no API policy. http://en.wikipedia.org/wiki/Wikipedia:Bot_policy <http://en.wikipedia.org/wiki/Wikipedia:Bot_policy>Any one can provide information on this? Regards, Mingli

13 years, 8 months

← Newer
1
2
3
4
5
6
7
8
9
10
Older →

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Wikitech-l August 2010