Hello everybody,
in order to further develop the selenium framework [1], I need to make a few design decisions, especially on coding conventions, which I'd like to discuss on this list, since they affect the way of how extension- and core developers write their tests.
1) Where are the tests located? I suggest for core to put them into maintenance/tests/selenium. That is where they are now. For extensions I propse a similar structure, that is <extensiondir>/tests/selenium.
2) How are the tests organized? Tests are organized in testing suites. Each suite represents a conhesive set of tests. So it is possible to have more than one test suite per extension / core area. Test suites are technically classes. The files should follow this naming convention: <NameOfExtension><[Subset]>TestSuite.php. The subset is optional. For example, in the PagedTiffHandler extension, it would be PagedTiffHandlerTestSuite.php and PagedTiffHandlerUploadsTestSuite.php. This should also be the name of the class. Alternatively, we could use the word "Selenium" somewhere in there in order to be able to divide between unit and selenium tests. In that case I suggest to use PagedTiffHandlerSeleniumTestSuite.php and PagedTiffHandlerUploadsSeleniumTestSuite.php. Hmmm... this gives pretty long names. Any suggestions?
3) How does the framework know there are tests? The tests should be registered with the autoloader in the extension entry file. In core, they should be registered directly with the autoloader.
4) Which tests should be executed? Since Selenium tests are slow, not every test should be executed in each test run. So At the moment, there is a variable $wgSeleniumTestSuites which can be set in LocalSettings.php and which contains the tests that should be run. If things become more dynamically (e.g. when tests should be run on svn commit), there could be a function to add to this variable.
5) Aesthetics... There is an awful lot of "Selenium" in the class names, method names, file names and variable names. It might be a good idea to use "Sn" everywhere except for path names.
Two things need to be kept in mind:
* The idea is to use a similar structure for unit- and selenium tests (selenium tests are based on unit tests anyway). I assume at some point, the tests should also be compatible with a continuous integration server.
* The wiki that executes the selenium tests is not neccesarily the one that is being tested if the tests run against a selenium grid.
If anybody would like to share their opinion on my suggestions, I'd be very glad!
Regards,
Markus
[1] http://www.mediawiki.org/wiki/SeleniumFramework (documentation will be updated soon..)
> Tim Starling wrote:
>
> So the time has probably come for us to come up with a "C" type
> password hashing scheme, to replace the B-type hashes that we use at
> the moment.
What about using public key cryptography? Generate a key-pair and use the "public" key to produce your password hashes. Store the private key offline in an underground vault just in case someday you'll need to recover the original passwords in order to rehash them. Needless to say the key-pair must be entirely for internal use and not already part of some PKI system (i.e. the basis for one of Wikimedia's signed SSL certificates).
Hi,
I have been working on getting asynchronous upload from url to work
properly[1]. A problem that I encountered was that I need to store
data across requests. Normally I would use $_SESSION, but this data
should also be available to job runners, and $_SESSION isn't.
As I see there are basically two ways to get a data store. The first
is to store the objects in the DB using wfGetCache( CACHE_DB ); I'm
not sure though whether it is meant to be used this way.
Alternatively I could revive my staged-upload work. In this branch,
all so called stashed uploads, uploads that require user intervention
before they can be completed have their meta data stored in the
database instead of in the session. That would be still quite a lot of
work though.
Or is there any other mechanism to be able to share data between the
jobqueue and requests?
Regards,
Bryan
[1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/author/btongminh?offse…
It looks like some change in the software for Wikimedia recently caused four namespaces for LiquidThreads to become active in many (all?) Wikimedia wikis, even if LiquidThreads is not reported as installed on Special:Version for the wiki.
An issue was created by MZMcBride noting a breaking API change because "LiquidThreads API namespaces don't include canonical key"[1].
Werdna commented on IRC, as also included in the issue report, this is some "hugely annoying" "feature of the localisation cache".
Siebrand
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=24837
I'm going to begin working on the following bugs:
* "Support collation by a certain locale (sorting order of
characters)", https://bugzilla.wikimedia.org/show_bug.cgi?id=164 (only
parts related to category sorting)
* "Subcategory paging is not separate from article or image paging",
https://bugzilla.wikimedia.org/show_bug.cgi?id=1211
* "CategoryTree is inefficient",
https://bugzilla.wikimedia.org/show_bug.cgi?id=23682
As well as possibly:
* "Categories need to be structured by namespace",
https://bugzilla.wikimedia.org/show_bug.cgi?id=450
* "Natural number sorting in category listings",
https://bugzilla.wikimedia.org/show_bug.cgi?id=6948
There are essentially two problems here:
1) We currently sort articles on category pages by the Unicode code
point of their sort key. This is terrible for anything other than
English, and dodgy sometimes even for English. (This is bugs 164 and
6948.)
2) We have no way to efficiently get all items that are in a category
and also in a particular namespace. Particularly, we can't retrieve
all subcategories without scanning all items in the category, which is
inefficient when we have a few (or no) subcategories and tons of
items. (This is bugs 1211, 23682, and 450.)
One part of (2) needs to be clarified. The primary use-case is
obviously that we want to be able to count subcategories efficiently,
or display all of them when we only display some of the items in the
category: this is bugs 1211 and 23682. Secondarily, we have a request
at bug 450 to organize category pages by namespace, so main, Talk:,
User:, etc. are all paginated separately.
I think the goal for (2) should be to allow efficient separate
retrieval of subcategories, files, and other pages, but not to
distinguish between namespaces otherwise. The major motivation is
that to do this efficiently, we'll need to add namespace info to the
categorylinks table, and we want this to stay consistent with the info
in the page table. Categories, files, and other types of pages cannot
be moved to one another, as far as I know (it would hardly make
sense), so it automatically stays consistent this way. This is a big
plus, because there are inevitably bugs that cause denormalized data
to fall out of sync (look at cat_pages).
Furthermore, I don't think it's obvious that we want separate
namespaces to display separately at all on category pages. What's a
case where that would be desired? It would break up the display a
lot, with a bunch of separate headers for different namespaces, when
each namespace might only have a few items. Most categories whose
sort appearance you'd care about (i.e., excepting maintenance
categories) will have nearly everything in one namespace anyway. You
could always split the category into separate ones per namespace if
you want them separate.
So I propose that we keep the current category/normal page/file split,
and paginate those three parts of the page separately. So you'd have
up to 200 subcategories, then below that up to 200 normal pages, then
below that up to 200 files. (The numbers could be adjusted.
Currently they're hardcoded, which is stupid.) Paginating
subcategories separately is obviously needed. Paginating files
separately is not really needed, but it would be much more consistent.
The overall solution, then, would be:
1) Change the way category sortkeys are generated. Start them with a
letter depending on namespace, like 'C' for category, 'P' for regular
page, 'F' for file. After that first letter, append a sortkey
generated by ICU or whatever. I think Tim has opinions on what would
be a good choice to convert the article title into sort key -- if not,
I'll have to research it and hopefully not come up with a completely
incorrect answer.
2) On category pages, maintain three offsets and do three queries (or
maybe UNION them together, doesn't matter), one for each of
categories/regular pages/files. Because of (1), this will be
efficient and will also sort less unreasonably for non-English
languages.
One problem that was pointed out somewhere in the massive useless
discussion on bug 164 is that we'd have to do something to display the
first letter for each section. Currently it's just the first letter
of the sortkey, but if that's some binary string, that becomes a
problem. I'm not seeing an obvious solution, since the
sortkey-generation algorithm will be opaque to us. If it sorts Á the
same as A, then how do we figure out that the "canonical" first letter
for the section should be "A" and not "Á"? How do we even figure out
where the sections begin or end? Would that even make sense in all
cases? At a first pass, I'd say we should just skip the first letter
and display all the items straight from beginning to end without
section divisions. I don't think that's a big problem.
This is just my initial thoughts. Feedback appreciated. If people
agree with the general approach, I can start coding this up tomorrow.
On 08/07/2010 02:23 AM, Andreas Kolbe wrote:
> Word-processing the Google output to arrive at a readable, written text creates more work than it saves.
This is where our experience differs. I'm working faster with the Google
Translator Toolkit than without.
> If Google want to build up their translation memory, I suggest they pay publishers for permission to analyse existing, published translations, and read those into their memory. This will give them a database of translations that the market judged good enough to publish, written by people who (presumably) understood the subject matter they were working in.
If we forget Google for a while, this is actually something that we could do
on our own. There are enough texts in Wikisource (out of copyright books)
that are available in more than one language. In some cases, we will run
into old spelling and use of language, but it will be better than nothing.
The result could be good input to Wiktionary.
Here is the Norwegian original of Nansen's Eskimoliv,
http://no.wikisource.org/wiki/Indeks:Nansen-Eskimoliv.djvu
And here is the Swedish translation, both from 1891,
http://sv.wikisource.org/wiki/Index:Eskimålif.djvu
Norwegian: Grønland er paa en eiendommelig vis knyttet til vort land og
folk.
Swedish: Grönland är på ett egendomligt sätt knutet till vårt land och
vårt folk.
As you can see, there is one difference already in this first
sentence: The original ends "to our country and people",
while the translation ends "to our country and our people".
Is there any good free software for aligning parallel texts and
extracting translations? Looking around, I found NAtools,
TagAligner, and Bitextor, but they require texts to be marked
up already. Are these the best and most modern tools available?
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
I would like to extend the syntax of the <ref> tag (Cite extension), in
order to deal with footnotes that are spread on several transcluded
pages. Since the Cite extension is widely used, I guess I better ask
here first.
Here is an illustration of the problem :
http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc…
On the bottom of the scan you can see the second half of a footnote.
That footnote begins at the previous page :
http://en.wikisource.org/wiki/Page:Robert_the_Bruce_and_the_struggle_for_Sc…
Wikisourcers currently have no way to deal with these cases in a clean
way. I have written a patch for this (the code is here :
http://dpaste.org/QOMH/ ). This patch extends the "ref" syntax by adding
a "follow" parameter, like this :
<ref follow="foo">bar</ref>
After two pages are transcluded, the wikitext passed to the parser will
look like this :
blah blah blah
blah blah blah<ref name="note1">beginning of note 1</ref>
blah blah blah
blah blah blah
blah blah blah<ref follow="note1">end of note</ref>
blah blah blah
This wikitext is rendered as a single footnote, located in the text at
the position of the parent <ref>. If the parent <ref> is not found (as
is the case when you render only the second page), then the text inside
the tag is rendered at the beginning of the list of references, with no
number and no link.
does this make sense ?
Thomas
Hey,
As the first components of the deployment stuff I'm working on are getting
finished, I find myself unsure where to put them in core. I think the best
approach would be to rename includes/installer to includes/deployment, which
can then hold all deployment related code, and maybe has some subdirectories
for stuff like the new web installer.
Any reasons not to do this?
Cheers
--
Jeroen De Dauw
* http://blog.bn2vs.com
* http://wiki.bn2vs.com
Don't panic. Don't be evil. 50 72 6F 67 72 61 6D 6D 69 6E 67 20 34 20 6C 69
66 65!
--
Hi, folks,
I am working on a project to provide Wikipedia recentchanges for different
categories.
You can navigate different recentchanges via category tree, and subscribe it
by rss, or call it by json.
A prototype had been passed, now I am working for a real service.
Since I need access recentchanges by API very very frequently (vary from
language, every 100~10 sec a call ),
Are there any policy on API usage except the rule that user agent should be
set?
I only find Bot policy, but no API policy.
http://en.wikipedia.org/wiki/Wikipedia:Bot_policy
<http://en.wikipedia.org/wiki/Wikipedia:Bot_policy>Any one can provide
information on this?
Regards,
Mingli