Since r61917 (3 February), running the api tests have been creating a
user called 'Useruser' which initially had a random password. r72475
(6 September) screwed up by using a hardcoded pasword.
Since it wasn't too bad for wikis allowing account creation, r74118
(1 October) made that user a sysop. Finally, r74213 (3 October) split it
into two users.
I added in r75588 a feature to block weak passwords, which will disable
them.
In r75589 I reverted to random password users and renamed them so that they
a) Are much more unlikely to conflict with any human account creation.
b) It is clear where did such sysop come from.
Sadly, we can't add them to $wgReservedUsernames without breaking the tests.
Any user having run 'make destructive' in the last two months should
update to r75589 inmediatly. They are also encouraged to desysop and
block the accounts 'Useruser' and 'Useruser1'. They are only at risk if
that install is publicly accessible, though.
Users not running MediaWiki from trunk or which haven't run the phpunit
tests are unaffected.
After the recent dicussions open open-ness and clarity with requests by
serveral people what is contained within the RT after several people have
asked and given answers like "it's staff stuff".
So what is stored in it that can't be within either the staff or internal
wiki where it must be private or bugzilla for other matters?
In these recently created projects article name completion doesn't
seem to work in the search box:
http://bjn.wikipedia.orghttp://koi.wikipedia.orghttp://mrj.wikipedia.org
I have "Enable enhanced search suggestions (Vector skin only)" checked
and "Disable AJAX suggestions" unchecked; I believe that it's the
default.
Is it a problem with some index or cache that needs to be filled or a bug?
--
אָמִיר אֱלִישָׁע אַהֲרוֹנִי · Amir Elisha Aharoni
http://aharoni.wordpress.com
"We're living in pieces,
I want to live in peace." - T. Moore
Has anyone seen this?
http://codebutler.com/firesheep
A new Firefox plugin that makes it trivially easy to hijack cookies
from a website that's using HTTP for login over an unencrypted
wireless network. Wikipedia isn't in the standard installation as a
site (lots of other sites, such as Facebook, Twitter, etc. are). We
are using HTTP login by default, so i guess we're vulnerable as well
(please say so if we're using some other kind of defensive mechanism
i'm not aware of). Might it be a good idea to se HTTPS as the standard
login? Gmail has been doing this since april this year.
-- Hay
Good afternoon,
In r75437, r75438[0][1] I moved the old installer to old-index.php
and moved the new to index.php. At this stage in the process,
I don't see us backing this out before we branch 1.17. I really
want people to test it out and report any major breakages [2].
This has been a long development process for almost 2 years
now, and I'd like to thank Max, Mark H., Jure, Jeroen, Roan
and Siebrand for their invaluable help in working on this. And
especially thanks to Tim for starting the project and providing
feedback, as always. There is a *lot* of code in includes/installer,
and I'd like to highlight some of the major changes that you'll
need to know.
Database updaters: They have been moved from the gigantic
file in maintenance/updaters.inc (patchfiles still go in the same
place though). Each supported DB type has a class that needs
to subclass DatabaseUpdater. The format's very similar, only
it's operating on methods in the classes instead of global functions.
The globals $wgExtNewTables, etc. are retained for back compat
and will be for quite some time. However, you can pass more
advanced callbacks since the LoadExtensionSchemaUpdates
hook now passes the DatabaseUpdater subclass as a param.
DB2 and MSSQL have been dropped from the installer. The
implementations are far from complete and I'm not comfortable
advertising their use yet.
Other known issues:
- Some UI quirks still exist, but work is coming here
- Postgres and Oracle are *almost* done
- Stuff listed on mw.org[2]
-Chad
[0] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/75437
[1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/75438
[2] http://www.mediawiki.org/wiki/New-installer_issues
Aryeh Gregor writes:
>
> To clarify, the subject needs to 1) be reasonably doable in a short
> timeframe, 2) not build on top of something that's already too
> optimized....
Integrating a subset of RTMP (e.g. the
http://code.google.com/p/rtmplite subset) into the chunk-based file
upload API -- http://www.mediawiki.org/wiki/API:Upload#Chunked_upload
-- would be an example of parallel I/O that we really need if we ever
hope to have reasonable microphone uploads for Wiktionary
pronunciation collection. I know Flash sucks, but it sucks way less
for microphone upload than currently nonexistent HTML5 audio upload
support, client side Java, or any other alternative, and probably will
suck way less than any of those alternatives for years. Soon GNU
Gnash should have microphone Speex upload on all three major
platforms, assuming the Gnash programming team doesn't starve to death
first.
Robert Rohde:
Getting back to Wikimedia, it appears correct that the Wikistats code
is designed to run from the compressed files ....(source linked from [1]).
As you suggest, one could use the properties of .bz2 format to
parallelize that. I would also observe that parsers tend to be
relatively slow, while decompressors tend to be relatively fast.
Some additional notes:
Yes wikistats processes compressed dumps.
Nowadays these are mostly stub dumps.
Most monthly metrics can be collected here, with few exceptions like
word count.
For stub dumps decompression is the major resource hog,
for full dumps some heavy regexp's do contribute considerably.
Wikistats could benefit a lot from parallelization (although these days
dump production for larger wikis is generally the bottleneck).
First thing I would want to look into (some day) is running the count
scripts for several wikis in parallel.
All intermediate data are stored in csv files, often one file for one
metric for all languages.
Decoupling and aggregation as post processing step is simple.
Running several count threads on one machine might tax memory.
Some hashes are huge (much has been externalized, but e.g. edits per
user per namespace is still a hash file).
The basic structure dates from the time that a full archive dump for
English Wikipedia was processed in minutes rather than months.
There have been a lot of optimizations , but general setup is still like
this:
Every months all counts for past 10 years are reproduced from scratch.
Wikistats basically has no memory.
This probably sounds crazy, incremental processing has been suggested
more than once.
Main reason to keep it this way is: ever so often new functionality is
added to the scripts (and the occasional bug fix)
In order to have new counts for full history we would need to rerun from
scratch ever so often anyway.
People asked me how come the counts can change from to month to month.
Same answer: counts are redone for all months, newer dumps will have
more deletions for earlier months.
Although this mostly effects last two months: nearly all deletions occur
within a month or two.
In early years deletions were very rare. most were done to prevent court
orders (privacy).
Nowadays deletionism has taken hold.
Still wikistats treats deleted content as 'should not have been there in
the first place'.
This makes our editor counts somewhat conservative, basically skews the
activity patterns in favor of good content contributors.
Erik Zachte
This term I'm taking a course in high-performance computing
<http://cs.nyu.edu/courses/fall10/G22.2945-001/index.html>, and I have
to pick a topic for a final project. According to the assignment
<http://cs.nyu.edu/courses/fall10/G22.2945-001/final-project.pdf>,
"The only real requirement is that it be something in parallel." In
the class, we covered
* Microoptimization of single-threaded code (efficient use of CPU cache, etc.)
* Multithreaded programming using OpenMP
* GPU programming using OpenCL
and will probably briefly cover distributed computing over multiple
machines with MPI. I will have access to a high-performance cluster
at NYU, including lots of CPU nodes and some high-end GPUs. Unlike
most of the other people in the class, I don't have any interesting
science projects I'm working on, so something useful to
MediaWiki/Wikimedia/Wikipedia is my first thought. If anyone has any
suggestions, please share. (If you have non-Wikimedia-related ones,
I'd also be interested in hearing about them offlist.) They shouldn't
be too ambitious, since I have to finish them in about a month, while
doing work for three other courses and a bunch of other stuff.
My first thought was to write a GPU program to crack MediaWiki
password hashes as quickly as possible, then use what we've studied in
class about GPU architecture to design a hash function that would be
as slow as possible to crack on a GPU relative to its PHP execution
speed, as Tim suggested a while back. However, maybe there's
something more interesting I could do.
Hi all,
As presented last Saturday at the Hack-A-Ton, I've committed a new version of the InlineEditor extension. [1] This is an implementation of the sentence-level editing demo posted a few months ago.
Basically what this new version does is building a tree structure from all the markings. This has the advantage of being able to render only part of the tree when doing a preview, which has a significant performance advantage. However, there's also the problem of dependencies throughout the page, the most notable of which is the Cite extension. Right now this is resolved by just rendering the entire page whenever a dependency is encountered. Extensions are responsible for telling this by using a hook, right now there's only built-in support for references (Cite). A more scalable solution would be to enable extensions to rerender only the dependency by using some stored data acquired on the initial parse, and data from the subsequent partial parse. This will be one of the goals for the next version.
Another advantage of using this tree structure is that now nested markings are possible. Now there are basically two (or perhaps more) possibilities of defining editing options. One is to differentiate based on functionality, like in the initial demo: Text, Media, Templates, etc. Another possibility is to differentiate based on block size (which uses nested markings): Sentences, Paragraphs, Sections, Full Text. Personally I think the second option will be better if the goal of the interface is to educate new users to become gradually more accustomed to wikitext.
Anyway, further research should investigate what's the best interface. I'll be doing some usability research myself in the next few months, and the Wikimedia Foundation will be doing further usability research next year.
If you like to play around with the editor, these are the lines you can add to LocalSettings.php to get started:
require_once( "$IP/extensions/InlineEditor/InlineEditorFunctional.php" ); // functional approach
*or*
require_once( "$IP/extensions/InlineEditor/InlineEditorBlocks.php" ); // block size approach
Feedback is welcome! Thanks for your time.
Regards,
Jan Paul
[1] http://www.mediawiki.org/wiki/Special:Code/MediaWiki/75344