Hello,
I understand the need for cite, thats why it is still there :) But...
- We format Cite references list every 100th request to backend,
though it takes 8.15% backend response time (thanks parser cache,
without it Cite formatting would take 815% cluster time - though
developers should understand I'm not exactly right at this hyperbole ;-)
- When parsing articles like one of most popular today,
[[en:Rod_Blagojevich_corruption_charges]], it takes 20s to produce the
page, 17s is spent on Cite block, executing {{cite}} mostly. That
makes every editor wait for ages to get a page displayed, and due to
cache stampede after invalidation it causes considerable stress on
site (look at numbers mentioned above).
- This 8% is in real-time, which includes waiting for search,
databases, and simply CPU contention, which we end up having today.
CPU-time wise it is way higher, so can actually have 20% CPU time
impact on our application farm. Thats at least 100k$ worth of hardware
(and rising), even if new/modern one, just for citation formatting.
So, a checklist what can be done ( simple to complex )
[ ] - Simplification of {{cite}}
[ ] - Separate cache for Cite, to avoid reparsing on minor edits,
that don't involve citations. I have no idea how much this would win,
but there is theoretical chance of stripping 1% or so. ;)
[ ] - Offload some templates like {{cite}} to actual PHP extensions
(can of worms, but, oh well, can be standardized process too)
[ ] - Implement proper scripting engine like Lua for metatemplates (http://pecl.php.net/package/lua
- another can of worms, though yet again, can be managed via trusted
set of people, on top20 wikis or so).
[ ] - Frustrated operations guy adding something like ( return ""; )
in some random extension, and syncing the live hack. Obviously there
would be some "HAHA YOU THOUGHT I COULDN'T DO THIS" comments in there.
I for one can directly participate in at least two of these options. ;-)
Unfortunately, {{cite}} is the only template I can profile/account for
now, we don't have proper per-template profiling, but I wish to get
one some day. Then we'd have more "war on ..." topics ;-D
Generally, templates are major part of our parsing, and thats over 50%
of our current cluster CPU load.
As we've actually managed to hit 100% last week, something what hasn't
happened for a while, some of work has to be done here.
Of course, new hardware will help for a while, but I for one have huge
personal satisfaction saving donation money. ;-)
CHEERS!
--
Domas Mituzas -- http://dammit.lt/ -- [[user:midom]]
Hello, I have a couple of mediawiki installations on two different slices at
Slicehost, both of which run websites on the same slice with no speed
problems, however, the mediawiki themselves run like dogs!
http://wiki.medicalstudentblog.co.uk/ Any ideas what to look for or ways to
optimise them? I still can't get over they need a 100mb ini_set in settings
to just load due to the messages or something.
Thank you, Dawson
Ok, things are finally starting to normalize as far as getting away from
fundraiser craziness, preparing regular releases, and generally getting
on with making things better for users!
I've enabled the Drafts extension for testing on
http://test.wikipedia.org -- this little cutie was new staff dev Trevor
Parscal's first assignment here, but deployment got pushed back when we
went full-steam on fundraiser banner stuff.
I've written up a quickie blog post with some purty screen shots:
http://leuksman.com/log/2009/01/16/drafts-extension-enabled-on-test-wikiped…
Suggestions for improvements to the UI and workflow are always welcome!
-- brion
On Wed, 03 Dec 2008 16:48:39 +0100, Roan Kattouw <roan.kattouw(a)home.nl
> wrote:
>
> We had a pretty lengthy discussion about this before the summer, and the
> consensus seemed to be that a fulltext-based approach looked most
> viable. I actually wrote an extension that does that, and promised to
> release it soon; that was quite a few months ago, and I never got around
> to it. I'll release it properly when I have time, which will hopefully
> be before Christmas :D
>
> The code needs some tweaking and refactoring, though. It's pretty
> tightly integrated with the article text search (both functions in one
> form) and has all kinds of weird features, because the guy who paid me
> to write it wanted them. It also doesn't support three-letter word
> searching (which core does these days, using a prefix hack), which is
> pretty bad since categories with short titles (or stopword titles) won't
> be found either.
>
> Roan Kattouw (Catrope)
>
>
Hey Roan, does your code use the a new table for the category search (with
fulltext index) and do you have the hooks for maintaining that table? Do
you display the the results on a new search results page, or did you hack
the existing one? Basically, I'm thinking that even if your stuff isn't
ready for prime time, you may have already done a lot of the heavy
lifting... can we get our hands on it?
Thanks!
Aerik
--
http://eventfeed.org - An Initiative Promoting Syndication of Events
http://www.wikidweb.com - the Wiki Directory of the Web
http://tagthis.info - Hosted Tagging for your website!
Am 10.10.2008 um 21:22 schrieb Erik Moeller:
> 2008/10/10 Derbeth <derbeth(a)wp.pl>:
>> I wonder about the legal aspects. In my opinion, when you create a
>> ready-to-print version,
>> you have to attach the text of GFDL license to it - directly, not
>> as a link. Like it is done in
>> http://en.wikibooks.org/wiki/Image:LaTeX.pdf.
As Erik wrote: This is already implemented (either a title of an
article or a URL to some license text can be set in
LocalSettings.php), but it's currently not configured.
>> Secondly, current version of the tool does a plagiarism - beacause
>> it does not mention
>> image authors and does not provide any mean (like by making images
>> clickable) to check
>> these authors.
>
> Ouch, thanks for pointing that out. Tricky to do this automatically
> since it's all wiki-text with templates, but we'll investigate a
> solution here.
We'd highly appreciate input from the community regarding this topic!
The printed books from PediaPress contain a list of figures where the
license of each image is listed, together with the URL to the image
description page. As some kind of "hotfix" this solution could be
implemented in the PDF export of the Collection extension, too. But
this doesn't really solve the problem.
We think it's more of a technical/software thing, so I cross-posted
(and set Reply-To) to Wikitech-l.
In our opinion, license management/handling must be a core feature of
MediaWiki, because the software is explicitely developed for the
collaborative distribution of free content. Licenses of the containing
articles and images should not be represented via some agreed-upon
convention but via structured (and machine-readable) information,
available for each relevant object in the wiki.
Some information that would be desired:
- Full (official) name of the license(s).
- Whether the full text of the license has to be included or a
reference sufficient.
- Reference to the full text of the license(s) (in some rigidly
defined format like wikitext).
- Whether attribution is required. If so: The list of required
attributions.
So, basically all the information that's required to check if it's
possible to take some part of the MediaWiki and use it somewhere else
and all the information that has to be included in that other place.
This information could be made accessible via MediaWiki API, but
ideally it's contained in the wikitext and/or XHTML, too.
All this could be handled via microformats, even inside of templates,
but the main point is that any kind of new technique has to be
enforced, ideally via MediaWiki software itself: In the commons wikis
there are some conventions that can be used in software by people/
companies like us (although we have to work with hacks and
workarounds), but oftentimes, in wikis with smaller communities this
information doesn't even exist at all.
-- Johannes Beigel
As a general note -- I've enabled revision and log suppression for
oversighters on all Wikimedia wikis.
This allows for the edit comment, page text, and username to be hidden
individually, and optionally to choose whether to also hide it from sysops.
The two big differences from the traditional oversight system are:
* The entry remains in place in the history or log view -- it's not
secretly vanished as though it never existed. The offending details only
are suppressed from view.
* Log entries can be done as well as page edits.
This'll allow the existing local oversighter users to handle log-spam
cases which previously requires developer intervention to clean up.
This system can be extended with a second tier so 'regular sysops' can
also suppress individual revisions but can undo each other, but I
haven't enabled it yet. (It's been in place for some time on
http://test.wikipedia.org if you want to try it out.)
-- brion
Is there a conversion script since Oversight is pretty
much obsolete? Speaking of, should it be marked as
such in svn?
-Chad
On Jan 29, 2009 6:17 PM, "Gregory Maxwell" <gmaxwell(a)gmail.com> wrote:
On Thu, Jan 29, 2009 at 5:07 PM, Brion Vibber <brion(a)wikimedia.org> wrote: >
As a general note -- I'...
Any chance that after some burn-in period you can take all the old
oversights and convert them into this new form (with all the hide bits
set— since we can't automatically know what needed hiding)?
If for no other reason the continue existence of the old system makes
explanations hard: "Oh no, we don't secretly vanish revisions and
misattribute edits— for information we had to remove you can at least
see their timestamps in the history." "What about this?" "Oh er..
hum.. you see, before some date, if they used THIS button.. err.."
_______________________________________________ Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia....
Hi
I am interested in contributing to Wikimedia .How can i help? I have worked
on Wikipedia for last 8 months and have good understanding of how it works.
--
Ankuj Gupta
Computer Engineering
NSIT,India
If you haven't seen it yet, Ubuntu is running an interesting
brainstorming software called IdeaTorrent to think collectively about
common problems and solutions:
http://brainstorm.ubuntu.com/
The software:
http://www.ideatorrent.org/
I wonder - would people consider it useful to set up something like
brainstorm.wikimedia.org using this software, or would it be too
duplicative of BugZilla and listservs? The benefit of IdeaTorrent is
that it's very straightforward for non-technical users to contribute
ideas and solutions. And, of course, it could be used for
non-technical problems as well.
--
Erik Möller
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate