we are trying to extract all URLs in wiki articles from our Mediawiki
installation. We have tried Grep, Perl and Sed on mysql dumps, but it
is very difficult to get the URLs only, without some
garbage/text/comments before or after them.
Does anyone know of a better way to achieve this?
The "usermerge" procedure is presently irreversible. Are there plans
for a way to make it reversible in case of error? I'm assuming error
(and a great deal of tedious mucking about to go back) is inevitable,
since the action is taken by humans.
After a few weeks of bug fixes, we've caught up with MediaWiki
development code review and I'm pushing out an update to the live sites.
This fixes a lot of little bugs, and hopefully doesn't cause introduce
too many new ones. :)
* Change logs: http://ur1.ca/2rah (r47458 to r48811)
As usual in addition to lots of offline and individual testing among our
staff and volunteer developers, we've done a shakedown on
http://test.wikipedia.org/ -- and as usual we can fully expect a few
more issues to have cropped up that weren't already found.
Don't be alarmed if you do find a problem; just let us know at
http://bugzilla.wikimedia.org/ or on the tech IRC channels
(#wikimedia-tech on Freenode).
We should be resuming our weekly update schedule soon -- I won't be
doing a mega-crosspost like this every week! -- and will continue to
improve our pre-update staging and shakedown testing to keep disruption
to a minimum and awesome improvements to a maximum.
I'd also like to announce that we've started a blog for Wikimedia tech
activity & MediaWiki development, in part because I want to make sure
community members can easily follow what we're working on and give
feedback before we push things out:
I'd very much like to make sure that we've got regular contacts among
the various project communities who can help coordinate with us on
features, bugs, and general thoughts which might affect some projects
distinctly from others.
-- brion vibber (brion @ wikimedia.org)
CTO, Wikimedia Foundation
Nice to have all the world's alphabets at one's fingertips right there
on the (www.mediawiki.org) edit page. Wow, IPA, Pīnyīn, they're all
there. I'm saving it for even when I'm not editing MediaWiki.
(However, as it bloats the edit page up to 700%
$ wget -O x.html http://www.mediawiki.org/w/index.php?title=WHATEVER\&action=edit
$ perl -nwle 'print unless /onclick/' x.html|wc -c - x.html|sed '$d'
there should also be some mechanism available for pages like these for
users to opt out, perhaps via Preferences?)
Aryeh Gregor wrote:
> On Tue, Mar 3, 2009 at 7:42 AM, Hay (Husky) <huskyr at gmail.com> wrote:
> > I don't know if making such an infobox that does not support IE6 and
> > IE7 is a good idea.
I've now added code to make IE6/IE7 degrade gracefully:
> It doesn't even support Firefox 2 . . . inline-block wasn't
> implemented in Gecko until 1.9 (Firefox 3).
Correct. The purpose of the case study was not to make all things work
for all browsers, but rather to see how features supported by all
modern browsers (in particular CSS 2.1) can improve Wikipedia. Whether
FF2 should still be on the list of supported browsers is not for me to
decide. At some point, however, it will be dropped from the list and
it is worth preparing for that moment.
> Also: "It should be fairly easy to do so, as the HTML code is
> generated by templates." Has he *looked* at the templates? :)
Yes, and they are quite scary :)
That's why I chose to write my ideal HTML code by hand instead of
trying to change the templates. What the sentence tries to say,
however, is that not all articles have to be changed, "just" the
> The major reason why inline style is used on Wikipedia is, of
> course, because ordinary editors don't have the ability to use
> stylesheets. And while admins do, they can only effectively add
> markup to *all* pages at once, regardless of whether they contain
> the exact infobox in question. An awful lot of the provided CSS is
> nation-box-specific, and so useless in 99.99% of Wikipedia's
> articles. (Literally: there are about 2.7 million articles, and I'm
> pretty sure there are less than 270 recognized nations.) But all
> that CSS would have to be served with all of them.
It could be done in one style sheet. And that style sheet wouldn't
have to be very complex, if one can agree on a set of class names
(i.e., a micro-format) which is inter-national.
Håkon Wium Lie CTO °þe®ª
I have installed Mediawiki 1.14.0 http://www.mediawiki.org/wiki/Download
and am trying to get the Cite Extension
http://www.mediawiki.org/wiki/Extension:Cite version 1.14.0 to work.
When accessing the Main_Page I get the error:
Fatal error: Call to undefined method
/var/www/wiki2/extensions/Cite/Cite_body.php on line 699
I can however get the 1.13.0 version of the Cite Extension to work.
Is this the correct place to be asking for help on the Cite Extension –
or is there some other mailing list or newsgroup for this.
Thanks Joshua. I would prefer that you post to the Mailing List / Newsgroup – so that all can benefit from your ideas.
--- El dom 8-mar-09, Joshua C. Lerner <jlerner(a)gmail.com> escribió:
> De: Joshua C. Lerner <jlerner(a)gmail.com>
> Asunto: Re: [Wikitech-l] Importing Wikipedia XML Dumps into MediaWiki
> Just for kicks I decided to try to do an import of ptwiki -
> using what
> I learned in bringing up mirrors of 4 Greek and 3 English
> sites, including Greek Wikipedia. Basically I had the best
> luck with
> Xml2sql (http://meta.wikimedia.org/wiki/Xml2sql). The
> conversion from
> XML to SQL went smoothly:
> # ./xml2sql /mnt/pt/ptwiki-20090128-pages-articles.xml
> As did the import:
> # mysqlimport -u root -p --local pt
> Enter password:
> pt.page: Records: 1044220 Deleted: 0 Skipped: 0
> Warnings: 0
> pt.revision: Records: 1044220 Deleted: 0 Skipped: 0
> Warnings: 3
> pt.text: Records: 1044220 Deleted: 0 Skipped: 0
> Warnings: 0
> I'm running maintenance/rebuildall.php at the moment:
> # php rebuildall.php
> ** Rebuilding fulltext search index (if you abort this will
> searching; run this script again to fix):
> Dropping index...
> Rebuilding index fields for 2119470 pages...
> (still running)
> I'll send a note to the list with the results of this
> experiment. Let
> me know if you need additional information or help. Are you
> trying to
> set up any mirrors?
Thanks for making this attempt. Let me know if your rebuildall.php has memory issues.
This is really getting confusing for me – because there are so many ways – all of which guaranteed to work – that work, and the one that is recommended – does not seem to work.
I would try out your approach too – but it would take time as I only have one computer to spare.
¡Sé el Bello 51 de People en Español! ¡Es tu oportunidad de Brillar! Sube tus fotos ya. http://www.51bello.com/
Everybody please join me in welcoming Frédéric Vassard as the latest
addition to the Wikimedia Foundation's tech team.
Fred's been doing Linux and Unix system administration around the SF bay
area for the last few years, and has enough notches on his belt to dare
to step up and tame the beasts that run Wikipedia and her sister projects...
He'll be helping us out with operations, monitoring, and documentation
of our servers, making sure everything's running smoothly and improving
our responses to and anticipation of problems.
Welcome aboard, Fred!
-- brion vibber (brion @ wikimedia.org)
Unfortunately, this comes back to the limitation of the
Http class to return helpful error messages. There's an
open bug about similarly vague messages in Interwiki
imports. Sadly, the methods in Http only return content
or failure, not a helpful message why it failed, which
could be 404, timeout, etc.
On Mar 22, 2009 9:03 PM, "Brion Vibber" <brion(a)wikimedia.org> wrote:
On 3/21/09 6:18 PM, Soxred93 wrote: > Recieved this when trying to upload a
non-existant image: > > ...
Agreed! :) There's a number of other bits of the UI which I'm sure can
be improved... having people actually use it is a good way to flush the
rest of those out! :D
_______________________________________________ Wikitech-l mailing list