Hi everyone,
I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
and I need to extra the content from it and convert it into LaTeX
syntax for printed documentation. I have googled for a suitable OSS
solution but nothing was apparent.
I would prefer a script written in Python, but any recommendations
would be very welcome.
Do you know of anything suitable?
Kind Regards,
Hugo Vincent,
Bluewater Systems.
I've been putting placeholder images on a lot of articles on en:wp.
e.g. [[Image:Replace this image male.svg]], which goes to
[[Wikipedia:Fromowner]], which asks people to upload an image if they
own one.
I know it's inspired people to add free content images to articles in
several cases. What I'm interested in is numbers. So what I'd need is
a list of edits where one of the SVGs that redirects to
[[Wikipedia:Fromowner]] is replaced with an image. (Checking which of
those are actually free images can come next.)
Is there a tolerably easy way to get this info from a dump? Any
Wikipedia statistics fans who think this'd be easy?
(If the placeholders do work, then it'd also be useful convincing some
wikiprojects to encourage the things. Not that there's ownership of
articles on en:wp, of *course* ...)
- d.
Hi,
I have just seen two independent instances where people said that they
sent posts to foundation-l in the last +/- 12 hours, which never got
posted on the list. The emails do not show up in the moderation queue
either (nor are these two subscribers, or the entire list, moderated).
Are there any technical problems with the mail(inglist) server that
you are aware of?
Michael
--
Michael Bimmler
mbimmler(a)gmail.com
Hi all,
We now have english wikipedia fully migrated to new servers / new search
backend. We cannot fully migrate other wikis until we resolve some hardware
issues. In the meantime, here is the overview of new features now deployed
on en.wiki:
1) Did you mean... - we now have search suggestions. Care has been taken to
provide suggestions that are context-sensitive, i.e. on phrases, proper
names, etc..
2) fuzzy and wildcard queries - a word can be made fuzzy by adding ~ to it's
end, e.g. query sarah~ thompson~ will give all different spellings and
similar names to sarah thompson. Wildcards can now be prefix and suffix,
e.g. *stan will give various countries in central asia.
3) prefix: - using this magic prefix, queries can be limited to pages
beginning with certain prefix. E.g.
mwsuggest prefix:Wikipedia:Village Pump
will search all village pumps and archives for mwsuggest. This should be
especially useful for archive searching in concert with inputbox or
searchbox
4) intitle: - using this magic prefix, queries can be limited to titles only
5) generally improved quality of search results via usage of related
articles (based on co-occurrence of links), anchor text, text abstracts,
proximity within articles, sections, redirects, improved stemming and such
Cheers, Robert
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
hello,
i have written a new extension to embed music scores in MediaWiki pages:
https://secure.wikimedia.org/wikipedia/mediawiki/wiki/Extension:ABC
unlike the Lilypond extension, this uses a simple input language (ABC) that is
much easier to validate for security. ABC is mostly used to transcribe Irish
trad and other simple tunes, but it recently gained support for more advanced
features, e.g. multiple staves and lyrics. this is supported in the extension
using the 'abcm2ps' tool.
unlike the existing ABC extension (AbcMusic), it doesn't support opening
arbitrary files as ABC input (which is a potential security issue), and has
several additional features:
- - The original ABC can be downloaded easily
- - The score can be downloaded as PDF, PostScript, MIDI or Ogg Vorbis
- - A media player can be embedded in the page to play the media file
i believe the ABC format is suitable for transcribing the majority of scores
currently on Wikimedia projects. although it can't handle all of them, it is
better than the current situation. plus, as ABC is simple, and existing ABC
scores are easily available, it's easier for novice users to contribute.
i would be interested to hear peoples' thoughts on enabling this extension on
Wikimedia.
- river.
-----BEGIN PGP SIGNATURE-----
iD8DBQFJBwL+IXd7fCuc5vIRAqG6AJ9RxKTGjJ7ywdZoesrTJWrMPtBYrACgjgDX
lIY552ilDFaVG1mLzqW1F/Y=
=7Tda
-----END PGP SIGNATURE-----
Can someone explain why the Wikimedia Commons accepts uploads of
printable PDF documents (e.g. brochures) but not the editable
source version in Open Document Format (e.g. .ODT). This seems to
violate the open source principle.
This should be an FAQ but, but it isn't obvious from
http://commons.wikimedia.org/wiki/Commons:File_types
--
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Special:Version displays SVN version numbers for extensions out of
$wgExtensionCredits, which seems to be done with $LastChangedRevision$
keywords in the extension's entry point file.
This produces massively incorrect numbers in many cases, since the entry
point file is relatively rarely changed in non-trivial extensions
consisting of multiple files. Updates to the body, class, i18n, and
other files are not reflected.
If we're running on a SVN checkout of the extension, we could check the
directory for its current revision much as we do for MediaWiki itself;
this would tell us for instance if an extension's subdirectory has been
updated separately from the core MediaWiki.
But if we aren't on a SVN checkout, or if individual files have been
updated to different versions, this may or may not tell us anything useful.
Anybody have a suggestion on how to best handle this?
- -- brion
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAklb6VcACgkQwRnhpk1wk44MNACg2c0ztpocjHfsb5l+KxSu8e+I
wXgAoMSrjFeTPzEnMY4904bxXZv+DiYf
=GNqG
-----END PGP SIGNATURE-----
On thistle with DB=dewiki:
mysql> explain select * from recentchanges
left join tag_summary on ts_rc_id=rc_id
order by rc_timestamp desc limit 50\G
*************************** 1. row ***************************
table: recentchanges
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1179921
Extra: Using temporary; Using filesort
*************************** 2. row ***************************
table: tag_summary
type: ALL
possible_keys: ts_rc_id
key: NULL
key_len: NULL
ref: NULL
rows: 4
Extra:
2 rows in set (0.00 sec)
Whenever you do a join with a limit, MySQL gets the query plan wrong.
It scans the small table and filesorts the large table. You have to
use FORCE INDEX on the small table to suppress the scan. We've seen
this many times. It's very difficult to detect during code review and
frequently crashes the site.
Does anyone know a DBMS where joining with limits actually works?
Because I'm sick of this crap.
-- Tim Starling
>> On 1/4/09 6:20 AM, yegg at alum.mit.edu wrote:
>> The current enwiki database dump (http://download.wikimedia.org/enwiki/20081008/
>> ) has been crawling along since 10/15/2008.
> The current dump system is not sustainable on very large wikis and
> is being replaced. You'll hear about it when we have the new one in
> place. :)
> -- brion
Following up on this thread: http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html
Brion,
Can you offer any general timeline estimates (weeks, months, 1/2
year)? Are there any alternatives to retrieving the article data
beyond directly crawling
the site? I know this is verboten but we are in dire need of
retrieving this data and don't know of any alternatives. The current
estimate of end of year is
too long for us to wait. Unfortunately, wikipedia is a favored source
for students to plagiarize from which makes out of date content a real
issue.
Is there any way to help this process along? We can donate disk
drives, developer time, ...? There is another possibility
that we could offer but I would need to talk with someone at the
wikimedia foundation offline. Is there anyone I could
contact?
Thanks for any information and/or direction you can give.
Christian