Hi,
I try to write an extension for Mediawiki that allows users to influence the keywords that are added as meta tags to the HTML code:
<meta name="keywords" content="Find out which table relates to a SYS LOB Segment" />
I already figured out that these tags are written OutputPage.php and are generated by the addKeywords function. I have found the 'OutputPageParserOutput' hook, that allows me to add more keywords (unfortunately it does not allow any access to the existing keywords to remove them). What I'm not sure if how to get the keywords the user entered into the function. What I thought of was a special tag that users can use to specific keywords that override the generated keywords...
Any ideas or hints where I should start looking?
Roman
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I've fixed up the image-based captcha to read from subdirectories, which
should put less load on the file server.
Also I've gone ahead and enabled it for all wikis, partly in response to
concerns that there's some possibly machine-based mass registration
going on which might be malicious.
If there's problems, let us know.
- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF2gfYwRnhpk1wk44RAgoyAJ9d29QYZU8uio4uutwYhGCeaRSvigCfTkvP
NohYqJpwi07qF5a7HAW2G3Y=
=WJnP
-----END PGP SIGNATURE-----
Dear all,
I thought I'd introduce myself first. I've been using Wikipedia for
about four years, mostly the English one (user houshuang), but also
occasionally the Norwegian (I am Norwegian), the Chinese, and lately
the Indonesian (I currently live in Jakarta). I am also very
interested in Wikipedia from a social and technological perspective.
Lately I've been working a lot on a way to use Wikipedia HTML dump
files offline. I posted about this a while ago, the current version
can be downloaded from http://houshuang.org/blog . I am working on one
with a few improvements. It's working quite well right now (I have
about 8 small language wikis on my HD, and all the interlingua links
work, etc). THe idea is that it works without unzipping the files
first, you just place them in the right directory, and given that you
have 7zip and Ruby, it should just work (serving files to localhost).
I will make a better installer, a tiny graphical UI etc later, so that
I can put it on a CD with a given language file, and it will just work
with one click, on Mac, Windows, Linux.
The problem is that 7zip was never optimized to work quickly on
extracting one given file out of hundreds of thousands, or millions.
Right now, the Indonesian wikipedia (60MB 7zipped) takes about 15
seconds for a page on my two-year old iBook, whereas the Chinese one
(250MB 7zipped) takes about 150 seconds for a page. I haven't dared
try any of the bigger ones, like the German (1,5GB) or the English
(four files a 1,5GB)... My first thought was if it was possible to
modify the open-source 7zip to generate an index of which block the
different files where, which would then make the actual extraction a
lot faster. The problem is that I suck at C, and I have been looking
for people to help me, even offering a small bounty to the developer.
(If anyone here would help me, that would be MUCH appreciated! I
personally think it would be quite easy, given the sourcecode that
exists, but I don't know for sure).
The developer himself suggested packing the Wikipedia dump file with
something like this
7z a -mx -ms=32M -m0d=32M archive *
which would make it more modular, and much faster to access. However,
I really don't want to repack all the dumpfiles (I cannot imagine the
time it would take to rezip the 1,5GB big file) and I don't have the
capacity to host them - my intention has all the time been for my
program to work out of the box with the Wikipedia dump files... So I
am writing here, since I don't know how else to contact the dump file
developers... is there any way they would consider using these options
for making the dump files, or if not, what are the reasons (maybe the
files would get slightly bigger, but I think the benefit would far
outweigh the disadvantage!).
Anyway, any help or guidance or ideas would be much appreciated. And
feel free to play around with my program. It's very unfinished (and I
have a better version which I will publish soon), but it's already
quite functional, and has saved me through several long boring
meetings in fancy hotels with too expensive wifi :)
Thank you very much, and please let me know if there are other mailing
lists - Wikipedia discussion pages which would be more appropriate for
this question.
Stian in Jakarta
--
Stian Haklev - University of Toronto
http://houshuang.org/blog - Random Stuff that Matters
Hi all,
I'm new on this forum. please bear with me.
I am running a mediawiki (1.6.9 because I would like to put my fingers
into updating php in a server where I am not an admin).
Now this is a wiki which will be looked at by a lot of people in my
company, but only edited by about 15 programmers, who all have an
account on that server.
Is there a way to allow account creation only for users already present
in /etc/passwd ?
Thanks a lot if anyone has an idea.
Holger
I have open sourced a simple "C" based utility under GPLv3 which will
convert Wikimedia Foundation XML Wikipedia Dumps into a format
which allows easy compliance with the GFDL for off-Wikipedia wikis.
This utility allows the XML dumps to be parsed and [[en:<Article
Title>]] or
[http://en.wikipedia.org/wiki/<ArticleTitle>] tags to be inserted into
the dump at the end of articles. Dumps converted with this
tool point back into the source Wikipedia artcles and edit history when
imported into MediaWiki.
The tools allows tags which use interwiki_sql links as well as simple
URL addresses. This code is provided to the Wikipedia Community
and other consumers of Wikipedia content with an easy way to insert GFDL
compliant tags into master XML dumps and import them.
The source code, binaries, and makefile are available at:
ftp://www.wikigadugi.org/wiki/xml/gfdl-wikititle.tar.gz
Enjoy.
Jeff
> How about using another algorithm that does this already?
Thanks for those who answered my mail. The thing is that I really
really want my program to work natively with the dump files from
Wikipedia, both because it would take me days to recompress 1,5 GB
files, I would have no capacity to host them, etc. I am not just
making this program to generate one offline CD for one language, but
for it to be a tool that works with all dump files (right now I have
the dump files of 9 smaller languages in a catalogue, all the
interwiki links work etc - just slow)...
Therefore, if the people running the dumps would consider changing the
format to something that was easier to random access, I would be all
for it. Indeed I don't know the pros and cons that made them choose
7zip in the first place. However, I don't even know where to start a
discussion with them, and I am imagining that such a decision would
take very long to implement. Thus I figured trying to tweak 7zip would
probably be a much faster way. :)
If you have any pointers on getting in touch with the dump people (I
also tried pointing out on a talk page somewhere a week ago that on
the static download page it says that dumps for December are currently
in progress, and the link points to November, however, all the dumps
for December are done according to the log, and you can download them
if you type in the URL manually... that was a week ago and apparently
that talk page wasn't a good way of getting in touch with them), or if
some of them are hanging around here, I'd love to have a discussion
with them, both about the dump format itself, and some other technical
details on how they prepare the material.
(I would also love to have several HTML dumps - one with only the
article pages. Currently there is only one - which includes all pages,
even the image detail pages). Let me say though, that they've done an
awesome job, and there are some really neat decisions in how to make
the static dumps.
Thanks a lot
Stian
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.10alpha (r20068).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
18 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* TODO: Table security: embedded pipes (http://mail.wikipedia.org/pipermail/wikitech-l/2006-April/034637.html) [Has never passed]
* TODO: Link containing double-single-quotes '' (bug 4598) [Has never passed]
* TODO: message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* TODO: message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* TODO: HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* TODO: HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* TODO: HTML nested bullet list, open tags (bug 5497) [Has never passed]
* TODO: HTML nested ordered list, open tags (bug 5497) [Has never passed]
* TODO: Inline HTML vs wiki block nesting [Has never passed]
* TODO: Mixing markup for italics and bold [Has never passed]
* TODO: 5 quotes, code coverage +1 line [Has never passed]
* TODO: dt/dd/dl test [Has never passed]
* TODO: Images with the "|" character in the comment [Has never passed]
* TODO: Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* TODO: Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 493 of 511 tests (96.48%)... 18 tests failed!
This pops up every so often at www.mediawiki.org, particularly in relation
to the public domain help pages that we are trying to compile.
What is the correct license for screenshots of MediaWiki? Are we able to
distribute them along with the PD help pages (when we get to that stage)?
Currently they are variously tagged as GFDL, PD, (c) WMF and possibly
others.
Are there any considerations that may cause some screenshots to be under one
license and some under another? E.g. if the screenshot includes the MW logo
or certain interface text does it affect how it can be licensed? What about
the different skins?
It would be good to have this question answered officially, once and for
all.
- Mark Clements (HappyDog)