There are 4 issues standing between polish wikipedia and phase 3
software.
One is automatic conversion of &#codes; are requires just a small
patch, which has been sent by me already.
Second is support for H1 (don't say that it is reserved for page
titles, CSS already treats H1 and H1.pagetitle different).
Two others are MySQL issues:
* MySQL doesn't search UTF-8 right.
* Wikipedia should be mirrorable and MySQL database dumps are not
really convenient way.
Afair MySQL 4.1 is supposed to fix the first, and there is some
patch already that fixes that, so could you investigate that stuff ?
Mirroring by downloading dumps is very inconvenient,
making nightly patches of dump file available is bare minimum.
But in longer term some better solution should be developed.
Anyway, I'm for setting up final setup instalation as soon as
&#code; and H1 issue is fixed. MySQL issues may be hard to fix
and nothing really critical would happen if they were fixed a few
weeks later.
Hi,
attached is a patch for the navigation sidebar. It changes the link
names as described in my previous message, and it fixes two problems:
- The "Protected page" text was shown twice
- The Watchlist link was shown for users who were not logged in,
although only logged in users can use it.
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
(cc'ing to wikitech-l)
Jonathan Walther wrote:
> Where are the templates stored that are used in generating the HTML
> pages? For instance, the templates for the header and footer for each
> article; the format of the recent changes page; that sort of thing. Is
> it hard coded into the source?
Yes. It's kind of ugly that way. ;)
Language-specific text is stored away in arrays in the Language**.php
files, but that contains relatively little markup.
See Skin.php and its fellows for most of the layout; also OutputPage.php
for some bits.
(And feel free to use the wikitech-l list for technical discussion of
the code; wikipedia-l is fairly high traffic as it is.)
Jonathan Walther wrote:
> Hi. Question about Wiki version control. Am I correct in believing
> that every revision of an article is stored in the database, in full?
Yes. Old revisions sit in the 'old' table, every blessed one. It's a big
table. (In theory this could be made more efficient in various ways;
compression, diffs, etc.)
> Also, looking through the php source, I'm seeing what look like a lot
> of MySQLisms that are hard to clean up, but if fixed could mean
> tremendous speedups with Postgres. Thats entirely apart from the
> benefit of running the VACUUM program every night so the database
> self-optimizes itself for the data access patterns that it sees.
Mmmm, please do!
> I would like to complement the coders on a really clean codebase. The
> code is a pleasure to read and tweak.
Send all compliments on the current codebase to Lee Daniel Crocker. He
da man!
> Not nice to do major changes on, but I doubt if that was ever
> intended for the code anyway. Postgres support isn't a major change,
> btw.
> There is one minor point; it's a very nice thing to have the sql stuff
> abstracted out into it's own .sql file. I refer to things like
> buildTables.php, and the like. Code and SQL don't mix too well; makes
> it harder to hunt down bugs or make modifications in either one. For
> instance, getting rid of MySQLisms...
Yes, it might not be a bad idea to break out the queries that way, so as
much as possible you can just drop in an alternate file or two and run
with a different database backend.
-- brion vibber (brion @ pobox.com)
(moving to the wikitech-l list; see sign-up and archive page at
http://www.wikipedia.org/mailman/listinfo/wikitech-l )
Jonathan Walther wrote:
> I've done some work at converting the Wikipedia to Postgres, but am not
> there yet. So, let's put that aside for now.
Great! I did get postgresql installed on my machine, but got bogged down
in details of converting the table definitions and various interface
behaviors. Someone with prior experience working with postgres would be
a big help there.
> It seems that the wiki "source" is "interpreted" into html every single
> time someone accesses a link. That seems like a lot of overhead.
> Given that for every time a change is made to the wiki source to a page,
> several people "view" it, why not just regenerate the html only when
> changes are made, and store it? It would take more storage space, but
> should be MUCH faster. And if storage is an issue, I can donate some
> hard drives...
We used to cache in the phase II days on the old server. This was
removed for two reasons:
1) Wiki->HTML rendering is still pretty darn fast, particularly with our
new dedicated server; database contention seems to be our main problem
during high-load periods.
2) We had problems keeping the cache consistent with the old code.
On number 2, I would certainly welcome an improved cache subsystem
that's designed right from the ground up. The old one was hacked in as a
"crap! the system's unusably slow, let's hack in some improved code"
On number 1, note that LinkCache::addLink() does a brief query on the
cur table for every link when rendering a wikipage. These could probably
be consolidated somehow or other. (Note that this does not apply to
Recentchanges, which loads everything in a big chunk.)
> The savings on the Recent Changes page alone should work wonders.
On the English wikipedia, Recentchanges is loaded at default options
about 3000 times per day; the number of edits per day is a similar
figure, and every edit means the page has to change to reflect it.
Caching the rendered display wouldn't seem to save significantly over
rerendering it on each view.
-- brion vibber (brion @ pobox.com)
Hi,
I'd like to suggest two solutions to the blocking of dynamic IP addresses or
proxies that may affect innocent users.
Solution #1: IP address blocks should expire after n days unless renewed by
someone. That way, instead of forgetting to unblock people, at worst we
forget to re-block them. In my opinion, it's better to fail to punish somone
effectively than to punish someone who's innocent.
Solution #2: We should give blocked users a way to re-gain access to the
site, namely by creating an account. I don't know if this is currently possible,
but it should be. We can block accounts a lot easier than IP addresses. So
we could basically say on the block page: "Because IP addresses cannot be
reliably linked to individuals, it may be that you receive this message in error.
In that case, or if you want to change your behavior, please create an
account and sign in, and you can continue to use Wikipedia."
We might still reserve complete IP&account bans for those who abuse the
account "backdoor", but this should be the exception, not the rule.
This would make our security softer, and hopefully more effective.
Regards,
Erik
--
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
NEU: Mit GMX ins Internet. Rund um die Uhr für 1 ct/ Min. surfen!
I'm not 100% happy with the sidebar:
Main page
Recent changes
Watch list
Current events
--------------------
Edit this page
Watch this page
Move this page
Talk page | Subject page
History
What links here
Watch links
--------------------
Upload
Bug reports
Special pages
The links are perfectly OK, but I have problems with the words used and
sometimes with the positioning. I suggest the following sidebar:
Main Page
Recent changes
My watchlist
Random page
Current events
--------------------
Edit this page
Watch this page
Move this page
Discuss this page | View article
Older versions
What links here?
Link history
--------------------
Upload file
Special pages
Bug reports
Explanations:
- "Watch list" should be "My watchlist" to make clear that this is not a
page that is the same for all users, like the other links in this
section of the sidebar.
- "Talk page" should be "Discuss this page" to use the same imperative
style of the other links above it. Since users don't have to create
"/Talk" links anymore, it's not necessary that we use the actual word
"Talk" anywhere but in the URLs.
- "Subject page" should be "View article" or "Back to article". "Subject
page" is really ambiguous and hard to understand, I searched several
times for a way back to the article because I didn't find an obvious
link.
- "History" should be "Older versions" or "Page history" to be more
obvious. Most people are not familiar with the concept of article
histories.
- "What links here" needs a questionmark.
- "Watch links" should be "Link history", "Related changes" or something
else. "Watch links" suggests that this will add the links to my
watchlist, which it doesn't do.
- Upload should be "Upload file". True, a bit redundant, but more
familiar this way.
- Bug reports should be at the bottom as this is the least relevant.
Your thoughts? I'll be glad to patch this, although it should be easy
enough for someone with CVS access to do it on their own.
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
Hi,
since the site stats are conveniently stored in the site_stats table, I
suggest subtracting the number of articles created by the Ram-Man bot
(US Census city information) from the total number of articles.
Why? The NOA is primarily interesting as a measure of our collaborative
progress. This is important for ourselves and for others. Personally,
I've had several discussions about Wikipedia where I was reluctant to
cite the NOA because of the high number of machine-generated articles,
others probably feel the same.
I therefore believe we should generally exclude autogenerated articles
(we can change the wording on Main_Page to reflect this). As it would be
a 5 minute task for anyone with access to the db, is there any reason
not to do it?
Regards,
Erik
--
FOKUS - Fraunhofer Insitute for Open Communication Systems
Project BerliOS - http://www.berlios.de
I've been getting lag for about the past hour
that seems to be especially pronounced when saving articles.
Otherwise, it's slower than usual but still tolerable.
Saving, however, takes half a dozen minutes.
-- Toby