Magnus Manske wrote:
Brion, please refrain from quoting half-sentences of
mine, which gives
the impression I said something I didn't.
There is no legal necessity for the "list authors" feature. I never said
that. I said that it is currently very inconvenient to fulfil the
demands of the GFDL, because you have to download the entire wikipedia,
including history, to correctly publish a single article. What good is
it to have an open license if taking advantage of that is a PITA? (I'm
talking non-geeks here, and geeks who have a slow machine/internet
connection/etc.)
Okay, I quoted the entire paragraph to show that you said exactly what I thought
you said.
Now, I ask again the same thing for the same reasons: please ask Brad Patrick,
the foundation's legal counsel, about this to see if it's actually true.
If you want to contribute something that is actually
helpful, please run
the one little SQL query I asked you to run, and tell us how many
seconds it takes. Maybe run it again ignoring anons. If it take 2
seconds, the feature could be activated (for the time being, of course);
if it takes 30 seconds, activating it on Wikipedia is certainly out of
the question.
For database performance issues I leave this in Domas's hands, since he'll just
turn it off if he thinks it's a problem anyway.
A quick test in isolation on lomaria showed 'George W. Bush' taking about a
second and 'Wikipedia:Sandbox' about 4 seconds. As frequently hit pages they may
be better in cache already; Sandbox results were much faster after that first
hit. Random-access could lead to cache churn, check with Domas.
EXPLAIN shows use of a temporary table, not necessarily a good sign but not too
bad if it fits in memory:
mysql> EXPLAIN SELECT DISTINCT rev_user,rev_user_text FROM revision WHERE
rev_page=3414021;
+----------+-------+------------------------+---------+---------+------+-------+------------------------------+
| table | type | possible_keys | key | key_len | ref | rows |
Extra |
+----------+-------+------------------------+---------+---------+------+-------+------------------------------+
| revision | range | PRIMARY,page_timestamp | PRIMARY | 4 | NULL | 42114 |
Using where; Using temporary |
+----------+-------+------------------------+---------+---------+------+-------+------------------------------+
1 row in set (0.02 sec)
More worrying than the time it takes is the amount of data it churns out: 8965
rows for George W. Bush, 21528 rows for Wikipedia:Sandbox. That's only going to
get longer as time goes on, and it's unsustainable in the long term. (That's
possibly why the GFDL explicitly *doesn't* require a list of every contributor.)
It's still a lot with accounts only: 2664 rows for GWB and 6569 for the sandbox.
So even if it's fast enough for the moment, I'd much prefer if we had something
that fit clear requirements. If the idea is for every random person grabbing
pages off our site to have the minimal GFDL requirements, I'm not so sure this
fits the bill.
-- brion vibber (brion @
pobox.com)