Dear All,
I just experienced that we have multiple page_id against one
page_title in the page table.
for example:
mysql> select page_id, page_title from page where upper(page_title) =
upper('Australia');
+---------+------------+
| page_id | page_title |
+---------+------------+
| 693538 | Australia |
| 1805288 | Australia |
| 4689264 | Australia |
| 8759165 | Australia |
+---------+------------+
4 rows in set (6.59 sec)
Can anyone please tell me how can i interpret this information?
Thanks a lot.
Best Regards,
Fawad.
Hi All,
I downloaded enwiki-latest-pages-articles.xml from
http://download.wikimedia.org/enwiki/latest/ and imported page.txt &
revision.txt in my mysql database.
Now i am trying to extract groups of people who are editing same pages.
on the revision table i ran the following query, to see the page_id of
pages that have been edited by more than one person.
mysql> select rev_page, count(rev_page) from revision group by
rev_page having count(rev_page)>1;
Empty set (1 min 19.24 sec)
I got an empty set back. I think this is not possible, as many pages
are edited by multiple people in Wikipedia.
Can you please let me know if i am doing something wrong here??
I will be thankful for help
Best Regards,
--
Fawad Nazir
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.12alpha (r26849).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
17 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 527 of 544 tests (96.88%)... 17 tests failed!
On 10/18/07, tango(a)svn.wikimedia.org <tango(a)svn.wikimedia.org> wrote:
> Revision: 26830
> Author: tango
> Date: 2007-10-18 19:19:37 +0000 (Thu, 18 Oct 2007)
>
> Log Message:
> -----------
> Show changes to pages linked to via a redirect on Special:Recentchangeslinked
>
> . . .
> rc_new_len,
> rc_deleted
> " . ($uid ? ",wl_user" : "") . "
> - FROM $pagelinks, $recentchanges
> + FROM $pagelinks, $recentchanges, $redirect, $page
> " . ($uid ? " LEFT OUTER JOIN $watchlist ON wl_user={$uid} AND wl_title=rc_title AND wl_namespace=rc_namespace " : "") . "
> WHERE rc_timestamp > '{$cutoff}'
> {$cmq}
> - AND pl_namespace=rc_namespace
> + AND (pl_namespace=rc_namespace
> AND pl_title=rc_title
> - AND pl_from=$id
> + AND pl_from=$id)
> + OR (rd_namespace=rc_namespace
> + AND rd_title=rc_title
> + AND rd_from=page_id
> + AND page_namespace=pl_namespace
> + AND page_title=pl_title
> + AND pl_from=$id)
> $GROUPBY
> ";
> }
At first inspection, this appears to cause the query to be executed as
a Cartesian join (query run on a fairly old dump of the Simple English
Wikipedia on localhost with some random parameters).
mysql> EXPLAIN SELECT * FROM pagelinks, recentchanges, redirect, page
WHERE rc_timestamp > '20071011094400' AND (pl_namespace=rc_namespace
AND pl_title=rc_title) OR (rd_namespace=rc_namespace AND
rd_title=rc_title AND rd_from=page_id AND page_namespace=pl_namespace
AND page_title=pl_title) AND pl_from=1234;
+----+-------------+---------------+-------+------------------------------------------------+-------------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra
|
+----+-------------+---------------+-------+------------------------------------------------+-------------+---------+------+---------+--------------------------+
| 1 | SIMPLE | pagelinks | index | pl_from,pl_namespace
| pl_from | 265 | NULL | 1078997 | Using
index |
| 1 | SIMPLE | recentchanges | ALL |
rc_timestamp,rc_namespace_title,rc_ns_usertext | NULL | NULL
| NULL | 176 | Using where |
| 1 | SIMPLE | page | ALL | PRIMARY,name_title
| NULL | NULL | NULL | 43475 | Using
where |
| 1 | SIMPLE | redirect | index | PRIMARY,rd_ns_title
| rd_ns_title | 265 | NULL | 8185 | Using
where; Using index |
+----+-------------+---------------+-------+------------------------------------------------+-------------+---------+------+---------+--------------------------+
4 rows in set (0.00 sec)
Note the join types, and the row numbers. The old code was a simple
ref plus range scan:
mysql> EXPLAIN SELECT * FROM pagelinks, recentchanges WHERE
rc_timestamp > '20071011094400' AND (pl_namespace=rc_namespace AND
pl_title=rc_title) AND pl_from=1234;
+----+-------------+---------------+-------+------------------------------------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref | rows | Extra
|
+----+-------------+---------------+-------+------------------------------------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | pagelinks | ref | pl_from,pl_namespace
| pl_from | 4 | const | 1 | Using
index |
| 1 | SIMPLE | recentchanges | range |
rc_timestamp,rc_namespace_title,rc_ns_usertext | rc_timestamp | 16
| NULL | 1 | Using where |
+----+-------------+---------------+-------+------------------------------------------------+--------------+---------+-------+------+-------------+
2 rows in set (0.00 sec)
Generally speaking, OR does not make MySQL happy. UNION is a good deal better:
mysql> EXPLAIN SELECT rc_namespace, rc_title FROM pagelinks,
recentchanges WHERE rc_timestamp > '20071011094400' AND
(pl_namespace=rc_namespace AND pl_title=rc_title) AND pl_from=1234
UNION SELECT rc_namespace, rc_title FROM pagelinks, recentchanges,
redirect, page WHERE rc_timestamp > '20071011094400' AND
rd_namespace=rc_namespace AND rd_title=rc_title AND rd_from=page_id
AND page_namespace=pl_namespace AND page_title=pl_title AND
pl_from=1234;
+----+--------------+---------------+--------+------------------------------------------------+--------------+---------+-----------------------------------------------------------------+------+-------------+
| id | select_type | table | type | possible_keys
| key | key_len | ref
| rows | Extra |
+----+--------------+---------------+--------+------------------------------------------------+--------------+---------+-----------------------------------------------------------------+------+-------------+
| 1 | PRIMARY | pagelinks | ref | pl_from,pl_namespace
| pl_from | 4 | const
| 1 | Using index |
| 1 | PRIMARY | recentchanges | range |
rc_timestamp,rc_namespace_title,rc_ns_usertext | rc_timestamp | 16
| NULL |
1 | Using where |
| 2 | UNION | pagelinks | ref | pl_from,pl_namespace
| pl_from | 4 | const
| 1 | Using index |
| 2 | UNION | recentchanges | range |
rc_timestamp,rc_namespace_title,rc_ns_usertext | rc_timestamp | 16
| NULL |
1 | Using where |
| 2 | UNION | redirect | ref | PRIMARY,rd_ns_title
| rd_ns_title | 261 |
wikidb.recentchanges.rc_namespace,wikidb.recentchanges.rc_title | 1
| Using index |
| 2 | UNION | page | eq_ref | PRIMARY,name_title
| PRIMARY | 4 |
wikidb.redirect.rd_from | 1
| Using where |
| NULL | UNION RESULT | <union1,2> | ALL | NULL
| NULL | NULL | NULL
| NULL | |
+----+--------------+---------------+--------+------------------------------------------------+--------------+---------+-----------------------------------------------------------------+------+-------------+
7 rows in set (0.00 sec)
Although not really good, given the row counts. And given the
randomness of my parameters and the oddity of my database (e.g. it's
months out of date, I picked a page that probably has practically no
incoming links, my MySQL version is different from Wikimedia's), the
UNIONized query may still be a serious problem and needs to be
examined more closely before being deployed.
Hi.
A new version of the Python parser in WikiXRay, along with improved documentation, can be found here:
http://meta.wikimedia.org/wiki/WikiXRay_Python_parser
Basically, I've developed two flavors: the standard for those people who want an alternative to other tools for processing Wikipedia's dumps (including the text table). The other version is for research purposes, It ignores the text itself and extracts instead useful info on the fly.
Both flavors use extended inserts (you can tune the size and num. of rows) and the --monitor mode calls a db access module to avoid timeout errors.
Further improvements (--skipnamespaces and --inject, this one should be very easy) are on the way.
Best,
Felipe.
---------------------------------
Sé un Mejor Amante del Cine
¿Quieres saber cómo? ¡Deja que otras personas te ayuden!.
Hi Travis,
We just went through a big evolution and found a few things that helped
us, so how about we take a look. Also, Emil made that patch that
prevents google analytics from busting the cache as much... that's in
the wikimedia svn. Can you ask Jack to come up to San Mateo and we'll
sit down with him and we'll also get Artur to connect with you on-line.
Thanks,
John Q.
-------- Original Message --------
Subject: [Wikitech-l] Looking for a system administrator familiar with
the Squid setup
Date: Thu, 18 Oct 2007 08:43:56 -0400
From: Travis Derouin <travis(a)wikihow.com>
Reply-To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
Hey,
We've been running into some performance problems lately, and I'm stumped.
I'm not sure if we need more hardware or not.
I'd like to find a system administrator familiar with the Squid,
Apache/Mediawiki, MySQL setup to take a look at our system, and identify any
potential problems that we might have. We'd be comfortable with either a
one-time fee, or an hourly rate. We have a 6 server setup right now, with 1
Squid, 1 DB, 3 Apaches and 1 spare. If you or someone you know is
interested, send an e-mail directly to me: travis(a)wikihow.com with your
details and experience.
Sorry for the job-type like posting, but I'm out of ideas and need some
help.
Thanks!
Travis
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
http://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi All,
I am Vinay, from Bangalore, India.I've just joined this mailing list. My
apologies if the mail/query is not in the right format, or If I'm in the wrong
forum.
We want to use Mediawiki in rural project we are workign on. I find that
Mediawiki does not at the moment support content creation in Indian languages
like Kannada,Hindi,Tamil etc.
I wanted to know what we will need to do/who we can work with to get MediaWiki
to support Indian languages.
Generally for Indian language support, we use a software called Baraha and once
we install that software, I can choose that font from say word,excel etc and
whatever I type will appear in the local language script. I was wondering if we
could integrate that into Mediawiki.
Looking forward to your support!
Thanks,
Vinay.
An automated run of parserTests.php showed the following failures:
This is MediaWiki version 1.12alpha (r26835).
Reading tests from "maintenance/parserTests.txt"...
Reading tests from "extensions/Cite/citeParserTests.txt"...
Reading tests from "extensions/Poem/poemParserTests.txt"...
Reading tests from "extensions/LabeledSectionTransclusion/lstParserTests.txt"...
17 still FAILING test(s) :(
* URL-encoding in URL functions (single parameter) [Has never passed]
* URL-encoding in URL functions (multiple parameters) [Has never passed]
* Table security: embedded pipes (http://lists.wikimedia.org/mailman/htdig/wikitech-l/2006-April/022293.html) [Has never passed]
* Link containing double-single-quotes '' (bug 4598) [Has never passed]
* message transform: <noinclude> in transcluded template (bug 4926) [Has never passed]
* message transform: <onlyinclude> in transcluded template (bug 4926) [Has never passed]
* BUG 1887, part 2: A <math> with a thumbnail- math enabled [Has never passed]
* HTML bullet list, unclosed tags (bug 5497) [Has never passed]
* HTML ordered list, unclosed tags (bug 5497) [Has never passed]
* HTML nested bullet list, open tags (bug 5497) [Has never passed]
* HTML nested ordered list, open tags (bug 5497) [Has never passed]
* Inline HTML vs wiki block nesting [Has never passed]
* Mixing markup for italics and bold [Has never passed]
* dt/dd/dl test [Has never passed]
* Images with the "|" character in the comment [Has never passed]
* Parents of subpages, two levels up, without trailing slash or name. [Has never passed]
* Parents of subpages, two levels up, with lots of extra trailing slashes. [Has never passed]
Passed 527 of 544 tests (96.88%)... 17 tests failed!
Hello!
You are receiving this email because your project has been selected
to take part in a new effort by the PHP QA Team to make sure that
your project still works with PHP versions to-be-released. With this
we hope to make sure that you are either aware of things that might
break, or to make sure we don't introduce any strange regressions.
With this effort we hope to build a better relationship between the
PHP Team and the major projects.
If you do not want to receive these heads-up emails, please reply to
me personally and I will remove you from the list; but, we hope that
you want to actively help us making PHP a better and more stable tool.
The first release candidate of PHP 5.2.5 was just released and can be
downloaded from http://downloads.php.net/ilia/. Please try this
release candidate against your code and let us know if any
regressions should you find any. The goal is to have 5.2.5 out within
three weeks time, so timely testing would be extremely helpful.
In case you think that other projects should also receive this kinds
of emails, please let me know privately, and I will add them to the
list of projects to contact.
Best Regards,
Ilia Alshanetsky
5.2 Release Master
Hey,
We've been running into some performance problems lately, and I'm stumped.
I'm not sure if we need more hardware or not.
I'd like to find a system administrator familiar with the Squid,
Apache/Mediawiki, MySQL setup to take a look at our system, and identify any
potential problems that we might have. We'd be comfortable with either a
one-time fee, or an hourly rate. We have a 6 server setup right now, with 1
Squid, 1 DB, 3 Apaches and 1 spare. If you or someone you know is
interested, send an e-mail directly to me: travis(a)wikihow.com with your
details and experience.
Sorry for the job-type like posting, but I'm out of ideas and need some
help.
Thanks!
Travis