Hi
I used xml2sql to convert an xml dump to sql, then imported the three tables
(text,page,revision) with mysqlimport (i had to make some things^1). now
when I browse through the wiki, all categories are empty. what should I do
to fix that?.
importDump.php is very slow but it doesn't give empty categories, is it
possible to use it to just fill the categories?.
mwdumper.jar doesn't work (at least for me).
^1 =
ALTER TABLE `revision`
DROP `rev_len`,
DROP `rev_parent_id`;
because :
mysqlimport: Error: Row 1 doesn't contain data for all columns, when using
table
: revision
ALTER TABLE `revision` CHANGE `rev_comment` `rev_comment` LONGBLOB NOT NULL
because:
mysqlimport: Error: Data too long for column 'rev_comment' at row 4799,
when usi
ng table: revision
ALTER TABLE `revision` ADD `rev_len` INT UNSIGNED NOT NULL ,
ADD `rev_parent_id` INT UNSIGNED NULL ;
--
--alnokta
forwarding to local lists for reminders
mark
---------- Forwarded message ----------
From: Cary Bass <cbass(a)wikimedia.org>
Date: Thu, Apr 17, 2008 at 1:04 AM
Subject: [Wikimania-l] [[WM2008]] Scholarship submission deadline
To: Wikimedia Foundation Mailing List <foundation-l(a)lists.wikimedia.org>,
"Wikimania general list (open subscription)" <
wikimania-l(a)lists.wikimedia.org>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Just as a quick note, the Wikimania 2008 scholarship deadline will be
April 21, at 23:59 (UTC). If you are applying for a scholarship, please
have it submitted by then.
Scholarship information may be found at
<http://wikimania2008.wikimedia.org/wiki/Scholarships>.
Cary Bass
Volunteer Coordinator
Your continued donations keep Wikipedia running! Support the Wikimedia
Foundation today: http://donate.wikimedia.org
Wikimedia Foundation, Inc.
Phone: 415.839.6885
Fax: 415.882.0495
E-Mail: cary(a)wikimedia.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkgGlB0ACgkQyQg4JSymDYnlDwCggey0JZJf+HqVBKy3cNe2DVD2
YJoAoKTluOMpX68kJaRMsifvGd5fEr/9
=+ro4
-----END PGP SIGNATURE-----
_______________________________________________
Wikimania-l mailing list
Wikimania-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimania-l
Hello.
Yesterday, I was moving around mysqldump files of our processed databases from parsed Wikipedia dumps, and this simple question came to my mind.
Is there any special reason to use an "ad-hoc" XML schema for Wikipedia dumps?
Could a mysqldump on every language edition slow down the Wikipedia MySQL server?
I guess some problem could arise, and that's why we don't use it. Otherwise, perhaps we could consider creating such mysqldump, to speed up the import process back to our local servers, instead of having to parse a huge XML file.
That's specially true for the very large meta-history.xml versions. And you still can filter out sensible tables (user, etc.).
Regards,
Felipe.
---------------------------------
Enviado desde Correo Yahoo!
La bandeja de entrada más inteligente.
Hello,
I am maintaining a wiki where I want real name to be shown instead of a login nick.
It is understood that nicks are handy and people are used to them, but they also somewhat "anonymize" users. This is not what our wiki is for.
Is there a way to change (hopefully in one, or maybe a few place(s)) - how user name is shown throught the site? (recent changes, page history, titles of user pages?). Nicks would still be useable by whoever wants them.
Thanks!!!
Evgeny.
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
CC'ing to Wikitech-l. Some more review:
* The way NUMBEROFBOOKS is calculated (scanning the page namespace) is
not acceptable, performance-wise. It needs to work like
NUMBEROFPAGES, etc., with a site_stats row or similar.
* The allbooks-regex message might not be a great way of doing things:
** Since results will have to be stored in various ways, like the
count being stored in site_stats, it's probably not feasible to change
the regex used without running a maintenance script. A config option
might therefore be better.
** Directly inputting the regex into a query is VERY VERY BAD! It's
an immediate SQL injection attack. Also, it will cause the query to
break in the presence of things like single quotes. Use
Database::addQuotes() here.
* Generating AllBooks using a REGEXP query is . . . maybe not ideal,
altogether. Usually we have a flag in the table, for instance,
page_is_redirect. It might be okay given that you'd "just" be
scanning a few times as many rows, in the average case, and only for
views of a certain special page. But a flag would be nice.
* If you're using the Xml functions, you don't have to use
htmlspecialchars explicitly. It will double-escape the variable.
* Variable names that are in Italian are kind of funny, but not really
in accordance with our coding standards. :) $conta -> $count,
$numero -> $number
This is only a quick glance, mind you. I haven't actually tested it
and probably won't find the time to do so.
Overall, I'm not sure this is the best way to tackle the problem. Has
Wikibooks considered, for instance, having the books' "main pages" be
in the main namespace, and the various pages be in a namespace like
"Page"? Then the problem is partly solved right away: you can use
Special:AllPages set to the main namespace to get a list of books.
For NUMBEROFBOOKS, you just need PAGESINNAMESPACE to be enabled, which
is fairly feasible at some point if someone's willing to do a little
optimization work, since it's a generally-requested feature.
On Thu, Apr 17, 2008 at 5:33 PM, mike.lifeguard
<mike.lifeguard(a)gmail.com> wrote:
> If you remember back in November we were asked what would be on a technical
> wishlist for Wikibooks. One of the things we said[2] we wanted was a way to
> list all books & enumerate them (all *books* as opposed to all pages). Ramac
> and Pietrodn have been working on an extension[2] which does this.
>
>
>
> Darklama and I have taken an initial look at things, and Simetrical found
> one error that has been corrected. Anyone who is technically-minded is
> invited to review & test the code. Everyone else is invited to discuss
> improvements.
>
>
>
> The extension is documented at mediawiki.org, but here's a quick overview
> for the sake of convenience:
>
> *Adds the {{NUMBEROFBOOKS}} variable
>
> *Adds [[Special:AllBooks]], which lists all books in specified namespaces
>
> *List books beginning at some prefix using ?offset=whatever
>
> *The regex used to determine what is a book and what isn't is a system
> message, so administrators may edit it, though the default should work for
> most (all?) Wikibooks languages
>
>
>
> After this much time with apparently no interest in developing anything for
> Wikibooks it's exciting to finally see such active development on something
> needed specifically for this project. Many thanks to Ramac and Pietrodn for
> their work, and also those who have helped them along the way. Hopefully
> this is the beginning of a trend, and more tools will be developed to
> fulfill our wishlist!
>
>
>
> -Mike.lifeguard
>
>
>
>
>
> [1]
> http://lists.wikimedia.org/pipermail/textbook-l/2007-November/001164.html
>
> [2] http://www.mediawiki.org/wiki/Extension:AllBooks
>
>
>
> _______________________________________________
> Textbook-l mailing list
> Textbook-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/textbook-l
>
brion(a)svn.wikimedia.org schreef:
> Revert r33473, which seems to assume that pt_title is supposed to be case-insensitive.
> (Even if it was, it woudln't work correctly as written.)
>
Why not? It worked for me...
Anyway, glad to see I wasn't the one overlooking stuff here.
Roan Kattouw (Catrope)
brion(a)svn.wikimedia.org schreef:
> Revision: 33504
> Author: brion
> Date: 2008-04-17 18:52:20 +0000 (Thu, 17 Apr 2008)
>
> Log Message:
> -----------
> Revert r33478 for now; I don't much like the field name user_timestamp, it's very unclear. Timestamp may not be ideal to build a diff link with, either.
I filed bug 12701 [1] a long time ago, which (off the top of my head) is
about a similar issue.
Roan Kattouw (Catrope)
[1] https://bugzilla.wikimedia.org/show_bug.cgi?id=12701
See how the front page of https://secure.wikimedia.org/ links to all
the wikis it isn't?
Could the front page link to the versions accessible via secure,
rather than the insecure versions? Is there any reason this would be
infeasible?
- d.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Yesterday, Werdna and Tim committed some initial code for adding shared
login session state to CentralAuth. The promise of this is that not only
do you have the same login state on multiple wikis, but you only have to
go through the login form once -- your login will be active on the other
sites as well.
There are two parts to this:
1) Central session data is maintained alongside the local sessions.
A cookie with the session key (or long-term login token) is shared
across an entire domain (say, wikipedia.org), letting all wikis on that
domain initialize their local sessions when you navigate to them.
2) On login, the central session cookies are set at multiple mid-level
domains.
This is done by loading a special login URL at each domain as an inline
image; that then sets a cookie for its domain as it's loaded. This
allows us to set, for instance, a cookie for wiktionary.org when you
logged in from wikipedia.org.
I've been doing some code review, local testing, and tweaking. The
general theory is reasonably sound though I have some concerns and notes...
Security:
The sessions are set on other domains by passing an internal token value
on a URL -- an unencrypted HTTP GET request. It's bad enough we're still
passing all kinds of stuff around in unencrypted cookies, but those GET
URLs go into all sorts of logs, which seems pretty creepy to me.
I'd be more comfortable with one-time-use tokens, which won't be of any
use to anyone once they've seen them. Resetting them on logout only
helps insofar as anyone actually logs out... I know I never do. :)
Compatibility:
Third-party cookies can be disabled by various browser options and
privacy proxies. The 1x1 invisible PNG may itself be blocked by privacy
or ad proxies. It may or may not be more compatible to use little
iframes or something.... or that might just suck. :)
Anyway, should be considered.
Logging out:
Currently, logout only clears your global session cookies; it doesn't
clear local session state. You log in once, but you may have to log out
many times.
Incomplete migrations:
I haven't thoroughly tested, but my impression is that the global
session state will only get set up properly if the remote wiki that
happens to get hit for that domain has the global account.
If there's a non-matching local account there, it looks like it won't
set the session for the whole domain.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkf+p14ACgkQwRnhpk1wk46qPACg3M0+dGCDKA2lmKHxFZ7ukDQH
EKsAoKFS5PD/bA4w0XhqrGovh3pXSK2e
=I2q/
-----END PGP SIGNATURE-----