OK, thank you guys. Now the reasons are clear :-). In any case, this forced the parser improvement, so it's welcome anyway ;).
Best,
F.
--- El lun, 15/6/09, Platonides <Platonides(a)gmail.com> escribió:
> De: Platonides <Platonides(a)gmail.com>
> Asunto: Re: [Wikitech-l] Fixing problem with complete dumps in WikiXRay
> Para: wikitech-l(a)lists.wikimedia.org
> Fecha: lunes, 15 junio, 2009 10:44
> Felipe Ortega wrote:
> > Hello, all.
> >
> > For (yet) unknown reasons, last complete dump files
> (pages-meta-history.xml) in some languages are flawed.
> Certain revision items are missing info about rev_user. Even
> though there are only 3 or 4 of that kind, this is enough to
> mess up either the parsing process or the later SQL load
> into the DB.
> >
> > So far, the last 3 dumps of DE Wikipedia and 20090603
> from FR Wikipedia have presented this error.
> >
> > I have updated both WikiXRay parsers:
> > http://meta.wikimedia.org/wiki/WikiXRay_parser
> > http://meta.wikimedia.org/wiki/WikiXRay_parser_research
> >
> > They now probe whether the parsed revision item is
> complete or not, before creating the SQL. If it's flawed,
> its omitted and logged into an error file for later
> inspection.
> >
> > Regards,
> >
> > Felipe.
>
> They're an effect of revdelete.
> You can see how they have a parameter deleted.
> An example is available in the bug for pywikipediabot:
> http://sourceforge.net/tracker/index.php?func=detail&aid=2790339&group_id=9…
>
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
Hello, all.
For (yet) unknown reasons, last complete dump files (pages-meta-history.xml) in some languages are flawed. Certain revision items are missing info about rev_user. Even though there are only 3 or 4 of that kind, this is enough to mess up either the parsing process or the later SQL load into the DB.
So far, the last 3 dumps of DE Wikipedia and 20090603 from FR Wikipedia have presented this error.
I have updated both WikiXRay parsers:
http://meta.wikimedia.org/wiki/WikiXRay_parserhttp://meta.wikimedia.org/wiki/WikiXRay_parser_research
They now probe whether the parsed revision item is complete or not, before creating the SQL. If it's flawed, its omitted and logged into an error file for later inspection.
Regards,
Felipe.
Ehlo,
I see that this topic has been popped up from time to time since 2004,
and that most of the misc servers have been already IPv6 enabled. I
have checked around whether google have any info on that, and found a
few (really few) mail on that, from the original 2004 test to a
comment from 2008 that squid and mediawiki is the problem, apart from
some smaller issues. (As a sidenote, google don't seem to find
anything on site:lists.wikimedia.org about "ipv6", interesting.)
Now, squid fully supports IPv6 as of now (since 3.1), so I guess
that's check. (I didn't try it, though, but others seem to have.)
MediaWiki, well, http://www.mediawiki.org/wiki/IPv6_support didn't
mention any outstanding problem and the linked bug is closed, so as
far as I'm observing (without actually testing it) it looks okay.
The database structure may require some tuning as far as I see. Right?
Apache handle it since eternity, php does I guess.
Are there any further, non v6 compatible components in running a
wikipedia? If not, is there any outstanding proble which would make it
impossible to fire up a test interface on ipv6?
I'd say to use a separate host, like en.ipv6.wikipedia.org, and not to
worry about the cache efficiency because I doubt that the ipv6 level
traffic would really measure up to the ipv4 one. At least it could be
properly measured, and decision should base on facts how to go on.
Maybe there's a test host already on, but I wasn't able to find it, so
I guess nobody else can. ;-)
Is there any further problem in this topic require solutions, or it
just didn't occur to anyone lately?
--
byte-byte,
grin
Hi,
a couple of days ago the IT manager of an italian company phoned me,
because they cannot open it.wiki website.
They made a lot of checks by themselves and with their provider
(COLT). In the end it seems there's a block on 62.152.101.82
They made a check with the London base and London replies:
"We believe that Wikipedia are blocking from their site the reasoning
behind this is our firewalls are not blocking you and you COLT have
confirm that they can reach there from their equipment and the trace
route from you server show’s that it leaves your site passes through
the firewall goes out onto the COLT network and beyond. If you compare
the trace route that I have done on my computer to the trace route
from your server you will notice that your last hop is the hop before
the web server and that is where your blocked."
Any idea or help?
Thanks,
Frieda
___________________________________________
http://it.wikipedia.org/wiki/Utente:Frieda
dear brothers,
I'm new here
i want a small help.
in my wiki website poonkavanam.com
i can't make new page from exiting red link
when i click the red link it will go to a non wiki page and display page not
found
recently i transferred my wiki one domain to another
old domain ponkavanam.com
old one still work in good condition
Hi,
I'm trying to get a hold of the wikipedia dump , in particular
enwiki-latest-pages-meta-history.xml.bz2
It seems that on the page where it's supposed to be
(http://download.wikipedia.org/enwiki/latest/) it's weighing at 0.6KB
whereas I was used for it to be 147GB
What happened to the data and where did it went ?
Also , on the wikipedia (
http://en.wikipedia.org/wiki/Wikipedia_database ) page I read
"As of January 17 </wiki/January_17>, 2009 </wiki/2009>, it seems that
all snapshots of pages-meta-history.xml.7z hosted
at http://download.wikipedia.org/enwiki/ are missing. The developers at
Wikimedia Foundation are working to address this issue
(http://lists.wikimedia.org/pipermail/wikitech-l/2009-January/040841.html).
There are other ways to obtain this file"
I checked the other ways of obtaining the file that they describe , none
worked.
Why did the dumps vanished and how can I download a copy of them ?
Thank you
The 1.15 release of MediaWiki introduced some hardcoded bitwise
operators to the core SQL. They were added to operate on the
log_deleted column in the logging table by, I think, aaron. This is
because the log_deleted column now has multiple states.
Unfortunately, bitwise operators have different syntax in different databases.
MySQL, PostgreSQL:
log_deleted & 1
DB2, Oracle:
BITAND(log_deleted, 1)
I think there are three options to make it compatible:
1. Refactor the database to not use an integer as a bit field. Just
use four different boolean columns, which works well cross-database.
2. Add a function to the Database API for each bit operator.
$sql = $database->bitand('log_deleted', 1);
3. Add a function to the Database API to handle all the operators.
$sql = $database->op('&', 'log_deleted', 1);
or
$sql = $database->op(Database::BITAND, 'log_deleted', 1);
My preference is for option 1 or 3. Thoughts?
Regards,
Leons Petrazickis
http://lpetr.org/blog/
All,
after some internal discussion with the licensing update committee,
I'm proposing the following final site terms to be implemented on all
Wikimedia projects that currently use GFDL as their primary content
license, as well as the relevant multimedia templates:
http://meta.wikimedia.org/wiki/Licensing_update/Implementation
Please note that these aren't quite yet ready for translation yet
(hence labeled draft). Please provide feedback here or on the talk
page, ideally by Thursday night UTC so we can move the process forward
on Friday.
In terms of implementing these changes, I suggest the following:
1) That the relevant site configuration variables are updated on June 15;
2) That, additionally, a central "Terms of use" page is created on
wikimediafoundation.org to house the "terms of use" above, which can
be replaced with a localized version whenever one is created;
3) That the relevant MediaWiki-messages are force-updated on all
projects to the English version above, or any translations already
created by June 15;
4) That the revised MediaWiki-messages are also translated through
translatewiki.net and hence additional translations will be rolled out
through normal i18n upgrades.
Regarding 3) and 4), this may best be achieved by creating new
MediaWiki messages. I would appreciate the advice of our translation
and tech team on this, and of course on the entire proposed process.
(I realize that there's not nearly enough time for any number of
translations, but we have a fixed deadline of beginning the roll-out
of this change by June 15.)
For multimedia, the licensing committee and the Wikimedia Commons
community are still discussing the best update strategy, but it will
probably involve a bot updating the existing templates. We're also
hoping to run a CentralNotice to explain the process to the
communities so that people can help to fix up pages and policies.
Thanks for any help in moving this forward,
Erik
--
Erik Möller
Deputy Director, Wikimedia Foundation
Support Free Knowledge: http://wikimediafoundation.org/wiki/Donate