Hi Robert,
May be you should use regular expressions that detect a long series of numbers without spaces between them.
Regards
Imene
On Monday, February 11, 2013, wrote:
Send Xmldatadumps-l mailing list submissions to xmldatadumps-l@lists.wikimedia.org javascript:;
To subscribe or unsubscribe via the World Wide Web, visit https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l or, via email, send a message with subject or body 'help' to xmldatadumps-l-request@lists.wikimedia.org javascript:;
You can reach the person managing the list at xmldatadumps-l-owner@lists.wikimedia.org javascript:;
When replying, please edit your Subject line so it is more specific than "Re: Contents of Xmldatadumps-l digest..."
Today's Topics:
- Weird page titles in page table (Robert Crowe)
- Re: Weird page titles in page table (Ariel T. Glenn)
Message: 1 Date: Sun, 10 Feb 2013 14:08:56 -0800 From: "Robert Crowe" <robert@ourwebhome.com javascript:;> To: <xmldatadumps-l@lists.wikimedia.org javascript:;> Subject: [Xmldatadumps-l] Weird page titles in page table Message-ID: 010301ce07db$39477660$abd66320$@com Content-Type: text/plain; charset="us-ascii"
I'm seeing rows in the page table that have weird titles, and I'd like to be able to identify and filter them out, but I don't see properties that seem to identify them. For example:
page.page_id = 21441554 page.page_title = 4567797074e280934d6f726f63636f5f72656c6174696f6e73
What should I look for to identify pages like that?
Thanks,
Robert
Message: 2 Date: Mon, 11 Feb 2013 07:51:03 +0200 From: "Ariel T. Glenn" <ariel@wikimedia.org javascript:;> To: Robert Crowe <robert@ourwebhome.com javascript:;> Cc: xmldatadumps-l@lists.wikimedia.org javascript:; Subject: Re: [Xmldatadumps-l] Weird page titles in page table Message-ID: 1360561863.18140.5.camel@trouble.localdomain Content-Type: text/plain; charset="UTF-8"
Στις 10-02-2013, ημέρα Κυρ, και ώρα 14:08 -0800, ο/η Robert Crowe έγραψε:
I'm seeing rows in the page table that have weird titles, and I'd like
to be
able to identify and filter them out, but I don't see properties that
seem
to identify them. For example:
page.page_id = 21441554 page.page_title = 4567797074e280934d6f726f63636f5f72656c6174696f6e73
What should I look for to identify pages like that?
Which dump is this from?
Ariel
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org javascript:; https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
End of Xmldatadumps-l Digest, Vol 36, Issue 1
xmldatadumps-l@lists.wikimedia.org