Hi Robert,
May be you should use regular expressions that detect a long series of
numbers without spaces between them.
Regards
Imene
On Monday, February 11, 2013, wrote:
Send Xmldatadumps-l mailing list submissions to
xmldatadumps-l(a)lists.wikimedia.org <javascript:;>
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
or, via email, send a message with subject or body 'help' to
xmldatadumps-l-request(a)lists.wikimedia.org <javascript:;>
You can reach the person managing the list at
xmldatadumps-l-owner(a)lists.wikimedia.org <javascript:;>
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Xmldatadumps-l digest..."
Today's Topics:
1. Weird page titles in page table (Robert Crowe)
2. Re: Weird page titles in page table (Ariel T. Glenn)
----------------------------------------------------------------------
Message: 1
Date: Sun, 10 Feb 2013 14:08:56 -0800
From: "Robert Crowe" <robert(a)ourwebhome.com <javascript:;>>
To: <xmldatadumps-l(a)lists.wikimedia.org <javascript:;>>
Subject: [Xmldatadumps-l] Weird page titles in page table
Message-ID: <010301ce07db$39477660$abd66320$@com>
Content-Type: text/plain; charset="us-ascii"
I'm seeing rows in the page table that have weird titles, and I'd like to
be
able to identify and filter them out, but I don't see properties that seem
to identify them. For example:
page.page_id = 21441554
page.page_title = 4567797074e280934d6f726f63636f5f72656c6174696f6e73
What should I look for to identify pages like that?
Thanks,
Robert
------------------------------
Message: 2
Date: Mon, 11 Feb 2013 07:51:03 +0200
From: "Ariel T. Glenn" <ariel(a)wikimedia.org <javascript:;>>
To: Robert Crowe <robert(a)ourwebhome.com <javascript:;>>
Cc: xmldatadumps-l(a)lists.wikimedia.org <javascript:;>
Subject: Re: [Xmldatadumps-l] Weird page titles in page table
Message-ID: <1360561863.18140.5.camel(a)trouble.localdomain>
Content-Type: text/plain; charset="UTF-8"
Στις 10-02-2013, ημέρα Κυρ, και ώρα 14:08 -0800, ο/η Robert Crowe
έγραψε:
I'm seeing rows in the page table that have
weird titles, and I'd like
to be
able to identify and filter them out, but I
don't see properties that
seem
to identify them. For example:
page.page_id = 21441554
page.page_title = 4567797074e280934d6f726f63636f5f72656c6174696f6e73
What should I look for to identify pages like that?
Which dump is this from?
Ariel
------------------------------
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org <javascript:;>
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
End of Xmldatadumps-l Digest, Vol 36, Issue 1
*********************************************