Yup, they do show up in the stubs. I checked the four Swedish dumps. I
left a comment there at
Let me know if there's anything else. Thanks!
On Mon, Jul 6, 2015 at 1:22 PM, Ariel T. Glenn <aglenn(a)wikimedia.org> wrote:
Στις 04-07-2015, ημέρα Σαβ, και ώρα 23:26 -0400, ο/η
gnosygnu έγραψε:
Hi. I've noticed that some June XML data
dumps have duplicate <page>
records, usually at the end of the dump.
Anyone know if this is intentional? One or two duplicate records is
benign, but I'm slightly concerned that it may be a symptom of a
larger problem. I've been working with the XML data dumps for over 3
years, and haven't seen this before.[1]
This was reported by another user
also. See phab task T103670 for the
report. Did you notice if the stub dumps contain those same duplicate
entries?
In any case this is an error, and I need to make sure we are fixed for
the next month's run.
Ariel
> I list some examples below. They're only from the Swedish wikis and
> Spanish Wikipedia (which is what I started looking at this week) Let
> me know if you need any other info, and I'll be happy to provide.
>
> Finally, for questions like these, is it best to email the mailing
> list, create a task in Phabricator or do both?
>
> Thanks.
>
> [1]: It may have started as recently as 2015 April. I stopped looking
> at dumps shortly before the May problems with the dump server.
>
> ----
>
> Example 1:
> URL:
http://dumps.wikimedia.org/svwikiversity/20150602/svwikiversity
> -20150602-pages-articles.xml.bz2
> Title: Audi m8
> ID: 18942
> SHA1: gd16v3qkmjr2w2j35zhqitjfg97igjt)
> Note: Last article in dump. Repeated twice
>
> Example 2:
> URL:
http://dumps.wikimedia.org/svwikiquote/20150602/svwikiquote
> -20150602-pages-articles.xml.bz2
> Title: Sommarens tolv månader
> ID: 6209
> SHA1: 9yibnev7pn3atxicayjoay0ave7pcu6
> Note: Last article in dump. Repeated twice
>
> Example 3:
> URL:
http://dumps.wikimedia.org/svwikibooks/20150602/svwikibooks
> -20150602-pages-articles.xml.bz2
> Title: Topologi/Metriska rum
> ID: 10001
> SHA1: 5zdkpxflzdxhy7gxclludnlasvl6tw3
> Note: Last article in dump. Repeated twice
>
> Example 4:
> URL:
http://dumps.wikimedia.org/svwikisource/20150602/svwikisource
> -20150602-pages-articles.xml.bz2
> Title: Afhandling om svenska stafsättet/4
> ID: 88768
> SHA1: 7zyj208ur4vit0t41z7xlftlyl69bo7
> Note: Last article in dump. Repeated twice
>
> Example 5:
> URL:
http://dumps.wikimedia.org/eswiki/20150602/eswiki-20150602-pages
> -articles.xml.bz2
> Title (1): Veguer
> Title (2): Promo
> Note: duplicates are earlier in the dump (Veguer at the 9% mark and
> Promo at the 23% mark). There doesn't seem to be a dupe at the end of
> the article.
>
> Unaffected:
> *
http://dumps.wikimedia.org/svwiki/20150602/svwiki-20150602-pages
> -articles.xml.bz2
> *
http://dumps.wikimedia.org/svwiktionary/20150603/svwiktionary
> -20150603-pages-articles.xml.bz2
> *
http://dumps.wikimedia.org/svwikinews/20150602/svwikinews-20150602
> -pages-articles.xml.bz2
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l(a)lists.wikimedia.org
>
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l