On Tue, Jul 28, 2020 at 1:01 PM <xmldatadumps-l-request@lists.wikimedia.org> wrote:

Send Xmldatadumps-l mailing list submissions to
xmldatadumps-l@lists.wikimedia.org

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
or, via email, send a message with subject or body 'help' to
xmldatadumps-l-request@lists.wikimedia.org

You can reach the person managing the list at
xmldatadumps-l-owner@lists.wikimedia.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Xmldatadumps-l digest..."

Today's Topics:

1. Has anyone had success with data deduplication? (griffin tucker)
2. Re: Has anyone had success with data deduplication? (Count Count)

----------------------------------------------------------------------

Message: 1
Date: Tue, 28 Jul 2020 01:50:09 +0000
From: griffin tucker <gtucker4.une@hotmail.com>
To: "xmldatadumps-l@lists.wikimedia.org"
<xmldatadumps-l@lists.wikimedia.org>
Subject: [Xmldatadumps-l] Has anyone had success with data
deduplication?
Message-ID:
<TY2PR03MB3997DB2177073F2871ABE5E2D2730@TY2PR03MB3997.apcprd03.prod.outlook.com>

Content-Type: text/plain; charset="utf-8"

I've tried using freenas/truenas with a data deduplication volume to store multiple sequential dumps, however it doesn't seem to save much space at all - I was hoping someone could point me in the right direction so that I can download multiple dumps and not have it take up so much room (uncompressed).

Has anyone tried anything similar and had success with data deduplication?

Is there a guide?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/xmldatadumps-l/attachments/20200728/7e4c52a7/attachment-0001.html>

------------------------------

Message: 2
Date: Tue, 28 Jul 2020 07:47:50 +0200
From: Count Count <countvoncount123456@gmail.com>
To: griffin tucker <gtucker4.une@hotmail.com>
Cc: "xmldatadumps-l@lists.wikimedia.org"
<xmldatadumps-l@lists.wikimedia.org>
Subject: Re: [Xmldatadumps-l] Has anyone had success with data
deduplication?
Message-ID:
<CAOHwkzAC1JjHHz3W8gwpqvzDRddhD5i73sowsSv4x7oGyGVp9Q@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi!

The underlying filesystem (ZFS) uses block-level deduplication, so unique
chunks of 128KiB (default value) are only stored once. The 128KB chunks
making up dumps are mostly unique since there is no alignment so
deduplication will not help as far as I can see.

Best regards,

Count Count

On Tue, Jul 28, 2020 at 3:51 AM griffin tucker <gtucker4.une@hotmail.com>
wrote:

> I’ve tried using freenas/truenas with a data deduplication volume to store
> multiple sequential dumps, however it doesn’t seem to save much space at
> all – I was hoping someone could point me in the right direction so that I
> can download multiple dumps and not have it take up so much room
> (uncompressed).
>
>
>
> Has anyone tried anything similar and had success with data deduplication?
>
>
>
> Is there a guide?
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/xmldatadumps-l/attachments/20200728/c9f42c04/attachment-0001.html>

------------------------------

Subject: Digest Footer

_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

------------------------------

End of Xmldatadumps-l Digest, Vol 119, Issue 7
**********************************************