[Fix]:
A link to the source code has been added.
@Dan Andreescu: The format is correct. The annual summary is a typical
basic statistical interval, and we save time by merging. The file size
problem disappears if the file is split by local wikis. And the skwiki
is only 49MB for the year 2021, which does not require a more
demanding level of the end user who processes them for their purpose.
2022-11-08 21:30 GMT+01:00, Dušan Kreheľ <dusankrehel(a)gmail.com>om>:
> A link to the source code has been added.
>
> @Dan Andreescu: The format is correct now. The annual summary is a
> typical basic statistical interval, and we save time by merging. The
> file size problem disappears if the file is split by wík. And the
> skwiki has only 49MB for the year 2021, which does not require the
> level of the end user who processes them for their purpose.
>
> 2022-10-06 19:31 GMT+02:00, Dan Andreescu <dandreescu(a)wikimedia.org>rg>:
>> @Dušan Kreheľ: I think there's a misunderstanding. I read your
>> re-written
>> article. In it, you say that the current format is:
>>
>> domain_code page_title count_views total_response_size
>>
>> For an example, you give this:
>>
>> sk Kreheľ 2 0
>>
>> But, actually, that format is deprecated and the new format is pageviews
>> complete, which looks like this:
>>
>> sk.wikipedia Kreheľ null desktop 13 B2D2G2J2O2T1V1X1
>>
>> The B2D2G2J2O2T1V1X1 is exactly the kind of encoding you're talking
>> about,
>> and no 0-values are present.
>>
>> You made the point that we are missing a yearly rollup in this new
>> format.
>> This would be quite a large file, but if there's a good use case for such
>> a
>> dump, a request in phabricator is a good way to proceed.
>>
>> On Sat, Oct 1, 2022 at 9:58 AM Dušan Kreheľ <dusankrehel(a)gmail.com>
>> wrote:
>>
>>> The big update of the article is done. Please, You look.
>>>
>>> Gergő Tisza: The current fresh hour format can remain. Later it can be
>>> converted to another format. And thus be more suitable for others.
>>>
>>> 2022-09-18 22:35 GMT+02:00, Dušan Kreheľ <dusankrehel(a)gmail.com>om>:
>>> > I have updated the document. I added the export of human pageviews for
>>> > year 2021. The statistics are in the article. A download link has been
>>> > added.
>>> >
>>> > Dan Andreescu: None problem was to understand You.
>>> >
>>> > 2022-09-05 21:48 GMT+02:00, Dan Andreescu
<dandreescu(a)wikimedia.org>rg>:
>>> >> Hi Dušan,
>>> >>
>>> >> I added the details on pageviews_complete to the talk page on your
>>> >> proposal
>>> >> <
>>>
https://en.wikipedia.org/w/index.php?title=User_talk:Du%C5%A1an_Krehe%C4%BE…
>>> >.
>>> >> Please let me know if it's still confusing.
>>> >>
>>> >
>>> _______________________________________________
>>> Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
>>> To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>>
>