<html><body><div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:12pt">Hello all.<br><br>@Tim: By "feature" I mean having values for column user.user_registration filled for DB replicas accessible from Tool-Labs, if possible. As Oliver has suggested, I don't see any reason for this info not being available, as it is already public from Special:ListUsers.<br><br>@Aaron: Thanks a lot. I belive that is a fairly decent approximation. In fact, I suspect that daily or weekly aggregates would be enough for time-series characterization. My actual goal is comparing trends between different languages, and eventually correlation with other known activity metrics.<br><br>Best regards,<br>Felipe.<br><div><span><br></span></div><div style="display: block;" class="yahoo_quoted"> <br> <br> <div style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande,
sans-serif; font-size: 12pt;"> <div style="font-family: HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif; font-size: 12pt;"> <div dir="ltr"> <font face="Arial" size="2"> El Viernes 14 de febrero de 2014 16:00, Aaron Halfaker <aaron.halfaker@gmail.com> escribió:<br> </font> </div> <blockquote style="border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; margin-top: 5px; padding-left: 5px;"> <div class="y_msg_container"><div id="yiv9596778462"><div><div dir="ltr">I have a dataset containing estimated registration dates for editors who registered before Dec. 2005. My method assumes that user_id is monotonically increasing and sets the lowest upper-bound available. <div>
<br clear="none"></div><div>For example. Let's assume the following rows:</div><div><br clear="none"></div><div><font face="courier new, monospace"> user_id first_edit</font></div><div><font face="courier new, monospace"> 12345 20040102030405</font><span style="font-family:'courier new', monospace;"> </span></div>
<div><font face="courier new, monospace"> 12344 NULL</font></div><div><font face="courier new, monospace"> 12343 20040102050102</font></div><div><font face="courier new, monospace"><br clear="none"></font></div><div><font face="arial, helvetica, sans-serif">Since an editor couldn't have saved a revision before registering their account, we can assume that user 12345 registered there account on or before </font><span style="font-family:'courier new', monospace;">20040102030405</span><font face="arial, helvetica, sans-serif">. If user_id is monotonically increasing, we also know that user 12344 must have registered on or before </font><font face="courier new, monospace">20040102030405</font><font face="arial, helvetica, sans-serif">, which lets us fill in a NULL. Similarly, we have a first_edit timestamp for user 12343, but that edit happened pretty late. We can
actually just continue to propagate the </font><span style="font-family:'courier new', monospace;">20040102030405</span><font face="arial, helvetica, sans-serif"> timestamp to this user too</font><span style="font-family:'courier new', monospace;">.</span></div>
<div><span style="font-family:'courier new', monospace;"><br clear="none"></span></div><div><font face="arial, helvetica, sans-serif">After performing this approximation, we'd have the following rows:</font></div><div><font face="arial, helvetica, sans-serif"><br clear="none">
</font></div><div><div><font face="courier new, monospace"> user_id first_edit user_registration_approx</font></div><div><font face="courier new, monospace"> 12345 20040102030405</font><span style="font-family:'courier new', monospace;"> </span><span style="font-family:'courier new', monospace;">20040102030405</span></div>
<div><font face="courier new, monospace"> 12344 NULL </font><span style="font-family:'courier new', monospace;">20040102030405</span></div><div><font face="courier new, monospace"> 12343 20040102050102 </font><span style="font-family:'courier new', monospace;">20040102030405</span></div>
</div><div><span style="font-family:'courier new', monospace;"><br clear="none"></span></div><div><font face="arial, helvetica, sans-serif">In effect, this is similar to the approximation discussed in </font><a rel="nofollow" shape="rect" target="_blank" href="https://bugzilla.wikimedia.org/show_bug.cgi?id=18638" style="font-size:13px;font-family:arial, sans-serif;">https://bugzilla.wikimedia.<u></u>org/show_bug.cgi?id=18638</a>, but I'm not trying to interpolate probable registration timings on users. In practice we're talking about a difference of seconds, so I haven't bothered with the extra work. </div>
<div><br clear="none"></div><div>I'm generating a datafile for English now that I should be able to share the the end of the day:</div><div><ul><li>user_id</li><li>registration_type (see <a rel="nofollow" shape="rect" target="_blank" href="https://meta.wikimedia.org/wiki/Research:Attached_user">https://meta.wikimedia.org/wiki/Research:Attached_user</a> and <a rel="nofollow" shape="rect" target="_blank" href="https://meta.wikimedia.org/wiki/Research:Newly_registered_user">https://meta.wikimedia.org/wiki/Research:Newly_registered_user</a>)</li><li>user_registration (from user table)</li><li>first_edit (lowest timestamp from "revision" and "archive" for user_id)</li><li>registration_approx (my approximation based on the method described above)</li></ul><div>-Aaron</div></div></div><div class="yiv9596778462gmail_extra"><br clear="none"><br clear="none"><div class="yiv9596778462yqt3666556094" id="yiv9596778462yqtfd28966"><div
class="yiv9596778462gmail_quote">On Fri, Feb 14, 2014 at 6:06 AM, Federico Leva (Nemo) <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:nemowiki@gmail.com" target="_blank" href="mailto:nemowiki@gmail.com">nemowiki@gmail.com</a>></span> wrote:<br clear="none">
<blockquote class="yiv9596778462gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Felipe Ortega, 14/02/2014 12:05:<div class="yiv9596778462"><br clear="none">
<blockquote class="yiv9596778462gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Thanks a lot. Then, I look forward to the confirmation and<br clear="none">
implementation of this feature. In case it's better to open a new issue<br clear="none">
on bugzilla or any other action on my side (lend a hand with value<br clear="none">
reviewing/testing) just let me know.<br clear="none">
</blockquote>
<br clear="none"></div>
You could help assess the correctness of and/or code the guesstimate method proposed in <a rel="nofollow" shape="rect" target="_blank" href="https://bugzilla.wikimedia.org/show_bug.cgi?id=18638">https://bugzilla.wikimedia.<u></u>org/show_bug.cgi?id=18638</a> , for the script to fill further blanks.<div class="yiv9596778462HOEnZb">
<div class="yiv9596778462h5"><br clear="none">
<br clear="none">
Nemo<br clear="none">
<br clear="none">
______________________________<u></u>_________________<br clear="none">
Labs-l mailing list<br clear="none">
<a rel="nofollow" shape="rect" ymailto="mailto:Labs-l@lists.wikimedia.org" target="_blank" href="mailto:Labs-l@lists.wikimedia.org">Labs-l@lists.wikimedia.org</a><br clear="none">
<a rel="nofollow" shape="rect" target="_blank" href="https://lists.wikimedia.org/mailman/listinfo/labs-l">https://lists.wikimedia.org/<u></u>mailman/listinfo/labs-l</a><br clear="none">
</div></div></blockquote></div><br clear="none"></div></div></div></div><br><br></div> </blockquote> </div> </div> </div> </div></body></html>