<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">I doubt it was that then, if you only scanned 22. This user according to IT was attempting to fetch all 140,000 pieces of data about minor planets and was making 160 requests to that site a minute, which was severely bogging their servers when combined with the load they already have. I think the ban was put into effect on Nov. 2.<div class=""><br class=""></div><div class="">Maybe it would be wise to have labs simply throttle consecutive outgoing connections from tool labs, if possible. That is connections being made from scripts to external sites, while maintaining statue quo with the webservices. This has to have some kind of impact on IO Network bandwidth usage for both host and client servers.</div><div class=""><br class=""><div class="">
<div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Cyberpower678<br class="">English Wikipedia Account Creation Team<br class="">ACC Mailing List Moderator</div><div style="color: rgb(0, 0, 0); letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Global User Renamer</div></div></div>
</div>
<br class=""><div><blockquote type="cite" class=""><div class="">On Dec 4, 2016, at 12:29, Martin Urbanec <<a href="mailto:martin.urbanec@wikimedia.cz" class="">martin.urbanec@wikimedia.cz</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">Hi all, <div class="">I was running weblinkchecker.py for whole cswiki (job was submited to the grid at Sun, 20 Nov 2016 16:54:24 GMT) because I wished to have a list of deadlinks. This may correspond with the UA (because I used script named weblinkschecker.py). I trusted this script it won't do anything wrong because this script was and still is in standard core package. I also use 3.0-dev version of pywikibot and Python 2.7.6. </div><div class=""><br class=""></div><div class="">But this job was completed already so if those GET requests didn't stop I'm not the cause. Or I lost access to the job, qstat at all my tools (urbanecmbot, missingpages) and my personal account (urbanecm) is empty/show only webserver. </div><div class=""><br class=""></div><div class="">If I was the cause, I'm very sorry for it. As I said I didn't know the script does not throttle GET requests enoguh. </div><div class=""><br class=""></div><div class="">Also <a href="http://minorplanetcenter.net/" class="">minorplanetcenter.net</a> is inserted only in 22 articles (as <a href="https://cs.wikipedia.org/w/index.php?search=insource%3Aminorplanetcenter.net&title=Speci%C3%A1ln%C3%AD:Hled%C3%A1n%C3%AD&go=J%C3%ADt+na&searchToken=507gqzzqk3eyplk5s6gsii2bv" class="">https://cs.wikipedia.org/w/index.php?search=insource%3Aminorplanetcenter.net&title=Speci%C3%A1ln%C3%AD:Hled%C3%A1n%C3%AD&go=J%C3%ADt+na&searchToken=507gqzzqk3eyplk5s6gsii2bv</a> says) so it shouldn't be so massive as there is said. </div><div class=""><br class=""></div><div class="">My .bash_history says the following. I guess 1479660864 is Unix timestamp, human time is Sun, 20 Nov 2016 16:54:24 GMT. </div><div class=""><br class=""></div><div class=""><div class="">#1479660864</div><div class="">jsub -l release=trusty python ~/pwb/scripts/weblinkchecker.py -start:!</div></div><div class=""><br class=""></div><div class="">My user-config.py is at <a href="http://pastebin.com/cUAwQuWt" class="">http://pastebin.com/cUAwQuWt</a>, without OAUTH. Complete user-config is at /home/urbanecm/.pywikibot/user-config.py and only roots can see it. </div><div class=""><br class=""></div><div class="">Again, if I was the cause, I'm sorry for it. I only used standard scripts and I trusted them that they works correctly. </div><div class=""><br class=""></div><div class="">Martin Urbanec alias Urbanecm</div><div class=""><a href="https://cs.wikipedia.org/wiki/Wikipedista:Martin_Urbanec" class="">https://cs.wikipedia.org/wiki/Wikipedista:Martin_Urbanec</a></div><div class=""><a href="https://meta.wikimedia.org/wiki/User:Martin_Urbanec" class="">https://meta.wikimedia.org/wiki/User:Martin_Urbanec</a></div><div class=""><a href="https://wikitech.wikimedia.org/wiki/User:Urbanecm" class="">https://wikitech.wikimedia.org/wiki/User:Urbanecm</a></div><div class=""><br class=""><div class="gmail_quote"><div dir="ltr" class="">ne 4. 12. 2016 v 18:03 odesílatel Maximilian Doerr <<a href="mailto:maximilian.doerr@gmail.com" class="">maximilian.doerr@gmail.com</a>> napsal:<br class=""></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class="gmail_msg"><a href="https://phabricator.wikimedia.org/F4978348" class="gmail_msg" target="_blank">https://phabricator.wikimedia.org/F4978348</a> Done.<div class="gmail_msg"><br class="gmail_msg"><div class="gmail_msg">
<div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg"></div></div></div></div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg">Cyberpower678<br class="gmail_msg">English Wikipedia Account Creation Team<br class="gmail_msg"></div></div></div></div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg"><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg">ACC Mailing List Moderator</div><div style="letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; word-wrap: break-word;" class="gmail_msg">Global User Renamer</div></div></div>
</div>
<br class="gmail_msg"><div class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg">On Dec 4, 2016, at 11:49, Merlijn van Deen (valhallasw) <<a href="mailto:valhallasw@arctus.nl" class="gmail_msg" target="_blank">valhallasw@arctus.nl</a>> wrote:</div><br class="gmail_msg m_8668639065164812549Apple-interchange-newline"><div class="gmail_msg"><div dir="ltr" class="gmail_msg">Hi Maximilian,<div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg"><a href="https://phabricator.wikimedia.org/file/upload/" class="gmail_msg" target="_blank">https://phabricator.wikimedia.org/file/upload/</a> allows you to specify 'Visible to'. You can select 'Custom policy' and select the relevant users, i.e.<br class="gmail_msg"></div><div class="gmail_msg"><span id="m_8668639065164812549cid:ii_158cabce8cde097a" class="gmail_msg"><image.png></span><br class="gmail_msg"></div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">In the meanwhile, I'll try to figure out if I can get some information from netstat.</div><div class="gmail_msg"><br class="gmail_msg"></div><div class="gmail_msg">Cheers,</div><div class="gmail_msg">Merlijn</div></div><div class="gmail_extra gmail_msg"><br class="gmail_msg"><div class="gmail_quote gmail_msg">On 4 December 2016 at 17:36, Maximilian Doerr <span dir="ltr" class="gmail_msg"><<a href="mailto:maximilian.doerr@gmail.com" class="gmail_msg" target="_blank">maximilian.doerr@gmail.com</a>></span> wrote:<br class="gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_8668639065164812549m_-2916909184822639306WordSection1 gmail_msg"><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg">Sure, how would I be able to restrict it’s visibility? Harvard is kind enough to unblock, if the culprit is stopped.<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></span></p><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg">As for exact URLs, it’s the entire domains owned by Harvard. But the access log can provide specifics. The Python script is attempting to get all 140,000 pieces of data about minor planets from <a href="http://www.minorplanetcenter.net/" class="gmail_msg" target="_blank">www.minorplanetcenter.net</a> according to IT, who also claims that such an action the way being done now would severely tie up their servers for quite a while, which they cannot afford.<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p></div></div></blockquote></div></div></div></blockquote></div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_8668639065164812549m_-2916909184822639306WordSection1 gmail_msg"><span class="gmail_msg"><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></span></p><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg">Cyberpower678<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg">English Wikipedia Account Creation Team<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg">Mailing List Moderator<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg">Global User Renamer<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p><p class="MsoNormal gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></span></p></span></div></div></blockquote></div></div></div></blockquote></div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_8668639065164812549m_-2916909184822639306WordSection1 gmail_msg"><p class="MsoNormal gmail_msg"><b class="gmail_msg"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif" class="gmail_msg"> Merlijn van Deen (valhallasw) [mailto:<a href="mailto:valhallasw@arctus.nl" class="gmail_msg" target="_blank">valhallasw@arctus.nl</a>] <br class="gmail_msg"><b class="gmail_msg">Sent:</b> Sunday, December 4, 2016 10:59<br class="gmail_msg"><b class="gmail_msg">To:</b> <a href="mailto:maximilian.doerr@gmail.com" class="gmail_msg" target="_blank">maximilian.doerr@gmail.com</a><br class="gmail_msg"><b class="gmail_msg">Subject:</b> Re: [Labs-l] Some using a Python framework is relentlessly hammering Harvard sites, resulting an IP range ban.<u class="gmail_msg"></u><u class="gmail_msg"></u></span></p><div class="gmail_msg"><div class="m_8668639065164812549h5 gmail_msg"><p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><p class="MsoNormal gmail_msg">Hi Maximilian,<u class="gmail_msg"></u><u class="gmail_msg"></u></p></div><div class="gmail_msg"><p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p></div><div class="gmail_msg"></div></div></div></div></div></div></div></blockquote></div></div></div></blockquote></div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_8668639065164812549m_-2916909184822639306WordSection1 gmail_msg"><div class="gmail_msg"><div class="m_8668639065164812549h5 gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><p class="MsoNormal gmail_msg">On 4 December 2016 at 05:51, Maximilian Doerr <<a href="mailto:maximilian.doerr@gmail.com" class="gmail_msg" target="_blank">maximilian.doerr@gmail.com</a>> wrote:<u class="gmail_msg"></u><u class="gmail_msg"></u></p><blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><p class="MsoNormal gmail_msg">Would the user who is querying the Harvard sites for planet data, that is carrying the UA “weblinkchecker Pywikibot/3.0-dev (g7171) requests/2.2.1 Python/2.7.6.final.0”, please stop, or severely throttle the GET requests. It’s making 168 requests to that site a minute, and consequently they banned labs from accessing it, according to the IT department there, who kindly shared with me the access log.<u class="gmail_msg"></u><u class="gmail_msg"></u></p><p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p></div></div></blockquote><div class="gmail_msg"><p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p></div></div></div></div></div></div></div></div></blockquote></div></div></div></blockquote></div></div></div><div style="word-wrap:break-word" class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><blockquote type="cite" class="gmail_msg"><div class="gmail_msg"><div class="gmail_extra gmail_msg"><div class="gmail_quote gmail_msg"><blockquote class="gmail_quote gmail_msg" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple" class="gmail_msg"><div class="m_8668639065164812549m_-2916909184822639306WordSection1 gmail_msg"><div class="gmail_msg"><div class="m_8668639065164812549h5 gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><div class="gmail_msg"><p class="MsoNormal gmail_msg">Would you be able to share the access log with the Tools admins (say, via Phabricator, only shared to Yuvi, Bryan Davis, Andrew Bogott, Chase, scfc and me)? From the combination of external IP and timestamp we may be able to pinpoint which tool was causing this.<u class="gmail_msg"></u><u class="gmail_msg"></u></p></div><div class="gmail_msg"><p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p></div><div class="gmail_msg"><p class="MsoNormal gmail_msg">Can you also clarify which exact URLs we are talking about?<u class="gmail_msg"></u><u class="gmail_msg"></u></p></div><div class="gmail_msg"><p class="MsoNormal gmail_msg"><u class="gmail_msg"></u> <u class="gmail_msg"></u></p></div><div class="gmail_msg"><p class="MsoNormal gmail_msg">Cheers,<u class="gmail_msg"></u><u class="gmail_msg"></u></p></div><div class="gmail_msg"><p class="MsoNormal gmail_msg">Merlijn<u class="gmail_msg"></u><u class="gmail_msg"></u></p></div></div></div></div></div></div></div></div></blockquote></div><br class="gmail_msg"></div>
</div></blockquote></div><br class="gmail_msg"></div></div>_______________________________________________<br class="gmail_msg">
Labs-l mailing list<br class="gmail_msg">
<a href="mailto:Labs-l@lists.wikimedia.org" class="gmail_msg" target="_blank">Labs-l@lists.wikimedia.org</a><br class="gmail_msg">
<a href="https://lists.wikimedia.org/mailman/listinfo/labs-l" rel="noreferrer" class="gmail_msg" target="_blank">https://lists.wikimedia.org/mailman/listinfo/labs-l</a><br class="gmail_msg">
</blockquote></div></div></div>
_______________________________________________<br class="">Labs-l mailing list<br class=""><a href="mailto:Labs-l@lists.wikimedia.org" class="">Labs-l@lists.wikimedia.org</a><br class="">https://lists.wikimedia.org/mailman/listinfo/labs-l<br class=""></div></blockquote></div><br class=""></div></body></html>