Hi Giles,
I regret I will probably not be available for the IRC office hours as scheduled.
In the discussion of shared hosting, I worry that en:User:Dispenser's reflinks project, which requires a 20 TB cache, is being forgotten again. He tried to host it himself, but it's offline again. This data is essential in maintaining an audit trail of references as long as the Internet Archive respects robots.txt retroactively, allowing those who inherit domains to censor them, even if they have already been used as a reference in Wikipedia. Keeping the cache is absolutely a fair use right in the US, in both statutory and case law, and it is essential to be able to track down patterns of attempts at deceptive editing to address quality concerns around deliberately biased editing such as paid editing. Because of the sensitivity of this goal, the Foundation should certainly bear the risk of hosting the reflinks cache. However, in the past, 20 TB was considered excessive, even though the cost was shown to be less than $5000 without whatever Dell NSA-enabled hardware you usually buy.
Would you please reach out to en:User:Dispenser and offer them the 20TB hosting solution they need for the Foundation to bear the risk of the reflinks cache? Thank you for your kind consideration.
Best regards, Jim
Were there any objections to my request below?
Can we also please hire additional database, system, and if necessary network administration support to make sure that the third party spam prevention bot infrastructure is supported more robustly in the future?
On Monday, December 14, 2015, James Salsman jsalsman@gmail.com wrote:
Hi Giles,
I regret I will probably not be available for the IRC office hours as scheduled.
In the discussion of shared hosting, I worry that en:User:Dispenser's reflinks project, which requires a 20 TB cache, is being forgotten again. He tried to host it himself, but it's offline again. This data is essential in maintaining an audit trail of references as long as the Internet Archive respects robots.txt retroactively, allowing those who inherit domains to censor them, even if they have already been used as a reference in Wikipedia. Keeping the cache is absolutely a fair use right in the US, in both statutory and case law, and it is essential to be able to track down patterns of attempts at deceptive editing to address quality concerns around deliberately biased editing such as paid editing. Because of the sensitivity of this goal, the Foundation should certainly bear the risk of hosting the reflinks cache. However, in the past, 20 TB was considered excessive, even though the cost was shown to be less than $5000 without whatever Dell NSA-enabled hardware you usually buy.
Would you please reach out to en:User:Dispenser and offer them the 20TB hosting solution they need for the Foundation to bear the risk of the reflinks cache? Thank you for your kind consideration.
Best regards, Jim
On 12/20/15, James Salsman jsalsman@gmail.com wrote:
Were there any objections to my request below?
Yes. As MaxSem said earlier[1], its basically being ignored as being totally irrelevant to the topic at hand. (To be clear: Third-party does not mean people who are doing work on Wikimedia sites that aren't WMF. Third party = Wikis that have nothing to do with Wikimedia wikis (e.g. wikia, wikihow, uncyclopedia etc))
If you want to get Dispenser his hard disk space, you should take it up with the labs people, or at the very least some thread where it would be on-topic.
Can we also please hire additional database, system, and if necessary network administration support to make sure that the third party spam prevention bot infrastructure is supported more robustly in the future?
Then by definition it wouldn't be a third-party spam framework if WMF was running it.
-- -bawolff
[1] https://lists.wikimedia.org/pipermail/wikitech-l/2015-December/084326.html [Linking because this thread is super-cross posted, and some people are going to be confused as to what I'm referring to]
wikimedia-l@lists.wikimedia.org