Hi everybody,
in T189051 the Analytics team introduced a new feature in the Hadoop cluster, namely the HDFS Trash directory. This means that now if you use the hdfs -rm CLI command you will not directly delete a file or a directory, but you'll move it under /user/$yourusername/.Trash. The Trash directory is "partitioned" by daily directories (called checkpoints), and will keep files for a month before deleting them. Here's a quick FAQ about how to recover data if needed:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster#recover_files_...
If you want to skip the Trash directory then you can use the hdfs -skipTrash option, but of course it should be done only when you are really sure about what you are doing :)
We hope that this extra safety net will help all the Hadoop users to preserve their data (in this case that might get deleted by mistake).
If you have comments/suggestions/etc.. feel free to reach out to the Analytics team via mailing list or via IRC.
Thanks!
Luca (on behalf of the Analytics team)