A massive audit reveals that machines grading machines can be perfectly reliable while being consistently unreliable