AI Proof Grading
initial results
Analyzed 780+ graded mathematical proofs to compare LLM vs. human grader consistency and accuracy. Identified feedback errors and applied statistical testing to evaluate grading reliability.
initial results
Analyzed 780+ graded mathematical proofs to compare LLM vs. human grader consistency and accuracy. Identified feedback errors and applied statistical testing to evaluate grading reliability.