Research Engineer focused on AI agents, SWE-bench evaluation, LLM verification, and scalable ML systems.