A method for grading domains like medicine and science using instance-specific criteria.

I. Introduction

In a standard RL loop, an takes an action within an environment and receives a reward .