There are several formulas that can be used to calculate compliance limits. The simple formula given in the previous paragraph that works well for sample sizes greater than 60[14] is that there are several operational definitions of „advisor reliability“ that reflect different views on what a reliable agreement is between advisors. [1] There are three operational definitions of agreements: there are a number of statistics that can be used to determine the reliability of inter-advisors. Different statistics are adapted to different types of measurement. Some options are the common probability of an agreement, Cohens Kappa, Scott`s pi and the Fleiss`Kappa associated with it, inter-rate correlation, correlation coefficient, intra-class correlation and Krippendorff alpha. Therefore, the common probability of an agreement will remain high, even in the absence of an „intrinsic“ agreement between the councillors. A useful interrater reliability coefficient (a) is expected to be close to 0 if there is no „intrinsic“ agreement and (b) increased if the „intrinsic“ agreement rate improves. Most probability-adjusted match coefficients achieve the first objective. However, the second objective is not achieved by many well-known measures that correct the odds. [4] If the number of categories used is small (z.B 2 or 3), the probability of 2 raters agreeing by mere coincidence increases considerably. This is because the two advisors must limit themselves to the limited number of options available, which affects the overall agreement rate, not necessarily their propensity to enter into an „intrinsic“ agreement (an agreement is considered „intrinsic“ if not due to chance). If advisors tend to accept, the differences between the evaluators` observations will be close to zero.

If one advisor is generally higher or lower than the other by a consistent amount, the distortion differs from zero. If advisors tend to disagree, but without a consistent model of one assessment above each other, the average will be close to zero. Confidence limits (generally 95%) It is possible to calculate for bias and for each of the limits of the agreement. Krippendorffs Alpha[16][17] is a versatile statistic that evaluates the agreement between observers who categorize, evaluate or measure a certain number of objects against the values of a variable. It generalizes several specialized agreement coefficients by accepting any number of observers applicable to nominal, ordinal, interval and proportional levels of measurement, capable of processing missing and corrected data for small sample sizes. Kappa is similar to a correlation coefficient, as it can`t exceed 1.0 or -1.0. Because it is used as a measure of compliance, only positive values are expected in most situations; Negative values would indicate a systematic disagreement. Kappa can only reach very high values if the two matches are good and the target condition rate is close to 50% (because it incorporates the base rate in the calculation of joint probabilities).

Several authorities have proposed „thumb rules“ to interpret the degree of the agreement, many of which coincide at the center, although the words are not identical. [8] [9] [10] [11] When comparing two measurement methods, it is interesting not only to estimate the bias and limitations of the agreement between the two methods (inter-counsel), but also to assess these characteristics for each method itself. It is quite possible that the agreement between two methods is bad simply because one method has broad convergence limits, while the other is narrow. In this case, the method with narrow limits of compliance would be statistically superior, while practical or other considerations could alter that assessment.