Fleiss’ kappa is a statistical measure of inter-rater reliability that assesses the agreement between three or more raters when they classify items into categories, accounting for agreement that would occur by chance. Like Cohen’s kappa, it ranges from -1 to 1, where 1 indicates perfect agreement, 0 indicates agreement no better than chance, and negative values suggest worse-than-chance agreement. Values above 0.6 are typically considered moderate to good agreement, though interpretation standards can vary depending on the field.
Fleiss’ kappa is designed for categorical (nominal) data and works with ratings made by multiple independent raters. It assumes that each item is rated by the same number of raters and that categories are mutually exclusive.
Fleiss’ kappa is designed specifically for three or more raters. For two-rater situations, consider using Cohen’s kappa. For ordinal or more complex data types, consider using Krippendorff’s alpha as an appropriate reliability measure.
For example, if five pathologists independently classify biopsy samples as benign or malignant, Fleiss’ kappa can evaluate how consistently they agree, adjusting for chance agreement. If the kappa value is 0.68, this would typically be interpreted as substantial agreement, suggesting that the pathologists’ ratings are reasonably reliable.
« Back to Glossary Index