Linguistic data

Science strives to achieve safe knowledge, i.e. insights we can trust. To the extent that a piece of knowledge depends on a method, we therefore require that the method be precise, i.e. it reliably leads to a certain result. A scientific method is reliable to the extent that it produces the same results on repeated application. Reliability is, thus, a measure of the precision of a method and of the stability of its results.

In an empirical investigation, a researcher R applies a method M – possibly involving a (measuring) instrument I – to a set of data D. All of these components must be reliable:

M is reliable with respect to R if its results do not depend on the identity of R. This aspect of reliability is called “inter-rater reliability” in psychology, but often separated out as objectivity.– For instance, if the medical diagnosis provided by a diagnostic method depends on the person of the physician, it is unreliable.
M is reliable with respect to I if I produces ceteris paribus the same measurements for the same data.– For instance, if a scale produces different weight values on repeated measurements of the same object, it is unreliable.
M is reliable with respect to D if it produces like results for like data, i.e. it is not the case that it only works for a particular set of data.– For instance, the method of getting grammaticality judgments from a group of informants is reliable to the extent that the same average grammaticality judgment is produced by the same group of informants on different occasions (and even by like groups of informants).

The reliability of a method may be ascertained as a value between 0 and 1. The value required depends on the goals of the investigation.

A method can be valid only if it is reliable; but it may be reliable without being valid. In other words, reliability is necessary, but not sufficient for validity.