In principle, the concept of ‘validity’ is straight forward. What is validity? A valid instrument measures what it purports to measure. A correctly calibrated ruler, for example, gives valid measurement of distance.

In practice, validation is a complex subject. It is discussed in most textbooks concerned with instrument construction. For a simple but comprehensive explanation see the excellent web-based notes by William Trochim


The present notes focus primarily upon validity of multi-attribute utility (MAU) instruments. This context adds two additional layers of complexity. The first is attributable to the breadth of the concept. Health is multi-dimensional and overall validity requires validity in all dimensions. Secondly, to qualify as ‘utility’, in the sense used by health economists, additional properties are required to those usually discussed in the psychometrics (link) literature.

Box 1: A Common Misunderstanding
The concept of validity is widely misunderstood and, in particular, the phrase ‘an instrument has been validated’. Many are misled by the compelling connotations of the word ‘validated’. The term implies a generality and finality which is incorrect. In contrast with the connotations of the term ‘validity’, the property we seek is more aptly described as (degrees of) confidence in an instrument in a particular context, rather than a universal ‘true-false’ stamp.


With psychological constructs such as intelligence (IQ) or quality of life (QoL), establishing validity is problematical as there is no ‘gold standard’ as there is for physical measurement (see ‘What is a ‘gold standard’). A concept such as intelligence, is commonly the result of a number of elements: verbal, numerical, spatial skills, problem solving, memory, etc. In turn, each of these may not be clearly identified by the answer to a simple question, but may require a series of questions and answers. Further, the precise meaning of terms and questions can vary between individuals and cultures in a way which is related to personal circumstances. As an example ‘communication’ may mean speaking to some, signing to others, face to face contact for some or texting for others.

To overcome this problem psychometric theory uses some variant of factor analysis to create measurement instruments. Answers to questions are analysed for their relationship and answers which cluster around a concept – the answers correlate – are accepted as a measure of this concept. This is illustrated in Figure 1 in which two constructs or concepts are represented in ‘content space’ by the heavy bold circles. A series of questions and answers are represented by the various rectangles. As shown, three of these heavily overlap Concept 1 and 3 overlap Concept 2. A seventh item, Item 7, crosses both concepts. In the terminology of factor analysis this last item ‘cross loads’ on the two concepts and would normally be eliminated from the items used in an instrument.

Figure 1


‘Item’ = question with a series of possible response levels (You can run 100m in (i) 0-20 seconds; (ii) 20-60 seconds, etc.)
Concept/Construct = an abstract idea concerning some hypothesised attribute or characteristic (physical fitness, happiness)


Figure 1 illustrates a number of points. Concepts overlap. Statements overlap and do not exactly correspond with concepts. Importantly, single statements may cover only a small part of the content of a concept, ie language and concepts are imperfectly related. Finally, as shown, neither concept may be perfectly defined by the items. Some content may be omitted by the item description.