A new preprint by philosopher of science Eran Tal and clinical psychologist Femke Truijens argues that many of the tools used to measure mental health outcomes rest on assumptions that do not hold up in clinical practice. Their paper, posted on PsyArXiv, proposes a new framework for evaluating patient-reported measurement that treats measurement not as a neutral way to collect data, but as a social intervention that shapes how people understand themselves and how institutions make decisions.
The authors contend that traditional approaches to measurement, which dominate clinical psychology and much of mental health services research, draw heavily on metrology, the science of measurement in fields such as physics and engineering. In those fields, a quantity such as temperature is understood to be stable and intrinsic. If a thermometer shows 30 degrees Fahrenheit in one room, it should show the same value elsewhere, regardless of the tool used or the context in which it is measured.
Tal and Truijens argue that psychological constructs do not work this way. Depression, anxiety, distress, and well-being are not stable quantities waiting to be uncovered by a questionnaire. They are shaped by meaning, context, values, and relationships. Treating psychological experiences as if they are analogous to temperature readings can obscure important differences between people, flatten lived experience into abstract numbers, and ultimately distort clinical care. They write:
“Patient-reported measurement is a social technology, and as such should be evaluated primarily by its ability to attain desirable individual and social goods.”
Their central claim is that patient-reported measurement should be judged not only by whether a tool shows statistical reliability or validity, but also by whether the practice of measurement contributes to meaningful and ethical clinical interactions. They describe patient-reported measurement as a “social technology,” something that does not merely collect information but actively produces it, influencing how people think about themselves and how clinicians respond.

Patient-reported measurement, as in symptom questionnaires (e.g., depression), is ubiquitous in clinical psychology and presumes that such measurement accurately reflects a stable, intrinsic trait or quantity (e.g., symptom severity) that can be easily transported across contexts and used for a variety of purposes after collection.
The authors of this preprint seek to challenge the idea of measurement as a collection of inert numerical data, and reflect instead on measurement practices as active and interactive goal-oriented social processes that produce, rather than collect, data about individuals.
Their aim is, first, to critique the limitations and implicit assumptions of the dominant “knowledge-first” model of outcome measurement in clinical psychology, and second, to demonstrate (using routine outcome monitoring or ROM as an example) how measurement actively intervenes on patients, clinicians, and the social and institutional contexts where they interact.
Finally, the authors outline a proposed alternative, what they call “responsible” and “meaningful” measurement practice. They propose six guiding principles for such measurement, summarized with the acronym M-SPACE: meaningful, systemic, pragmatic, aligned, contextual, and evolving.
Tal and Trujiens argue that the knowledge-first approach common in clinical psychology is a descendent of the metrological tradition of the natural sciences (metrology is the science of measurement): the numerical value of a property (such as temperature) should be the same regardless of the context in which it was measured or the tools that were used to measure it – an ambient temperature of thirty degrees Fahrenheit should register as thirty degrees Fahrenheit on any thermometer and in any location.
Already, we can see that, as the authors will contend later, measuring is an intervention rather than a passive process of data collection. Metrological assumptions shape how clinical psychology conceptualizes mental distress or pathology – that depression, for example, is a quantity like temperature that resides within people and that can be revealed or uncovered by measurement (when a person completes a depression questionnaire). But the authors contend that mental traits and psychological constructs are meaningfully different from a quantity like temperature, and the metrological framework thus introduces two assumptions (“myths,” in the authors’ parlance) that seriously complicate the usefulness and validity of the knowledge-first approach in clinical psychology.
These are what the authors call the myths of “repurposing” (“the interpretation of patient scores is expected to remain the same when the same measure is used for different purposes”) and of “standard interpretation” (“score use involves no further score interpretation. By the time a patient’s score is used for decision making… the score has already been assigned a standard interpretation”).
The knowledge-first approach and its assumptions or “myths” are, for Tal and Trujiens, infeasible for clinical psychology; they risk restricting patient autonomy to define and interpret their own experiences, and deprioritize the ethical dimensions of measurement, “as patient-reported data reflect a patient’s attempt to communicate values and not merely facts.”
The knowledge-first approach also relegates potentially meaningful variability in different patients’ responses to measurement instruments to statistical noise or random error, smoothing away critical information for psychological practice. The very act of administering a questionnaire intervenes in the respondent’s interpretation of their own experiences. This can be beneficial, detrimental, or neutral. The authors select two examples from the Ghent Psychotherapy Study to illustrate the divergent experiences of two patients with routine outcome monitoring – the tracking of individual-level patient-reported measures before, during, and after treatment. For one patient in the study, the ROM helped monitor “symptoms” and adjust therapeutic engagement accordingly, with a beneficial outcome; for another, the ROM induced distress over the appearance or reappearance of “symptoms,” and additional therapeutic intervention was needed to attend to this distress.
The authors also consider how ROM intervenes in institutions. Responses to ROM can be and often are “repurposed” and aggregated to the institutional level, where they can be used as performance metrics or for evaluation. Depending on the performance evaluation’s goal, undesirable outcomes can result.
Aggregate performance metrics, for example, therapeutic outcomes, may become goals in their own right, which can interfere with the provision of appropriate treatment or therapeutic practices if successful treatment is defined as rapid completion or convergence of outcome measures toward the group average. In extreme cases, this can encourage selective treatment:
“For example, suppose that clinical institutions are compared on the number of people that leave with a post-treatment below a clinical threshold (cut-off score)… such institutions may then prefer to treat people with a better prospect of showing a clear pre-to-post difference.”
In contrast to the knowledge-first framework, the authors contend that the primary goal of measurement is to enable meaningful and beneficial social interactions, and that ethical and moral values are central in determining which social interactions are desirable. Their focus is not on the measurement tool but the measurement practice, which produces the score data that is meaningful and interpretable only in its original context. In other words, the critical thing for Tal and Trujiens is how measurement works socially (to guide therapeutic practice, screening, resource allocation, and so on) rather than how neatly or reliably it captures a presumably static construct.
In their framework, the goal of measurement is to do something – to intervene – not to collect data for its own sake or for an indeterminate future purpose. Values are again central to the measurement practice as a total social process, informing first whether the intended social goal of the measurement is appropriate, beneficial, desirable, or meaningful to the relevant stakeholders, and second, how the measurement intervention helps achieve that social goal.
In place of the knowledge-first model, Tal and Truijens propose a framework they call M-SPACE: meaningful, systemic, pragmatic, aligned, contextual, and evolving. These principles describe what responsible measurement practice should look like.
-
Meaningful: Measurement should focus on what matters to the person being measured, rather than what is most convenient for institutions.
-
Systemic: Because measurement is embedded in social systems, its effects on individuals, clinicians, and institutions must be considered.
-
Pragmatic: Measures should be used for clearly articulated purposes.
-
Aligned: Measurement practices should reflect the intervention’s values and aims.
-
Contextual: Scores must be interpreted within the conditions under which they were produced, rather than treated as intrinsic traits.
-
Evolving: Measurement practices should develop through ongoing dialogue among all stakeholders, including people with lived experience.
The authors are clear that these principles are not a technical solution. They are a guide for rethinking measurement practices at their foundations. Because measurement is a social and ethical process, improving it requires reflection, deliberation, and institutional change, not just better questionnaires.
The M-SPACE principles respond to a number of concerns and well-documented limitations of existing measurement practices. However, these principles are abstract and high-level – the authors acknowledge that they are leaving so-called methodological considerations about implementing these principles to future work. As the authors themselves painstakingly establish, measurement is a social technology embedded in complex social systems of power and culture – including current measurement practices. Implementation of these principles is thus not simply a matter of acknowledging the limitations of knowledge-first practices or the superiority of the proposed M-SPACE principles; it will have to grapple with the actual measurement practices in institutional, financial, cultural, and epistemic contexts and dimensions. Some of these barriers to implementation are concepts and values about objectivity, and how mathematization and metrology relate to objectivity, deeply embedded in many disciplines.
This article intervenes in a long-standing debate about measurement in psychology as a social practice. Previous research has documented the risks inherent in standardizing mental health measures, many of which concern “repurposing” data collected in one context for use in other contexts while applying the same “standard interpretation” across contexts. The variable and context-dependent nature of psychological symptoms and their measurement is apparent at the level of measurement tools themselves; concerns have been raised about whether different depression scales measure the same thing (as metrological theory would presume), especially since these scales can be used interchangeably in clinical practice.
At the population and institutional levels, routine outcome monitoring has not been shown to improve treatment experience or outcomes, and service users and practitioners alike have raised doubts about its efficacy, purpose, and alignment with values. Finally, improvements to measurement practice have been suggested before along similar lines, such as incorporating a humanistic perspective into measurement. This preprint provides a template for future discussion and work on improving measurement practice in clinical psychology.
****
Tal, E., & Truijens, F. Responsible and Meaningful Measurement in Clinical Psychology. [Preprint on PsyArXiv] (Link)
link
