Missing outcome data in randomised clinical trials of psychological interventions: a review of published trial reports in major psychiatry journals

Table of Contents

Search strategy and selection criteria

Two investigators (SJ, PF) independently searched for randomised clinical trials of psychological interventions (for example psychodynamic therapy, cognitive behavioural therapy, or mentalisation-based therapy) for any type of mental health disorder published from 2017 to 2022 (all years inclusive) on the websites of six high-journal impact factor psychiatric journals:

World Psychiatry.
Lancet Psychiatry.
JAMA Psychiatry.
Psychotherapy and Psychosomatics.
American Journal of Psychiatry.
British Journal of Psychiatry.

We chose these peer-reviewed journals due to their high impact factor (ranging from 10.5 to 73.3 in Web of Science for the year 2022). The period was chosen to provide an overview of the current research practice. Two investigators (SJ, PF) independently screened titles and abstracts. Any discrepancies were resolved through discussion or, if required, through discussion with a third investigator (JCJ). Full-texts were retrieved for all trial reports.

Eligibility criteria

We included any randomised clinical trial (as defined by trialists) assessing the effects of a psychological intervention for any mental health disorder, i.e. we included (1) trials comparing psychological interventions with non-psychological interventions (for example drugs, no intervention, or wait-list) and (2) trials comparing two or more psychological interventions with each other. We excluded prevention trials assessing healthy participants.

Data extraction

Six independent investigators (SJ, PF, RKA, CBK, JJP, FS) extracted data and performed risk of bias assessments in pairs. The final data extractions and risk of bias assessments were reached through consensus. Any disagreements following the independent data extractions or risk of bias assessments were resolved through discussion or, if required, through discussion with a third investigator (JCJ). Both the published trial reports and supplementary materials were used for data extraction.

We extracted data from the assessment timepoint defined as primary by the trialists. If no primary timepoint was defined by the trialists, we used the timepoint closest to the end of treatment in both groups. We included feasibility studies but only used the clinical outcome(s).

Assessment of risk of bias due to missing outcome data

For each trial, we assessed domain 3 of the Cochrane Risk of Bias tool – version 2 (RoB2) [12], which is “Bias due to missing outcome data”. Low risk of bias was assessed, if outcome data was available for all, or nearly all, participants (usually 95% complete data). High risk of bias was assessed, if missing data exceeded 5%, there was no evidence that the result was not biased, and the missingness could, and was likely to, depend on the true value [12].

Classification of the reporting of the missing data

We classified how the trial reported the extent of the missing data into three categories:

1.

‘Fully reported’: when the trial reported in detail the extent of missing data for the primary outcome.
2.

‘Partially reported’: when the overall proportion of missing data was reported in a flowchart, but the extent of missing data for the primary outcome was unclear.
3.

‘Not reported’: when the extent of missing data was not reported.

Classification of outcomes

We only extracted data for each trial’s primary outcome(s). We classified the different types of primary outcomes as:

1.

‘Hard binary outcomes’: defined as patient-important binary outcomes that are conclusive regarding the disease progression, demonstrating a patient’s feelings, functionality, or survival [13]. Examples of hard binary outcomes are all-cause mortality, suicides, suicide attempts, psychiatric hospitalizations, and self-harm.
2.

‘Symptom severity scales’: defined as any scale (for example interviews or questionnaires) measuring psychiatric symptoms. If a trial assessed a dichotomised version of a continuous outcome (for example response or remission), we also classified this as a symptom severity scale. Examples of symptom severity scales are Hamilton Rating Scale for Depression and Clinical Global Impressions Scale.
3.

‘Count data’: defined as any outcome using countable quantities. Examples of count data are number of binge eating episodes and number of days without drinking alcohol.
4.

‘Other types of outcomes’: any type of outcome not included in the above-mentioned classifications. Examples of other types of outcomes are urine toxicology and other lab results.

Classification of participants

We classified if the trial participants belonged to the following mental health disorder categories: Addiction disorders and comorbidities (for example, substance use disorder and post-traumatic stress disorder; or alcohol use disorder and depression); affective disorders and anxiety disorders; eating disorders and comorbidities; neurodevelopmental disorders; personality disorders; post-traumatic stress disorder; psychosis and schizophrenia spectrum disorders; sleeping disorders; transdiagnostic; or other disorders.

Classification of intervention length

We classified if the intervention lengths were under 1 month; 1–3 months; 3–6 months; 6–12 months; over 12 months; or if there was no information regarding intervention length.

Classification of psychological intervention types

We classified if the psychological trial interventions (both the experimental and control groups) belonged to the following categories: cognitive and behavioural therapies; humanistic therapy; psychodynamic therapy; supportive psychotherapy; systemic/family therapy.

Classification of statistical analyses

To determine the group of participants included in the analyses, we categorised the included population as:

1)

‘Intention-to-treat population’: if all randomised participants with available data were included in the primary analysis;
2)

‘Modified intention-to-treat population’: if participants who did not initiate the interventions were excluded from the analysis;
3)

‘Per protocol population’: If only participants who completed the intervention and follow-up assessments were analysed;
4)

‘Wrong classification of the population‘: if the outcome was reported as being analysed according to the intention-to-treat principle, but the trialists erroneously excluded certain participants from the follow-up assessments and analyses (for example due to protocol violations, adverse effects, or no effects of the intervention). This classification was based on consensus ratings by two review authors (SJ, JJP), as the trialists’ reporting of their analyses was often unclear or ambiguous.;
5)

‘Unclear population’: if it was unclear or not reported which population was included in the primary analysis.

We assessed if the potential impact of missing data was assessed in sensitivity analyses to evaluate the robustness of the primary analyses in trials, where missing data exceeded 5%. Additionally, we grouped the primary statistical analyses used to handle missing data in the following categories: (1) complete case analysis, (2) multiple imputation or similar methods, (3) other types of imputation (for example last observation carried forward, single imputation, worst-case imputation), (4) full information maximum likelihood, (5) regression analyses or similar methods (e.g. mixed-effects modelling, multilevel linear regression, linear mixed-effects models), or (6) unclear/not reported.

Additionally, we assessed whether the trialists discussed the potential strengths or limitations of missing outcome data in the discussion section of the published trial report.

Data analysis

We calculated the mean missing data along with 95% confidence intervals (CI). Confidence intervals of proportions were calculated using one sample proportions test with continuity correction and presented as percentages and 95% CI. We assessed the difference between missing data proportions with Fisher’s exact test. We used the unadjusted threshold (0.05) as the threshold for statistical significance. The analyses were carried out using R version 4.2.1 (R Core Team, Vienna, Austria).

link