A guided chatbot-based psychological intervention for psychologically distressed older adolescents and young adults: a randomised clinical trial in Jordan

Table of Contents

Trial design

The trial was prospectively registered on ISRCTN on 02/11/2022 ( It was approved by the WHO Ethics Review Committee (ERC.0003729), and the University of Jordan (PF.22.9), and all participants provided informed consent prior to participation. No changes were made to the trial protocol (1). In this randomised, parallel, controlled trial, psychologically distressed young adults in Jordan were randomly assigned to either STARS or EUC on a 1:1 basis. Assessments were conducted via online assessments. The primary outcome was anxiety and depression symptoms, and the primary outcome timepoint was the 3-month assessment.

Participants

Participants were recruited in Jordan via online advertising and publicising the study in universities in Amman. Potential participants were screened via a website (Qualtrics software), and after completing digital informed consent, participants completed the screening measures.

Inclusion criteria included: (a) aged between 18–21 years, (b) residing in Jordan, (c) moderate/high psychological distress as operationalized by scores of ≥20 on the Kessler Distress Scale (K10¹⁸), and (d) access to a device for intervention delivery. Exclusion criteria included (a) imminent suicide risk as determined by questions to assess serious thoughts or a plan to end one’s life in the past month¹⁹. Eligible participants were contacted by telephone to explain the trial and answer any questions they had. Participants were then provided with a personalised link to complete the online baseline assessment.

Randomisation and masking

Following completion of the baseline assessment, participants were assigned to STARS or EUC by randomisation on a 1:1 ratio via Qualtrics software that stratified randomisation of Jordanian/Palestinian and other nationalities on a 1:1 basis. All assessments were conducted online without the assistance of research personnel, and in this sense, independence of assessments from treatment was assured. E-helpers and participants were not masked for treatment allocation because they were aware of the administered treatment.

Interventions

The STARS intervention, described in more detail elsewhere¹², comprises 10 lessons that are intended to be completed over 5 weeks. The structure and length of the STAR programme were developed following considerable human-centred design with young people in five countries, and this was further refined with consultation with young adults in Jordan; this process contributed to the decision to structure the content spread over 10 sessions, as well as the design of the current trial^12,20. STARS is a pre-programmed chatbot that uses decision-tree logic to deliver content that guides participants through stress coping strategies via messaging. The delivery mode of STARS utilises a conversational style from the chatbot with opportunities to hear from fictional characters who are portrayed as having different stress-related problems. Pilot adaptation work for this trial indicated that for this age group of urban Jordanians, the appropriate stressful experiences related to university, unemployment and financial difficulties, and romantic and family relationships²⁰. The chatbot was designed to simulate a conversation between the participant and a person, despite initially being informed that the programme is an automated chatbot. The lessons ranged from 10–25 min, and included text, videos, audio clips, and activities to appeal to different preferences in learning styles (e.g. participants could choose to read the stories of different characters). Using a combination of pre-defined choice responses and limited free-text input, participants responded to the conversational text generated by the chatbot. Lesson 1 comprised an orientation to the chatbot and rationale for the intervention. Lesson 2 involved psychoeducation, explained via a character story about common emotional experiences to stressful events, as well as the participant setting their goals regarding the intervention (e.g. stress, relationship, or mood management). Lesson 3 taught controlled breathing as a stress management strategy. In Lesson 4, participants continued practising this technique, and also learnt ‘grounding’ (being aware of all five senses during stress) as an additional strategy. Via the character stories, in lessons 5 and 6, participants learn strategies to cope with stress, including identifying experiences of mastery, pleasure, and/or social connection. Lesson 7 introduced problem management techniques via the character stories, and lessons 8 and 9 explained self-talk as an alternate strategy to appraise situations adaptively. Lesson 10 focuses on relapse prevention and encourages the use of the strategies in planning for the management of future stressors. The STARS chatbot also provided a toolbox of video and audio resources, quizzes and an extra ‘helping’ lesson to support learning the strategies (see Fig. 2).

To support participants using the chatbot, non-specialist helpers (called ‘e-helpers’) were trained and supervised in a self-help support model devised by WHO and used previously with similar interventions¹⁵. This comprised of five weekly 15-min telephone calls with participants to provide support and motivation in using the chatbot. Participants were asked if they wanted to receive reminders prior to each scheduled call, and if appropriate, received prior reminders by text or phone call. If participants did not attend the phone call, they received two reminders to reschedule their next call. The e-helpers had at least a Bachelor’s degree but no formal mental health training. Training comprised of five days and covered an introduction to common mental health conditions, basic helping skills, structured call protocols to follow when providing support, an introduction to and practise with, the STARS intervention, and management of adverse events and referral pathways. E-helpers received weekly group supervision from a qualified clinical psychologist.

To assess treatment fidelity of support offered by e-helpers, a random sample of 5% of all planned e-helper sessions were audio-recorded, and were rated by the project manager using a checklist from the e-helper manual. The checklist included all the steps for e-helpers to deliver during each call (e.g. introducing e-helper support in the welcome call, reviewing practice of lessons in calls 2 to 4, reviewing action plan for relapse prevention in the final call). Adverse reactions were monitored and recorded by the e-helpers.

Participants in EUC accessed a website that contained information derived from lesson 2 of STARS, which comprised psychoeducation about anxiety and depression, a story about a fictional character who talks about their emotions, and a link to a list of psychosocial services in Jordan where participants could access mental health care. Provision of psychoeducation and explicit referral to psychological services is enhanced relative to usual care in Jordan because these services are not routinely offered. This list was also contained in the toolbox section of the STARS intervention.

Outcomes

The primary outcome was change in anxiety and depression severity, as measured by the Hopkins Symptom Checklist (HSCL²¹) total scores. The HSCL consists of 25 questions, with 10 questions related to anxiety (range, 10–40) and 15 questions related to depression (range, 15–60), with higher scores indicating more severe anxiety and depression, respectively. The HSCL has been validated across many cultures, including in Arabic contexts²². To determine probable caseness of anxiety and depression, an item mean score is calculated for each subscale, and on the Arabic version of the HSCL relative to structured clinical interview the cutoffs are 2.0 and 2.1, respectively²². The internal consistency of the HSCL in the current sample was robust for the anxiety (0.81) and depression (0.85) scales, respectively.

In terms of secondary outcomes, anxiety and depressive symptoms were assessed using the subscales of the HSCL. Psychological distress was assessed with the K10¹⁸, which is a 10-item self-report measure of psychological distress (range, 10-50; higher scores indicate more severe distress). Functional impairment was assessed with WHODAS 2.0²³, which is a 12-item self-report measure of disability in the past 30 days (range, 0–48; higher scores indicate more severe impairment). Personally identified problems were assessed with the Psychological Outcome Profiles (PSYCHLOPS²⁴), which address personally identified problems, functioning, and wellbeing, with their impact being scored on a 6-point scale (range, 0–20; higher scores indicate more severe problems). Psychological wellbeing was assessed with the WHO-5, which is a 5-item scale of positive wellbeing (range, 0–25; higher scores indicate better wellbeing)²⁵. A sense of agency was assessed with the agency subscale of the State Hope Scale, which is a 3-item scale (range, 3–24; higher scores indicate a greater sense of agency)²⁶.

Statistical analyses

On the basis of previous trials in LMICs with digital interventions²¹, we projected that to achieve a between-condition effect size of 0.5 at the 3-month follow-up, 172 participants would be required, with 90% power and α = 0.5. Based on meta-analysis of drop-out rates in mental health digital application trials²⁷, we estimated that 50% of the sample would not be retained at follow-up, thereby requiring enrolment of 344 participants to achieve the desired sample size.

Descriptive and other basic statistics were calculated using SPSS (Version 29). Analyses focused on intent-to-treat analyses. Across outcomes, mixed model repeated measures (MMRM) models were fitted using the R package mmrm²⁸. The model included condition, time, and the condition × time interaction, with an unstructured covariance matrix, and Satterthwaite degrees of freedom. The R package emmeans²⁹ was used to calculate contrasts comparing the difference between the conditions in the change from baseline to post, and baseline to follow-up, as well as the associated Cohen-like effect sizes (ES); we interpreted effect sizes as 0.2–0.4 as small, 0.5–0.07 as moderate, and >0.08 as large. Sensitivity analyses used the R package rbmi³⁰.

The MMRM model fitted to available data provides valid inference if missing data is missing at random (MAR), however if some data is missing not at random (MNAR) or if there is a large proportion of missing data then potentially this validity is reduced. Multiple imputation (MI) can increase validity and enable sensitivity analyses. We report MI-based analyses for both the primary and sensitivity analyses. The MI for primary analyses used the R package mice, with the number of imputations chosen using a previously demonstrated approach³¹. For the sensitivity analyses, package-defined methods of MI were used. To help validate our results the robustness of the MI estimates was tested by a range of sensitivity analyses examining various assumptions for imputation, including tipping-point analyses to determine the worse-case bounds at which the MI-based findings are no longer significant (Supplementary Material p. 19, Tables S6, S7, and S8, pp 27–35).

We additionally examined the effect of the intervention on those who presented with probable anxiety or depression on the HSCL (defined as a mean item score ≥2 on anxiety or ≥2.1 on depression subscales). We also conducted non-planned analyses on the minimally important difference for the primary outcomes by comparing the proportions of participants in each treatment arm showing improvement of more than 0.5 SDs of total HSCL scores from baseline to 3-month follow-up³², and on this basis calculated the NNT.

An independent data monitoring committee reviewed adverse events occurring during the trial. No interim analyses were conducted.

link