Ethical regulations statement
This research project complied with all ethical regulations for research involving human participants laid out by the host organization, Swansea University. Approval was granted by the School of Psychology’s Research Ethics Committee. The participating sites either received ethical approval from their local institutional review boards (IRBs) or stated that they were exempt. Swansea University and Université Grenoble Alpes carried out the administrative organization for the study. Swansea University was also the data controller for this project. Informed consent was obtained from all participants before collecting any data. Participants’ personal data were processed for the purposes outlined in the information sheet. The project was conducted in line with the CO-RE Lab Lab Philosophy v.5 (ref. 37). The current multi-site project (ClinicalTrials.gov: NCT06308744) followed the route of a parallel randomized controlled trial. All materials used in the study, including the preregistered document (https://osf.io/us5ae), the ethics (IRB) approval documents of all the sites involved in the project and the meditation scripts are available on our Open Science Framework (OSF) page (https://osf.io/6w2zm/) and in our ClinicalTrials.gov registration. The data analytic script can be found on the GitHub repository of the project (https://github.com/alessandro992/A-large-multisite-test-of-self-administered-mindfulness) and on the OSF page (https://osf.io/6w2zm/).
Participants
Data were collected between 23 March and 30 June 2022. We limited participation in the study to English native speakers or participants who self-assessed their English language proficiency at the C1/C2 levels from the Common European Framework of Reference for Languages38 to ensure maximum comprehension of the English-spoken audio files used in all conditions. Participants were excluded if they reported having or having had a history of mental illnesses assessed via a prescreening question, if they declared having meditated in the previous 6 months or if they did not match the English language proficiency required (participants had to be either native language level or fluent in English). Each participant was asked to take part in the survey using a smartphone with headphones or earphones attached, to ensure that participants could perform any of the mindfulness activities they were randomly assigned to (that is, mindful walking). Each site committed to collect between 70 and 120 participants; however, if a site collected fewer or more participants than was the target, we still used the data from those participants in the analysis. Each site collected a different number of participants, from a minimum of one and a maximum of 179. Our Rpubs page shows the total number of participants per site (https://rpubs.com/ale-sparacio92/920457). Data collection was performed blind to the experimental conditions but data analysis was not performed blind. However, given that all our analyses were preregistered, it is unlikely that the lack of blinding in data analysis introduced bias.
The dataset originally comprised 6,691 responses, including both the ‘test answers’ generated by the site collaborators while developing and previewing the survey and the actual answers submitted by the participants. From the initial participants in the survey, we excluded the following: 1,307 who self-identified as meditators or reported having engaged in meditation within 6 months before the experiment, 776 who did not meet the English language proficiency requirement and 981 who disclosed having a history of mental illnesses. Finally, 1,660 participants started the survey without using a smartphone with headphones attached. Among these participants who failed to meet the inclusion criteria, 1,491 simultaneously met several exclusion criteria. Respondents who did not meet one or more inclusion criteria (n = 3, 233) were immediately directed towards the end of the survey and we did not record further data from them. We also removed from analyses those who initiated the survey but did not progress up to the listening of the audio track (n = 976) and the ‘test answers’ provided by the collaborating researchers while developing the survey (n = 19); thus, the sample size dropped to n = 2,463. We then removed data from 19 participants who dropped out of the experiment and data from 205 participants who, according to our criteria, were considered careless respondents, yielding a final sample of 2,239 valid observations. Of these, 611 participants self-identified as male, 1,576 as female, 7 as transgender male, 2 as transgender female, 27 did not identify with any choice and 16 preferred not to say (mean age (Mage) = 22.4, s.d.age = 10.1; range 17–87; 94.2% students), with an approximately even distribution across the five experimental conditions (nmindful walking = 416, nmindful breathing = 469, nloving kindness = 427, nbody scan = 449, nbook chapter control = 478). We are not aware of how many participants were invited to the survey but declined to participate.
Dealing with careless responders
We applied a set of rules to deal with responders39 who were careless or had made insufficient effort, to reduce the random variance component in the data. First, we made the answers for the questions connected to our exclusion criteria (meditation experience, English language proficiency and mental illnesses) compulsory. For the questionnaires related to our dependent variables/moderator, we alerted respondents about unanswered questions but they had the possibility to continue with the survey without providing a response. Second, the programmed survey prevented participants from skipping the 15 min audio file (for both mindfulness exercises and control conditions) by blocking the screen with the audio of the meditation/control condition for 14 min, so as not to allow participants to proceed to the following survey page until the meditation was finished. Third, we identified and excluded participants who provided identical responses to a long series of items (that is, always selecting the answer ‘strongly agree’) by performing a long-string analysis. Using long-string analyses, we excluded participants with a string of consistent responses equal to or greater than 10 (that is, half of the scale length).
Distribution of participants across sites
Thirty-seven sites participated in the data collection (see the full list at https://osf.io/uh3pk). Participants could be recruited through the SONA system (the platform used to recruit student participants from universities, https://www.sona-systems.com/) of the respective institution or via crowdsourcing platforms such as mTurk or Prolific. Participants could come from any geographic area if they met our inclusion criteria and could be given either credits or financial compensation in exchange for participating in the study.
Materials
Self-administered mindfulness interventions
To compile a list of self-administered mindfulness exercises to be tested in our multi-site project, we initially conducted a survey among mindfulness practitioners, whom we asked to recommend the most prominent and widely used exercises in their practice. We then retained the most popular exercises suggested by the surveyed practitioners, which we cross-referenced with the exercises included by Matko40 in an inventory of present popular mindfulness exercises. This combined approach led to the selection of four types of mindfulness exercises: body scan, mindful breathing, mindful walking and loving kindness meditation. The full procedure that led us to the selection of the four self-administered mindfulness exercises can be found in the extended preregistration document.
The four audio files of the mindfulness exercises and the three audio files of the stories of the non-mindful active control condition were recorded by the same certified meditation trainer, C. Spiessens, a BAMBA registered mindfulness teacher in MBSR (https://www.christophspiessens.com/) and each lasted 15 min. The exact text of the seven meditations and of the three stories used in the active control condition can be found on our OSF project page (https://osf.io/6w2zm/). The seven recordings can be found on the Soundcloud page of the project (https://soundcloud.com/listening-385769822).
Mindfulness conditions
In body scan, the meditation trainer invited participants to ‘scan’ their parts of the body. Every time the mind wandered, the meditation trainer invited participants to bring back the awareness and attention to the part of their body they were ‘scanning’. During mindful breathing, the meditation trainer invited participants to ‘stay with their breath’, without changing the way they were breathing. When their mind wandered, the meditation trainer invited participants to bring their attention back to their breath with kindness and patience. During the loving kindness meditation, the trainer encouraged participants to direct loving kindness toward themselves and then to extend these feelings of loving kindness towards somebody else. During mindful walking, the meditation trainer asked participants to walk in a quiet place (preferably indoors or in a place as isolated as possible from distractions), while listening to the instructions. During this practice, the meditation trainer invited participants to bring their awareness to the experience of walking and subsequently the meditation trainer invited them to ‘feel’ the physical sensations of contact of their feet with the ground.
Control conditions
Participants in the active control condition listened to an excerpt from ‘Silverview’ by John le Carré21 (word count 1,838), ‘The Old Man and the Sea’ by Ernest Hemingway41 (word count 2,039) or ‘Smith of Wootton Major’ by J. R. R. Tolkien22 (word count 2,309). We used more than one story excerpt to increase the variance of the control conditions and thus push towards greater generalizability across stimuli42. These three excerpts had a similar word count, were written in standard English, did not feature major plot changes and were thus unlikely to elicit strong emotions. Participants had equal chances of listening to any one of the three story excerpts.
Neuroticism
We measured this trait with the neuroticism subscale of the International Personality Item Pool five NEO domains, comprising 20 items43. Examples of items include ‘I often feel blue’ or ‘I am filled with doubts about things’ and answers ranged from 1 (very inaccurate) to 5 (very accurate; coefficient omega ωu = 0.90).
Stress
Participants answered the 20 item STAI Form Y-1 (ref. 19). They indicated how they felt in that exact moment on 20 items (for example, ‘I am tense’; ‘I feel frightened’; ωu = 0.92) on a 4-point scale (1, not at all; 2, somewhat; 3, moderately so; 4, very much so). By using the STAI Form Y-1 scale, we aimed to measure the short-term effects of stress on individuals. This scale, after all, has been shown to correlate with biomarkers of stress in previous research (salivary α-amylase44).
Emotion dimensions
Participants filled in the self-assessment manikin scale, a three-item non-verbal pictorial assessment technique which measures emotions on three different dimensions, namely pleasure, arousal and dominance45. The self-assessment manikin scale is the picture-oriented version of the widely used semantic differential scale46. This instrument measures the three-dimensional structure of stimuli, objects and situations with 18 bipolar adjective pairs which can be rated along a 9-point scale. This measure was not the primary dependent variable of our study but we added it in the study for the exploratory analyses.
Demographics
Participants provided information regarding their age, gender, country of birth, country of residence, whether they were students or not, which university they were studying at (for the former) and what was their current occupation (for the latter).
Simulation of the sequential Bayesian design
Before the data collection, we simulated data based on a Bayes factor design analysis to assess the expected efficiency and informativeness of the present design. The aim of the simulation was to establish (1) the expected likelihood of the study to provide compelling relative evidence either in favour of H0 (BF10 = 1/10) or H1 (BF10 = 10), (2) the likelihood of obtaining convincing but misleading evidence and (3) the likelihood that the study points into the correct direction even if stopped earlier due pragmatic constraints on sample size47.
Given these aims, we modelled a sequential design with a maximum n where the data collection continues until either the threshold for compelling evidence is met or the maximum n is reached. Although 41 laboratories indicated an interest in the project, we took the conservative estimate of 30 data-collecting laboratories. Each laboratory was expected to collect data of at least n = 70 participants, with a maximum n at 120 (translating to minimum 420 and maximum 20 participants per condition). Our goal was to be able to detect an effect size of d = 0.20; we modelled the true value to vary between laboratories by repeatedly (for each simulation) drawing from a normal distribution, δ ∼ n (0.20, 0.05), with a 95% probability that the effect size falls between d = 0.10 and 0.30.
We tested the effectiveness of four standalone interventions using a between-participants adaptive group design, whereupon hitting a threshold of compelling evidence in one condition, we planned to allocate the rest of the participants into other conditions where the threshold had not been met yet. The simulation, however, assumed a conservative scenario with equal n across all conditions, therefore, simplifying the computations to a single between-participants t-test scenario.
The results (Fig. 3) show that, given the assumed design, the probability of the test arriving at the boundary of compelling evidence (BF10 = 10 or 1/10) was 0.79 (0.72 at H1 and 0.07 erroneously at H0). The probability of terminating at a maximum n of 720 per condition was 0.21; 0.05 of showing some evidence for H1 (BF10 > 3), 0.13 of being inconclusive (3 > BF10 > 1/3) and 0.03 of showing evidence for H0 (BF10 < 1/3). For the test of a single condition against controls, the sequential design is expected to be 27% more effective than collecting a fixed maximum n per laboratory, with the average n at the stopping point (BF boundary and maximum n) at 0.526. Even conservatively assuming a balanced-n situation, the informativeness of the design thus appeared to be adequate and the use of the adaptive design would probably enhance informativeness and/or resource efficiency.
Procedure
Participants accessed the experiment via a Qualtrics link. We provided participants with detailed information about the study (see ‘Participants information sheet’ included in the IRB package, https://osf.io/6w2zm/) and asked for their consent to participate. We asked them to use a smartphone with headphones or earphones attached instead of a computer or laptop. We asked participants whether they started the survey from a device other than a smartphone; if they answered positively, we asked them to exit the survey and to restart it, this time using a smartphone with headphones or earphones attached. We then asked participants to sit in a quiet place such as a room where they would not be disturbed for 20 min. After providing informed consent, participants completed the neuroticism measure, then were randomly allocated by the Qualtrics algorithm to one of the four intervention conditions or one of the three control conditions, each lasting 15 min. On completion, participants answered the main study outcome, namely the stress measure and the self-assessment manikin scale. Finally, participants provided demographic information, were then thanked and debriefed and were awarded credit or payment depending on the site policy.
Analysis plan
To assess the effectiveness of the chosen mindfulness exercises against the control conditions at reducing stress in participants in an efficient manner, we carried out four independent-samples Bayesian t-tests to determine whether there was a difference between each mindfulness exercise and the active control condition. This study was originally conducted as a sequential Bayesian design48. The data were continuously monitored to see when each condition met the compelling evidence threshold of BF10 of 10 in favour of H1 or a BF10 of 1/10 in favour of H0. When we monitored the data, three out of four mindfulness exercises reached the BF10 threshold of 1/10 in favour of H0 before reaching the BF10 of 10 in favour of H1 as the sample increased. A detailed explanation of the sequential Bayesian design can be found in the extended preregistration document on the OSF page at https://osf.io/us5ae.
We used a two-tailed test using a non-informative Jeffreys–Zellner–Siow Cauchy prior for the alternative hypothesis with a default r-scale of √2/2 (ref. 49). To account for the hierarchical nature of the data, we compared the condition means using a Bayesian mixed-effects model which involved a random intercept for the site and for the different stories used in the non-mindful active control condition. We set our threshold of compelling evidence on the basis of which we would have drawn inferences about the results: a Bayes factor (BF10) of 10 in favour of H1 or a Bayes factor of 1/10 favoring H0. We chose a Bayes factor of 10 because, according to the classification of ref. 20, it demarcates the threshold between moderate and strong evidence. Here, using a Bayes factor of 10, we aimed to substantially decrease the probability of misleading evidence48. In the Bayesian analyses, we only engaged in comparative inference using Bayes factors (comparing the likelihood of the data under two competing hypotheses, H0 and H1) and for this reason we did not estimate posteriors. Finally, we decided not to screen for and exclude outliers and we did not perform any (nonlinear) transformations contingent on the observed data.
Exploratory analyses
We also carried out analyses exploring the effect of the experimental conditions on pleasure, arousal and dominance and for the moderating effect of neuroticism. We performed separate Bayesian t-tests for each dimension of the self-assessment manikin scale (pleasure, arousal and dominance) comparing our experimental conditions with the control condition. We then looked at the Bayes factor to establish whether the data favoured H1 or H0. We compared the means of the different conditions using a Bayesian mixed-effects model with a random intercept for laboratory and for the different stories used in the non-mindful active control condition to account for the hierarchical nature of the data.
To examine whether neuroticism moderated the effects of the four experimental conditions on stress, we compared the model with the interaction to the model with only the main effects (using the lmBF function) and we reported the corresponding BF10. If the model with the interaction was preferred to the model with only the main effects of a BF10 of 10 or more, we regarded it as solid evidence of the moderation of neuroticism on stress. We performed a similar analysis to investigate the potential moderation of English language proficiency on stress levels. The analyses for the current project were performed using RStudio v.2023.09.0 + 463.
Not preregistered analyses
Several analyses conducted in the ‘exploratory analyses’ section were not explicitly outlined in the preregistration. These additional analyses included the computation of heterogeneity and Cohen’s d for each condition when compared to the active control conditions and moderation effects by considering English language proficiency. Additionally, robustness analyses were incorporated at the reviewer’s request.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.