Introduction
‘Executive functions’ is an umbrella term for various cognitive processes including problem solving, planning, organisational skills and inhibitory control. In neuropsychology, information about patients’ planning ability, problem solving etc. can be obtained using multiple methods. These might include neuropsychological tests, observations, interviews, questionnaires, and rating scales. For the purposes of assessing executive functions, and particularly planning ability, Hunter and Sparrow [1] mention not only Tower of London tests, but also maze tasks, several variants of the categorisation tests (e.g. Wisconsin Card Sorting Test), copying designs in the Rey-Osterrieth Complex Figure (ROCF), and verbal fluency tasks.
The Tower of London (ToL) task was first proposed by Shallice [2] as an alternative to the Tower of Hanoi. Nowadays, the ToL is frequently used in clinical psychology settings for an assessment of executive functioning, namely planning ability [3, 4].
Specific clinical populations have been proven to have poor performance when it comes to the ToL task, especially patients with frontal lobe impairment [2, 4, 5]. The ToL is also being used in assessing patients with Parkinson’s Disease [6–8], Huntington’s Disease [9–11], schizophrenia [8, 12, 13], dementia or mild cognitive impairment [14–17], and autism spectrum disorder [18–20].
Over the years, various versions of the original ToL have been introduced [18, 21–23], using an inconsistent variety of problems, performance measures and methodologies [3].
For example, a standardised version of the ToL test has been developed for use as a part of the Delis–Kaplan Executive Function System [24]. The Tower test in D-KEFS is longer than the original ToL, but covers a wider spectrum of performance, from severe deficits to superior performance. In the D-KEFS Tower Test, there are nine different towers to be completed with various levels of difficulty. Testing begins with simple towers requiring only 1–3 moves, and gradually becomes more difficult with towers requiring up to 26 moves [25].
Another example is the Tower of Toronto, designed specifically for patients with striatal disorders affecting procedural learning [26].
The original Shallice version implements a block base with three rods of different lengths, and three balls of different colours (red, green and blue). The three balls are placed on the rods in a starting position and for each problem the balls have to be moved from the starting position into the target position while respecting a set of rules and a maximum number of moves [23].
A standardised Czech version of ToL was however lacking. In recent years, Michalec et al. [27] published a standardised Czech version of Shallice’s original ToL and provided preliminary normative data for healthy elderly people. Later, Michalec et al. [8] published definitive normative data for the whole Czech population, and compared various scoring systems suitable for the original Shallice version of ToL. The subsequent release of the Czech manual and props for standardised administration of ToL Shallice version [28] made the method more available in the Czech Republic.
Raizner et al. [29] closely examined an alleged ‘ceiling effect’ of ToL when testing adolescents and young adults. According to this study, ToL consistently shows satisfactory sensitivity to the development of planning abilities in children; however, sensitivity to the development of planning ability in healthy adolescents and young adults is inconsistent. The authors of the study further explored whether a ceiling effect was to blame, and they introduced an extended version of ToL (ToL-E). The ToL-E included an additional eight problems with same rules and props as the original Shallice ToL, but they required 6, 7 or 8 moves to complete, as opposed to the maximum 5 moves in the hardest task of the original ToL.
Bearing in mind this extended version of the ToL, we considered it important to introduce a shortened version of the method, especially for an adult clinical population. In differentiating between various levels of executive functioning in young adults, we might find using more tasks useful [29], but in screening for severely cognitively impaired adults in a clinical environment, we felt this might be better accomplished in fewer tasks.
To test this theory, we introduce our current study of the brief screening version of ToL using only three chosen ToL tasks instead of the usual number of 12.
Test shortening is a well observed trend in clinical psychology, most often in order to decrease testing time and to ensure patient cooperation [30]. However, several challenges arise when shortening an established psychological method, as we go against the ‘classical psychometric assumption’ in which many items are essential to obtain valid and reliable measures [31, 32]. We chose to deal with this issue by choosing a statistics-driven strategy, which we explain below. This research was approved by the ethics committee of the First Faculty of Medicine, Charles University, Prague, Czech Republic and the authors did not have any financial interest in the research. The creation of this article was supported by the PROGRES programme of Charles University in Prague (Progres = C4 = 8D.Q06 / LF1 = 20).
Study 1
The aim of Study 1 was to select for further psychometric evaluation (Study 2) the optimal short version of ToL from all 4,095 possible combinations. Primarily, we wanted to achieve a discriminative validity of the chosen combinations as close to the discriminative validity of all the 12 original items combined.
Methods
Procedure
Both groups, i.e., patients with mild cognitive impairment (MCI) in Parkinson’s Disease (PD) and the healthy control group, were administered the original 12-item Shallice version of ToL. Their performance was evaluated based on the Shallice original scoring system: 3 points for successful task completion within 15 seconds, 2 points for completion within 30 seconds, 1 point for completion within 60 seconds, and 0 points for completion in more than 60 seconds. The number of trials was irrelevant, but the time in each trial was added to the total time sum. For a more detailed explanation of the scoring system, see Michalec et al. [8].
The PD-MCI group was also administered a complex neuropsychological test battery according to level II criteria for PD-MCI as established by Litvan et al. [33]. The control group was administered DRS-2 (Dementia Rating Scale 2) [34, 35] as a screening for possible cognitive impairment.
Participants
As a suitable model of executive functions deficit (including planning ability), we chose to study a sample of patients with mild cognitive impairment (MCI) in Parkinson’s Disease (PD). Participants were recruited at the Department of Neurology at the First Faculty of Medicine and General University Hospital in Prague, Czech Republic. The total sample of n = 46 patients was 80% male, with average years of education of 13.78 (± 2.66) and an average age of 59.52 (± 7.29). The diagnosis of PD-MCI in our sample was established based on a complex neuropsychological test battery in accordance with contemporary diagnostic criteria on level II [33].
ToL 01 |
ToL 02 |
ToL 03 |
ToL 04 |
ToL 05 |
ToL 06 |
ToL 07 |
ToL 08 |
ToL 09 |
ToL 10 |
ToL 11 |
ToL 12 |
|
ToL-four items |
x |
x |
x |
x |
||||||||
ToL-five items |
x |
x |
x |
x |
x |
|||||||
ToL-six items |
x |
x |
x |
x |
x |
x |
The model of intact planning ability was represented by a sample of n = 225 healthy volunteers (control group) without any disease with a possible impact on their cognitive functioning. Also, all the volunteers in our sample had to fulfil test criteria of a DRS-2 (Dementia Rating Scale 2) score > 141 [30, 31]. Fifty per cent of this sample were male, the average years of education was 14.16 (± 2.95), and average age was 54.57 (± 13.28).
Statistical analysis
Receiver operating characteristic (ROC) analysis was carried out, and area under curve (AUC) was determined for all 4,095 combinations.
Results
206 of 4,095 possible combinations had a similar AUC to that of all 12 items combined (AUC = 0.753). However, none of the combinations had significantly better clinical discriminative validity. AUC ranged from 0.753 to 0.769 (see Appx). Out of these 206 combinations of items, only four were four-item combinations. Out of these four-item combinations, all consisted of four and five moves items. None of them included less than four moves items. Taking this into account, for further evaluation we chose only combinations of items with four moves or more, which rather narrowed our selection. The chosen items were items 6-8-9-10 for a four-item version, items 6-8-9-10-11 for a five-item version, and items 6-8-9-10-11-12 for a six-item version. This particular selection enabled parallel administration of all three versions chosen for the purpose of their psychometric evaluation and comparison (Study 2), i.e., to. administer once a six-item version and only to subtract items for psychometric evaluation of a given short version with a minimal risk of influencing the results via a learning effect or via the order of items.
Study 2
The aim of Study 2 was to obtain a research sample for assessing basic psychometric properties of the shortened version of ToL.
Methods
Procedure
Based on the results of Study 1, from 206 possible combinations, three versions of shortened ToL were chosen: one with four items, one with five items, and one with six items. Items included in a given version of shortened ToL are set out in Table 1. Schizophrenia patients and controls were administered the ToL-six-item version. Shorter versions were subtracted from the ToL-six-item version. There was no possible interference by different learning curves etc. The scoring system used was the same as in the original version.
Participants
The research sample consisted of n = 30 patients with schizophrenia from the Department of Psychiatry of the First Faculty of Medicine and General University Hospital in Prague. The sample was 77% male, with average years of education of 13.36 (± 3.12) and average age of 31.37 (± 6.27). All patients were in remission at the time of assessment.
The control group consisted of n = 31 socio-demographically paired healthy volunteers without any history of a disease with a possible effect on their cognitive functioning, and who also fulfilled the test criteria of DRS-2 score > 141 [30, 31]. The sample was 74% male, with average years of education of 15.53 (± 3.37) and average age of 33.42 (± 3.81).
Participants were excluded from the study if they were found to have a history of traumatic injury of the central nervous system, a neurological disorder, a premorbid intellectual disability, or any substance dependence (with the exception of nicotine).
Statistical analysis
Receiver operating characteristic (ROC) analysis was carried out for each short ToL version. Area under curve (AUC) was computed and also values of sensitivity and specificity for given cut-off scores. Descriptive statistics for raw ToL scores were also computed.
Results
Table 2 shows descriptive statistics of ToL raw scores of each short ToL version.
CG |
SCH |
|||
M ± SD |
(MIN-MAX) |
M ± SD |
(MIN-MAX) |
|
ToL-four items |
8.81 ± 1.89 |
(5–12) |
5.6 ± 2.67 |
(0–10) |
ToL-five items |
10.74 ± 2.44 |
(6–15) |
7.07 ± 3.39 |
(0–13) |
ToL-six items |
12.03 ± 2.74 |
(7–17) |
8.1 ± 3.91 |
(0–15) |
From the results listed in Table 3, we can deduce that the discriminative validity (meaning AUC values) of individual shortened versions of ToL was equal and clinically valuable. The reliability of shortened versions of ToL was analyzed by using internal consistency (Cronbach’s alpha) separately for both the schizophrenia and control groups. The results yielded insufficient Cronbach’s alpha values for the control group, yet conversely satisfactory results when it came to the schizophrenia group. The ToL-four-item version had slightly lower internal consistency than the ToL-five-item and ToL- -six-item versions.
ToL 06 |
ToL 08 |
ToL 09 |
ToL 10 |
ToL 11 |
ToL 12 |
AUC |
95% CI AUC |
Cronbach alpha |
||
NC |
SCH |
|||||||||
ToL-four items |
x |
x |
x |
x |
0.833 |
0.732–0.935 |
0.22 |
0.64 |
||
ToL-five items |
x |
x |
x |
x |
x |
0.804 |
0.695–0.914 |
0.40 |
0.73 |
|
ToL-six items |
x |
x |
x |
x |
x |
x |
0.788 |
0.675–0.900 |
0.45 |
0.76 |
Table 4 sets out the values of sensitivity and specificity for given cut-off scores for each of the ToL short versions, showing in more detail their discriminative validity.
ToL-four items |
ToL-five items |
ToL-six items |
||||||
Cut-off |
SE |
SP |
Cut-off |
SE |
SP |
Cut-off |
SE |
SP |
4/5 |
33 |
100 |
4/5 |
20 |
100 |
4/5 |
17 |
100 |
5/6 |
53 |
97 |
5/6 |
30 |
100 |
5/6 |
23 |
100 |
6/7 |
63 |
90 |
6/7 |
47 |
94 |
6/7 |
43 |
100 |
7/8 |
77 |
71 |
7/8 |
53 |
90 |
7/8 |
47 |
94 |
8/9 |
83 |
55 |
8/9 |
57 |
74 |
8/9 |
50 |
90 |
9/10 |
90 |
39 |
9/10 |
80 |
71 |
9/10 |
57 |
77 |
10/11 |
100 |
19 |
10/11 |
87 |
61 |
10/11 |
67 |
71 |
11/12 |
100 |
10 |
11/12 |
90 |
45 |
11/12 |
77 |
58 |
12/13 |
93 |
23 |
12/13 |
90 |
48 |
Discussion
The main aim of this study was to examine a shorter version of ToL for the clinical purposes of assessing executive functioning (planning ability) of psychiatric patients. As already mentioned, there has been a lack of similar efforts to establish shortened versions of ToL in the scientific community. In our clinical experience, the original Shallice ToL version [2] can prove difficult and time consuming when administered to patients with severe psychiatric or neurological diseases.
Our study shows that shortening ToL is possible. In our sample, we used four-, five- and six-item versions. The shortened versions yielded a satisfactory value of internal consistency when it came to the schizophrenia sample, but interestingly not when it came to the control group. Interpreting these findings, we hypothesise that the use of a shortened version is indeed appropriate only in patient samples with expected problems in the executive area. However, for assessing executive functions in healthy individuals, the use of short versions of ToL is not recommended.
The implications of test shortening can be several. The original psychometric properties of the method can be altered and need to be established again for the shortened version. As already mentioned, shortening of neuropsychological methods goes against the ‘traditional’ psychometric idea of many items being essential for obtaining valid and reliable measures [31, 32]. That is why all endeavours towards establishing such shortened versions must be statistically-driven and thoroughly evaluated.
There are several limitations to our study. The samples used were rather small in size and not socio-demographically well balanced when it came to clinical groups. However, this is often an inevitable issue with patient groups with a specific diagnosis — each diagnostic group has its own specific socio-demographic characteristics, which is then naturally reflected in the patient sample. Furthermore, our study participants were mostly chosen opportunistically and based on certain criteria, rather than via random sampling. Also, the psychometric properties were derived from a sample of patients with schizophrenia. The study does not provide comparisons with other clinical groups, for example neurological patients. Using the ToL short version may be more challenging for neurological patients who have initiation problems, i.e., find it hard to start performing new tasks, and for patients with procedural memory impairment. We suggest that these issues and other related questions concerning the appropriateness of this test for diverse neurological populations should be the subject of further research.
Conclusions
Our main conclusion, as well as recommendation, is that using the ToL short version can possibly prove sufficient in certain clinical samples, and we encourage further research efforts in that area.
Regarding the equal discriminative validity of all three ToL short versions, and the fact that the ToL-four item version had a slightly lower internal consistency in the patient group, we recommend the ToL-five item version for clinical use. We recommend the ToL-five rather than the ToL-six simply because it is shorter. Cut-off scores for preliminary clinical use were presented, although this cannot replace standardisation including normative data.
Conflict of interest: None.
Funding: None.