A pilot study: a computer game-based assessment of visual perspective taking of four children with autism with high support needs

High support need and minimally verbal individuals with autism have received less attention in research in comparison to so-called higher functioning individuals with autism. As computers motivate individuals with autism, a game with a positive user experience was altered into a level 1 perspective-taking task in which advancement was contingent on eye contact. A case-controls design was used to see whether participants had impaired perspective taking and would they benefit from additional cues. Four high support need and minimally verbal children with autism played the game in their school environment. It was found that only one child with autism made more errors than controls using the eye cues. No child with autism benefitted from the additional cues, whereas the control group did. The results suggest that positive contexts may reveal more about individuals and their skills, and that individual level analysis can provide insights about autism and about the individuals with autism.


Introduction
One of the defining features of autism spectrum disorder (ASD) is the deficit in social interaction (American Psychiatric Association 2013; Wing and Gould 1979). To function in the social world, one needs to be able to take other people's perspective into account (Flavell 1977). Perspective taking helps us to infer and predict the actions, desires, and beliefs of other people (e.g. Conson et al. 2015;David et al. 2010;LeBlanc et al. 2003;Pearson, Ropar, and de C Hamilton 2013;Schilbach et al. 2012). Perspective taking has visually two levels. Level 1 refers to being able to know if the other person sees an object; for example, line of sight and obstructions on the line of sight. These can be passed by typically developing two year olds. Level 2 refers to the ability to understand that the other person sees the object differently, dependent on the point of view, and even adults have been shown to have difficulties with Level 2 tasks in naturalistic contexts Tomasello 2004, 2006;Pearson, Ropar, and de C Hamilton 2013).
Impairments in establishing and maintaining eye contact are part of the diagnostic criteria of autism (APA 2013), and perspective taking is dependent on the ability to follow the other person's gaze and line of sight (Conson et al. 2015;Pearson, Ropar and de C Hamilton 2013;Warreyn et al. 2005). Although children with ASD have problems executing this skill (e.g. Hamilton, Brindley, and Frith 2009;Warreyn et al. 2005), mixed results have recently been found (Pearson, Ropar, and de C Hamilton 2013). Similarly, discrepancies have also been found in eye-tracking research looking at with the arrow cues there would be no differences between the individuals with ASD and the controls in the error rates.

Participants
Children with autism A convenience sampling method was used. Four pupils from a school for individuals with special needswhich used an adjusted syllabus due to pupils' academic performancetook part in the study. The study took place in their own school in a familiar setting. All children were previously diagnosed with ASD (based on ICD-9 criteria) and were assessed as high support need (e.g. Strnadová, Cumming, and Marquez 2014) and minimally verbal (e.g. Tager-Flusberg and Kasari 2013) by the school services (medical doctor and speech therapist). A teacher-rated Autism Spectrum Screening Questionnaire was used (ASSQ: Mattila et al. 2012): the teacher rated ASSQ scores were all above the cut-off score of ≥ 22, with sensitivity/specificity 0.73/0.74 for clinical populations (ASSQ scores for the four participants: 23,36,41,30). The participants were all male, and their age levels were equivalent to those in Finnish primary and secondary school (ages in years: 9, 12, 14, and 11). We could not collect standardized test results (language or cognition) as tests were stopped due to the children's systematic task-irrelevant behaviour, for example, by inventing their own play action, which was unrelated to the task. More subjectively, by the teachers and researchers, these children can be characterized by having very limited use of verbal language, mainly using single words, expressing echolalic speech, and most often communicate non-verbally.
At the time of the study Aaron was a 9-year-old boy, with an ASSQ score of 23, who has developmental delays and hence has extended schooling planned. He was still learning how to dress himself and needed aid using a toilet. Aaron had sensory sensitivities which made, for example, cutting hair or doing physical examinations difficult. He appeared happy in everyday life but had difficulties in concentrating on tasks, and if irritated he may have scratched or headbutted the person next to him. Aaron had good gross motoric skills but needed aid and training in fine motor skills, for example, using a pencil/pen, and in general was often restless in his motor actions. He also needed aid in outdoor activities, in public spaces he needed careful supervision, and inside he needed aid and guidance in eating. Aaron understood clear context-related instructions but had trouble with comprehending more abstract concepts. He was able to name individual everyday items but could not use plurals, and he was not able to produce L, K, R sounds, and J and N sounds could only be used as individual sounds. Aaron was able to produce some sentences by combining two words; however, the intelligibility was often inadequate and he felt irritated when asked for a clarification. As an aid he used a picture communication folder for communication with several pictures to form a sentence to ask for something.
Billy was a 12-year-old boy, with an ASSQ score of 36, who had developmental delays. Billy was a child that could not be left alone without supervision, and he needed aid in using a toilet, washing up, and in brushing his teeth. Although Billy was able to do puzzles up to 25 pieces in size, he was still training to use pens and pencils but was able to use scissors to cut paper into triangular shapes. He had a tendency to get easily frustrated if there was no planned activity. Billy was able to understand clear short instructions but he communicated with pictures and supportive sign language. The school found that Billy's day was best organized by using a pictorial calendar.
Clark was a 14-year-old boy, with an ASSQ score of 41, who had developmental delays and therefore had extended schooling planned. Clark took very little contact with others and got distracted easily and fell to his own thoughts, but with verbal guidance he was easily brought back to the task. Sometimes Clark might have grabbed hair or pinched the person next to him from seconds to up to minutes without a specific reason and when disappointed the time was often longer. His motor skills were repetitive and his fine motor abilities needed training; using a pencil or scissors was difficult, but luckily he liked physiotherapy in which he needed verbal and manual aid and guidance. Clark's activity level was very varied; sometimes he needed constant guidance but often tasks were done without any aid. Clark used words to communicate, and did not use signs or pictures, and moreover had frequent echolalic speech and often recited sentences from cartoons. Although limited in language use he was able to write his own name, recalled most numbers, and was able to name geometric shapes (square, circle, house, heart).
Derek was a 11-year-old boy with an ASSQ score of 30.
Derek was almost always a cheerful child who had made progress in play and did not only do certain play activities, and was also more willing to be guided by an adult. When he was stuck, giving time and showing pictures helped him to move on. He was eager to play but only for a short while, and he needed adult supervision and guidance to plan and execute activities. In motor play activities Derek was hesitant and his fine motor skills, like holding a pen, were still in the process of learning. For Derek big social events at the school were a challenge but those could be addressed by encouragement and pictorial planning of the events.
Derek communicated with words, gestures, and pointing. However, he used the same phrases frequently, with one two three words, and was able to name colours, numbers, and play-related items. He was able to ask for help by using words such as help or give and additionally used a picture communication file to communicate with adults.
All children participated on the basis of their own, parental, and school consent and the research was approved by the Research Ethics Committee of the University of [name of university].

Typically developing children
A convenience sampling method was used. Finnish universities have teacher training schools that are designed to work in collaboration with researchers. We involved all consenting and typically developing second grade primary school children from the university training school in our study, gaining their own consent as well as that of their parents and the schools. The second grade was selected to ensure that the youngest participants were age-matched to the youngest individual with ASD: the mental and language age in the control group was therefore on the same or higher level as for the youngest child with ASD. This was done since children with ASD had too much task-irrelevant behaviour during testing. The school reported that the participating children had no medical, psychological, and neurological diagnoses or other learning disabilities or difficulties. To exclude potential individuals with ASD, a teacher-rated ASSQ was used: for a whole population sample the sensitivity/specificity was 1.00/0.94 and the cut-off ≤ 7 (ASSQ: Mattila et al. 2012). The ASSQ scores in the TD group were all below the cut-off: all scores were < 3. Altogether, 17 typically developing children, between the ages of 8 and 9, participated in the study (9 males and 8 females). The study took place in their own school in a familiar setting. The researchers opted to use statistics appropriate for single-case studies. The sample size used for the control group in case-control studies can be modest: the sample size for the 98 single-case studies that directly compared a single case to controls was 11.69, SD = 10.66 (Crawford, Garthwaite, and Porter 2010)

Material and measures
In order to project the least amount of discomfort for the children, the game playing was designed on the basis of existing activities at the participants' schools. We chose a game with a positive user experience (Mäkelä, Bednarik, and Tukiainen 2013).
The perspective-taking game ran on Visual Studio ® software on a PC computer using the Microsoft Windows ® operating system with a Kinect sensor, Microsoft Xbox 360 ® (version 1.8). The Kinect sensor's operating range is from 0.8 to 4.0 m, a 640 × 480 resolution, with 30 frames per second rate. The game was played on a white screen with a VGA connection to a projector/smartboard (Xbox Kinect ® uses body movement in its games. See Ilg et al. 2012;Munson and Pasqual 2012).
The Kinect sensor was placed in front of the player below the white screen (Figure 1(a). for an example of game playing on Kinect and Figure 1(b). for the layout of the game playing). No physical contact with the screen was needed. The player saw a silhouette of him-/herself and used his/her hand's silhouette to select and catch items on the screen by placing either hand on top of the item. The software was programmed to only allow hands for selection. The distance to the screen was altered by the player by moving in the room; hence, the visual angle is not constant. The size of the screen was 2.6 (width) × 2.01 (height) in metres (m); the image projected was 2.1 m × 1.54 m. A cartoon character and images were used to maintain the game-like feature and since cartoons have been found to elicit similar gaze behaviour towards real images in individuals with ASD (Riby and Hancock 2009). The character's height was 97 cm with eyes being 20 cm × 13.2 cm. We also kept the sclera of the virtual character white and the pupil colour dark so as not to reverse eye viewing behaviour (Frischen 2007).
In the game, the player first chose an object of their preference, for example, a bird, a bee, a plane, etc. by placing either hand on top of the item. Then s/he needed to know the direction in which the virtual character was looking (there were three boxes on the screen: up, down, or middle) and open the box in that location with the help of eye gaze cues or with eye gaze and arrow cues ( Figure 2; and for the order of the events see Figure 3). If the participants tried to open the incorrect box, it would not open; it would shake for a moment and made a sound to invite players to try again. There were three attempts before the new cue would appear. Once they chose the correct box, the participants needed to catch the flying object, which would emerge from the box.
We were interested in the two different kinds of trials to see if attending to eyes is impaired and whether the additional arrow cues would lessen the amount of errors made in the game: (1) there was only the eye gaze cue indicating which box to choose (hereafter eye cue) and (2) there were the eye gaze cue and an additional dotted arrow cue at the same time to make the task easier (hereafter double cue).
The data collection started with practise trials for both typically developing participants and for individuals with ASD. This was done because we did not know whether the target behaviour was part of their repertoire, and task failure could have evoked negative feelings in the children with ASD and result in refusal to play and participate in similar activities in the future. The practise trials had two eye cue trials and five double cue trials. The practise measurements involved only two attempts on the eye cue condition to avoid multiple failures as guided by Morgan and Morgan (2009). Similarly, the trial numbers were kept low (in the practise and real trials) due to keeping the playing time short and due to the pilot nature of the study. At the practise trials, the eye cue trials came before the double cue trials to ascertain whether the children were able to play the game when only eye gaze cues were given: two eye cue trials (length of the dotted arrow: 0-0). There was only one attempt for each eye cue trial. After the two eye cue trials, five prompted trials using the fading procedure (length of the dotted arrow: 5-4-3-2-1) with three attempts were used to aid the player with understanding the game and to impose a feeling of control. The typically developing children had one practise trial after which they were able to understand the game (with reference to their own comments).
After the practise trials, in the two playing sessions that were analysed, there were six double cue trials and three eye cue trials in order to have more positive than negative trials. In the double cue trials the game used a fading procedure in which each cue had shorter arrow cue and eventually no arrow cue. The order was according to the fading procedure: the length of the arrow: 5-4-3-2-1-0-0-0. All trials allowed three attempts before proceeding to the next trial. The game performance  (1) the player chooses an object, which he would like to look for; (2) the player locates the correct box using the eye gaze or arrow cues; (3) the player opens the correct box; (4) the player catches the object that emerges from the box. After the final trial with either a gaze cue or the double cues, the participant can choose a different object to play the game again. data, for the eye cue and the double cue, were collected onto log files: choosing the correct or incorrect location after the cue and the cue type.

Design
A case-controls method was used; that is, a method in neuropsychology in which individual's results can be statistically compared to that of a control sample's. A statistical programme (DISSOCS_ES.EXE) was applied to test whether participants' scores on eye cue and double cue trials are significantly lower than those of a control sample, and whether the scores in the eye cue and double cue trials differ from one another (Crawford, Garthwaite, and Porter 2010). The dependent measure is the errors made in the game, in the eye cue and double cue trials.

Procedure
When the children arrived in the game playing room, at their own schools, they were welcomed and then given instructions that they first needed to choose a preferred item on the screen and they were then told to locate the hiding place of that item. They were then told that the man on the screen would help them find the correct box. Once finished playing the game they were thanked for playing.

Results
The errors made in the game were counted and reported as percentages. Errors were divided into two categories based on the cue type: (1) eye cue; (2)  Relative error rates: eye cue and double cue Children with ASD: in the practise eye cue condition the error rate was 43.8% and 27.9% in the double cue condition. In the 1st trial in the eye cue condition the error rate was 32.5% and in the 2nd 42.9%. In the first double cue condition the error rate was 37.4% and in the second 43%.
Typically developing children: in the practise eye cue condition the error rate was 68.8% and 14.4% in the double cue condition. In the 1st trial in the eye cue condition the error rate was 19.3% and in the second 13.8%. In the first double cue condition the error rate was 14.4% and in the second 5.7%. Wilcoxon signed-rank test indicated that typically developing participants made more errors overall in the eye cue trials than in the double cue trials (Z = −2,415, p = .016).

Comparisons of individual participants to controls
For case-controls analyses the means of the first two play session's relative error rates were used. We reversed the relative values to avoid zero values in the data set (e.g. an error value of 0 = a score of 100, and an error value of 20 = a score of 80). In the eye cue condition the error rates were: Aaron, 60; Billy, 60; Carl, 41.7 Derek, 87.5. In the double cue condition the error rates were: Aaron, 58.4; Billy, 56.1; Carl, 58.2; Derek, 66.6. The control group's (N = 17) mean score on the eye cue condition was 85.2 (SD 14.5) and for the double cue condition the mean was 94.7 (SD 4.6).
The DISSOCS_ES.EXE programme was used and the results are presented in Table 1. (see Section 3.3. for task comparison for each individual) as suggested by Crawford, Garthwaite, and Porter (2010). All participants were significantly poorer in the double cue trials compared to controls. Apart from Carl the participants did not differ from controls in the eye cue condition. Note: Crawford and Howell's (1998) test for a deficit in task X and Y.

Dissociations between eye cue and double cue
The dissociation of each participant's error rates in the two different trials were measured (Crawford, Garthwaite, and Porter 2010): Aaron: Eye cue condition z score = −1.738 and double cue z score = −7.891. The RSDT (Crawford and Garthwaite 2005) to test the difference between Aaron's eye cue and double cue scores: two tailed p-value for t of 4.664, df 16, is 0.000, meaning that eye cue and double cue performances are different. An estimated 0.01% of control population would show more extreme difference in performance. The effect size for the difference between case and controls = 5.079 (Bayesian Credible Interval: 95% CI 2.715−7.894). The difference between Aaron's standardized scores is statistically significant (RSDT). Overall, Aaron's results fulfils the criteria for a dissociation. Errors in the double cue condition were higher for this participant than in the eye cue condition.
Billy: Eye cue condition z score = −1.738 and double cue z score = −8.391. The RSDT (Crawford and Garthwaite 2005) to test the difference between Billy's eye cue and double cue scores: two tailed p-value for t of 5.031, df 16, is 0.000, meaning that eye cue and double cue performances are different. An estimated 0.01% of control population would show a more extreme difference in performance. The effect size for the difference between case and controls = 5.491 (Bayesian Credible Interval: 95% CI 2.980−8.487). The difference between Billy's standardized scores is statistically significant (RSDT). Overall, Billy's results fulfil the criteria for a dissociation. Errors in the double cue condition were higher for this participant than in the eye cue condition.
Carl: Eye cue condition z score = −3.000 and double cue z score = −7.935. The RSDT (Crawford and Garthwaite 2005) to test the difference between Carl's eye cue and double cue scores: two tailed p-value for t of 3.761, df 16, is 0.002, meaning that eye cue and double cue performances are different. An estimated 0.1% of control population would show more extreme differences in performance. The effect size for the difference between case and controls = 4.073 (Bayesian Credible Interval: 95% CI 1.707−6.853). The difference between Carl's standardized scores is statistically significant (RSDT). Overall, Carl's results fulfil the criteria for a strong dissociation. Errors in the double cue condition were higher for this participant than in the eye cue condition.
Derek: Eye cue condition z score = 0.159 and double cue z score = −6.087. The RSDT (Crawford and Garthwaite 2005) to test the difference between Derek's eye cue and double cue scores: two tailed p-value for t of 4.732, df 16, is 0.0002, meaning that eye cue and double cue performances are different. An estimated 0.01% of control population would show more extreme differences in performance. The effect size for the difference between case and controls = 5.155 (Bayesian Credible Interval: 95% CI 3.175−7.499). The difference between Derek's standardized scores is statistically significant (RSDT). Overall, Derek's results fulfil the criteria for a dissociation. Errors in the double cue condition were higher for this participant than in the eye cue condition.

Discussion
The objective of this study was to pilot a potential method for skill assessment. We wanted to explore if a computer game with a positive user experience could be used in a school environment with high support need and minimally verbal children with ASD to assess whether these particular children had potential perspective-taking skills, can they use eye gaze information to play the game, and do they benefit from additional cues. We concentrated on individual-level analyses in order to detect possible variations between individuals' performances, which have been suggested to influence attention research and could account for the discrepancies in previously found results. We hypothesized that the individuals with ASD were able to play the game, but made more errors in comparison to typically developing children without the additional arrow cues. We also expected that with the arrow cues there would be no differences between the individuals and the controls in the error rates.
Since no participant tried to choose a box before the eye cues or arrow cues appeared, and since the error rates were better than chance level, it can be assumed that the decisions made in the game were based on the cues, which confirms the first hypothesis that children with ASD are able to play the game.
The second hypothesis was not confirmed. Children with ASD did not make more errors without the additional error cues in comparison to the controls. The case-control analysis showed that in comparison to the controls only one child made more errors in the eye cue trials (Clark).
The third hypothesis was also not confirmed: all children with ASD made more errors with the additional arrow cues compared to the control group. There was a dissociation between the two conditions for each child meaning that the arrow cues made the performance worse. Hence, it seems that the children did not benefit from the additional arrow cues as we expected and as was found in the Gould et al. (2011) study. Overall, using eye cues for perspective taking seemed to be intact in three out of the four children with ASD and additional arrow cues only made them make more errors.
Comparing our results to those of Gould et al. (2011), it is possible that the children in the Gould et al. (2011) study found the task difficult without feedback. They may have had the skill before the game but did not know how to play the game. Therefore, development in the skill in their study was evident. Possibly for the same reason they found the arrow cues helpful. In our game, the feedback was visually presented as the box opened if correct and shook if incorrect; hence, the purpose of the game was known from the very beginning. A possible explanation why participants showed a difference to controls in the double cue condition with the additional arrow cues is the complexity of the scene. Stimuli that are too numerous or too complex for children with ASD was suggested by Guillon et al. (2014) to be one reason for seeing differing results in attention research. However, since in our study children with ASD performed better with the eye cues they may have only been distracted by the arrow cues that would relate to visual complexity. Similarly, the control group made fewer errors using the additional arrow cues as they may not have been distracted by them. There is also variation between the practise, first, and second trials which may reflect the variation within the individuals over time, and should be further explored, even though the error rates did stay below chance level. For now the numbers are too small to analyse. Overall, our results show that perspective taking was not an impossible task for these children contrary to what, for example, Falck-Ytter et al. (2012) and Riby et al. (2013) have found. It is, however, according to what Pearson, Ropar, and de C Hamilton (2013) discoveredsome studies found intact perspective taking in individuals with ASD. Therefore, after our current study, it would be interesting to reanalyse the data from previous groups' studies on an individual level using the case-controls method.
We suggest that we could learn more about children with ASD and how to organize their learning by using contexts with which they feel at ease, as suggested also in other studies on social and academic skills (e.g. Baker 2000;Boyd et al. 2007;Charlop, Kurtz, and Casey 1990;Jacobsen 2000;Kryzak and Jones 2014;Mancil and Pearl 2008;McGonigle-Chalmers et al. 2013;Naoi et al. 2008;Vismara and Lyons 2007). In addition, computer games and tasks could be used as a starting point for learning and assessing children's capabilities as they have been found to be a viable source for education and entertainment (Grynszpan et al. 2013;Ilg et al. 2012;Munson and Pasqual 2012;Wass and Porayska-Pomsta 2013). Once the games and tasks have been found to be interesting by the children, the games could be revised for skill assessment. Furthermore, since research also indicates a lowered intellectual performance when presenting tasks to individuals who already believe they will perform poorly and who, as a result, inadvertently behave in a stereotypical manner (e.g. Croize et al. 2004;Spencer, Logel, and Davies 2015), we believe the steps we took for testing seem more than appropriate. In our study the lack of interest in the standardized tests and willingness to play the game can be taken to indicate that interesting games can be seen as an alternative assessment method. In fact, a review of joint attention, an ability closely related to perspective taking, found that no study reported taking children's interests into account in testing joint attention abilities, a step that could have made the children feel more at ease (Korhonen, Räty, and Kärnä 2014). Hence, the review also supports the importance of context in the research of individuals with ASD.
Our research has some limitations. The design did not control the influence of the teachers or assistants, as they did not remain constant for each child. However, this increases the ecological validity of our study as it represents the real-life situation of all school environments. Furthermore, the teachers very rarely advised or guided the children. The time taken to play the game was not analysed as there was too much intra-and inter-individual variation in each session. For example, at one point the player might be standing still or concentrating on their own interests even when the game was on. This makes the interpretation of the temporal duration data impossible. Also, for our purposes, time was not essential, as we recognize that these individuals may have different ways of performing in a testing situation and we do not expect a steady playing rate. The size of the eyes was exaggerated due to the cartoon nature of the character and since we wanted to have the possibility to use the game for mobile eye-tracking purposes, a larger eye size was required. This may have attracted the participants more to the eye area. However, they were able to use the eyes as cues, which is what we wanted to examine, and if the cue size or attractability would enhance performance we should have seen the arrow cues to enhance performance as well.
Further limitations in this study were the gender ratio that was skewed, as all children with ASD were boys. On the other hand the gender ratio in ASD is skewed towards boys (Fombonne 2009;Jensen, Steinhausen, and Lauritsen 2014;Rutherford et al. 2016). We also did not dwell on the technology (Kinect) per se, as our contention was not to change it as it was found to work in a previous study (Mäkelä, Bednarik, and Tukiainen 2013). This might be an element why the children fared well in this particular task but that is what is expected if the platform is enjoyable. Overall, and also in the case of perspective taking, more research is needed on task engagement of the children. Future studies should also explore systematically the subjective experience and feelings when having to take eye contact and using information from the eyes, as in our design this particular aspect was not addressed.
On the other hand, the limitations of this study have positive aspects: We were observing individuals who are often not studied, and we designed a task for individuals who are otherwise difficult to assess and educate. In fact, these children would most likely be excluded from most research as their matching becomes problematic due to their lack of interest in testing. Moreover, our research setting has a good ecological validity and the method's applicability to real-life school or rehabilitation centre settings was tested as we collected the data in school environments.

Conclusions
The results indicate that applying small changes to a computer game with a positive user experience allowed us to conduct research to study the skills of four high support need and minimally verbal children with ASD, an avenue not explored before. The game let us infer that perspective taking was impaired in only one out of four children with ASD. It is likely that we are able to see more potential for high support need children to emerge when the context is supporting of the individual. The study also confirmed that the issue of individual variation in autism is still topical and showed that individual-level analysis can reveal insights about autism and about the individual with autism. Overall, we wish that high support need children with ASD receive more attention in research and in the design of research methodology, in order to include them in research more systematically rather than using methodologies that may exclude them as a group or as individuals.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
The research was supported by the Academy of Finland, grant number 140450.

Notes on contributors
Vesa Korhonen, MSc, is a PhD candidate at the University of Eastern Finland. His research interests are autism spectrum disorders, attention, social communication, and technology in educational contexts.
Hannu Räty, PhD., is a professor of psychology at the University of Eastern Finland. His research deals with the social representations of intelligence and parental perceptions of their children's schooling.
Eija Kärnä, PhD., is professor of Special education at the university of Eastern Finland. Her research interest are inclusive education and continuous learning, inclusive learning environments and technology for individuals with special needs, and communication and interaction of individuals with severe developmental disabilities and autism spectrum disorders.