Disability measurements: impact on research results

This article addresses the issue of whether the choice of operational definition of disability in survey research affects findings. Earlier studies have shown that different definitions cause substantial variation in prevalence rates, as well as limited agreement on the classification of subjects as disabled or not disabled. The article addresses whether this leads to differences in research outcomes. The study compares seven existing measurements of disability. The analysis is carried out using a single dataset; the 2007 survey of disabled people’s living conditions in Norway (N 1652). Results are reported for subject descriptors (gender, age, marital status, characteristics of impairment), social indicators (education, income and employment), and predictors of income and employment. The impact of the definition of disability on results is found to be modest in general but with exceptions: The definition of disability clearly affects employment rates and the type and degree of impairment of people classified as disabled. Consequences for disability research are discussed.


Introduction
Disability research employing surveys and censuses has faced methodological challenges for decades. The problem is not only that the concept of disability can be theoretically understood and defined in a number of ways (Barnes, Mercer, and Shakespeare 1999;Altman and Barnartt 2003;Grue 2004;Borg 2008), but also that the operational measurement varies considerably for instance regarding number and type of questions included Á see methods section for examples (Fujiura and Rutkowski-Kmitta 2001;Tøssebro and Kittelsaa 2004;Molden and Tøssebro 2010;Houtenville et al. 2009). There is little agreement on how disability should be understood, conceptualized, and measured empirically in a quantitative research setting (Loeb and Eide 2006). The International Classification of Functioning, Disability and Health (ICF) by the World Health Organization (WHO 2001) and the proposal by the UN Washington Group of Disability Statistics (Altman 2006;Mont 2007) are seen as attempts, or recommendations, to standardize the measurement of disability and have clearly inspired and influenced researchers worldwide (Hendershot 2006). However, there is as yet no consensus on an international standardization of disability measures (Ravaud, Letourmy, and Ville 2002;Altman 2006), and, in practical survey research, a variety of measures are employed. There are a number of reasons for this state, including traditions of different statistical agencies, disagreement on relevant indicators, and practical issues such as how many questions the survey can include. In addition, some measurements may serve slightly different purposes (Grö nvik 2007).
A number of researchers have pointed out that this state of affairs is problematic because operational definitions are likely to affect research outcomes (Hem 2000;Grö nvik 2007;Loeb, Eide, and Mont 2008;Molden and Tøssebro 2010), and, if this really is the case, it places severe limitations on the possibility of comparing results across studies and on building a cumulative research basis. There are a number of findings that gives reason for concerns, particularly regarding prevalence rates. International studies report considerable variation, both within and across countries, in the proportion of the population classified as disabled (Fujiura and Rutkowski-Kmitta 2001;Dupré and Karjalainen 2003;OECD 2003;2010;Loeb and Eide 2004;Purdam et al. 2008;Loeb et al. 2008). In Norway, a research review found disability rates from surveys to vary from 7% to 30% depending on methodologies used (Tøssebro and Kittelsaa 2004). Furthermore, a study using several commonly employed disability measurements within a single survey found that disability rates varied from 10% to 28% (Molden and Tøssebro 2010). It also appears that the group identified by one definition only partly overlaps with the group identified by another definition. This was tested in France in a study by Ravaud et al. (2002) and also in Norway by Molden and Tøssebro (2010). Both studies found the level of agreement across definitions to be unexpectedly low; not only with respect to prevalence rates, but also to whether a person was indeed classified as disabled or not. An operational definition that classified fewer people as disabled did not simply identify fewer people but partly a different group. We do not know why this is the case, though it might be because definitions have different affinity to people with different impairments. Minor differences in the phrasing of the disability measurement question in the Swedish and Norwegian labour force surveys do for instance lead to differences in number of people with mental health issues, diabetes, or moderate hearing difficulties that are included among disabled people (Tøssebro 2011).
The findings suggest caution when comparing results across studies. Furthermore, we know little about how research on social indicators, such as level of education, employment, housing conditions or income, is affected. Although such questions are addressed in a few studies (Grö nvik 2007;Tepper et al. 1997, Altman 2001a, 2001b, Houtenville et al. 2009), these studies compare results that not only differ in operational definitions of disability but also in other methodological aspects (such as sampling or study design). Grö nvik (2007), for instance, found considerable consequences of employing different definitions. However, the main differences were between results obtained from surveys using self-assessment to define disability and register-based studies of people receiving services intended for severely disabled people. Thus, the definitions were hardly meant to serve the same purpose or address the same group. Altman (2001aAltman ( , 2001b has suggested that different definitions may affect both the distribution of background variables (such as age, gender and ethnicity) and labour market status. Houtenville et al. (2009) showed differences in employment rates across definitions, whereas Tepper et al. (1997) suggested that estimates of health-care expenditures should be interpreted with caution as they are affected by disability measurement. Research on this area, however, is not conclusive. In part this is due to the few studies conducted, but also because the variation in results may be caused by other methodological aspects than the operational definitions (such as sampling and study design). Nevertheless, the findings do suggest that it is important to carefully consider the consequences of empirical definitions of disability in research.
A Norwegian dataset provided an opportunity to assess the possible consequences of the current state of art in disability measurement. This is a survey of living conditions of disabled people (LCD) conducted by Statistics Norway in 2007(Bjørshol 2008. The survey included items from a number of commonly employed disability operationalisations. Thus, within this single dataset, one has the opportunity to compare the distributions on various other variables (such as gender, age, type of impairment, severity of impairment, employment, etc.) using different operational definitions of disability. Using this strategy, alternative explanations (such as sampling or study design) of diverging results can be ruled out. This dataset has previously been used to study differences in prevalence rates and level of agreement on the classification of people across disability measurements (Molden and Tøssebro 2010, see above).
At this point, there is need to clarify one terminological issue. Part of the theoretical literature makes a clear distinction between disability and impairment, and in reality, most disability measurements in quantitative research employs operationalisations that are more in keeping with the concept of impairment. That is, one is addressing individual characteristics rather than environmental barriers (Grönvik 2007;Molden and Tøssebro 2010). However, in keeping with traditional language in disability research we will use the term disability throughout the article. The exception is in cases where there are clearly referred to individual characteristics, such as type of impairment and onset of impairment.

Aims of the study
The aim of this study is to move beyond the question of disability rates and level of agreement and rather to explore the consequences of different operational definitions of disability on research results. Three types of possible consequence are addressed: (1) Consequences for the composition of the group classified as disabled. We will analyze consequences for the distribution on a selection of variables commonly used to describe groups, such as gender composition, age distribution, and marital status, and we also include impairment-related descriptors, such as type of impairment, degree of impairment, and the age at which the impairment was acquired. The purpose is to analyse to what extent different disability definitions identify groups that differ with regard to important sample descriptors. (2) Consequences for disabled peoples' outcomes on important social indicators, such as labour market participation, income, and education. These indicators are assumed to substantially impact people's living conditions. For disabled people, education and employment, in particular, can be viewed as measures of inclusion in society (Borg 2008). They are also used as indicators in studies of disability discrimination (Barnes 1991). (3) Operational definitions might not only affect the distribution of other variables, but also social mechanisms: that is, the relation between variables. For example, it is well known that higher education enhances the employment opportunities for disabled persons (Bliksvaer and Hanssen 2006). This mechanism may, however, also be affected by disability definitions. For the purposes of this article, we have selected employment and income as examples of outcome variables and explore the extent to which operational definitions of disability affect how gender, age, education, type of impairment, etc., influence these outcome variables amongst disabled people.
To address these questions, we have employed seven different definitions of disability that have frequently been used in surveys of disabled people in Norway and/or other countries.

Data, methods and measurements
The sample/survey The analysis reported in this article is based on data from the national survey of living conditions of disabled people (LCD) by Statistics Norway in 2007 (for details, see Bjørshol 2008), which was carried out in two phases. The first phase consisted of a brief screening of a random sample of persons aged 20Á67 from the Norwegian population (telephone interview). The second phase consisted of a full survey of disabled people identified through the screening (telephone or personal interview). Of the gross sample 70% (N 010,920) responded to the screening questions (N 07,632), and 26% of the screening sample (N 01,984) were identified as potentially disabled. These were invited to take part in the full survey and 85% accepted (N 01,652). The criteria for being invited to participate in the full LCD survey were based on a wide definition of disability. This included the questions: (1) Do you have a longstanding illness or disability (more than six months)?; (2) Do you have problems with (a) pain; (b) breathing; (c) concentration or remembering; (d) anxiety; (e) depression,; (f) other mental problems?; (3) Can you without difficulties: (a) walk stairs one floor without resting; (b) walk for five minutes; (c) lift and carry five kilos; (d) hear what is said in a conversation with more than two people; (e) normally hear what is said on a telephone; (f) read normal newspaper print (with or without glasses)? People who responded 'yes' to any of the items in question 1 and 2, and 'no' to any of the items in question 3, were asked if this limited their everyday life. People who confirmed this (to some extent or strongly) were invited to participate in the full survey. In addition, all persons in the screening sample receiving any of the four most common disability-related benefits were invited to participate (see measurements section).
The logic behind such wide inclusion criteria was to recruit 'potentially disabled people.' The intention was to invite to the full survey all persons (or as many as possible) that would be classified as disabled according to any one of the commonly used definitions. The items from the various definitions were subsequently included as questions in the full survey (if not already included in the screening). This provided the opportunity to classify the sample according to a number of measurements of disability used in Norway and internationally (such as in Sweden, the EU, Australia and the USA). The data reported in this article are from cases participating in the full LCD survey. The disability rates computed may in some cases be underestimated as persons not eligible to participate in the full survey may have responded affirmatively to some of the impairment questions in the full survey if they had been given the opportunity (false negatives). Given the wide inclusion criteria for the full LCD survey, however, there is reason to expect that the underestimation is minor and unlikely to affect the results reported herein. Furthermore, there is an attrition rate of 30% of the gross (screening) sample. This is not uncommon for such surveys, but may lead to biased results. Attrition was analysed by Bjørshol (2008), suggesting minor biases but a small overrepresentation among non-respondents of people with lower education and ethnic minorities. This may have a minor impact on disability rates but is unlikely to affect the research results presented in this article because the issue is differences within the same sample.

Measurements of disability
The measurements of disability used in this study are all intended to be replicas of measurements used in earlier surveys or censuses. Four considerations have guided the choice of measurements, that they: (1) represent measurements from more countries and/or regions; (2) that the main types of disability measurement as identified by Grö nvik (2007) (subjective, administrative and functional definitions) are represented; (3) that the measurements are designed to reflect disabled people in general; and (4) that measurements inspired by ICF (WHO 2001) and the UN Washington Group on Disability Statistics are included (see below). In some cases modifications of the original measurements were necessary, as when several previous measures used similar questions, but with minor variation in phrasing or response categories. In such cases one question was chosen in order to avoid several items appearing identical to respondents. Modified versions of the original measurements are clarified in the description below. A table with the questions (35 in all) and operational definitions is shown in the Appendix.
Subjective definition: A number of surveys employ a definition based on selfassessment but with different phrasing of the questions (such as the Disability Supplement to the European Labour Force Survey 2002 (ELFS), The European Social Survey (ESS), The European Community Household Panel (ECHP), EU-SILC, the general Norwegian Living Conditions Survey from 2005, and also the annual Disability Supplements to the Norwegian and Swedish Labour Force Surveys (LFS)). This type of measurement is typical for EU surveys. The LCD survey used a version similar to the EU-SILC question: 'Do you have any long-standing illness or disability?' The follow-up question ('does this limit your activities?') was coordinated with other impairment questions in the LCD survey and deviates slightly from the original, mainly because it was placed after the series of questions on 'can you without difficulties . . .' (see above). Thus, the question of limitations in everyday life is related to several questions rather than to a single subjective self-assessment. The subjective operational definition used in this article includes people reporting that they are limited 'to some extent or strongly' in their everyday life.
Administrative definition: Administrative definitions identify people receiving a service or benefit intended for disabled people. Such definitions are rarely used on their own in surveys of disability, but rather as an item in broader functionally based definitions. If used alone, the study is more likely to address a specific group of disabled people (for instance people receiving assistive technology or services for people with intellectual disabilities). There are, however, exceptions where administrative criteria are used alone, for instance in countries where systems exist for official recognition of disability (e.g. Ravaud et al. 2002). In Norway, the so-called basic benefit (compensation for extra costs) was used as a definition of disability in a survey by Statistics Norway in 1995(Statistics Norway 1996. We decided to include an administrative definition in the analysis in order to examine the extent to which this type of demarcation differs from other disability measures. In this article we use an administrative definition based on the receipt of at least one of the four most common benefits for disabled people in Norway (the first two of which are not linked to work incapacity); (1) The basic benefit; (2) the supplementary benefit; (3) the disability pension; and 4) the time-limited disability benefit.
The UN Washington Group: In this study, we included five functionally based definitions used in different countries. Such measurements pose a series of questions on functional limitations, and the person is considered disabled if responding affirmatively to one or more questions. The first functional definition we included was the measure proposed by the UN Washington Group of Disability Statistics (Mont 2007). This definition (WG) consists of four items: difficulties with (1) seeing (even with spectacles); (2) hearing (even if using hearing aids); (3) walking or climbing stairs; or (4)  (4) blackouts, fits, or loss of consciousness; (5) slowness at learning or understanding; (6) incomplete use of arms/fingers; (7) incomplete use of feet/legs; (8) difficulty gripping and holding small objects; (9) treatment for nerves or an emotional condition; (10) restrictions in physical activities or in doing physical work; (11) disfigurement or deformity; (12) long-term effects of head injury, stroke or brain damage; (13) a mental illness requiring help or supervision; (14) treatment or medication for a long-term condition or ailment and still restricted; (15) any other long-term condition resulting in a restriction (Madden and Hogan 1997;ABS 2003). The definition (AUS) employed in the LCD survey consists of 14 items. Item number 14 in the original was omitted.
Activity limitations: Statistics Norway has also developed a definition intended to be in keeping with the logic of the ICF (Ramm 2006). This definition distinguishes between activity limitations and participation restrictions (WHO 2001). Activity Limitations (Act.) is based on 9 items. Difficulty to: (1) walk stairs up or down one floor without a rest; (2) walk for five minutes at a rapid pace; (3) read a plain text in a newspaper with spectacles if necessary; (4) listen to a conversation between at least two persons, with hearing aids if necessary; (5) a condition of feeling nervous; (6) a condition of often feeling scared or anxious; (7) feelings of hopelessness for the future; (8) being depressed or sad; or (9) often being distressed or restless. It is also a criterion that the difficulties hamper the respondents' everyday life. In order to use this measure with LCD data, some modifications were needed: Items 5Á8 were replaced by 'problems with remembering and concentrating,' 'feelings of anxiety,' and 'other mental difficulties.' Participation restrictions: The items in the Participation Restrictions measure (Part.) are difficulties with: (1) moving out of the home without assistance from others; (2) participation in organizations or associations; (3) participation in leisure activities; (4) travelling on public transport; or (5) making contact with others or talking to other people. It is a criterion that the difficulties hamper peoples' everyday life. In order to use this measure with LCD data, item 1 was replaced by 'getting in or out of the building they live in.' The US-SIPP definition: In the US a number of definitions are employed (Houtenville et al. 2009). The LCD survey includes items from the Survey of Income and Program Participation (SIPP) by US Census Bureau (Steinmetz 2006). This measures impairments in three domains: the communication, mental health, and physical domains. People were classified as having a disability in the communication domain if they had difficulties seeing, hearing, or speaking, were blind or deaf, or reported one or more related conditions as the cause of an activity limitation. Items in the physical disability domain are related to the use of a wheelchair, cane, crutches or walker, or having difficulty with one or more of the following functional activities: walking a quarter of a mile, climbing a flight of stairs, lifting something as heavy as a 10-pound bag of groceries, grasping objects, getting in or out of bed. This domain also included a question about everyday limitations related to a number of diseases (details in Steinmetz 2006). Items in the mental health domain were: If people had one or more of the following conditions: (1) learning disability; (2) mental retardation; (3) other developmental disabilities; or (4) Alzheimer's disease; (5) any other mental or emotional condition that seriously interfered with everyday activities; 6) difficulty managing money/bills; or (7) reported one or more related conditions as the cause of an activity limitation. Together these items make up the definition of disability in the US-SIPP survey (Steinmetz 2006). Some of the items were not included in the LCD survey. Disability in the communication domain is identical (difficulties with seeing, hearing or speaking). Included in the physical domain was use of any aid for moving indoors or outdoors, difficulties walking for five minutes in rapid pace or climbing stairs, difficulties carrying an object of five kg, breathing problems, difficulties gripping or holding objects, or difficulties being in physical activity or doing physical work. Disability in the mental domain contains difficulties to learn or understand, managing money and bills, having a long-standing psychological or emotional difficulty, difficulties with remembering or concentrating, or feeling anxiety or depressed. This definition is named US-SIPP in this study. Table 1 shows an overview of the measurements of disability analysed in this article, including sources, the prevalence rates in the source publications, the estimated prevalence rate in LCD, and the number of cases (N) classified as disabled according to the definition in LCD.

Other measurements/social indicators
In order to explore the first research aim (composition of the group classified as disabled) we included a set of variables referring to characteristics of the respondents. This includes gender, age, marital status (three categories; single, married [and livein-partner] and 'other' [widow/widower, judicially separated and divorced]) and descriptors of the impairment. Type of impairment is based on a set of questions where each person could respond affirmatively to more than one type of impairment. In this study, this was reclassified into one variable based on peoples' statements about what they see as their main impairment. The variable is recoded according to a procedure described in Molden, Wendelborg, and Tøssebro (2009) into eight groups: (1) sensory difficulties; (2) breathing difficulties; (3) chronic pain; (4) mobility difficulties; (5) mental health difficulties; (6) head injuries; (7) learning or cognitive difficulties; and (8) other impairment. The response categories for the onset of impairment question were 'congenital' or the actual age the impairment was acquired. This was re-coded into two categories; (1) congenital or acquired before the age of 21; and (2) acquired at age 21 or later. Severity of disability measures to what degree the respondent is hampered in their everyday life. The measure is based on respondents' self-assessment in two categories; (1) to some extent, and (2) severely.
With respect to the second research aim (consequences for outcomes on social indicators), three variables were used in this study: (1) labour market participation; (2) annual income; and (3) education level. Labour market participation measures whether a respondent is employed or not based on the respondents' classification of their main activity chosen from a list of eight possibilities. People employed full-or part-time or self-employed were classified as employed. Annual income (in NOK) is the respondents' income after taxes obtained from tax registers by Statistics Norway. To ensure anonymity, Statistics Norway has grouped income into nine categories, ranging from one (income less than NOK100,000) to nine (income more than NOK1,000,000). To simplify the data presentation when addressing research aim two, the above income categories are recoded into a dummy variable: (1) income less than NOK200,000 (generally considered as low income); and (2) income of NOK 200,000 or more. Income is, however, used as a continuous variable (categories 1Á9) in the regression analyses presented in Table 5 related to research aim three. Educational level is obtained from records by Statistics Norway and presents the highest educational level completed by the respondent. The variable is recoded into three levels: (1) compulsory school (years 1Á10); (2) upper secondary school (years 11Á13); and (3) higher education (university or university college degree).
Regarding research aim three, we have analyzed the social indicators, labour market participation and annual income, as dependent variables. All independent variables entered into the analyses are described above.

Data analysis
Data was analysed using the SPSS 17.0 software. For the purpose of research aim one and two, descriptive statistics (frequencies and means) were employed. Significance was set at p B.05. Since each row in Tables 2 and 3 consist of 21 pairs to compare, the presentation of statistical significance was simplified. For each variable, the starting point is the disability definition with the highest proportion of a specific value on the descriptive variable (for instance the highest proportion of women). Table 2 includes information on the number of other disability measurements that has a proportion of this value (for instance women) that is significantly different from the value on the definition with the highest proportion. This will also provide some guidance for the 95% confidence interval for all pairs. Significance was computed manually according to the formulas in Skog (1998), 182ff). For the purpose of research aim three, regression techniques were used. In the case of annual income, ordinary linear multiple regression was used, and in the case of labour market participation, a logistic regression (technique). In the regression analyses, type of impairment is recoded into a series of dummy variables with sensory difficulties as reference category. The same applies to marital status in the logistic regression (three dummies). This variable is, however, used as a dichotomy in the linear regressions model (married [partner], not married). Table 2 shows descriptive statistics on gender, age, marital status, type of impairment, onset of impairment, and severity of disability according to the measurements of disability included in this study. Given the differences in prevalence rates and the lack of agreement between definitions (cf. Molden and Tøssebro 2010), the variation in the composition of the group is less than expected. No significant differences were found for age and onset of impairment, and the variation in gender and marital status is limited. According to all disability measurements, approximately 60% of disabled people are women, and the mean age is approximately 50 years.

Consequences of disability measurement for the composition of the group
There are, however, two important exceptions to the consistency across definitions: This concerns type of impairment and severity of disability. People classified as disabled according to the Subjective definition more often report chronic pain than those classified according to any other disability definition. Furthermore, the Subjective definition includes few people with mental health difficulties or learning/cognitive difficulties. The Participation Restrictions measure includes more people with mobility difficulties than most other disability measures, whereas the Activity Limitations measure includes significantly more people with mental health difficulties. The Washington Group and the Participation Restrictions measurements include a higher proportion of people with learning/cognitive difficulties. Notably the Administrative definition includes more people with 'other impairments,' most likely due to the fact that some people receive disability-related benefits for other reasons than a specific disability. The US-SIPP and the Australian definitions have the highest proportion of people with sensory difficulties.
With regard to severity of disability, the two ICF-inspired measures, Activity Limitations and Participation Restrictions, clearly include a higher proportion of people experiencing severe limitations in their everyday life. The Subjective and Administrative definitions, and also the US-SIPP and the Australian definitions, include fewer people with self-reported severe limitations. Table 3 shows the distribution for the three selected outcome variables according to different operationalisations of disability. The variation in outcome on educational level is minor. People with benefits (Administrative definition) tend to have a somewhat lower level of education, whilst education results across the other disability measurements are uniform. There is some variation in annual income (from 53.3% to 59.1% with income of more than NOK200,000), but no dramatic differences.

Consequences for outcome measures/ social indicators
However, when it comes to labour market participation, considerable differences become apparent. Only 32% of the people defined as disabled according to the Administrative definition are employed. This is in stark contrast to the definition from US-SIPP, where 56% participate in the labour market. The other disability measures result in rates of employment varying between 38% and 47%. One would have expected the differences in employment rates across definitions to be reflected in differences in annual income. This is only partly the case. On the one hand people who are disabled according to the Administrative definition have low employment rates, and a low proportion with an income above NOK200,000. On the other hand, people included by the Subjective definition have the largest proportion of people with income above NOK200,000 yet an employment rate close to the average. The high level of employment according to the US-SIPP definition is not reflected in annual income. The most likely explanation is that the effect of employment on income is moderated by the social security system.

Impact on social mechanisms: employment and income
In order to illustrate research aim three, the impact on social mechanisms or relations between variables, we have explored the impact of a set of variables on labour market participation and annual income amongst persons classified as disabled according to different definitions. We have delimited the analysis to four of the seven disability measurements: the Subjective definition, the Administrative definition, and the functional definitions with the highest and lowest employment rates (US-SIPP and Washington Group). Table 4 shows the results of the logistic regression analyses (odds ratios) on labour market participation. The same set of independent variables is entered in the four analyses shown in Table 4, but disabled people are defined according to four different definitions. The results are strikingly similar when it comes to the effects of age, marital status, educational level and severity of disability. Education and severity of disability stand out as the most important predictors in all four regressions. In all analyses higher education increases the odds ratios for employment more than four times, and a severe disability reduces the odds ratio to about 0.40. Age also has a consistent and significant impact on employment, whereas marital status has a consistent non-significant effect. Furthermore, there are some differences across definitions regarding the effects of gender, type of impairment, and onset of impairment. There is variation across disability definitions as to the extent gender and onset of impairment has a significant effect on employment, but the differences in odds ratios across models are limited. The pattern with respect to type of impairment is more complex. Some types of impairment that bring about an odds ratios for employment that is significantly different from people with sensory difficulties (reference category), but this varies across disability definitions. People with mental health difficulties have lower employment likelihood in all cases except the Washington Group definition, but the odds ratio in the Washington Group model is similar to the other odds ratios. People with chronic pain have a significantly higher likelihood for employment only according to the Subjective definition, and people with learning/cognitive impairments or other impairments are significantly less likely to be employed only if disability is defined according to the US-SIPP definition. The differences in employment likelihood (odds ratios) across type of impairment are striking in two cases in particular: People with chronic pain according to the Subjective (OR 01.63) and the Washington Group (OR 00.98) definitions, and people with other impairments according to the Administrative (OR 02.08) and US-SIPP (OR 00.24) definitions. One should notice that these contrasts reflect the maximum variation between disability definitions in number of people that have the impairment in question (chronic pain high on Subjective and low on Washington Group; other impairment high on Administrative). Table 5 shows a multivariate linear regression model which predicts outcomes on the annual income variable according to the four definitions of disability. There are few differences between the definitions, and the analyses show that roughly the same factors influence the annual income in all four regression models.
As expected, educational level and labour force participation affect income and have the strongest impact on the annual income. One should notice that although Table 3 shows that labour market participation and annual income to only a limited extent varied consistently with disability definitions, labour market participation clearly has an impact on income in all the four models in Table 5. Gender also has a significant effect, and, as expected, women earn less than men. Interestingly, severity of disability and onset of impairment do not significantly contribute to differences in annual income. Two variables, age and marital status, show significant results in some models but not all. However, even the significant results are minor. Although there are differences between the definitions in rates of people who annually earn more than NOK200,000 (Table 2), there seems to be few differences between the definitions with respect to the variables influencing the income variation.

Discussion
The aim of this study was to address the extent to which differences in the operational definitions of disability lead to differences in research results. The point of departure was that earlier research has shown substantial consequences for disability rates (the proportion of a population classified as disabled) as well as a low level of agreement between definitions (limited overlap of people classified as disabled). Thus there was reason to question the extent to which different definitions also lead to differences in research results, for instance regarding employment rates or the composition of the group of people classified as disabled. This study has explored similarities and differences between three types of results: the composition of the group (background variables and impairment characteristics), social indicators, and social mechanisms (predictors of variation on social indicators).
The main result is that definitions of disability do affect research results, but that the impact varies considerably. The impact on group composition variables such as age, gender, marital status, and age of onset of impairment are insignificant or minor. The same applies to social indicators, such as educational level and also predictors of employment and income. Thus, the general impression is that differences in research results are less than expected, given the variation in disability rates and lack of overlap between definitions. The differences are also clearly less substantial than Grö nvik (2007) found when using several data sources with different methodologies. However, this study has also shown three exceptions to the rather reassuring general finding: The impact of disability definitions on both employment rates and the distribution of types of impairment and degree of disability were substantial. This calls for caution when comparing results across studies and countries and has potentially devastating consequences, for example for the OECD statistics on disability and employment (for instance OECD 2010, 51). OECD compares disability employment rates across 27 countries, and finds a variation from approximately 30% to 60% (excluding Poland). This is approximately the same range as found in this study of one sample when employing different definitions of disability. OECD employs the definitions used by the relevant national statistical agencies, and, even though many countries use a subjective definition, there is considerable variation. We do not suggest that such statistics are useless, but caution is definitely needed when comparing results employing different definitions.
It also seems clear from the results presented here that different disability definitions operate differently according to type of impairment and degree of disability, and in particular the extent to which people with chronic pain and people with mental health issues are classified as disabled.
It has not been the aim of this study to determine whether one operational definition is better than others. The methodology provides an opportunity to map variation in results but not to evaluate one measurement against the others. However, the results of the study provide suggestions as to what to expect when employing the different definitions.
First, the administrative definition appears to stand out from the other definitions. It produces a lower employment rate and a group with less education. On the one hand, the low employment rate might be seen as obvious since many receive work incapacity related benefits. The inclusion of this definition in the analysis of employment might thus be seen as unnecessary. On the other hand, persons receiving benefits in Norway are encouraged to work part-time, and it is nevertheless interesting to see variations across definitions, even though it in this case might be seen as obvious. The administrative definition also includes few persons who have severe disabilities and mental health difficulties. This is partly unexpected. One would expect severity of disability to be positively related to eligibility for benefits, and people with mental health difficulties comprise an increasing group receiving benefits in a number of countries (OECD 2010). In general, it is also the case that mental health difficulties and a severe disability predict low employment rates (cf. Table 4). The explanation is likely to be that the administrative definition identifies a group of people that is processed by the welfare system. This group is likely to have labour market problems, but it may be for other reasons than a specific or severe disability. Thus, the definition stands out with more people classified without any specific type of impairment ('other impairment' in Table 2). The group appears to consist of a number of people that do not see themselves as disabled and exclude people with impairments that do not receive any benefits. According to Molden and Tøssebro (2010), 36% of the people that see themselves as disabled, do not receive any of the benefits that the administrative definition is based upon (study based on the LCD data). Thus, even though one should take care not to evaluate the different measures based on the data presented in this paper, the administrative definition used here appears, in our opinion, to have clear shortcomings with respect to the identification of the group that is generally the issue in quantitative research addressing disabled people in general. Second, the subjective self-assessment appears to include more people with pain problems and fewer people with mental health difficulties. Furthermore, the proportion with a self-assessed severe disability is relatively low. This stands in contrast to the three most typical functional definitions (WG, Act and Part.) that include more people with severe impairments and also a relatively larger share with 'classical disabilities.' However, these differences do not appear to significantly impact the other research results reported here.
The results reported here have some obvious limitations. There exists a wide range of disability measurements and conceptualizations, and the present study has only included a selection. Results from this selection can hardly be generalised to other disability measurements. Furthermore, the study is conducted within a Norwegian context, and we cannot exclude the possibility that the Norwegian culture, language or welfare system impact how people respond to the items included in definitions and thus subsequently affect the outcomes of the analyses. The 30% non-response in the gross (screening) sample may also pose a limitation.
A last reservation is that we have only addressed a selection of 'outcome' variables. One can hardly extrapolate from these results to other kinds of outcomes. It appears that the variation in results across definitions is modest when it comes to typical background variables, but not when it comes to type of impairment and severity of disability. It also appears that consequences are modest for some social indicators, but not for all, and particularly not for employment rates. This calls for caution when interpreting and comparing results across studies employing different disability definitions, several data sources, and over time. The call for caution includes international comparative research and is particularly pertinent when discussing employment rates and other outcomes likely to be affected by type and severity of impairments. This state of affairs is yet another argument for international standardization of disability measurement, not because this is likely to produce agreement on some sort of best operationalisation, but in order to allow a more reliable basis for comparisons between studies and countries, as well as to build a cumulative research basis.