Defining intellectual ability requires examining the tools used to measure it

Aug 1, 2016
Supervisory Psychologist G. Frank Lawlis, Ph.D.

by G. Frank Lawlis, Ph.D.
Supervisory Psychologist
American Mensa

As Supervisory Psychologist for American Mensa, the most frequent question I am asked is, “How do you test a person for intelligence?” From my perspective, few go away satisfied with my answers. I understand the frustration because the question implies many further questions, from simple ones to those with layers of definitions, one of which is: What is intelligence?

Many definitions have been applied to this invisible trait, initially considered to be a personality trait for curious and gifted people who created useful and insightful solutions for human problems. By demonstration, these individuals were selected to invent weapons as well as compose music and entertainment to benefit society and serve for guidance in complex social conflicts and economic crisis. In other words, intelligence was considered to be a positive attribute to a culture, and those with a significance in this aptitude were highly regarded and often were rewarded in prestige and financially.

It is well known that Alfred Binet was given the task to develop a method of testing for this mental quality in school-age children, perhaps to determine investment in the youth to “harvest” the evolvement of the community good. The quest to determine this aptitude was highlighted again in World War II when the militaries needed to select soldiers for specialized duties that required high intelligence, such as airplane pilots. As the awareness grew that resources were limited to a population, selection of individuals with the greatest potential for success was necessary for graduate schools, medical training and executive business openings.

Methodologies of testing for intelligence flourished and are still widely discussed with issues of specific talents and abilities. Not only is high intelligence important, low intelligence and problematic challenges to learning need to be distilled in order to create meaningful educational experiences for remediation of adverse conditions and to increase human capital across all vectors of the community. To add to the confusion about the nature of intelligence, we have differing concepts of spheres of intellectual abilities.

American Mensa Adopts New Qualifying Test

This summer, American Mensa is transitioning its admission test from the Mensa Admission Test (MAT), which has been the organization’s test for more than four decades, to the Reynolds Adaptable Intelligence Test (RAIT).

Large increases in average scores, which throw off a test’s normative data, and outdated cultural references made the change necessary. Adopting a new test proved far more cost-effective than revising the existing one.

National Supervisory Psychologist Dr. Frank Lawlis, who consulted on the change, and proctors who piloted the new test overwhelmingly recommended the transition. Proctor training on the new test began in June, and a complete rollout is expected by November.

The RAIT, along with the Wonderlic, allow American Mensa to consider electronic administration in the future and allow testing in more places at more convenient times for potential members.

— Timothy Brooks, Manager of Membership and Admissions

There is verbal intelligence, which we usually assign with greatest credibility because of the highest correlation to the general “g” considered to be the core. But quantitative intelligence has high validity as well. One only has to bring up Einstein in a debate, which shows that although he was brilliant in math, he was a failure in history. Certainly, interest and motivation are underlying determinants of types of intelligence. Spatial intelligence, as a nonverbal measure, has been proven to show variance among the cultures and deemed by some as the true measure of fluid intelligence.

This discussion brings us to one of Mensa’s greatest challenges: On what basis do we select someone as highly intelligent? Obviously, the definition for Mensa’s task in selection has been to define high intelligence as a statistical percentile (98 percent) on a qualified test. But on what test? Verbal? Performance? Personality trait? Spatial?

There are four general criteria we have used for selection of people based on particular tests: Classification, representation of sample(s), reliability and validity. Classification is the definition of the test and usually determined by the publisher. For example, the publishers of the Graduate Record Examination (GRE) could define their test as one of intelligence (as has been done in the past), yet it has been determined that it is at its core a test of achievement, and its items reflect this purpose. This criterion also relates to how it is validated. If a test correlates more with achievement scores than concurrent intelligence scores on a similar test, then the classification changes through empirical basis.

By far one of the most problematic and expensive criteria to be accepted as an intelligence test is the representation of individuals in the normalization process. For our purposes, we cannot accept a test that is not nationally represented if we are to claim our members are in the top 2 percent of the nation, yet we often get tests that are based on as few as 50 subjects. Tests can be biased based on biased samples. Individuals in New York are raised in culture and language different than Texas or California, as an obvious statement, yet it costs millions of dollars to make the effort.

Having done a major study in statistics, reliability is my favorite topic, but I will spare you the book survey I did on the topic. Suffice it to say, reliability levels indicate that there is something a test is measuring, and they determine that answer by how consistent the scores are from one item to the next. In other words, if a group of individuals is consistent from one testing to the next, one has to conclude that the people are understanding and responding to some questions and reflecting a variance of measure. Of course, there is the question of whether the individuals themselves are consistent within themselves, such as measures of motivation and emotions.

There are multiples of tests of reliability, but they come down to external consistency and internal consistency. External consistency is usually referred to test-retest consistency, which literally means the correlation between test 1 and test 2, both versions of the same test format. Internal consistency is usually referred to as the correlations within the test. If the items correlate with each other and with the total score, then there is evidence that the whole measure is consistent.

The validity of a test is the most asked question: What is it measuring? Again, there are a variety of statistical validity measures, but I will discuss two basic ones, concurrent and predictive validity. Concurrent validity defines what the test is measuring by some accepted measure that is correlated to it. Nearly every publisher correlates its test to the Wechsler series because of its credibility. If test A correlates to the Wechsler Adult Intelligence Scale, then it is defined as such. Predicted validity is based on how well it predicts some performance based on its score, for example, if test A defines its predictive value on students’ grades and then there is credibility in that specific prediction. Of course, it also could predict motivation, curiosity, self-esteem and the wide perspective of what grades mean (or what intelligence is).

The Summer 2016 Mensa Research Journal brings into discussion many of these factors that make up intelligence tests, and perhaps this review of testing will bring some of those factors back into discussion. The psychometric process of test construction is a constant research topic because test items grow old and irrelevant, and new words become ingrained into society. Intelligence is not a static construct and reinvents itself with each new situation and problem.

* * *

This article originally appeared as the Foreword to the Summer 2016 Mensa Research Journal, Volume 47, Issue 2. Special thanks to MRJ editor Steve Slepner.