All Tests are Not Created Equal: Choosing the Right Psychometric-Based Credit Tool

Alternative credit scoring solutions are being embraced by banks and lenders around the world as a means to improve their credit models, and better service the underbanked.

While such solutions present exciting new opportunities for lenders to adopt innovative technologies and ultimately grow their businesses, it can be difficult for them to navigate through the various vendors and products available. This may be especially true for cross-disciplinary solutions, which apply non-financial data to credit scoring, and are less widely understood among credit risk specialists. Such is often the case with psychometric-based credit scoring, for example, which will be briefly discussed here.

Well established in organisational and educational psychology, psychometric credit scoring solutions are very new to the lending scene, but are starting to gain popularity. Psychometrics is an applied psychological science, which is backed by over a century of research, and has seen great technological advances in recent decades. Psychometric solutions typically take the form of professionally constructed behavioral questionnaires, with algorithms that yield scores based on patterns of responses to carefully constructed questions.

Such tools can ostensibly provide lenders with a complementary layer of analytics that focuses on a borrower’s personal character traits, as they are relevant for good lending behaviors. In this way, psychometrics can help lenders assess creditworthiness more personally, above and beyond their traditional credit scores, which are based primarily on historical financial data. Identifying loan applicants who are responsible, trustworthy and dependable, for example, can help lenders approve more loans among underserved consumers, whose “thin” credit histories might otherwise limit their access to affordable credit.

For those of whom are less familiar with this space, it can be hard to find the tool that is right for their given situation. The following is a brief set of topics, based on best practice guidelines in psychometric testing, which can help identify some key attributes of a good test.

Theory-based

One fundamental requirement of any psychometric tool is that it is based on a solid theoretical model, with clearly understood measurement constructs. Theoretical models are necessary to know what constructs are being measured, so that scores can be both explainable and defensible.

Purely empirical models can sometimes suffer from what is known as “dustbowl empiricism” and may tend to be unstable over time. To be sure, questionnaires and their scoring algorithms can (and should) be adjusted and optimised based on empirical data, but it is important that their theoretical underpinnings remain intact. Moreover, theoretical models should be appropriate for the context in which they are being used. Often, very general theoretical models, based on broad personality traits, for example, can be ineffective and inappropriate for use in specific settings, such as lending.

Reliability and validity

Among the most basic criteria of good psychometric tests is they be both reliable and valid. “Reliability” is essentially the degree to which a test consistently measures what it is supposed to be measuring. Reliable tests yield similar scores for people who test repeated times, for example, and are correlated with other similar measures.

“Validity” is primarily the degree to which the test can predict an external behavior (in psychology, this is known as “criterion-based” validity, although other variants of validity exist as well). The prediction of loan repayments would be an example of validity.

Reliability and validity can both be measured in different ways, and it is not within the scope of this article to detail each approach. However, it is interesting to note that reliability is a necessary but insufficient requirement for validity – think of a broken scale that yields the same weight each time (i.e. reliable), but is still inaccurate (i.e. valid). That a new personality test reliably measures certain personality traits, for example, is insufficient evidence for predictive validity of actual behaviors.

Therefore, any test should have proven evidence for the behavioral criteria they are purported to predict. Finally, it should be possible to quantify the test’s validity in terms of its predicted return on investment (i.e. its economic utility – more on that in a future article).

Purpose of use

Another important aspect to consider is the purpose for which the psychometric test was designed to be used. Tests that have been designed for usage in other settings are unlikely to be appropriate for lending. A test of employee talent assessment (i.e. hiring), for example, may not be ineffective for predicting borrower’s behaviors. In fact, even within talent assessments, some famous tests (e.g. MBTI) are often misused for hiring, despite not being designed for that specific type of use.

Tests that may not have been designed for the purpose of lending may also include non-contextually relevant questions, which may be seen as inappropriate or even invasive by some applicants, and result in a poor customer experience. Ideally, all questions should be relevant for lending and finance, rather than being related to general lifestyle behaviors.

Logistics

Several logistical issues should be considered before adopting a new test, and depend upon the given setting. Among these are:

Time length – testing time can be especially relevant when it is part of a longer overall process, such as a loan application. Tests should preferably be as brief as possible, although there is almost always a tradeoff between test length and reliability.
Language level – the language of the test should be appropriate for the test taker, and should be clearly understood. Ambiguous questions or abstract pictures can be interpreted differently and possibly culturally biased, ultimately reducing reliability.
Test accommodations – candidates requiring special accommodations due to physical or mental disabilities, such as added time or larger fonts, should be entitled to such accommodations without negatively influencing their scores.
Software integration – in many cases, the psychometric tool is going to be a small part of the overall applicant experience, and a complementary piece to the decisioning process. It is therefore important that the psychometric software is integrated seamlessly into the lending platform.
Privacy – psychometric solutions do not inherently need personal identifiable information (PIIs), and should preferably not collect such information. Applicant IDs can be anonymised for the purpose of software integration, and thereby keep all private/sensitive data with the lender.
Data training – unlike some other alternative credit scoring solutions, psychometric solutions do not require large datasets to “train” the scoring model. Data training can otherwise be a complex and timely process, and a hindrance to easy adoption, but many not be necessary for psychometric solutions whose scoring models are theory-base, and not purely empirical. As such, scores can be available from day one. Customisations are of course still possible, as described in the next section.

Cultural norms and customisations

In personality-based psychometric tools, nuanced differences by language, as well as cultural differences by geography, can play important roles.

Specifically, psychometric scores are generally measured against a referenced norm base of some kind, which should closely match that of the target sample. For example, Chinese norms should not be used to assess samples in Brazil. Moreover, the norms for a sub-prime lender may be different to that of a prime lender in the same geography. It is, of course, possible to normalise scores based on a lender’s pilot data, and that is recommended. Pilot data can also be used to recalibrate the scoring algorithm, but should not change the underlying constructs being measured.

Fakability

Since psychometric tools are typically self-reported questionnaires, there is an understandable concern that responses can be faked or gamed. Indeed, this may be a problem for tests that use cognitive based questions (e.g. intelligence related questions) with right or wrong answers, as well as tests whose statements are overly obvious.

On the other hand, more sophisticated approaches ask candidates to choose between equally desirable behaviors, with no right or wrong answers. In addition, computerised algorithms can identify unrealistic response patterns and response times, which can disqualify the scores in some cases.

Fairness and adverse impact

A final issue that should be considered is test fairness. Test fairness refers to the degree to which a test is non-biased against protected demographic groups, such as for age, race and gender. The idea is that no demographic group should be disadvantaged (either directly or indirectly), causing an adverse impact on the decisioning process.

Statistical analyses can be used to compare scores between groups, and demonstrate fairness, and these should be readily available for the relevant geography. In addition, such data can be collected locally, and confirmed for a specific sample. Typically, personality-based measures should have very small and mostly negligible differences between groups, allowing for a fair assessment of the constructs being measured.

Closing remarks

We hope this brief outline of issues is helpful in evaluating the quality of various psychometric based credit scoring solutions. Needless to say, professionally designed tools should be authored by experienced psychometricians, who are well versed in these and other topics.

This article was originally published in www.bankingtech.com