Although testing doesn’t fatten the goat…

In this article, we first discuss some negative attitudes encountered to testing, before mentioning some positive reasons to test. We refer to some characteristics of ‘good’ tests, and indicate that the Richmond Entry/Exit test exhibits these.

Attitudes to testing

We live in an age of data and measurement. It is difficult not to generate data in our daily lives. Everything we do on the internet is recorded, tracked, interpreted. Our preferences, our purchases, our connections – all is constantly noted and put to some use. We do have the opportunity to limit or adjust some of this, but not much of it. It is used by different organisations for different purposes at different times. It is traded between organisations.

It has been observed that people queuing in the post office in big cities are always very impatient and anxious to get on, while people in villages stand patiently. Urbanites know and resent the fact that they are always behind, always time-poor, because of the time spent commuting and so on, leaving them exasperated by additional waiting. Country-dwellers, meanwhile, enjoy the chance to spend a bit of time chatting to the neighbours and relish the slow pace.

It’s not unreasonable to conjecture that since people know they are constantly being measured in so many, mostly economically-driven, ways behind the scenes, their sense of resentment boils over when they encounter any assessment or measuring. We have heard teachers really moaning about testing and it doesn’t always seem wholly reasonable to do so.

Charges are levelled that testing damages learners’ interest in learning, that teachers are obliged to focus on tedious exam preparation rather than more creative activities, and, most seriously in a way, that the test results do not give a fair and accurate picture of the test-taker’s ability.

If tests are poor in certain respects, then the first and third charges may be right. If the tests are poor and the teachers not well trained, then the second charge may be right.

Testing is a complex business and many factors are involved in it. It affects many people in many ways and it is affected by many things. The LT123 Testing Manifesto tries to list the most important things to bear in mind when considering tests, either when designing them or using them. You can see it here. https://lt123.co.uk/testing-manifesto/

Reasons to test

Would you want to get on a plane if you didn’t believe the pilot’s flying and leadership abilities hadn’t been very rigorously tested?

Can you imagine an actor not seeing if they could say their lines before the play started?

Would you enter for a marathon without timing yourself on a range of preparation and training runs and workouts?

It is of course true that you cannot fatten a goat by weighing it repeatedly. We weigh the goat to see if our supposed improvements to its diet are having the desired effect. (Just as we test new vaccines, methods of disease control, and so on.)

We must proceed with care and we must observe many things. A test makes observations about the test-taker, but in turn the test-taker generates information about the test. But the main reasons to test before, during and after the learning process are to support, shape and enhance the learning and the learners.

This matters because learners, particularly school-age learners, need to be safe-guarded. They are the users of education and assessment, but they are not the customer, and it is customers who usually hold the power, not the users. A test provided from outside the classroom, provided in effect independently of the teacher, of the course, the course materials, enables the user to know if progress is being made, if goals are being attained. Assessment facilitates the relationships between learners and the societies in which they live. It monitors the success or otherwise of education. It indicates where repairs may be necessary and what has gone well. It provides an accurate picture of what somebody can do and it can be believed by a wide range of people and organisations.

A test enables us to know if a learner has not just remembered something but actually learnt it, by seeing if the leaner can apply the learning in scenarios that are not simply repetitions of the original scenario in which it was encountered.

As more and more of all human activity moves online, of course we would think of seeing what can be done online. And if much language use in general, and much studying, takes place online, then that’s where we should be testing.

Testing therefore helps us to see what is really happening and it helps to shape and define learning. There’s a range of purposes within that overall objective. With online courses one important role for testing is to place learners in the right level of course or materials access, so that they don’t impair their learning by engaging with inappropriate courseware.

Characteristics of ‘good’ tests

It’s crucial that a test starts from a good place ethically. Tests should not be used as devices unfairly to exclude migrants, for example, by setting tasks deliberately too hard.

Assuming, then, that we are testing with essentially good intentions, we need to establish some form of partnership between test and test-taker. This means that the test should be transparent. The test-taker should know what the test is intended to do and they should understand what they are requested to do by the test. This means clear documentation and clear instructions, with examples provided if necessary. It should be clear to the test-taker how the scores are awarded, so that they know which questions, perhaps, to spend longer on, and so forth.

Ideally, the test-taker should find the test a learning experience in itself, though that may not be fully possible in every situation. As a minimum, the test should be pleasant to engage with. This applies both to the content and to the user experience. The test should be in line with the test-takers and their experience. Young learners should get a test appropriate to what they know and they have studied and will go on to study. Nothing in the test’s content should be offensive or alarming or upsetting – and this can be critical when designing and developing globally used tests which will travel across different cultures.

In the case of many tests, it is inevitable that people will prepare for the test. This makes a lot of sense. You probably would want to prepare for your driving test, so why not your English test? If people are going to spend time preparing, then the test’s design, its content, should encourage pedagogically useful learning activities as preparation.

Balanced with that ambition, however, is the issue of practicality. If we want, and we usually do, to spend a relatively short time being tested, then test designs based on efficiency of reporting may not always be full of really interactive, fun and dynamic material.

Tests must be at the right kind of level for the test-takers. This is easier to achieve these days now there is the technology to enable adaptive testing. The test is driven by the test-taker’s performance, moving up and down in level according to the answers given. It is useful if the levels reported by one test are in concordance with those from others. If all tests are mapped against a framework such as CEFR then we can know that exam result X is equivalent to exam result Y, even if those tests are in different languages and countries.

All this becomes relevant in the context of English teaching and learning and in the pursue of improvement for Colombian educational programs, thus opening the door for innovation in times of change. Richmond in alliance with LT123, proposes a digital ecosystem to promote systematic evaluation so that through learning analytics students’ progress throughout their learning process can be evidenced. Next, we present the entry and exit test that belongs to the Richmond evaluating system.

Richmond Entry/Exit test

Turning now to the Richmond Entry/exit test, we at LT123, working with colleagues in Richmond, worked hard to apply these points to this test. This summary is from our initial recommendations report:

‘LT123 will use its expertise to design and validate its Young Learners entrance and exit tests. This includes:

Extensive knowledge of young learners as test takers (Papp and Rixon 2018)
Experience with measuring school learners English language development
Test development and validation processes that take the latest knowledge in relevant fields into account, including:
- Recent developments in the CEFR for young learners
- Latest research informing English Vocabulary Profile and English Grammar Profile
- Standard setting in developing and validating young learners’ tests.’

The test was designed following standard steps of quality assurance. The project had some complexity in that we needed to account for different levels and different ages.

It was important to make the test fully appropriate for the young test-takers. The consultant team at LT123 includes Dr. Szilvia Papp, a widely recognised expert in this area, as well as Dr Felicity O’Dell and Frances Treloar, both highly experienced in young learners tests within assessment organisations and as authors of preparation and course materials for young learners.

Here are just a few examples of the kinds of considerations made, taken from Szilvia’s own publications, showing in each case the characteristic of this age of learner, together with the problem and the solution.

Limited reasoning skills

– Difficulty as listeners and readers to trace back from effects to causes in processing clauses connected by because until age 11
– Difficulty with counterfactuals related to events that do/did not take place in the present or past, i.E. Processing a situation that is counter to factual reality, difficult because they entail negating a positive statement in the if clause, until age 10
(We’re visiting a very dry country). If it rained there, we’d take an umbrella.
– (It didn’t rain yesterday). If it had rained, we’d have taken an umbrella.
– Difficulty with problem solving and hypothesis testing

– Use events with basic chronology
– Use common cause and effect, ‘it happened because we were not careful’
– Do not refer to events that do/did not take place in the present or past
– Create clear relationships between characters

Problems in constructing a wider discourse representation

– Problems in embedding a piece of information in a wider context
– Limited ability to get the gist and to see the links between parts
– Difficulty collating information, integrating multiple pieces of evidence
– Not sensitive to the connectives which hold together a text

– Contextualize the task in the rubric (who, where, why?)
– Use simple connectives
– Use familiar, sequential discourse types (narration, instruction, description)
– Use transparent connections between utterances or turns
– Avoid misleading leaps of topic

Limited world knowledge

– Can deal with immediate environment and familiar topics

– Match content/topic to expected background knowledge of candidates
– Need for test material to be culturally neutral

Limited understanding of the conventions of written discourse

– Difficulty interpreting writer’s goals / intentions

– Children can be asked to empathise and recognize how others feel, but they may not respond as expected

The test was designed to take account of many factors such as these. It also had to be reliable as a measuring tool as well as being pleasant as an experience for the test-taker. The test is relatively short and clear and accessible to engage with.

We ran a tightly managed team of very experienced writers and editors, as well as audio studio and illustrators, to create the test material. A level-vetting panel of experts was set up to establish the level of the material in relation to CEFR. This was used to drive the design of the pre-tests and the content was trialled in a number of schools in South America. The results were analysed statistically and adjustments were made where necessary. The outcome is a testing tool designed to improve the learning experience by helping to place learners into the right courses and to provide an indication of their measurable progress as a result of those courses.

Of course, we can only weigh the goat and not the field it lives in. Different learners do various things to improve their English above and beyond taking one particular course so the test tells us about the learner, not the courses as such – but then that’s the right way round, isn’t it? RM