Monitoring the KS2 higher standard

This short post reviews performance at the higher standard in the 2017 KS2 national curriculum assessments, comparing this year’s provisional outcomes with those from 2016.

It uses underlying data from SFR43/2017: ‘National curriculum assessments: KS2 (provisional)’, first published on 31 August 2017 and the equivalent SFR39/2016 published last year. Reference is also made to SFR62/2016 containing the revised data from 2016.

The higher standard

The latest Statement of Intent confirms that the 2017 primary performance tables will continue to include, as one of four headline measures:

‘The percentage of pupils who achieve a higher standard in reading, writing and mathematics’

This percentage will also be supplied for boys and girls, disadvantaged pupils, those with EAL and non-mobile pupils.

The Statement implies that it will not be available for low, medium and high prior attainers, but those percentages are now included in the 2016 tables, suggesting that this is erroneous.

The technical guide ‘Primary School Accountability in 2017’ defines this combined measure:

‘To be counted towards the measure, a pupil must have a ‘high scaled score’ of 110 or more in reading and mathematics; and have been teacher assessed in writing as ‘working at a greater depth within the expected standard’.

Unlike the expected standard, which was determined by the Standard and Testing Agency’s standard setting teacher panel, the high score was determined by the Department solely with reference to the distribution of pupils’ test results to identify the pupils who achieved the highest marks on the tests.’

This leaves open the possibility of year-on-year adjustment of the higher standard, but SFR39/2016 indicated the opposite:

‘The high score is not based on a standard of achievement in the same way that the expected standard is. It was set after analysis of the 2016 results. A threshold of 110 was chosen to give approximately one-fifth of pupils achieving the high score in each subject. This threshold also has the presentational advantage that it is the mid-point between the expected standard and the maximum scaled score. The threshold for the high score will be confirmed for future years in updates to the technical guidance, but the intention is that it will remain in the same place (110) for a number of years so that changes over time can be measured.’

The technical guide also mentions that ‘working at a greater depth’ is assigned a notional 113 points ‘within the scaled score range’ though this is only for the purpose of calculating writing progress scores. This was derived by:

‘…considering the percentage of pupils achieving each category of English writing teacher assessment, identifying the corresponding percentages of pupils on the English reading and mathematics tests and finding the mean scaled score for each group, in order to determine the number most likely to be the best reflection of a typical pupil’s performance in English writing.’

The intention to maintain comparability looks set to be frustrated by plans to revert to a ‘best fit’ model of teacher assessment in writing. The goalposts for ‘working at a greater depth’ will move, so changing the standard required to achieve the headline measure.

The Statement of Intent also confirms that, as in 2016, the 2017 tables will include the percentage of pupils achieving the higher standard in each of the reading, maths and GSP (grammar, spelling and punctuation) tests, as well as the percentage ‘working at a greater depth’ in writing.

These percentages will also be supplied for boys and girls, disadvantaged/other pupils and for low/middle/high prior attainers (though one has to download the full dataset for each school to find this information in the 2016 tables).

Performance against the headline measure

SFR 43/2017 says that attainment against the higher standard headline measure:

‘…increased by 3 percentage points, from 5% in 2016 to 9% in 2017’

A footnote adds the explanation that ‘gaps are calculated from unrounded figures’.

This compares favourably with the eight percentage point increase in the proportion of pupils achieving the expected standard, up from 53% to 61%. The scale of improvement is much greater, relative to the size of the successful cohort last year.

The underlying data from each SFR can be used to calculate these changes more accurately.

The 2016 provisional data showed that 31,391 of 582,016 eligible pupils achieved the headline higher standard (5.39%). This time round 51,940 of 600,066 pupils achieved this outcome (8.66%).

The improvement compared with 2016, calculated to two decimal places, is therefore 3.37 percentage points. In terms of raw pupil numbers, almost two-thirds more (65%) achieved the higher standard in 2017 than in 2016.

Chart 1 compares the percentages achieving higher and expected standards.

The 2016 provisional data was adjusted very slightly upwards in SFR62/2016, the underlying data for that publication recording 53.47% achieving the expected standard and 5.42% the higher standard. Similar adjustments are likely in 2017.

Chart 1: Percentage achieving the expected standard and higher standard headline measures (comparing provisional data from 2016 and 2017)

The SFR implies that this substantial improvement may be more attributable to familiarisation than to improving standards:

‘This increase may be due to pupils and teachers further becoming more familiar with the increased levels of demand of the new assessments, aligned with the new, more challenging national curriculum, in their second year.’

That interpretation is consistent with Ofqual’s research into the ‘Sawtooth Effect’:

‘…performance on high stakes assessments is often adversely affected when that assessment undergoes reform, followed by improving performance over time as students and teachers gain familiarity with the new test. This pattern reflects changes in test-specific performance over time, whilst not necessarily reflecting changes in a cohort’s overall mastery of the subject…’

It found that:

‘…it seems to take roughly 3 years for students and teachers to become familiar with the nature and requirements of new assessments…’

And:

‘When estimates of outcome change were calculated (using simulated outcome distributions), the size of these changes were relatively small, with estimated average outcomes changing by 2% each year for the first 3 years, and then by 0.5% per year thereafter.

But Ofqual’s study is confined to GCSE and A level assessments:

‘A further avenue for future research would be to see whether the 3 year adjustment period remains consistent for other qualification types, such as …Key Stage 2 assessments, as there may be differences in institutions’ abilities to adapt to assessment change.’

These KS2 improvements are significantly larger, which might suggest a comparatively bigger sawtooth effect at KS2, though there are other possible explanations.

STA and/or Ofqual should be commissioning research to investigate further, including the potential for differential effects at the expected and higher standards.

One contributory factor is that the proportion of learners achieving the higher standard in 2016 appeared particularly depressed, even allowing for the sawtooth effect. It would be interesting to know whether this statement is borne out by internal modelling undertaken prior to the tests.

It was in no-one’s interest to admit this at the time, since it would have invited further awkward questions about teacher preparedness and priorities, details of assessment design and so on. Except on this blog it was widely ignored.

Gender differences in performance against the headline measure

There are substantial gender disparities in achievement on this measure.

According to the 2016 provisional data 56.4% of those achieving the headline higher standard were girls. This has risen to 57.5% in 2017, so there is a 15 percentage point gap in favour of girls, up two points on 2016.

Moreover:

In 2016, whereas 6.21% of girls achieved the higher standard, only 4.61% of boys did so – a gap of 1.6 percentage points.

By comparison, in 2017, 10.16% of eligible girls achieved the higher standard compared with 7.21% of eligible boys – a gap of 2.95 percentage points.

The 2016 percentages were revised slightly upwards in SFR62/2016 to 6.25% for girls and 4.64% for boys.

So, while the percentage of successful boys increased substantially, by 2.6 percentage points between 2016 and 2017, the improvement in the percentage of successful girls comfortably exceeded this, at almost four full percentage points.

Chart 2: Percentage of boys and girls achieving the higher standard headline measure (comparing provisional data from 2016 and 2017)

For comparison, improvements at the expected standard were slightly below eight percentage points for boys and girls, the improvement for girls being only marginally stronger.

This widening gap between girls and boys achieving the higher standard will become problematic if it persists into 2018 and beyond.

The gap between the strongest and weakest-performing local authorities is also growing. In 2016 it stood at ten percentage points but in 2017 it has stretched to 13 points. When gender is factored in several authorities record 4% of boys at the higher standard, while Kensington and Chelsea reaches 22% amongst girls.

In 2016 the gap between the authorities with the strongest performing girls and the weakest performing boys was almost 13 percentage points; now it is some 18 points.

Achieving the higher standard in each assessment

The data for separate assessments helps to expose the ‘weaker links’ that are holding back performance against the headline measure.

Comparison between the provisional underlying data for 2016 and 2017 reveals that:

In 2016, 14.7% of pupils were ‘working at greater depth’ in writing TA, 16.6% achieved the higher standard in the maths test and 18.8% achieved the higher standard in the reading test. The comparable percentages in 2017 were 17.7% (writing TA), 22.7% (maths test) and 24.6% (reading test), improvements of 3.0, 6.1 and 5.8 percentage points respectively. Hence the improvement in the success rate in writing TA was about half that in the maths and reading tests. It remains the least likely field of high achievement – and the gap is growing. There are significant gender differences however.

In 2016, boys’ success rates were 16.0% in reading, 18.2% in maths and only 10.8% in writing. In 2017 these improved to 21.5% (reading), 24.3% (maths) and 13.0% (writing), adding 5.5, 6.1 and 2.2 percentage points respectively. So while writing is improving it is doing so at a relatively slower rate.

Chart 3: Percentage of pupils, boys and girls achieving the higher standard/WAGD in each assessment (comparing provisional data from 2016 and 2017)

The pattern for girls is different. In 2016 girls’ success rates were 21.7% in reading, 14.9% in maths and 18.9% in writing, so maths was their ‘weakest link’. In 2017 they improved to 27.9% (reading), 20.9% (maths) and 22.6% (writing), adding 6.2, 6.0 and 3.7 percentage points respectively. The gap between high achievement in maths and writing has narrowed significantly.

Boys improved a little more than girls in maths, but girls out-improved boys in reading and especially in writing. The gender gaps in performance at the higher standard in 2016 were 5.7 percentage points in reading (in favour of girls), 3.3 percentage points in maths (in favour of boys) and 8.1 percentage points in writing (in favour of girls). In 2017 these had changed to 6.4 points in reading, 3.4 points in maths and 9.6 points in writing.

The male advantage in maths reverses the position at the expected standard, where girls are one percentage point ahead. The female advantage in reading is similar to that at the expected standard (a seven point gap) while for writing it is rather lower (a 12-point gap).

In 2016 22.5% of pupils achieved the higher standard in the GSP test and this improved to 30.9% in 2017, up 8.4 percentage points. Boys improved 6.1 percentage points, from 18.5% to 26.6%, while girls improved 8.7 percentage points, from 26.7% to 35.4%. The gender gap is similar to that at the expected standard.

It remains to be seen whether reversion to a ‘best fit’ methodology in the TA framework for writing – and potentially a longer term shift to comparative judgement or an alternative model – will assist rather more boys than girls to exceed the WAGD threshold, so helping to narrow the emerging gap on the headline measure.

Achieving high scaled scores in the reading, maths and GPS tests

Comparison between the provisional data for 2017 and 2016 shows that:

The percentage of pupils achieving a maximum scaled score of 120 increased from 1.1% to 1.6% in reading, from 0.8% to 3.3% in GSP and from 0.3% to 1.5% in maths. The improvement in reading is noticeably less substantial than the fourfold improvement in GSP and the fivefold improvement in maths.

The percentage of pupils achieving a scaled score of 115 or higher increased from 7.0% to 10.0% in reading, from 6.3% to 14.0% in GSP and from 4.0% to 6.8% in maths. The biggest proportional improvement was in GSP, the smallest in reading, with maths rather closer to the profile for reading.

It seems that the highest attainers are rapidly improving their capacity to ‘ceiling out’ in maths and GSP, but less so in reading. The previous pattern of Level 6 performance – with both maths and GSP comfortably ahead of reading – may gradually be reasserting itself.

Further down the distribution – midway between the higher standard and the ceiling – the gap between maths and reading has not yet closed. Nor is GSP quite so far ahead, but it could be by 2018 if present trends continue.

Conclusion

There has been significant improvement in performance against the headline higher standard measure. This was to be expected given the perilously low success rate in 2016. There remains considerable room for further improvement.

Some or all of the improvement since 2016 may be attributable to increasing familiarisation by virtue of the ‘sawtooth effect’. This looks set to continue, but further research into the impact of the effect as it operates at KS2 is necessary.

The growing gender gap in favour of girls on the headline measure will become problematic if it continues. WAGD in writing is boys’ ‘weakest link’ while the higher standard in maths is the ‘weakest link’ for girls.

But girls have improved substantially in maths since 2016 and their success rate is now close to that for writing. It remains to be seen whether a shift back to ‘best fit’ for writing TA will work in favour of boys.

It would be helpful to publish the data that completes the Venn diagram of high achievement, so we can establish what proportion of learners achieve the higher standard/WAGD in two of the three assessments – and which pairings are more prevalent. It would also be helpful to understand the correlation with achieving the higher standard in GSP.

The provisional data does not include material about excellence gaps, between disadvantaged high attainers and their more advantaged peers. In 2016 the revised data in DFR62/2016 showed that only 4.5% of those achieving the higher standard headline measure were FSM-eligible and only 11.6% were disadvantaged. Within these populations there were large gender disparities in favour of girls.

It will be essential that FSM and disadvantaged learners improve significantly on this position in 2017, or further questions will arise about the efficacy of pupil premium funding for disadvantaged high attainers in primary schools.

September 2017