To prove whether coaching ‘works’ is much the same challenge a Government economist faces in establishing whether a new policy has worked (and whether the benefits are worth the costs). There are some tricky political issues too. Do the ministers, officials and advisers who commissioned a project really want to know if it isn’t working? Often their self interest – and self esteem – is tied to demonstrating success. At its worst, unsuccessful policies linger on until their originators move elsewhere. It reminds me of physicist Neils Bohr’s quote that ‘science moves on, one death at a time’.
Like most coaches, I am pretty evangelical about the power of coaching and honestly believe it works. But can I prove it?
There are different evaluation techniques we might consider and some are more credible than others. If I was reviewing coaching as a government policy, how much confidence would I have in coaching’s effectiveness if different forms of evidence were used?
1. Ask coaches. Credibility rating: 1/10. Without questioning the integrity of coaches, there can be no doubt that we have a vested interest in proclaiming the effectiveness of our profession. Indeed advocacy can be an important way in which we get business. As an analyst, I would not accept this as good evidence.
2. Ask clients. Credibility rating: 4/10. This is a useful form of feedback for coaches to find out if there are specific things they are doing that are working well or poorly. However, client reports lack rigour as a form of evaluation. The problem is the ‘focusing illusion’, where we over-weight the importance of something when we are asked about it. As one expert puts it, “nothing is as important as you think it is, when you’re thinking about it”.
3. Compare client performance against a ‘control group’. Credibility rating: 6/10. This is a more typical form of evidence in government evaluations. The outcomes achieved by the target group or area are compared against the outcomes achieved by a group or area that has received no help from the policy. The first challenge is to establish what outcomes we care about. Is it career progression, business performance or perhaps individual wellbeing?
A more serious issue is the selection problem. Take for instance the business coaching scheme I recently visited. The clients were very enthusiastic and I have no reason to doubt that they valued the programme. But it is simply not a fair comparison to compare the outcomes of those who take up coaching with those who do not. By their nature, those who select to take part in business coaching are typically more motivated, more driven – and thus more likely to succeed in future. It’s rather like looking at the superior career results achieved by graduates and assuming that it must all be due to the fact that they attended university. At least part of the difference is driven by the factors that got them to university in the first place (brains, hard work, connections etc).
4. Run an RCT. Credibility rating 8/10. There is a strong trend towards Randomised Control Trials (RCTs) in social science, building on the decades of experience in medicine. This is typified by the awards and attention lavished on the so-called ‘randomistas’ in the economics profession. To establish a coaching RCT, we would need to take all those who want to take part and randomly allocate them into two groups. Group A would receive coaching; Group B (the ‘control group’) would not. Any differences in performance are then assigned to the effect of coaching.
The weakness here is the placebo effect. Presumably it would be obvious to people whether they were receiving coaching or not, and decades of research in medicine suggests that this alone will influence their future performance.
5. Run an RCT vs a placebo. Credibility rating 10/10. This is the gold standard. Clients would be split into two groups a before. Group A would receive coaching and Group B would receive a ‘placebo’, such that neither knows whether they are receiving coaching or not. It is difficult to imagine how this might work in practice, but perhaps they could simply have a series of unstructured conversations similar in length to a coaching session. The logical extension of this approach is to compare different types or elements of coaching, to see which is more effective.
I haven’t yet explored how many of these approaches have been tried ‘in the field’ and, if so, what they have revealed. I will post up anything I find of interest. Writing this has also made me think more carefully about what I might do to more robustly evaluate my own coaching performance.