Designer, Coder, Founder


Changing The Way We Do Research


Plato, the product I am building with BfB Labs is a ground up redesign of clinical research for a connected age. It’s a recent project, but I’ve been exploring these ideas for some time now. This blog post from three years ago, written by the inimitable Quynh Pham, captures the experiences that led us both to explore this space.

After 3 months of having our users share their data with us on how they experience anxiety, I am so pleased to say that Flowy has been having a clinically significant positive impact on our users’ anxiety symptoms. 

We’ve been getting great qualitative feedback from our users ever since Flowy was released, but this is the first piece of quantitative data we’ve been able to report on since our pilot trial last year. We were hoping that Flowy would play an important role in our users’ mental health journeys, but this really surpassed our expectations in the best way.


I am writing this post to share the exciting results from our data analyses, but to really understand how much these results mean to us, I’d like to take you back a couple of months when we were all preparing to present our work on Flowy to the kind folks at Nominet Trust; at 11 PM on a Saturday night (the prime writing hour, if you must know) I was scrambling to finish my research portion of the presentation. It was during this delirious and fervent writing session that I stumbled upon some off-handed notes Simon had made in his section of the presentation:

“It seems clear to me that the way we evaluate is just as due for disruption as the way we treat. ­”

That sentence changed everything. Not only did I steal it and shamelessly claim it as my own, but it went on to colour my views on evaluation in our field. To give you an idea of how timelines work in industry research that aims to mirror academic research, I started designing Flowy’s pilot trial in November 2013 and just submitted an amended second draft of our manuscript for publication in the Games for Health Journal last Wednesday – that’s 1 year 9 months and counting. If you think that sounds like a long time, consider the fact that the average duration of a randomized controlled trial from the initiation of subject recruitment to publication of the outcome is 5.5 years(1). For research in mHealth, this time lag is especially problematic: the technology being evaluated may well be obsolete before the trial is completed.

Consider the alternative: a small games studio sets out to change the evaluation game by directly embedding psychological self-report questionnaires in their gaming app – these questionnaires are the same ones used by NHS psychological therapy services to measure anxiety(2).  Players are asked to complete an evaluation every 2 weeks. Within 3 months, 660 players have completed an evaluation. 85% of them are moderately to severely anxious.

Of the 60 players who completed a 2-week follow-up evaluation, there was a significant difference in their scores – players felt significantly less anxious(3).

  • 32% showed reliable improvement(4). 20% showed reliable recovery(5).

  • The number of users who were moderately to severely anxious went from 65% to 48%.

  • The number of users who met the NHS clinical cut-off score for anxiety(6) went from 82% to 68%.

I analysed these results on July 30, 2015 and wrote this post 2 days later, which means this whole process took 3 months and 2 days from recruitment to publication.

It’s true. It happened. And it was amazing.

“It seems clear to me that the way we evaluate is just as due for disruption as the way we treat.” 

Truly, I think we’re off to a good start.


1. Ioannidis JPA. Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials. JAMA 1998;279(4):281–6.

2. Generalized Anxiety Disorder–7 item questionnaire (GAD)

3. A dependent-samples t-test was conducted to compare GAD scores pre and post 2-week play. There was a significant difference in the scores for pre (M=12.23, SD=5.22) and post (M=10.7, SD 4.74) play; t(59)=2.33, p=.023. The effect size for the two groups is small (0.3).

4. Reliable Improvement: a decrease in total GAD score of ≥4

5. Reliable Recovery:  a decrease in total GAD score of ≥4 and a total score ≤7.

6. A total GAD score ≥8.

Simon Fox