Real World Machine Learning with Susie Sheldrick #APIDAYSAU

I’m at the API Days conference, and one of the first sessions of note was Deep Learning: Real World Applications with Susie Sheldrick, which explored some of the practical real-world challenges related machine learning, based on experience.  I also caught up with her after the session where we expanded on some of curlier questions.

Quick Context: 30 Second Intro to Machine Learning

Susie kicked off with a simple diagram that sums up what machine learning is in comparison to traditional applications:


Machine learning partially turns this model on its head: the solution is able to “learn” its own rules (through training its internal rules model) at much greater scale than some person/team coding them by hand.  So, rather than feeding data and manually created rules into an solution, simply train the solution to produce its own rules.

The Chaser

This nice intro kicked off a mental train of thought for me: in practice the more complete solution probably looks something like this:


The end goal is still to build an solution that provides the answers users were seeking, we’re simply using machine learning to help out with the rules.

Devil in the Detail

That all sounds wonderful on paper – or in ivory-tower pixels – but, as should be no surprise, the real world is not so straightforward.

Of critical importance:

  • Understanding the problem you’re trying to solve.
  • Gathering the right data to train the model.

This is much easier said than done, it transpires that:

  • It’s all too easy to inadvertently train bias into the rules model.
  • Tracing exactly how the AI made a specific decision actually turns out to be really hard.

Whilst the second point has obvious implications for developers and testers, both points combined have massive implications for your legal teams, anyone who considers themselves ethical (like you, right?), product owners and anyone at the receiving end of a machine determined decision.


Susie gave some examples of unexpected and undesirable bias ending up in rule models, such as one experiment that determined prisoners eligibility for parole.  It turns out that the model significantly favored granting parole to white prisoners and was relatively much less favourable to prisoners of colour.   In contrast, in terms of parolees reoffending – the actual results were the exact opposite of the bias.

It turns out that the information used to train the model was “correct” but only in the sense that it faithfully transposed the bias already inherent in the legal system, against people of colour.

True Representation

A related issue isn’t so much of bias in the data, but of bias stemming from an absence of data.  Once more issues of race come to the fore; this time it was a passport application solution that told an Asian gentleman his submitted photo “did not meet our standards” because he was “asleep”.  As you might be able to guess, the model had obviously not been sufficiently trained with data that faithfully represented the entire user base, and therefore could not correctly handle non-european facial features.

Just to be crystal clear, the technology is more than capable of correctly handling a wide range of cases, nuances and subtlety – including racially based facial features.  The actual issue is the correct training of the model – meaning it’s critical to gather the right data, data that covers the entire spectrum of cases.  Not to mention testing and monitoring the behaviour of the solution.

Building an AI Solution: Custom or OOTB?

If you’re about to embark on a project that involves machine learning, one of the practical questions you’ll come up against is whether or not you can use an Out-Of-The-Box (OOTB) solution, or need to custom build something.  Susie’s discussion here was mostly in reference to the rule models specifically.  If you want a model capable of identifying cats in pictures online for you meme generator – you’re in luck, but if you need to correctly identify something more obscure, or more specific, you may have to build this model yourself.  Which is why the stuff above about bias is so important, because you’re going to have to navigate that minefield yourself.

Further Questions

Our chat after the session was very stimulating; a couple of the more curly questions that our conversation provoked were:

How to identify, and test for, unexpected bias?

The obvious ethical reaction to all of this is “great, let’s ensure we keep unwanted bias out of the model and our solution”.  What is much less obvious is how to do that.

Were the team behind the parole example conscious of the bias in that solution?  Let us assume they weren’t aware of it – in such a situation how would they (or you) identify that bias, and in addition, having established an operational solution how would you ensure none was introduced?

This is where, for me, machine learning is like a lens that amplifies human behaviours and bias.  It has the potential to help expose them, but how clearly, how soon, and at what cost?

How will your model react in the event of change over time?  I.e. if there is a fundamental shift in the (data) foundations on which the model was originally conceived and trained?

For example, Google is looking at moving back into the Chinese market, despite pulling out some years ago due to human rights concerns.  Hypothetical example: let’s assume that they have machine learning models built up, based on the data they currently have access to – i.e. does not include China’s current population of 1.3 billion.

What would happen if 1.3 billion Chinese people suddenly have access to a Google solution that is backed by a rules model that was not trained with them in mind?  Sure, Google’s data should be a fair representation of their current global user base, which will include Chinese – but wouldn’t adding 1.3 billion people potentially shift the model?  How will it react?  Will the responses it provides be biased against the new user population because hitherto they were not expected by the model?  Will the model be able to adapt over time, and if so how long will that be?


Please note that this post is based on rapidly scrawled notes in session and my recollection of subsequent discussions – my accuracy should be reasonable but may not be perfect.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s