It’s often interesting to look at extreme system scenarios and see what can be learnt from them. This webinar (https://www.youtube.com/watch?v=Fz1BJTephWw) from the Software Engineering Institute (SEI) at Carnegie Mellon University, presented by Marc Novakouski and Grace Lewis, looked at some extreme scenarios and had some interesting takeaways.
Rather than regurgitate the entire presentation here, I’ll touch on the essentials and then discuss some points of interest. If you need further info have a look at the webinar – it’s pretty concise and the information is well presented, and well worth a look in any case.
What is “The Edge”
The edge refers to extreme system scenarios, such as those potentially experienced by first responders, people providing front-line humanitarian aid or combat personal – specifically where this is being done out in the field. For example: disaster relief and on-site coordination.
These scenarios are very far from the office or metropolitan ones most of us typically architect for: infrastructure (power and networking) may be unavailable or limited, limited computing power (maybe only what you can carry with you, on foot), conditions on the ground can be unsafe (natural disaster, enemy activity), and mission duration & scope may be unknown or very fluid.
Relevant System Quality Attributes
The following seven attributes were covered, based on experience gained through SEI’s Tactical and AI-enabled Systems (TAS) initiative:
Many of these may appear as common sense once you think about the scenarios a bit, for example being reliable.
Autonomy is interesting. It’s the idea that systems need some degree of intelligence / autonomy so that it can perform some actions on it’s own whilst the user is dealing with mission or environmental concerns.
Modularity & Monitoring
What I took from the quality attribute discussion is that modularity & monitoring are key to several of them.
High reliability and survivability can be achieved when the architecture is modular – allowing components to be replaced e.g. the instantiation of a new microservice instance; but more than that – in this case a modular system can also imply an array of instances forming redundancy. This second point was one of the interesting ideas discussed – a systems ability to scale down (or in) and still operate, if the edge situation demands it.
Monitoring is obviously the mechanism that enables recovery – whether through user intervention or system automation – i.e. autonomy. Whilst autonomy can cover functional aspects it’s also useful for self-repair, working towards better reliability and survivability.
Complexity vs Reliability
The autonomy discussion threw up AI as a possible way of achieving autonomy, which to me implies a level of complexity that might conflict with the desired level of reliability, since the more complex a system is the more likely it is to fail, because the number of failure points and scenarios increase.
I was able to pose a question on this and the response was elegant and – you guessed it – came back to modularity and monitoring. What I took from it was this: Modularity is the key – the idea is basically to encapsulate areas of complexity so that they are self-contained and isolated as much as possible (think IoC / dependency injection and programming against contracts). Having safely isolated the complexity, you can then use simple and robust architectures to harness the power of these complex modules – e.g. Pub/Sub.
The majority of the discussion could apply to “edge” solution design at any level – the ideas discussed are not limited to microservices in their application. That said, microservices were covered albeit briefly (there’s only so much you can cover in an hour). Certainly microservices have many attributes and advantages that make them well suited to edge scenarios where reliability and so on are important.