How can a good developer experience improve your security posture?
This post is aimed at security leaders and teams. But it’s likely (hopefully!) to be useful for engineers as well. We are all trying to solve the same problem – ship software swiftly and securely – and a little empathy for each other goes a long way. Before I dive into the why, I’ll give you the summary. The big-line-up-front as it were.
Some metrics that I’ve observed to be useful in measuring the joint effectiveness of the security/engineering team are:
Amount of defects that reach production This number should be reducing over time – are we getting better at identifying and fixing potential issues earlier?
Time between a new issue being identified and it being remediated This number should also reduce – it’s a good proxy for increased automation and a better signal to noise ratio in notifications.
Amount of time that an engineer spends on defect resolution, away from feature development This is another number you want to see come down. This metric is not just a security one, but also an indication that the overall operational effectiveness of the system is improving. Fewer tasks marked as ‘chore’ may also be useful to track here.
Developer sentiment on the security program This may be the best way of getting qualitative feedback that will inform the quantitative feedback. If your metrics are ‘good’ but the engineers are not happy about the friction then you still have work to do.
Anyway, on with the longer part of this.
The hardest job in security, especially in a smaller organization, is relying on influence rather than positional authority to drive the security outcomes. This is because a lot of the work to deliver these security outcomes is not done by you. ‘You’ being the security team. It’s the people who own the workloads who are on the hook for deploying and running the systems, not the security team. Successful security programs rely on earning trust and influencing positive outcomes across the organization.
This means that as security folks we need to work with engineers and product owners to help them understand security objectives – in the context of the systems they are running. You can measure the effectiveness of a security program in a variety of ways. These metrics can then be used to communicate progress, inform further decisions, or justify requests for funding. Common metrics (be aware that the level of detail is important here) include time to patch, numbers of vulnerabilities discovered at various stages of the software development lifecycle (SDLC), time taken for security reviews, or quantity of security incidents. General metrics – and what you’re trying to communicate with them – is a topic for another post, but I strongly believe that the best metrics allow you to inspect the operation of the organizational machine, rather than just the output. For example the time between issue identification and resolution is much more useful that the raw number of issues. Context is so important with metrics, as that’s what helps leaders have enough information to make an informed decision without needing to parse raw data. In fact, focusing on the output metrics can confuse the message you’re trying to send or, at worst, influence bad decisions. “We haven’t had a major security incident therefore everything is fine”, for example. This isn’t to say that you should take the FUD approach. You earn trust over time by being critical and considered, which results in a situation where when your security program needs funding, re-prioritization, or deeper focus you’re not met with a shrug that it’s just the shouty CISO demanding money again.
Which brings me to developer experience. Security accountability sits with the CISO, but the work happens in multiple parts of the business. Focusing on application security (AppSec), or the security of the systems you build, we can see that the engineers who build, deploy, and operate the systems are key. Security folks own the goals of an AppSec program, but to consistently achieve them we need to partner with engineering. Your specific implementation may vary but, at a high level, most AppSec programs are trying to do very similar things: we want to be able to reason about the security properties of the systems we build; identify where we have deviated from a desirable state; and have mechanisms that alert us to unexpected or undesirable change. So far, so high level. What this means in practice is that we need to understand what we’ve built and what can go wrong (doing some threat modelling of our system is the best foundation for that), then reduce the cognitive load for the humans that need to react to some trigger for remediation work to happen. The easiest time to perform remediation (or cheapest or least annoying, depending on your perspective) is early in the software development lifecycle (SDLC). However you can alert developers *too* early. It may take some time, partnership, and trust-building before you identify the sweet spot for identifying and fixing software issues in your SDLC process.
The real goal here is to try and reduce the number of detectable or preventable software issues that reach production to as close to zero as possible. There will always be the need to respond to unexpected issues that manifest in a deployed system because dependencies are hard, security researchers poke into a variety of systems, and things like log4shell happen. Being able to respond to those issues effectively is a consequence of having good development rigour.
So why is the developer experience a good metric? What does that even mean? The way I think about this has two parts… What *outcome* am I asking the developers to deliver? And what is the security program doing to reduce the effort that developers need to expend in getting there? The second one could also be framed as ‘how can I help the developers spend as much time on product features as possible while still maintaining the security posture of our system?’ Our system is important – all the parts of the business are working together, it’s not an us and them type situation. A big part of that is the security culture that you foster and grow in your organisation,but that’s a topic for another post.
What does it look like to help developers spend as much time on features while still identifying or addressing potential issues before they end up in prod? Again, two things...
Firstly you need to understand how work happens. What are the steps from work planning and prioritization to individual dev work to code review & deployment? At what points are people working independently, and where are they working together? What are the steps to get from pull-request to code running in production? Where in the SDLC are the stages where folks are trying to figure out the problem compared to where they are integrating with other parts of the system? The actual steps taken are going to be specific to your organization or environment. It’s also where you as a security person can, in partnership with the engineering team, identify the most useful place in the flow(s) to notify that some engineering work is needed to meet a security objective.
Secondly, you need to inspect the tooling to identify where you can use automation most effectively in your continuous integration and continuous deployment (CI/CD) flows to make it easier to identify and remediate issues. This is also where the security team may need to take on some engineering work (or, again partner with your platform/infrastructure teams) to reduce the complexity at point of developer consumption. The choice of particular tooling, how it is implemented and, more importantly, measured will have a significant impact on the developers. If you get this right (or at least right enough to start with, and then be prepared to tweak as you go) you’ll help both groups achieve the shared goal of shipping securely and swiftly. If you get this wrong you’ll likely annoy the people building software, lose trust, and make it harder to identify issues.
So what does this look like and how can we measure it? There will be multiple places where checks will occur and the ideal state is to minimize the amount of blocking that happens when a test runs. You’ll have multiple types of test, ideally these should be integrated with other automated functional testing. There’s a significant difference between “a vulnerability has been found in package X you must upgrade this now before you do anything else” and “package X has a vulnerability, it’s fixed in version n.n and you won’t be able to progress past a test environment until it’s resolved. You’re not blocked but this is a heads up that you’ll need to sort this out before merging”. Depending on the particular type of issue it may be possible for tooling to automate the mechanics of the resolution. This could be by automatically updating packages or dependencies or by creating a pull request that includes code fixes. These processes are extremely useful, but they need to be well communicated and you need to make sure that functional tests still pass when there are automated (or semi-automated) code updates. The worst thing you can introduce as a security team is a situation where ‘security made me use some tooling and it broke my stuff!’. A gradual approach to automated remediation is likely the best one. The other thing to keep in mind is that you want to try to only provide actionable notifications/triggers/warnings to developers. This is going to take some work on the part of the security team. This is ok though,because the value of this work is to reduce the impact on developers when they need to respond to these notifications and, more importantly, build trust with the developers. The positive impact to a security program where the devs & security people are working together is significant.
Which, finally, brings me to the metrics. These should be transparent to both security and engineering teams. You should regularly review them to understand where improvements can be made and where there’s too much friction. For example, friction may be experienced by engineers getting pulled out of the dev thought process, by engineers spending too much time dealing with security findings, or by security folks having to parse information and then present it back to engineering. Regular inspection of the system with equal input from all parts of the team helps keep the machine running efficiently.
This brings me back to the metrics I mentioned at the start:
Is the amount of defects that reach production reducing over time? Are we getting better at identifying and fixing potential issues earlier?
Is the time between an issue being identified and remediated reducing over time? Have we got the right level of automation and a good signal to noise ratio in notifications?
Are engineers spending less time away from feature development and on defect resolution? What is developer sentiment on the security program? If your metrics are ‘good’ but the engineers aren’t happy about the friction – what can you do differently?
So what should you do? The first thing is to actually talk to the engineering team! Find out what their work looks like and where they see friction. Communicate what the goals of the security program are as it relates to building software. It’s better to say “we want to identify potential security issues before we push to production because it gives us more time to resolve them” than “hey folks I’ve bought this tool and now you must use it”.
The next thing, and this may well come up when you talk to the engineering team, is to understand what metrics they are currently using. How are they tracking time spent on various pieces of work? What does sprint planning or prioritization look like? How are those metrics communicated? The way the engineering team measures themselves provides a good opportunity for the security team to understand the impact (good or bad) of getting security work into engineering. Once you have that visibility you can improve how you get to the desired security outcomes. Your engineering partners will appreciate you for removing friction.
Ultimately, security is just another quality metric. We shouldn't’ treat it as ‘other’ or ‘different’ – security work is just another component of the systems we build. The easier it is for engineers to achieve good security outcomes, the better the systems we build and run will be.