Keepers of the Machines: Examining How System Administrators Manage Software Updates Li et al., SOUPS 2019
Continuing through the SOUPS proceedings, I wanted to take some time to write about this paper by Frank Li, et. al. Their work focusing on the perspective of the administrator as it relates to managing software updates received the Distinguished Paper Award. I think what stood out about this paper is their focus on the admin. Most of the literature surrounding updating focuses on the end-user which is useful; however, admin who manages thousands of devices has to consider different things. In their work, they do an in-depth investigation of the process admins follow when updating the machines they manage.
Tell me about the data
The data that they collected as part of this study came from two sources: a survey and semi-structured interviews. The survey that they administered had 41 questions and the participants were recruited through social media, blogs, and at the Large Installation System Administration Conference (LISA) conference. In total, they had 102 people take the survey. They coded the open-ended questions from the survey using open coding by two researchers. For the semi-structured interviews, they had 17 interview subjects which took part in interviews that ranged from 1 - 3 hours. Similar to the open-ended questions, two of the researchers coded the interview responses.
The population that they interviewed was mostly male with 6/102 survey and 2/17 interview subjects being female. The survey respondents on average had a median of 11 years of experience, while the median for the interview subjects was 6 years. Another piece of demographic information that I found interesting was that almost half of the participants for both the survey (56/102) and the structured interviews (8/17) worked at organizations with more than 500 employees.
After reading some of those demographic details, you are probably thinking that the respondents might not be representative of system admins in general and the authors would agree with you. In their paper, they call out that because of where and how they recruited participants their findings may not be representative of all admins.
Five Stages of Updating
From the data collected through the survey and the structured interviews, they found that the update process for administrators followed five stages:
- Learning about updates from information sources
- Deciding to update based on update characteristics
- Preparing for update installation
- Deploying the update
- Handling post-deployment updates issues.
Where do you get your info
In both the survey and the interviews, they asked participants where they found out about updates. The participants provided a range of different sources which can be seen in Table 1 below.
The median survey respondent reported using a median of 5.0 different types of sources. The authors confirm what I thought while reading this paper – that admins could miss out on relevant information if they don’t follow a source in general or don’t diligently follow their usual sources. In addition, several of the sources named like blog posts and social media require the admin to take some action in order to receive any information.
After administrators find out about updates, they have to decide if they are going to actually deploy them. From the survey and interviews they found five general factors that affect whether or not they are going to apply an update:
Update type: The authors aksed the survey admins how regularly they installed security or non-security updates. They found that 97/102 admins regularly installed security updates while only 63/102 regularly installed non-security updates. The sentiment of regularly installing security updates was shared by the interview participants. The authors noted that while admins prioritize fixing security bugs/vulns many platforms bundle security patches along with along with other features. This means that they have to consider more than just improved security when applying updates.
Proper Preparation Prevents Poor Performance
Once admins have decided to roll out an update, they made preparations that fell in three categories: making backups/snapshots, preparing machines in terms of changing configurations or dependencies and testing updates for bugs.
In both the survey and semi-structured interviews, the authors asked participants about the problems that they’ve run into when trying to apply updates. The general sentiment was that it wasn’t an entirely easy process and that there was some risk even if it was small. The responses showed that some admins had worse experiences than others.
“I stopped applying updates because it was becoming more of a problem to apply them than not to. Production machines, they don’t get updates” (P12).
I think it’s important to note that P12 probably has the best of intentions and desires when it comes to maintaining their organizations infrastructure, but they are jaded from their experience.
When asked about the testing strategies that admins used for updates, most fell into one or two camps: staggered deployments or dedicated testing environments. Those that utilized staggered deployment had a few different approaches. One approach was applying updates based on priority levels with lower priority machines getting the updates first to uncover any potential issues. Another approach was to have a group of end users to help with update testing. For those that fell into the the dedicated testing, they used dedicated hardware or relied on a quality assurance (QA) team to to test updates.
The authors note that both of these strategies have their downsides. When using the staggered deployment approach, higher priority machines or certain groups of users are left more susceptible to potential attackers. And to perform dedicated testing, admins must have access to the computing resources or employees which are dedicated to testing.
When asked how they deployed updates, the authors received a variety of answers from in-house scripts, automatic updates, to manual application. In addition, the methods that they used was dependent on the machine being updated. The majority of respondents used some type of third-party update manager such as Ansible or Chef to apply updates and about half created custom scripts to deploy updates. While the respondents detailed the importance of automation when managing hundreds of machines, they also noted the effort that it takes to get that automation right. Also, to no one’s surprise, manual updates were still common with 40/102 survey respondents and 4/17 interview subjects saying they conducted manual updates.
The “when” to deploy is probably just as important as the how. Most of the interview subjects when asked when they would apply updates used either a predictable schedule such as a weekly patching program or to update during off-hours.
It worked on my machine.
As noted before applying updates can sometimes come with issues. The two most common approaches were to either just uninstall the update or rollback to a previous snapshot or backup. Both of these strategies aren’t great in that they leave the machines in their previous state which means that they’re potentially vulnerable if the update involved security patches. The two approaches really enforce the mindset that admins really want people to be able to get their jobs done. They are prioritizing functionality over security.
Let’s ask the boss
The admins in this study had different organization policies or oversight when it came to applying updates. Some were given the autonomy to apply updates as they saw fit, some had to get buy-in from management before they could perform certain actions, and others had to comply with organizational policies or compliance requirements. Another unfortunate but not surprising finding by the authors was that several of the admins in this study commented on the not having budget to support admin operations surrounding updating such as software to deploy updates.
So what should change?
In the paper, the authors make several suggestions to ease the pain of managing software updates.
When it came to finding out about updates, they suggest standardization and the consolidation update information into one place so that there’s a single source of truth. They also mention the possibility of outreach campaigns to promote updating and inform admins of vulnerabilities.
To ease the decision making process of whether or not to install an update, the authors suggest splitting all-inclusive updates into updates of specific types. I think this is an interesting approach but one that could cause potentially more problems than good especially for platforms or software that is used by enterprises and common users. I personally would like to see some data surrounding the potential consistency in which users would install security only patches.
To improve the deployment process, the authors encourage the usable security community to take a look a the common update tools to see how their interfaces could be improved. They also mentioned that dynamic software updating (DSU) which help the dreaded restart or downtime could help alleviate some of the problems associated with finding the right time to deploy. The authors relent that there are still a lot of unknowns surrounding the increased difficulty in developing patches that use DSU and the general usability of those systems.
Last, the authors speak to the need for a cultural shift at organizations to understand the importance of quickly applying updates, especially those of the security variety. Moreover, they mention that without proper resources surrounding applying patches that security lapses like data breaches can occur. While they point out the importance of having organization support when it comes to applying patches, they note that getting to an ideal state is not straightforward.
Frank Li and his co-authors point out that
1) There are some similarities between the update process for admins and end-users but that admins have different considerations and operate on different scales
2) There are distinct pain points throughout the admin’s process and there needs to be further research surrounding solutions to these pain points.