The Linux Foundation and Harvard Lab have released the second in a series of studies of the most commonly used and critical software packages in regular Linux server operations. This second study focuses on the open source software most commonly deployed in private and public organizations, with a view to better assess potential vulnerabilities and where security support should be focused.
The first census in the series was published in 2015 and focused on Debian Linux software packages. This second study is based on analyzes of codebases of thousands of companies, and one of its main security findings is that approximately 80% of the lines of code in the top 50 packages were the creation of ‘a set of only 136 developers.
Linux Open Source Software Census identifies most frequently used packages and potential security issues
Free and open source software (FOSS) was chosen as the subject of the second census in this series because of its ubiquity; as the report notes, tens of millions of FOSS projects now exist and organizations of all types and sizes regularly rely on them (with approximately 98% of codebases now including some sort of FOSS element). However, decentralized distribution and freedom to modify make it difficult to track and measure the security status of these projects. The recent Log4j issue is a clear illustration of this phenomenon.
The project starts with a simple metric that has not really been sufficiently explored and documented before: which FOSS projects are most widely used? Knowing which are the most common means security resources can be prioritized to them. An earlier preliminary report published in 2020 provided two unranked lists of the top 10 most commonly used open source software at the package level, but this full and final report includes eight lists of the top 500 ranked (half of them at the package level/ version).
The report emphasizes that it does not attempt to present any security profile on open source software packages, but simply to find out which are the most commonly used so that they can be prioritized for further analysis. In addition to being reviewed for security vulnerabilities, this data also helps identify understaffed projects and those where outdated versions are commonly used.
Lessons learned from the inventory of open source software
One of the key lessons the researchers learned from this project is that there is a strong need for a standardized naming scheme for software components, an issue that also emerged during the first census. This is one of the areas where freedom to modify contributes to serious difficulties in identifying and cataloging such software, adding substantial time to the overall effort of inspecting formats and naming standards.
Documentation of package versions also proved to be a serious problem. The census relied heavily on data provided by survey respondents. In many cases, the respondent named a package version that was way beyond the most recent version in the official repository. After investigation, it was determined that this is often due to companies performing their own internal updates and not sharing them outside of the organization.
From a security perspective, perhaps the most important finding is that a relative handful of developers are responsible for more than 4/5 of the code in the top 50 projects on each list. 136 developers were responsible for just over 80% of all of that code, 23% of projects had a developer responsible for more than 80% of that project’s code, and 94% of projects had less than ten developers contributing to more than 90% of the code. coded.
The security of individual developers is also a potentially underestimated issue, given that many packages that featured in the top 500 assorted lists are hosted by such accounts. These accounts tend to have less security to protect them than organizational accounts. The report notes that account takeovers on GitHub and other sites have increased in recent times, usually with the goal of installing backdoors into the project. Developers can also simply “go rogue” for a number of reasons and unexpectedly extract access to their code or even intentionally corrupt it, as happened recently with the “colours.js” and “faker” libraries. .js”.
The study notes that government involvement could help the situation. For example, the EU established a FOSS strategy in 2014 (which was renewed in 2020), but very few other nations have made efforts of this nature in the open source software space. The United States has been slowly building a campaign for a “software bill of materials” that would require components of open-source software used in government systems to be cataloged and updated. A push for such a measure began in 2014, but did not begin to become a federal requirement until an executive order from the Biden administration last year tasked the National Institutes of Standards and Technology with developing minimal elements.