Vulnerabilities in open-source components — such as the widespread flaws disclosed 10 months ago in Log4j 2.0 — have forced data scientists to re-evaluate open-source code frequently used in machine learning analysis and model building.
According to a report by Anaconda, a data science platform company, over the past year, 40% of data scientists, business analysts and students surveyed have reduced their use of open source components, while a third are remained stable and only 7% incorporated more open source code into their projects. The majority of respondents do not report to the IT department (18%), but work within their own data science or research and development group (47%), according to Anaconda”State of Data Science 2022″ report published last week.
While software developers and IT have already begun verifying secure code, concerns about the security of open source software are a relatively new trend for the world of data science, says Peter Wang, co-founder and CEO of Anaconda.
“We see a huge proportion of people working in organizations where IT has created a very strict posture around open source and Python,” he says. “They’re not expert developers. … They’re data scientists and machine learning people who may not be very experienced developers at all, using whatever they could download to do their analysis , and then they turned it over to IT.”
The security of open source components – and the software supply chain in general – has become a primary consideration for software developers, enterprises and national governments over the past two years. In May, for example, the US National Institute of Standards and Technology (NIST) released software supply chain risk guidance. Additionally, a growing number of software vendors have joined the Linux Foundation’s Open Software Security Foundation (OpenSSF).
Overall, the maturity of organizations’ security efforts has improved. According to the June survey, about half of organizations have an open source security policy in place, leading to better performance in security readiness measures. Additionally, open source risk control efforts have jumped 51% in the past 12 months, a security maturity study said September 21.
“[W]With the focus on software supply chains, most organizations are taking a risk-based approach to application security,” Jason Schmitt, general manager of Synopsys Software Integrity Group, said in a statement announcing the study. . “Such an approach recognizes that security is not limited to the code base; this includes the software development process where security reviews and testing “go all over the place” to continually improve security outcomes.”
Developers expand use of open source
Software vendors aren’t seeing any sort of decline in open source usage, according to other data. Instead, development organizations focus on improving the security of open source software and use security as the primary guide in component selection.
According to Tracy Miranda, head of open source at Chainguard, the self-reported abandonment of open source packages by the data science community likely indicates greater awareness of security issues and less abandonment of open source components. in development.
While data science teams and development teams may have reacted differently to major security issues – such as Log4j 2.0 – companies have little recourse when moving away from an open source package than adopting a different package whose makers have put more emphasis on security, she says.
“Companies are leveraging open source as a way to increase their speed, so if they’re cutting back, where are they headed? Writing code in-house? Using third-party packaged versions?” Miranda says, adding that instead, “I think we can expect to see companies be more demanding about the quality of the open source they use, especially when it comes to security features. “.
Data scientists are playing catch-up
Additionally, while data science professionals work at companies that overwhelmingly (87%) allow open source software, about a quarter (26%) have minimal IT oversight of their open source choices, according to the Anaconda report. In 18% of companies, the IT department specifies only about half of the open source components available.
Maintainers of the most critical projects – of which there are hundreds, if not thousands – must use secure dependencies, test their own code, and validate the reliability of contributors. Maintainers should also publish a Security Scorecard – an initiative created by Google and now managed by the Open Source Security Foundation (OpenSSF), which assigns a security rating to a project based on nearly 20 different criteria.
While awareness is likely growing, there’s no quick fix, Miranda says.
“The reality is that the safest options didn’t exist before,” she says. “It makes sense to reduce unnecessary dependencies to reduce the attack surface, but it is difficult to do so once the dependency tree has grown.”