Amazon Develops AI to Find Private Data in Open Source Code



Amazon recently developed a new software service called “Secretes Detector” which can scan publicly available source code and identify potentially private information. What challenges do open source code and code repositories present, what will the new service do, and what steps can engineers take to avoid such errors?

The battle between closed and open sources has been going on for decades and has a lot that both sides hate to admit. They both have their own advantages and disadvantages. For example, closed-source software is arguably more secure because hackers do not have access to the source code, which hinders their ability to find loopholes. However, the bugs that end up in the final version can only be fixed by those who own the source code, which may never happen.

Open source is exactly the opposite; making the source code publicly available makes it easier for anyone to study the code in depth and find loopholes more easily. However, open source communities generally allow third party input and suggestions to code, which makes it possible to find bugs and fix them much faster. Open source code also has the added benefit that users can see exactly what the code is doing by being completely transparent, making it very difficult to add malicious code.

The public nature of open source code means engineers need to be extremely careful about what gets published. For example, IoT projects will undoubtedly use credentials such as API keys, usernames, and passwords, and therefore it is essential that these are removed from files before publishing. This is easy to do if the files are hosted in a location where a user controls the files posted, but the introduction of services like GIT can make this extremely difficult.

Several people can own a project, all of whom can push and extract code, and it only takes one of those people to have left personal details. This problem is compounded when you consider that services like GIT have version control that keeps copies of older code and tracks changes. So deletion of an API key found in some uploaded files may still be present on other requests fetched and tracked in version control.

Last year, Amazon released a smart software manager called CodeGuru that helps users create high-quality code by checking the syntax, structure, and overall quality of the code. Recognizing the security challenges faced by version control and open source code, Amazon recently released its latest service for CodeGuru called Secretes Detection.

Powered by machine learning, the new system can analyze code and identify potentially private information, including usernames, passwords, credentials, and API keys. Amazon hopes the new system will prevent the accidental publication of such data, especially for widely used software. An example of what Secretes Detection could have prevented is the unintentional release of AWS credentials by an Uber design engineer in 2017.

Secret detection will be available to CodeGuru developers at no additional cost and is expected to be a game-changer in software version control. In addition, the new system will allow verification of code, including Java and Python, configuration files and documentation.

For developers, version control is an essential part of the development process. Accidental public exposure can occur when proper precautions are not taken when modifying code and releasing new versions. It only takes an exhausted developer to make a few changes to the source code to fix a flaw, then push to the latest version without removing credentials.

One method engineers can use is to separate the credentials from the code and then create a blank configuration file that cannot be updated. Users who extract the source code have their own local copy of a credentials file that includes their own private data. When code changes are made and then released, the version control system ignores the credentials file.

However, this method depends on engineers understanding the importance of security and placing all private data in their own local credentials file. An engineer would still be able to overwrite variables referring to an external credentials file with an absolute value.

This is why software systems like Amazon’s Secret Finder can be powerful for engineers. If more than two engineers are working on a coded project, great attention should be paid to what information is stored, how it is stored, and where it is.



Comments are closed.