Code generation systems like DeepMind’s AlphaCode, Amazon’s CodeWhisperer, and OpenAI’s Codex, which powers GitHub’s Copilot service, offer a tantalizing glimpse of what’s possible with AI in computer programming today. But so far only one handle of these AI systems have been made freely available to the public and open source, reflecting the commercial incentives of the companies building them.
In a bid to change that, artificial intelligence startup Hugging Face and ServiceNow Research, the R&D arm of ServiceNow, today launched BigCode, a new project that aims to develop “state-of-the-art” AI systems for code in an “open and accountable” way. The goal is to eventually release a data set large enough to form a code generation system, which will then be used to create a prototype – a 15 billion parameter model, larger than Codex (12 billion parameters) but smaller than AlphaCode (~41.4 billion parameters) – using ServiceNow’s internal graphics card cluster. In machine learning, parameters are the parts of an AI system learned from historical training data and essentially define the skill of the system on a problem, such as code generation.
Inspired by Hugging Face’s BigScience efforts to open up highly sophisticated text-generating systems, BigCode will be open to anyone with a professional background in AI research who can commit time to the project, organizers say. The application form posted this afternoon.
“In general, we expect candidates to be affiliated with a research organization (academic or industrial) and to work on the technical/ethical/legal aspects of [large language models] to code apps,” ServiceNow wrote in a blog post. “Once the [code-generating system] is trained, we will assess his abilities… We will strive to make the assessment easier and broader so that we can learn more about the [system’s] capacities. »
By collaboratively developing a code generation system, which will be open source under a license that will allow developers to reuse it subject to certain terms and conditions, BigCode seeks to resolve some of the controversies that have arisen around the practice of AI – motorized code generation – especially regarding fair use. The nonprofit Software Freedom Conservancy, among others, has criticized GitHub and OpenAI for using public source code, not all of which is permissively licensed, to train and monetize Codex. Codex is available through OpenAI’s paid API, while GitHub recently started charging for access to Copilot. For their part, GitHub and OpenAI continue to assert that Codex and Copilot do not violate any license terms.
The organizers of BigCode say they will work to ensure that only files from repositories with permissive licenses enter the aforementioned training dataset. Along the way, they say, they will work to establish “responsible” AI practices for training and sharing code-generating systems of all types, seeking input from relevant stakeholders before making policy decisions. .
ServiceNow and Hugging Face did not provide any timelines for when the project would be completed. But they expect it to explore several forms of code generation over the next few months, including systems that automatically complete and synthesize code from snippets and natural language descriptions and work in a wide range of domains, tasks and programming languages.
Assuming the ethical, technical, and legal issues are ever resolved, AI-based coding tools could significantly reduce development costs while allowing coders to focus on more creative tasks. According to a study from the University of Cambridge, at least half of developer effort goes into debugging, not active programming, costing the software industry an estimated $312 billion a year.