Building ML models like open source software


Can ML models be developed and improved by a community of like-minded experts who work individually on them before “merging” the improvements into the original model? This revolutionary idea was proposed by Colin Raffel, currently an Assistant Professor in the Department of Computer Science at the University of North Carolina.

Raffel made the suggestion in a blog post last week, where he detailed his rationale and explained how such a venture might be possible. For the latter, he referred to work already done by researchers, including research he participated in as a resident of Google’s famous AI residency program before joining the university.

Pitfalls of pre-trained models

But first, why is this important? The most popular use of AI today revolves around the use of transfer learning which generates pre-trained models to solve specific ML problems. These models are typically refined through additional training on a task of downstream interest and are a common way to implement ML in organizations around the world.

The elephant in the room here is the sheer cost of training the initial model, which essentially puts it out of the reach of data scientists or AI experts. For example, it was estimated that recycling the popular language model GPT-3 would cost around $ 4.6 million in computing resources, putting it beyond the reach of most and within the domain of a handful of large companies though. funded.

And as Raffel observed, most pre-trained models never update and are left as is until a better model comes along: “As of today, there is no such thing as an approach. standard to update a published model to resolve these issues. leave them indefinitely “frozen” in the state in which they were released until they are superseded by a new model. “

The popular Python programming language would never have incorporated features like variables, Unicode support, or many other widely used features if it had been developed using such an approach, Raffel claims.

“We should develop tools that will allow us to create pre-trained models the same way we create open source software. More specifically, models must be developed by a large community of stakeholders who continually update and improve them.

ML models developed by the community

Drawing inspiration from the development of open source software, Raffel suggests the use of community-developed models that are continuously improved, bringing ideas such as code merging and versioning into the realm of ML models.

He pointed out how models can already be trained effectively without updating every parameter, with updates being communicated to a centralized server. Additionally, updates can be kept in a small subset and compressed, significantly reducing the cost associated with storing and transmitting updates as models are formed.

Referring to the work he was involved in selecting a small subset of a model’s parameters, Raffel wrote: “We demonstrate that our approach allows updating a small fraction (as little as 0.5% ) model parameters while achieving similar performance. to the formation of all parameters.

To combat merge conflicts, Raffel suggests strategies such as starting from a solid baseline and attempting to average individual worker updates, although he concedes that these can degrade performance. He demonstrated an improved method of merging the models he was involved in developing, while also making reference to the work of other researchers on distributed training.

The road ahead

Of course, there are still many other hurdles to overcome before the idea of ​​an open source ML model community can take off. In his article, Raffel also addressed the challenges inherent in verifying community contributions, modularity, and backward compatibility.

“[The] the development of these models is still in the dark ages compared to best practices in software development. The well-established concepts of open source software development are a source of inspiration for the methods of creating pre-trained models that are continuously improved and developed in collaboration.

“Carrying out this research program will help divert the power of large companies working in isolation and allow models to be democratically developed by a distributed community of researchers,” he summed up.

Paul Mah is the editor of DSAITrends. A former systems administrator, programmer and computer teacher, he enjoys writing both code and prose. You can reach him at [email protected].

Image Credit: iStockphoto / 07LE


Comments are closed.