GitHub, the code repository that houses much of the world’s open source software projects, has been busy physically locking down code to protect against calamity.
They bury him 750 feet underground in a converted coal mine in Spitsbergen, an island in the Arctic archipelago of Svalbard in Norway. On the same mountain is the Global Seed Vault, where over a million seed samples are safely locked away, so in the event that a global disaster disrupts the balance of biodiversity or wipes out valuable food sources, we will have the resources to restart.
GitHub’s projects, the Archive program and Acrtic Code Vault, perform a similar function, except the focus is on open source code that has become essential in the modern world.
The old coal mine is home to the Arctic World Archive, a joint venture of Norwegian data storage company Piql, and Store Norske, a state-owned coal mining company. AWA is a storehouse of digitized artifacts, such as Vatican archives, films, political histories, digitized art, scientific information – and now open source code – hidden for protection from the unknown.
In a way, the GitHub Archive Program and GitHub Arctic Code Vault are two parts of the same project, although the Archive Program has separate projects that it runs for GitHub’s parent company, Microsoft. For this project, the Archive program is responsible for backing up all publicly available open source code on GitHub, and ensuring that the code can be made useful, even in a world without computers and no understanding of software – not just in the arctic code. Vault but in other places too.
“We will protect this invaluable knowledge by storing multiple copies, on an ongoing basis, in various formats and data locations, including a very long-term archive designed to last at least 1,000 years,” explains GitHub.
In November, GitHub announced that it had archived and deposited in the code vault a first 6,000 of its most popular repositories as a proof of concept. The code is stored on a silver halide polyester film developed specifically for the AWA by Piql, using images that look like small QR codes. These images are however high density, with each image containing 8.8 million microscopic pixels.
“It can withstand extreme electromagnetic exposure and has undergone extensive durability and accessibility tests,” Piql said in a statement about the film, which is also being used for other AWA projects.
The arctic location of the AWA storage facility ensures that even in the event of a long-term power failure, the temperature in the vault will remain below freezing, low enough to preserve the contents of the vault for long periods of time. decades or more. For additional protection, the film is stored in a steel-walled container inside a sealed chamber.
In early February, satisfied with the results of the trial, GitHub took a snapshot of all active public repositories on its site to archive them to the vault. After that, Piql took the resulting 21TB of data and wrote it to 186 piqlFilm reels, with each reel containing one kilometer of film.
“Our original plan was for our team to fly to Norway and personally escort global open source code to the Arctic,” wrote Julia Metcalf, director of strategic programs at GitHub in a recent blog, “but then the world continues to experience a global pandemic, we have had to adjust our plans.
“We have remained in close contact with our partners, awaiting the moment when they can safely travel to Svalbard. We are pleased to report that the code has been successfully deposited into the Arctic Code Vault on July 8, 2020. “
Storing code in a facility designed to survive any unknown disaster the future may bring is one thing. Making sure our grandchildren’s grandchildren can understand what it is and how to use it is another. GitHub has already started to tackle this problem by including in every reel a human readable copy of the “Guide to the GitHub Code Vault” in five languages and written with input from the GitHub community.
Additionally, the project will eventually add a human-readable film reel called the Tech Tree, which the company says will consist primarily of existing works selected to provide a detailed understanding of modern computing, open source and of its applications, modern software development, popular programming languages, and more.
“It will also include work that explains the many layers of technical foundations that make software possible: microprocessors, networks, electronics, semiconductors, and even pre-industrial technologies,” Metcalf explained. “This will give the heirs of the archive a better understanding of today’s world and its technologies, and may even help them recreate computers to use the archived software.”
Going forward, GitHub plans to update software archives every five years or so.