The Future of Information Storage: DNA’s Use for Storing Data

Responsible for the construction of all living things, DNA formats organic data within an organism. DNA is, in essence, what instructs our body to create the proteins it needs to run. Created by a four letter alphabet of ACGT, the human genome comprises a unique sequence that is over 3 billion letters long. These seemingly infinite combinations pass on genetic information from one generation to the next, and are fundamentally the reason we look like our parents. While the genome itself does not build us, it provides our cells with the information they need to do their jobs, such as maintaining our organs or responding to illness. In developments over the past years, the ACGT alphabet within DNA that builds and stores biological data has begun to be understood in relation to the codes seen within computing. 

Despite formatting differences, organic data storage mechanisms such as DNA could be used to operate similarly to traditional digital data storage methods, such as the binary code. Over the past few years, developments show that the ones and zeros of binary code can be mapped to the ACGTs seen within DNA structures. This way of data storage and maintenance may seem far-fetched and unviable in current times, but this alternative method of data storage is being hailed as a substitute to traditional hard-drives, which are fast becoming environmentally and spatially unsustainable as the data produced in the world increases at an exponential rate. As outlined by AMPLYFI’s Dr. Lee Eccleshare in the whitepaper ‘Extracting Insight from Unstructured Data’, the amount of data we are producing is growing at an ever-increasing rate, with past estimates including a minimum of 2.5 quintillion bytes of data each day. Set in times of an incoming data storage crisis and increasingly accessible and cost-effective DNA sequencing, the level of motivation for recognising the potential for storing digital information within DNA has never been higher.

Every single day, DNA samples are being found and decoded from fossils that are hundreds of thousands of years old, showing that DNA is perfect for archival data storage over the long term. DNA data storage is fast becoming the most cutting edge solution to the long term issues expected with current mediums of data storage, such as corruption, degraded hardware and general matters of sustainability with regards to the manufacture of hundreds of millions of units of hard drives a year. DNA is now being seen as the ultimate upgrade from traditional hard-drives due to its ability to store data in a high-capacity, high-density and consistently available format, making it the gold standard of archival storage methods. However, DNA is by no means a perfect medium, largely due to its capability to mutate, which would be the organic equivalent of a corrupted harddrive. While there are coding mechanisms to mitigate the impacts of mutation on stored data, it progresses at a much slower rate than the encoding process itself. The science and technology behind DNA sequencing is falling into a similar pattern and is lagging far behind the standard of accuracy needed for a truly perfect retrieval of data stored within DNA. 

Despite the DNA within a human genome weighing only 6.41 picograms (one trillionth of a gram), a single gram of DNA can theoretically store 215 million gigabytes of data, making it a small but extraordinarily mighty data solution. In scale this means that every piece of data recorded by the human race could be stored in the size and weight comparable to a couple of pickup trucks.

The process of DNA data storage can be summarised in three stages. The first stage being that of coding the data into the ACGT sequences seen in DNA, followed by linking the data into the DNA itself, otherwise known as synthesis. Following the logistic aspects of storage is the decoding process, where the DNA is sequenced to access the synthesised data. Due to the ability of DNA’s four-letter alphabet to be coded to and from binary, it has the potential to become a highly valuable resolution to common data storage problems. The four-component structure of DNA renders it able to store double the amount of traditional binary code.

Read more here: Source link