Why Big Memory Is a Big Deal for Big Data
Ever hear the saying “too much of anything is a bad thing”? That is exactly the case happening with data today. While information has become the lifeblood for businesses to make decisions and improve operations, the size of data is outpacing the memory available to store it. When that happens, performance and progress slow down or come to a halt.
This problem is only expected to increase as real-time workloads and more data-intensive applications are on the rise.
“We are in an age of information. The amount of data being created is significantly increasing every day. And it needs to be processed rapidly. Today’s computer systems, based on the von Neumann architecture, are no longer able to keep up with this influx,” says Jonathan Jiang, Chief Operating Officer at MemVerge, a big memory software provider.
Storage I/O Is Not the Answer
This is especially difficult in the biosciences space, where it is not uncommon to have a data set exceed a terabyte. “When data is bigger than memory, the research often cannot be completed. In many cases, the program will just report an error and exit,” Jiang explains.
To get around this, researchers have traditionally had to swap data between memory and disk. This process, known as storage I/O, results in a lot of time wasted just reading from and writing to disk. For example, when researchers are in the middle of an analysis or experiment, they store data for persistence to protect against any program failures or for future reproducibility.
While the data is being copied to or read from storage, the researcher is forced to sit around and wait for that to be completed. This can equate to hours of downtime. Additionally, if the workload fails midstream, then the researcher has lost all their progress.
“One of the fundamental bottlenecks for performance of the von Neumann model is that data needs to be moved between memory and storage. And when you need to move data between a fast media and a slow media, your performance drops. As the amount of data continues to explode, that weakness in the computing infrastructure will be more pronounced,” says Jiang.
To address this problem, MemVerge, has pioneered a new category of computing: big memory (Video 1), which allows applications to bypass traditional storage systems in favor of persistent memory. Jiang explains that this can result in a 10x performance improvement for data-intensive applications.
But for big memory to really take off, it will require software innovations like the ones MemVerge has made. The company’s snapshot technology eliminates IO to storage and recovers terabytes of data from persistent memory in seconds.
Big Memory Offers Big Results
This is the exact answer Analytical Biosciences, a leader in single-cell genomics, was looking for in its research to fight cancer and help stop the spread of COVID-19. The organization found more than 50% of its multistage analytic pipeline was spent just loading data from storage.
To overcome this storage bottleneck, accelerate its discoveries, and be able to make faster predictions, Analytical Biosciences turned to MemVerge for help.
#BigMemory can result in a 10x performance improvement for #data-intensive applications. @MemVerge via @insightdottech
“Our goal and what we enable is big memory computing, which allows us to keep all the data in memory all the time, thereby eliminating that storage I/O,” says Jiang. “Even the fastest, high-end storage solutions available today is still an order of magnitude slower that what memory can do.”
With MemVerge’s big memory computing platform Machine Memory, Analytical Biosciences was able to leverage the solution’s memory snapshot technology, clone the data, and write it to persistent memory. This enabled Analytical Biosciences to load data 800 times faster, eliminate 97% of its storage IO, and reduce the overall pipeline time by over 60%.
“In another use case for snapshots, you can move workloads seamlessly between on-prem data centers to cloud data centers and between different clouds. There are many interesting new operational concepts that can be enabled with big memory,” Jiang explains.
A New Era of In-Memory Computing
In the past, it has been too expensive to put all data in memory all the time. But recent advancements in persistent memory from Intel® have made the price point much lower per gigabyte than traditional DRAM.
By utilizing Intel® Optane™ technology in its Machine Memory solution, MemVerge provides more capacity and persistence in memory—improving application performance, scalability, and reliability.
As applications become more data intensive and memory becomes faster, Jiang predicts every industry will change its applications to take advantage of a big-memory infrastructure.
For instance, it is critical for the financial industry to provide services with high performance and low latency. Big memory will be crucial for them to stay competitive and transport/share data faster. Big-memory computing can also help the media and entertainment industry, which deals with a lot of the same interruptions in its pipelines as the biosciences space because of its large data sets.
App developers who have made accommodations for calling data sets from storage, bringing it in bit by bit for performance reasons, will have to rethink how they write their applications. “When the application developers and the IT operations organizations shift their mindset to big-memory computing, a lot more can be done,” says Jiang.
To make it easier to adopt its big-memory technology, MemVerge provides an SDK that allows customers to take advantage of the underlying infrastructure, develop new applications, and make use of its capabilities directly.
“This will change the face of the data center. When memory can cross physical machine boundaries, the focus of applications will change. They won’t need to optimize around memory usage,” says Jiang. “When that happens, that’s when big memory will really take off.”
This article was edited by Georganne Benesch, Associate Content Director for insight.tech.