Machine learning, big data find promise in cloud
Keywords:Nervana Systems machine learning data centre processor cloud
Nervana Systems is close to rolling out a microprocessor aimed at big data analytics that embodies the company's efforts to speed up deep neural networks in hardware for various recognition tasks. The company believes it has an edge with a novel processor it hopes to have up and running in its own cloud service late in 2016. Engineers at the startup are racing to develop and accelerate algorithms that find patterns in today's flood of digital data.
Nervana competes with giants such as Intel and Nvidia whose processors run most of today's algorithms for training neural nets. Web giants are also in the hunt, snapping up the best researchers in machine learning. Among the leaders, Google is said to be working on an accelerator chip of its own.
"This revolution in deep neural networks is akin to the invention of microprocessor, it is the solution to our big data problem which is the fundamental industry problem of the next 10-20 years," said Naveen Rao, CEO of Nervana.
Rao designed processors for ten years at Sun Microsystems and a string of startups before returning to academia to get a PhD in neuroscience. He did a brief stint in finance working on algorithmic trading, then researched chips that could mimic the brain at Qualcomm before co-founding Nervana.
The Nervana chip "is a culmination of all the things I've studied...we want to build inference machines to find structure in large data sets...that's the biggest problem of our time," said Rao.
"We own this business, and we are going to do everything we can to keep it," said Bill Dally, chief scientist of Nvidia.
Rao: Deep neural networks is the solution to big data problem.
It's early days for inventing new chip architectures for deep learning, said Dally, a veteran microprocessor researcher. He pointed to the recent ImageNet competition as an example of how fast the algorithms are still changing.
More than 50 academic and commercial research teams from around the world competed in the event including groups from Google, NEC, Mitsubishi, Tencent and Qualcomm, and even a research team from China's Ministry of Public Security. A group from Microsoft Research won the competition showing once again the potential for machines to recognise images faster and more accurately than humans, simply using neural nets running on racks of Intel x86 processors.
A glimpse of Nervana's new architecture
Nervana's chip is designed to process tensors, mathematical arrays used as building blocks for neural networks. The startup packed as many of its tensor processing units as it could in a 28nm chip made at TSMC that is as large as any GPU or FPGA.
The chip associates memory elements next to each tensor processing unit, using an undisclosed 3D memory stacking technology. It uses a flat memory architecture, abandoning traditional cache hierarchies and shared memory concepts.
"We completely tossed out the window things we studied in school about pipeline interlocking and cache coherency; we just wanted a ton of compute with local state and enough memory to use it," said Rao.
Each chip has about 2Tb/s aggregate throughput using multiple 25G serdes. It uses a 3D torus fabric with six external links to create clusters with a practical limit of 64 processors that Nervana has simulated so far.
Neural nets are trained over the high bandwidth proprietary links. Separate x16 PCI Express Gen 3.0 interconnects move data in and out of the chip clusters.
"Deep learning requires a lot of I/O, and we provide that," Rao said.
Nervana programmes the chip using its own deep learning framework, also running on Nvidia's top-of-the-line Titan X graphics processors. Nervana's code runs almost three times faster on the Titan X than Nvidia's own algorithms and using the startup's new processor its cloud service showed more than a 10x speed up in simulations.
Visit Asia Webinars to learn about the latest in technology and get practical design tips.