If you are inventory in the past two years of industry hot words and social hot words list, the "chip" must be famous.
With the broad practice of AI technology in all walks of life, the application layer is getting higher and higher. Correspondingly, the requirements for deep learning on chip power increases. Information age, everywherechipHowever, the chip is a scarce resource.
There are two ways to work in the industry, one is a custom chip, one is a modification of the model. Reduce the requirements of force by using a small model or compression model. There are highly and disadvantages in two ways, and the custom chip performance is strong but cost, cycle, and the risk is very low; the cost of small models or compression models is low, but the cycle is short, but it will lead to decline in accuracy, it is difficult to high-precision and high performance. A better balance between it.
Excessive redundant calculations in the existing AI calculation and the ability to operate the capacity of the work engine constrain the mining of chip performance. In the case of the imbalance of the chip resources supply and demand, the mainstream practice is a challenge to attack productivity.
There is also a technical team. A family called CoCoPie AI company announced that you can dig existing chip points from the software level by compression and compiling collaborative design technologies, which is expected to make existing chip performance. So we found the person in charge of Cocopie Li Xiaofeng. According to him, CoCopie has already built Coco-Gen and Coco-Tune and other products. These products are able to process artificial intelligence applications in real time without additional additional artificial intelligence hardware.
He told infoq: "Cocopie unique AI software technology stack, solves the development and popular bottleneck problem of end side, which is still unique in the industry. Test data and customer feedback indicate that the comparative advantage with other programs is very obvious. There is a big chance to win in the tide of the end of the equipment wisdom. "
Bypassing the hardware bottleneck, double the chip power, is the software level increase whether the chip performance is feasible? In order to further understand the techniques used by Cocopie, the answer to this problem is obtained, and the InfoQ has interviewed Li Xiaofeng.
Q: What is the specific technical implementation and academic papers support by optimizing compression and compiling collaborative design, solving performance issues, specific technical implementation and academic papers?
Li Xiaofeng: Cocopie’s technical core is the three professors in the founding team. They are people who are very high and unusually diligent, and they are leaders in their respective fields. Among them, Wang Yanzhi professor has focused on the AI ??model algorithm, Ren Bin’s focus on the AI ??model compiled, and Gu Xi Peng ‘s professor’s system engine. These research areas are technically a well complement, which constitutes the iron triangle of AI calculation optimization technology, mutual indispensable, jointly builds the company’s core competitiveness, and is also a kind of sky.
First introduce the basic techniques of the AI ??model optimization. An AI task runs on the device, actually the process of maping the AI ??model to a chip instruction sequence. Compression and compilation are two key steps to perform. The compression optimization of the model itself will be reduced by the compression optimization of the model itself by weight-bearing pruning, weight quantization, reducing the complexity of the model itself. The compiled model is optimized for the compressed model. In this way, the AI ??task is more efficient, and the chip capacity can be made to the other hand.
But compress and compile these two steps, currently do not do very well in the industry. Existing technologies can only compress, or only compile, or although both have, they are designed to isolate each other, there is no good collaborative design, so it is difficult to achieve both effects of reasoning accuracy and ensuring operational efficiency. .
The core of CoCopie technology is to compress and compile two steps of "collaborative design", that is, consider compression and hardware preferences when designing compression, and select the compression model when designing the compiler. Compilation Optimization Method. Corresponding to compression and compiling two steps, we design two components for the Cocopie frame: Coco-Gen and Coco-Tune. Coco-Gen generates an efficient execution code by generating phase associations based on mode-based neural network twigs, and Coco-Tune can significantly shorten the process of compression and training of DNN models.
Cocopie technology is common and can be widely used in various CPUs.GPU, DSP and AI special chips such as NPU, APU, TPU, etc.
Cocopie published a large number of top international conference papers in the relevant field, from the upper AI application optimization technology, AI model design technology, to compiler optimization technology, underlying hardware related optimization technology. In particular, Cocopie’s technical introduction article was published in the Communications of ACM in June this year, which is the flagship publication of the US computer society, which is released with this year’s Tuling Award, which means that the academic community is highly recognized to COCOPIE.
Q: Can the current core product COCO-GEN and COCO-TUNE can be used alone?
Li Xiaofeng: These two products provide key technologies for our AI model optimization, Coco-Gen generates efficient execution code by generating phase-based code-based code generation, Coco-Tune can be significantly shortened by generating phases associated with mode-based neural network twigs with mode-based code; COCO-Tune can significantly shorten DNN model compression and training process.
Coco-Gen and Coco-Tune can be used separately. They constitute the core of the CoCopie toolchain, so it is prioritized. As a bridge that connects the upper layer AI task and the lower hardware, the Cocopie’s product system will continue to add new members.
Q: Solve the chip shortage problem from the software level, is there a similar software technology in the industry?
Li Xiaofeng: The current end Ai technology stack, only Cocopie optimization technology can meet or exceed the performance of the AI ??special chip on the mainstream chip, which is through a large measure of verification. Currently known techniques, or side heavy compression, either focused on compiling, and did not see the techniques of the two collaborative design, which is the patented technology of CoCoPie.
Because although the current mainstream chip has a good potential, we must use this potential, you must convert the AI ??task to a suitable vector calculation by compressing and compiling synergies. Quantity. This is the key to CoCopie’s technical critical.
Q: The ruler is short, the inch has a long, what is the advantages and limitations of this technology?
Li Xiaofeng: The advantage of COCOPIE is that on the one hand, it is possible to make a lot of the AI ??tasks that cannot be operated properly on the end side. On the other hand, it is dedicated to the end side. AI chipThe AI ??task can be run, and now you can run through the mainstream chip.
The implementation of the AI ??task will always be restricted by the chip’s strength. Cocopie technology has always its own limitations, and the liberated AI is not unlimited. In addition, Cocopie technology is currently focused on the AI ??inference tasks, as for the acceleration of special AI training tasks is not our focus.
Q: Cocopie technology can make the chip to increase 3-4 times, allowing the chip effectiveness to increase by 5-10 times, what is the standard? Can you achieve this level for different chips?
Li Xiaofeng: These data are actually measured, and the peer review has passed the customer’s identification. In other words, there is a technical support in technology, and there is a product in practice.
For example, comparison with universal chips and Google TPU-V2: Use Cocopie, VGG-16 Neural Networks in Mobile Equipment Samsung Galaxy S10 is nearly 18 times higher than the TPU-V2, and RESNet-50 has achieved 4.7 times. Efficiency is improved.
On the same SAMSUNG GALAXY S10 platform, both the C3D and S3Ds of behavior recognition have increased by 17 times and 22 times higher than Pytorch Mobile, respectively. Running Mobilenetv3, Cocopie’s speed increases by nearly 3 times and 4 times respectively, respectively, respectively, respectively, respectively.
In addition, the results of the QUALCOMM TREPN POWER POWER PROFILER have been shown that Cocopie is shortened than 9 times higher than the TVM, but the power is only less than 10%. In the work of DNN reasoning acceleration based on AQFP superconductance, our research is also the highest energy efficiency in all hardware devices through a low temperature test.
Q: The effectiveness of the performance will not be empty, what are the requirements for the operation of this software to the hardware environment?
Li Xiaofeng: Yes. Cocopie technology is not high, the mainstream chip can be satisfied, in particular, the chip requires vector computing power, such as the Neon instruction set, Intel’s SSE, AVX instruction set, RISC-V vector extension, etc. They are all current CPUs, GPU and APU / NPUs are not more useful. Of course, if there is no vector computing power, Cocopie’s technology can still work, but will be affected.
Q: What is the main challenge encountered during the technology practice?
Li Xiaofeng: The main challenge encountered in practice in practice is that our current product system is not very perfect, and customers’ needs are also a variety of different, and the specific service methods are very different. Therefore, we have not yet carried out large-scale business. Promotion, mainly for some key areas, key customers, such as representative mainstream chip providers, equipment providers, software service providers, etc., selectively provide services in accordance with our product development strategy. We will carry out this process, and we will work together with a variety of customer needs and continuously explore the best product service system.
Q: Is this technology currently have a practical floor case?
Li Xiaofeng: There are still more than a dozen people in collaborative customers. These customers have multiple fields, such as Tencent, Drop, a famous chip platform provider, a famous mobile phone manufacturer, and the Ministry of Communications, the world-renowned service provider Cognizant. Wait.
Q: The mainstream processor is a more popular solicity of real-time manual intelligence. Do you agree with this view?
Li Xiaofeng: Yes. For end-side devices, the mainstream processor is a more favorable solution to real-time manual intelligence.
1. From the functional say, the end side device resource is limited, the application scenario is killed, while the dedicated AI processor function is relatively curing, and the abnormal flexible functional demand for the end side has a large challenge. Mainstream processors can handle AI issues through software technology, of course, there is no need to have another branch.
2. From the technical statement, the practice of solving the special chip of the AI ??problem is actually increasing the processing capabilities of vector computing, improves memory access efficiency, and some are tensile calculation units. As mentioned earlier, the current mainstream chip is actually a vector computing unit. These vector computing units may be weak than the dedicated tensile processing chip, but the current AI task is generally sufficient, provided that there must be excellent model compression and compilation tools, able to make AI tasks through a delicate design Convert to a suitable vector calculation and control the overall calculation.
3, from the cost In addition to purchasing the cost of the AI ??chip itself, there are some recessive costs. Many chips will cause redesign of PCB, heat dissipation, and the like, and the package is also an extra cost. Many devices such as smart headphones, micro medical devices, etc. are sensitive to these factors.
4. It is important to emphasize that Cocopie’s technology does not exclude special AI chips. As a full stack of AI, Cocopie also supports the AI ??processor to allow AI processors to play a greater performance. Therefore, we also have an important role in a specific application area.
Q: Is Cocopie a product transition time?
Li Xiaofeng: Only, because of the generalization of the terminal AI, CoCopie is the leader of the advanced technology in this field, and we believe that its future development space is very large. We have a complete set of product development strategies in our inside, and future product forms will be different, but core technology is inherent.
In addition, it is assumed that there is a popularity of the ai-specific chip to a day, nor does it adversely affect the living space of COCOPIE. First of all, the AI ??special chip will never be more popular than the general-purpose chip. For universal chips that can be done well, it may still be more effective in universal chips; secondly, the AI ??chip has developed even if it is developed. Enhance the optimization technology. Our technology will make the AI ??chip’s ability to further improve. In fact, universal chips are the same, such as CPU or GPU, no matter how cheap, high performance, still require high performance compiler support, such as LLVM or NVCC, etc.
In a long time, the development of AI technology stacks will only get more and more demand for Cocopie software technology, just like mobile SOC chips, the function is not more and more simple, but more powerful, 8 Nuclear phones are common, and the requirements for software technology are getting higher and higher. In fact, AI calculates the requirements for integration to exceed the development speed of the AI ??hardware capabilities. According to the research report of the US MIT University, AI calculations have been developed in recent years, the development is 700 times every two years. This development speed is only impossible to meet the requirements of hardware capabilities, and must be better in software technology.
Q: How do you think about today’s popular big model technology?
Li Xiaofeng: The big model often achieves higher AI capabilities in the case of training data. This is the way to explore unknown world exploration. This is like a high-energy physics community, in order to make new discovery, it continuously constructs a higher energy of particle crash. However, this thing is also needed to see from two aspects. If you pursue a larger model, the amount of training data, training time, integration support, energy consumption, etc. are increasing, and the marginal benefits will be getting smaller and smaller. This trend is obviously unsustainable. It may only be practiced in the work of individual significant challenges in the future.
Here is a number on the big model gain, and the RESNET is a famous computer visual model released in 2015. The improved version of this model is called ResNext, which was advent in 2017. Compared with RESNET, the computing resources required by ResNext (measured with total floating point calculations), but accuracy is only 0.5%.
Give a number on carbon emissions, according to the Report of Forbes Magazine, since the depth study has developed in 2012, the calculation resources needed to generate first-class artificial intelligence models, an average of 3.4 months This means that the energy required to train the AI ??model has increased by 300,000 times from 2012 to 2018.
If you do a contrast, the ability to learn in deep learning is still a big gap even if you compare with a baby, let alone compare with the brain of adults. And our adult brain operation, only 20 watts of energy, which can only supply a bulb.
We obviously impossible only by expanding the model to improve the machine’s intelligence, and the academic circles are constantly exploring new methods.
Q: Can you talk about the basic software industry that is currently being floating? Simply talk about your judgment on the chip industry?
Li Xiaofeng: The importance of basic software is getting bigger and bigger. There are two reasons, one is that the technology has developed very quickly in recent years, and the basic software has actual needs, and it is necessary for the future; the other is that a large number of high-quality engineers have been cultivated.
The chip industry will continue to flourish. The development trend of science and technology is to continuously penetrate the digital world into all aspects of the physical world, while the simultaneization is the constant implantation of the chip in various equipment. The core means of intelligent in the upper wave of equipment is to implant the chip on the device, and the core means of running applications, and this wave of intelligent core means is to run deep neural networks on the equipment, which is a great trend of vastness. This is also the fundamental opportunity of CoCopie.
Li Xiaofeng told InfoQ, Cocopie’s technology leading advantages have been at least a few years, enough to compete in a computer field. Cocopie technology is not to solve the problem of chip shortage, but to achieve the generalization of the AI ??task, it is just a matter of happening, which is a by-product of CoCopie technology capabilities.
In Li Xiaofeng, in the face of complex and diverse scenes and terminals, the existing technical level cannot fully exert the ability of the mainstream chip, so there is a Cocopie’s development space. It can be confirmed that the development of Cocopie provides a new idea to put the chip capability "things".