In recent years, artificial intelligence procedures have been driving changes in computer chip design, while new computers also make new neural networks in artificial intelligence.There is a feedback loop in progress, very powerful.
Its core is to convert a neural network program to software technology running on a novel hardware, and recently being able to get powerful open source projects.
Apache TVM is a compiler that is different from other compilers.It is not a typical chip command that converts the program to a CPU or GPU, but is a Tensorflow (end-to-end open source machine learning platform) or Pytorch form, study the calculation operation in the neural network, such as convolution andOther conversions and identify how these operations are best mapped to hardware based on the dependence between the operation.
The core of this operation is an Octoml, which is a two-year start-up, providing ApachetVM service and helping AI operations in mlops.
In the latest developments of hardware and research feedback cycles, TVM optimization processes may have shaped all aspects of manual intelligence development.
Today, TVM is specifically used for reasoning, which can be seen as part of artificial intelligence, which is fully developed neural network for prediction according to new data. But in the future, TVM will extend to training, which is a process of developing neural networks.
Apache TVM "Refining"
"Training and architecture search in our road map." Octoml joint founder and CEO Luis Ceze said, here, he refers to the best network design by letting the neural network to achieve automatic design The process of neural network architecture.
Will the neural network developers use TVM to affect their training methods?
"If they have not yet, I suspect that they will affect their training methods." Ceze said. "Some people come to us to train, we can train models", while tracking the performance of the model after training.
The TVM and Octoml services continue to expand, because the technology is a broader platform than the compiler.
"Users can extend TVM and OCTOML into a flexible, ML-based automation layer is used to accelerate, run on various hardware running in the machine learning model." CEZE said: "Every one in these hardware, no matter which one, Have their own way of writing and executing code, writing code and figure out how to best utilize today’s hardware, can be handled by ML developers and hardware suppliers. "
This also means that compilers and services have replaced this manual tuning – "Today" in the reasoning level, the model is ready to deploy, "Tomorrow", may be practical to develop or train.
The key to TVM attraction is higher performance in throughput and delay, as well as computer power efficiency. This becomes more and more important for increasingly challenging neural networks that have become increasingly challenging.
"There are some models that use crazy computations." CEZE observed, especially natural language processing models, such as Openai’s GPT-3, they are extending to 10 trillion neurological weights or parameters.
With the expansion of these models, they will also bring "extreme costs". He said, "Not only training time, but also in service time, all modern machine learning models is true."
Therefore, if it is not "an order of magnitude" optimization model, the most complex model is not truly feasible in production, and they still have to study curiosity.
However, performing optimization using TVM involves its own complexity. "It is necessary to get results in the way they need, this is a heavy work." Ceze said.
Octoml simplifies things by making TVM more like a button event. "From the end user’s point of view, they upload models, compare models, and optimize values ??on a large number of hardware targets." Ceze said: "The key is automatic – no low-level engineers write code sweat and tears." So Octoml The development work is to ensure that the model can be optimized for increasingly hardware.
"The key here is to make full use of each piece of hardware. This means that the machine code is dedicated to a particular machine learning model on a particular machine learning model." Things typically convolutive neural networks may be optimized to accommodate specific hardware blocks of a particular hardware accelerator.
The result is proven. In the baseline test of the MLPERF test kit for neural network reasoning, Octoml’s reasoning performance of the ancient RESNET (residual network) image recognition algorithm can get the highest score.
Reflect the value of Octoml in cooperation
Since December last year, Octoml services have been in the forecast and preemptive experience.
In order to advance its platform strategy, Octomal announced earlier this month, it has obtained $ 85 million C-round financing from hedge fund Tiger Global Management and existing investors Addor, Madrona Venture Group and Amplify Partners. This round of financing makes Octoml’s total financing rate of $ 132 million.
This fund is Octoml efforts to spread ApachetVM to a growing AI hardware. Also in this month, Octoml announced the establishment of partnerships with British company ARM Ltd., which is being acquired by AI chip giants NVIDIA. Prior to this, the company announced its partnership with Advanced Micro Devices and QualComm.
In addition, the ARM partnership will extend the use of Octoml services to ARM CPU core licensed, the latter leads mobile phones, networks, and Internet of Things.
In addition to the design of the neural network, the feedback cycle may cause other changes. It may broaden the way ML’s business deployment method, after all, this is the meaning of mlops.
CEZE predicts that the technology can greatly improve the portability of ML services as optimized by TVM.
Since clouds provide various trade-offs for various hardware products, dynamic optimization can eventually be used for different hardware objectives, it means to be more flexible from one goal to another.
"From essentially, it is useful to extrude more performance from any hardware target in the cloud because it provides more target flexibility." CEZE describes it: "Can automatically optimize portability And portability provides a choice.
This includes any available hardware in the cloud configuration, but also includes hardware such as delay, throughput, and cost that happens to happen to the same SLA. "