OPUS-MT Deployment
Introduction
OPUS-MT is an open-source neural machine translation project developed by the University of Helsinki, based on the high-performance Transformer architecture. The model is built upon the Marian NMT framework and trained using large-scale parallel corpora from the OPUS dataset. The OPUS-MT series utilizes the Transformer architecture to accurately capture text semantics and translate them into natural, fluent target languages. Widely applied in scenarios such as cross-lingual text conversion, content localization, and multilingual communication, it provides efficient and accurate translation services suitable for diverse translation requirements.
This chapter demonstrates the deployment, loading, and translation workflow of opus-mt-en-zh on edge devices. The following deployment method is provided:
- AidLite Python API
In this case, model inference runs on the device-side NPU computing unit. The code calls relevant interfaces to receive user input and return results.
- Device: Rhino Pi-X1
- OS: Ubuntu 22.04
- Source Model: opus-mt-en-zh
- Quantized Precision: FP16
- Model Farm Reference: YOLOv5s-INT8
Supported Platforms
| Platform | Runtime Method |
|---|---|
| Rhino Pi-X1 | Ubuntu 22.04, AidLux |
Prerequisites
- Rhino Pi-X1 Hardware
- Ubuntu 22.04 OS or AidLux OS
Download Model Resources
mms list opus-mt
#------------------------ Available opus-mt series models ------------------------
Model Precision Chipset Backend
----- --------- ------- -------
opus-mt-en-es FP16 Qualcomm QCS8550 QNN2.36
opus-mt-en-zh FP16 Qualcomm QCS8550 QNN2.36
opus-mt-es-en FP16 Qualcomm QCS8550 QNN2.36
opus-mt-zh-en FP16 Qualcomm QCS8550 QNN2.36
# Download the model
mms get -m opus-mt-en-zh -p fp16 -c qcs8550 -b qnn2.36 -d /home/aidlux/opus-mt-en-zh
cd /home/aidlux/opus-mt-en-zh
# Extract
unzip opus-mt-en-zh_qcs8550_qnn233_fp16_aidlite.zip💡Note
Developers can also search for and download models via the Model Farm web interface.
AidLite SDK Installation
Developers can also refer to the
README.mdwithin the model folder for SDK installation instructions.
- Ensure QNN backend version is
≥ 2.31. - Ensure versions for
aidlite-sdkandaidlite-qnnxxxare2.3.x.
# Check AidLite & QNN versions
dpkg -l | grep aidlite
#------------------------ Example Output ------------------------
ii aidlite-qnn236 2.3.0.230 arm64 aidlux aidlite qnn236 backend plugin
ii aidlite-sdk 2.3.0.230 arm64 aidlux inference module sdkUpdating QNN & AidLite versions:
# Install AidLite SDK
sudo aid-pkg update
sudo aid-pkg install aidlite-sdk
sudo aid-pkg install aidlite-qnn236
# aidlite sdk c++ check
python3 -c "import aidlite; print(aidlite.get_library_version())"
# aidlite sdk python check
python3 -c "import aidlite; print(aidlite.get_py_library_version())"AidLite Python API Deployment
Install Python Dependencies
cd /home/aidlux/opus-mt-en-zh/model_farm_opus-mt-en-zh_qcs8550_qnn233_fp16_aidlite/
pip install -r requirements.txtRun Python API Example
cd /home/aidlux/opus-mt-en-zh/model_farm_opus-mt-en-zh_qcs8550_qnn233_fp16_aidlite/
python main.pyYou can view the model inference latency (in ms) and the translation results in the terminal:
{'input_ids': tensor([[ 906, 32, 3, 13193, 1657, 25, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}
encoder time: 0.007657289505004883
decoder time: 0.004458427429199219
decoder time: 0.004180431365966797
decoder time: 0.004227876663208008
decoder time: 0.004235744476318359
decoder time: 0.004320621490478516
[4393, 17546, 6008, 25]
今天天气怎么样?Developers can modify the input sentences in
main.py.