Skip to content

OPUS-MT Deployment

Introduction

OPUS-MT is an open-source neural machine translation project developed by the University of Helsinki, based on the high-performance Transformer architecture. The model is built upon the Marian NMT framework and trained using large-scale parallel corpora from the OPUS dataset. The OPUS-MT series utilizes the Transformer architecture to accurately capture text semantics and translate them into natural, fluent target languages. Widely applied in scenarios such as cross-lingual text conversion, content localization, and multilingual communication, it provides efficient and accurate translation services suitable for diverse translation requirements.

This chapter demonstrates the deployment, loading, and translation workflow of opus-mt-en-zh on edge devices. The following deployment method is provided:

  • AidLite Python API

In this case, model inference runs on the device-side NPU computing unit. The code calls relevant interfaces to receive user input and return results.

Supported Platforms

PlatformRuntime Method
Rhino Pi-X1Ubuntu 22.04, AidLux

Prerequisites

  1. Rhino Pi-X1 Hardware
  2. Ubuntu 22.04 OS or AidLux OS

Download Model Resources

bash
mms list opus-mt

#------------------------ Available opus-mt series models ------------------------
Model           Precision  Chipset           Backend
-----           ---------  -------           -------
opus-mt-en-es   FP16       Qualcomm QCS8550  QNN2.36
opus-mt-en-zh   FP16       Qualcomm QCS8550  QNN2.36
opus-mt-es-en   FP16       Qualcomm QCS8550  QNN2.36
opus-mt-zh-en   FP16       Qualcomm QCS8550  QNN2.36

# Download the model
mms get -m opus-mt-en-zh -p fp16 -c qcs8550 -b qnn2.36 -d /home/aidlux/opus-mt-en-zh
cd /home/aidlux/opus-mt-en-zh

# Extract
unzip opus-mt-en-zh_qcs8550_qnn233_fp16_aidlite.zip

💡Note

Developers can also search for and download models via the Model Farm web interface.

AidLite SDK Installation

Developers can also refer to the README.md within the model folder for SDK installation instructions.

  • Ensure QNN backend version is ≥ 2.31.
  • Ensure versions for aidlite-sdk and aidlite-qnnxxx are 2.3.x.
bash
# Check AidLite & QNN versions
dpkg -l | grep aidlite
#------------------------ Example Output ------------------------
ii  aidlite-qnn236       2.3.0.230         arm64        aidlux aidlite qnn236 backend plugin
ii  aidlite-sdk           2.3.0.230         arm64        aidlux inference module sdk

Updating QNN & AidLite versions:

bash
# Install AidLite SDK
sudo aid-pkg update
sudo aid-pkg install aidlite-sdk
sudo aid-pkg install aidlite-qnn236

# aidlite sdk c++ check
python3 -c "import aidlite; print(aidlite.get_library_version())"

# aidlite sdk python check
python3 -c "import aidlite; print(aidlite.get_py_library_version())"

AidLite Python API Deployment

Install Python Dependencies

bash
cd /home/aidlux/opus-mt-en-zh/model_farm_opus-mt-en-zh_qcs8550_qnn233_fp16_aidlite/

pip install -r requirements.txt

Run Python API Example

bash
cd /home/aidlux/opus-mt-en-zh/model_farm_opus-mt-en-zh_qcs8550_qnn233_fp16_aidlite/

python main.py

You can view the model inference latency (in ms) and the translation results in the terminal:

plain
{'input_ids': tensor([[  906,     32,      3, 13193,  1657,     25,      0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}
encoder time: 0.007657289505004883
decoder time: 0.004458427429199219
decoder time: 0.004180431365966797
decoder time: 0.004227876663208008
decoder time: 0.004235744476318359
decoder time: 0.004320621490478516
[4393, 17546, 6008, 25]
今天天气怎么样?

Developers can modify the input sentences in main.py.