AidGen C++ 接口文档
AidLLM C++ API Documentation
💡Note
Before developing with AidGen-SDK C++, please be aware of the following basics:
- During compilation, include the header file located at
/usr/local/include/aidlux/aidgen/aidllm.hpp - During linking, specify the library file located at
/usr/local/lib/libaidgen.so - All interfaces are under the
aplux::aidllmnamespace
Inference Backend Type.enum LLmBackendType
For AidllmSDK, different inference backend frameworks are supported to implement LLM inference tasks. The available inference backends are listed below.
| Member Name | Type | Value | Description |
|---|---|---|---|
| TYPE_DEFAULT | uint8_t | 0 | Unknown backend type |
| TYPE_GENIE | uint8_t | 1 | Genie inference backend |
Inference Task State.enum LLMSentenceState
During an inference task, a single session may go through multiple stages. Developers can use these state codes to understand the current runtime status of the inference task.
| Member Name | Type | Value | Description |
|---|---|---|---|
| BEGIN | enum class | 0 | Session start segment |
| CONTINUE | enum class | 1 | Intermediate content during ongoing session inference |
| END | enum class | 2 | Session ending segment |
| COMPLETE | enum class | 3 | Current session completed successfully |
| ABORT | enum class | 4 | Current session terminated passively |
| ERROR | enum class | 5 | Current session inference error |
Interpreter Runtime State.enum LLMState
The overall state of the Aidllm interpreter during runtime. Developers can query this state to understand the interpreter's current working status.
| Member Name | Type | Value | Description |
|---|---|---|---|
| STANDIDLE | enum class | 0 | Idle standby state |
| BUSYING | enum class | 1 | Busy processing inference |
| ABORT | enum class | 2 | Inference has been terminated |
| ERROR | enum class | 3 | Inference encountered an error |
Log Level.enum LogLevel
AidllmSDK provides an API for logging (introduced later). You need to specify which log level is currently used, so this log-level enum is required.
| Member Name | Type | Value | Description |
|---|---|---|---|
| INFO | uint8_t | 0 | Message |
| WARNING | uint8_t | 1 | Warning |
| ERROR | uint8_t | 2 | Error |
| FATAL | uint8_t | 3 | Fatal error |
Global Functions
Get Library Version.get_library_version()
Gets the version information string of the current Aidllm library.
| API | get_library_version |
| Description | Gets the version information of the Aidllm library |
| Parameters | void |
| Return Value | A string containing the library version information |
std::string version = aplux::aidllm::get_library_version();
printf("Current aidllm library version: %s\n", version.c_str());Set Log Level.set_log_level()
Sets the minimum log output level for Aidllm. Logs below this level will not be output.
| API | set_log_level |
| Description | Sets the minimum log output level |
| Parameters | log_level: LogLevel enum value specifying the minimum log level |
| Return Value | void |
aplux::aidllm::set_log_level(aplux::aidllm::LogLevel::ERROR);Set Log File Prefix.set_log_file_prefix()
Sets the log file name prefix for outputting logs to files with the specified prefix.
| API | set_log_file_prefix |
| Description | Sets the log file name prefix |
| Parameters | log_file_prefix: Log file name prefix string |
| Return Value | void |
aplux::aidllm::set_log_file_prefix("aidllm_log_");Inference Callback Function Type.LLMCallback
The callback function type definition used during Aidllm inference. Developers need to implement a callback function of this type to handle inference results.
using LLMCallback = std::function<int32_t(LLMCallbackData& cb_data, void* user_data)>;💡Note
The callback function return type is int32_t. Returning 0 indicates normal continuation of inference; a non-zero value can be used to control the inference flow.
Inference Callback Data Type.struct LLMCallbackData
During inference tasks, Aidllm uses developer-provided callback functions. This data type is the argument passed to that callback function, and developers can use it in custom callbacks to process inference results.
Member List
The LLMCallbackData struct contains the following members:
| Member | state |
| Type | enum LLMSentenceState |
| Default Value | |
| Description | Status code of the current inference session |
| Member | text |
| Type | std::string |
| Default Value | |
| Description | Result text of the inference task / message corresponding to special status codes |
Runtime Context Class.class LLMContext
During Aidllm runtime, some configuration information may need to be set, and runtime-related data also needs to be passed around. Objects of this runtime context type are used to complete data flow.
Create Instance Object.create_instance()
To set runtime context information, you first need a configuration instance object. This function is used to create an instance object of type LLMContext.
| API | create_instance |
| Description | Used to construct an instance object of class LLMContext |
| Parameters | config_file: Initial configuration file, where key information such as backend type and model file names can be configured |
| Return Value | If it is nullptr, object construction failed; otherwise, it is a pointer to an LLMContext object |
// Create a configuration instance object; report an error if the return value is null
std::unique_ptr<LLMContext> llm_context_ptr = LLMContext::create_instance("qwen2-7b/qwen2-7b.json");
if(llm_context_ptr == nullptr){
printf("Test sample: LLMContext create_instance failed.\n");
return EXIT_FAILURE;
}Member List
The LLMContext object is used to manage runtime configuration information, including the following parameters:
| Member | config_file |
| Type | std::string |
| Default Value | |
| Description | Initial configuration file, the config file parameter passed when creating the object |
| Member | backend_type |
| Type | LLmBackendType |
| Default Value | LLmBackendType::TYPE_DEFAULT |
| Description | The developer is required to specify the inference backend in the config file. After initialization parses the config file, this field will be overwritten to indicate the backend type specified by the developer |
| Member | model_file_vec |
| Type | std::vector<std::string> |
| Default Value | |
| Description | The developer is required to specify model files in the config file. After initialization parses the config file, this field will be overwritten to indicate the model files specified by the developer |
| Member | config_overwrite_options |
| Type | std::string |
| Default Value | |
| Description | By setting this field, you can specify certain key parameters in the inference process, thereby affecting inference speed, inference results, etc. |
| Member | android_tmp_directory |
| Type | std::string |
| Default Value | |
| Description | This field is only valid on the Android platform. By setting this field, you can specify a directory for which the system user has valid permissions, for temporary use by the inference program |
Interpreter Class.class LLMInterpreter
An object instance of type LLMInterpreter is the main executor of inference operations and is used to carry out specific inference processes.
Create Instance Object.create_instance()
To perform inference-related operations, an inference interpreter is essential. This function is used to construct an instance object of the inference interpreter.
| API | create_instance |
| Description | Uses various data managed by the LLMContext object to construct an object of type LLMInterpreter |
| Parameters | llm_context: Reference to the unique_ptr of an LLMContext instance object (std::unique_ptr<LLMContext>&) |
| reserve: Reserved field, default value is nullptr | |
| Return Value | If it is nullptr, object construction failed; otherwise, it is a unique_ptr to an LLMInterpreter object |
// Use the LLMContext object pointer to create the interpreter object; report an error if the return value is null
std::unique_ptr<LLMInterpreter> llm_interpreter_ptr = LLMInterpreter::create_instance(llm_context_ptr);
if(llm_interpreter_ptr == nullptr){
printf("Test sample: LLMInterpreter create_instance failed.\n");
return EXIT_FAILURE;
}Initialization Operation.initialize()
After the interpreter object is created, some initialization operations are required, such as environment checks and resource construction.
| API | initialize |
| Description | Completes the initialization work required for inference |
| Parameters | enable_profiler: Whether to enable the profiler, default value is false |
| reserve: Reserved field, default value is nullptr | |
| Return Value | A value of 0 indicates successful initialization; otherwise a non-zero value indicates failure |
// Initialize the interpreter; report an error if the return value is non-zero
int init_result = llm_interpreter_ptr->initialize();
if(init_result != EXIT_SUCCESS){
printf("Test sample: aidllm initialize failed.\n");
return EXIT_FAILURE;
}Sampling Parameter Setup Operation.set_sampler()
After initialization completes successfully, sampling parameters can be set with this function to control the randomness, diversity, and quality of generated content.
| API | set_sampler |
| Description | Sets sampling parameters to control the randomness and diversity of LLM outputs. |
| Parameters | key: Name of the sampling parameter. Currently supported:
|
value: Parameter value represented as a string:
| |
| Return Value | A value of 0 indicates success; a non-zero value indicates failure (e.g. invalid key or unsupported value format). |
// Set sampling parameters
llm_interpreter_ptr->set_sampler("temp", "0.8");
llm_interpreter_ptr->set_sampler("top-k", "20");
llm_interpreter_ptr->set_sampler("top-p", "0.9");Session Inference Operation.run()
After successful initialization, you can run dialog inference with the LLM. Developers provide a custom callback function to handle continuous inference results during the session.
| API | run |
| Description | Executes one session inference |
| Parameters | prompt: Prompt string |
| cb: Callback function of type LLMCallback for handling continuous inference results during the session | |
| user_data: Pointer to user data, convenient for using this data in custom callback functions, default value is nullptr | |
| Return Value | A value of 0 indicates the inference executed successfully; otherwise a non-zero value indicates failure |
// Define callback function
LLMCallback dialog_callback = [&](LLMCallbackData& cb_data, void* user_data)->int32_t{
if(cb_data.state == LLMSentenceState::BEGIN){
printf("%s", cb_data.text.c_str());
}else if(cb_data.state == LLMSentenceState::CONTINUE){
printf("%s", cb_data.text.c_str());
fflush(stdout);
}else if(cb_data.state == LLMSentenceState::END){
printf("%s\n", cb_data.text.c_str());
}else if(cb_data.state == LLMSentenceState::COMPLETE){
printf("\n[COMPLETE]%s\n", cb_data.text.c_str());
}else if(cb_data.state == LLMSentenceState::ABORT){
printf("\n[ABORT]%s\n", cb_data.text.c_str());
}else if(cb_data.state == LLMSentenceState::ERROR){
printf("\n[ERROR]%s\n", cb_data.text.c_str());
}
return EXIT_SUCCESS;
};
// Execute inference
std::string prompt = "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n";
int run_result = llm_interpreter_ptr->run(prompt, dialog_callback);
if(run_result != EXIT_SUCCESS){
printf("Test sample: aidllm run failed.\n");
return EXIT_FAILURE;
}Query Inference State.state()
During inference, developers may need to query the current runtime state of the interpreter, such as determining whether it is idle or actively inferring.
| API | state |
| Description | Gets the current runtime state of the inference task |
| Parameters | state: Reference to an LLMState variable; the function will overwrite this variable with the current state |
| Return Value | A value of 0 indicates the query executed successfully; otherwise a non-zero value indicates failure |
LLMState current_state = LLMState::STANDIDLE;
llm_interpreter_ptr->state(current_state);
printf("Current state: %d\n", (int)current_state);Session Termination Operation.abort()
In some situations, users may want to interrupt the session that is currently running inference. This function is used to terminate inference.
⚠️Warning
It is strictly forbidden to call the abort function inside the callback function (LLMCallback), as this may cause deadlocks or undefined behavior.
| API | abort |
| Description | Terminates the currently running inference session |
| Parameters | reserve: Reserved field, default value is nullptr |
| Return Value | A value of 0 indicates successful termination; otherwise a non-zero value indicates failure |
// Terminate inference in another thread
int abort_result = llm_interpreter_ptr->abort();
if(abort_result != EXIT_SUCCESS){
printf("Test sample: aidllm abort failed.\n");
return EXIT_FAILURE;
}Final Release Operation.finalize()
As mentioned above, the interpreter object needs to run initialize() for initialization. Correspondingly, the interpreter also needs to run release operations to destroy previously created resources.
| API | finalize |
| Description | Completes necessary de-initialization and release operations |
| Parameters | reserve: Reserved field, default value is nullptr |
| Return Value | A value of 0 indicates the release operation executed successfully; otherwise a non-zero value indicates failure |
// Execute interpreter de-initialization; report an error if the return value is non-zero
int fin_result = llm_interpreter_ptr->finalize();
if(fin_result != EXIT_SUCCESS){
printf("Test : aidllm finalize failed.\n");
return EXIT_FAILURE;
}Get Profiler.get_profiler()
When the profiler is enabled during initialization (enable_profiler = true), this function can be used to obtain the profiler object pointer for performance data collection and analysis. For detailed usage, refer to the "Profiler C++ API Documentation" section below.
| API | get_profiler |
| Description | Gets the pointer to the profiler object |
| Parameters | void |
| Return Value | If the profiler is enabled, returns a Profiler object pointer; if not enabled, returns nullptr |
// Enable profiler during initialization
int init_result = llm_interpreter_ptr->initialize(true);
// Get the profiler
aplux::aidgen::Profiler* profiler = llm_interpreter_ptr->get_profiler();AidMLM C++ API Documentation
💡Note
Before developing with AidMLM-SDK C++, please be aware of the following basics:
- During compilation, include the header file located at
/usr/local/include/aidlux/aidgen/aidmlm.hpp - During linking, specify the library file located at
/usr/local/lib/libaidgen.so - All interfaces are under the
aplux::aidmlmnamespace - AidMLM is designed for multimodal large model (vision-language model) inference, currently supporting Qwen2-VL and Qwen2.5-VL series models
Inference State.enum AidLLMState
During an AidMLM inference task, a single session may go through various stages. Developers can use these state codes to understand the current runtime status of the inference task.
| Member Name | Type | Value | Description |
|---|---|---|---|
| STAND | enum class | 0 | Not yet working |
| START | enum class | 1 | Inference started |
| BUSYING | enum class | 2 | Inference in progress |
| FINISH | enum class | 3 | Inference finished |
| COMPLETE | enum class | 4 | Inference completed fully or truncated |
| WAITING | enum class | 5 | Current token decoding failed, waiting for next decode |
| ABORT | enum class | 6 | Current inference terminated early by developer |
| ERROR | enum class | 7 | Inference failed due to exception |
Log Level.enum LogLevel
| Member Name | Type | Value | Description |
|---|---|---|---|
| INFO | uint8_t | 0 | Message |
| WARNING | uint8_t | 1 | Warning |
| ERROR | uint8_t | 2 | Error |
| FATAL | uint8_t | 3 | Fatal error |
Model Type.enum ModelType
Specifies the type of multimodal model currently in use.
| Member Name | Type | Value | Description |
|---|---|---|---|
| RESERVED | enum class | 0 | Reserved type |
| QWEN2VL | enum class | 1 | Qwen2-VL model |
| QWEN25VL | enum class | 2 | Qwen2.5-VL model |
Inference Callback Data Type.struct AidLLMCBData
During AidMLM inference tasks, developer-provided callback functions are used. This data type is the argument passed to that callback function.
Member List
| Member | state |
| Type | enum AidLLMState |
| Default Value | |
| Description | Status code of the current inference session |
| Member | text |
| Type | std::string |
| Default Value | |
| Description | Result text of the inference task / message corresponding to special status codes |
Inference Callback Function Type.AidLLMCB
The callback function type definition used during AidMLM inference.
using AidLLMCB = std::function<void(AidLLMCBData& cb_data, void* user_data)>;Image Data Type.struct ImageData
A struct for passing image data to the multimodal model.
Member List
| Member | img_pos |
| Type | int |
| Default Value | -1 |
| Description | Position index of the image in the prompt. If -1, the image is appended at the end of the prompt |
| Member | img_data |
| Type | uint8_t* |
| Default Value | nullptr |
| Description | Image data pointer, pointing to RGB format image pixel data. Developers need to pre-resize to the model's required width and height |
Initialization Parameter Type.struct AidmlmInitParam
Configuration parameters required for AidMLM initialization.
Member List
| Member Name | Type | Default Value | Description |
|---|---|---|---|
| vision_model_path | std::string | Vision encoder model file path | |
| pos_emb_cos_path | std::string | Position encoding cosine weight file path | |
| pos_emb_sin_path | std::string | Position encoding sine weight file path | |
| embedding_weights_path | std::string | Word embedding weights file path | |
| window_attention_mask_path | std::string | Window attention mask file path (Qwen2.5-VL only) | |
| full_attention_mask_path | std::string | Full attention mask file path (Qwen2.5-VL only) | |
| llm_model_path_vec | std::vector<std::string> | LLM model file path list | |
| dbg_opt | std::string | Debug options string | |
| type | ModelType | ModelType::RESERVED | Multimodal model type |
| qwen2vl_cfg | Qwen2VLConfig | Qwen2-VL model configuration | |
| qwen25vl_cfg | Qwen25VLConfig | Qwen2.5-VL model configuration | |
| enable_profiler | bool | false | Whether to enable the profiler |
| genie_log_level | int | 1 | Genie backend log level (1=ERROR, 2=WARN, 3=INFO, 4=VERBOSE) |
| use_shared_buffer | bool | false | Whether to use shared buffer |
| use_mmap | bool | false | Whether to use memory-mapped model loading |
| use_genie_load_model_ex | bool | false | Whether to use Genie extended model loading |
Model Configuration Structs
AidMLM provides predefined model configuration structs to specify vision model configurations for different resolutions and parameter scales. Developers can choose the corresponding configuration based on the model in use.
| Config Struct Name | Model | Image Size | Embedding Dim |
|---|---|---|---|
| Qwen2VLConfig | Qwen2-VL | 644×644 | 1536 |
| Qwen25VLConfig | Qwen2.5-VL 3B | 392×392 | 2048 |
| Qwen25VL3B644Config | Qwen2.5-VL 3B | 644×644 | 2048 |
| Qwen25VL3B672Config | Qwen2.5-VL 3B | 672×672 | 2048 |
| Qwen25VL7B392Config | Qwen2.5-VL 7B | 392×392 | 3584 |
| Qwen25VL7B644Config | Qwen2.5-VL 7B | 644×644 | 3584 |
| Qwen25VL7B672Config | Qwen2.5-VL 7B | 672×672 | 3584 |
Global Functions
Get Library Version.get_library_version()
| API | get_library_version |
| Description | Gets the version information of the AidMLM library |
| Parameters | void |
| Return Value | A string containing the library version information |
std::string version = aplux::aidmlm::get_library_version();
printf("Current aidmlm library version: %s\n", version.c_str());Multimodal Inference Class.class Aidmlm
An object instance of type Aidmlm is the main executor of multimodal inference operations and is used to carry out vision-language model inference processes.
Construction and Destruction
Aidmlm objects are created via the default constructor.
aplux::aidmlm::Aidmlm mlm_ctx;Set Log Level.set_log_level()
A static method that sets the minimum log output level for AidMLM.
| API | set_log_level |
| Description | Sets the minimum log output level (static method) |
| Parameters | log_level: LogLevel enum value |
| Return Value | void |
aplux::aidmlm::Aidmlm::set_log_level(aplux::aidmlm::LogLevel::INFO);Set Log File Prefix.set_log_file_prefix()
A static method that sets the log file name prefix.
| API | set_log_file_prefix |
| Description | Sets the log file name prefix (static method) |
| Parameters | log_file: Log file name prefix string |
| Return Value | void |
aplux::aidmlm::Aidmlm::set_log_file_prefix("./test_mlm");Initialization Operation.initialize()
Loads the multimodal model and initializes the inference environment.
| API | initialize |
| Description | Loads the model and completes the initialization work required for inference |
| Parameters | param: AidmlmInitParam struct reference containing model paths, configurations, and other initialization parameters |
| enable_profiler: Whether to enable the profiler, default value is false | |
| Return Value | A value of 0 indicates successful initialization; otherwise a non-zero value indicates failure |
aplux::aidmlm::AidmlmInitParam init_param;
init_param.type = aplux::aidmlm::ModelType::QWEN25VL;
init_param.vision_model_path = "/path/to/veg.serialized.bin.aidem";
init_param.pos_emb_cos_path = "/path/to/position_ids_cos.raw";
init_param.pos_emb_sin_path = "/path/to/position_ids_sin.raw";
init_param.embedding_weights_path = "/path/to/embedding_weights.raw";
init_param.window_attention_mask_path = "/path/to/window_attention_mask.raw";
init_param.full_attention_mask_path = "/path/to/full_attention_mask.raw";
init_param.llm_model_path_vec.push_back("/path/to/llm_model.serialized.bin.aidem");
init_param.use_genie_load_model_ex = true;
aplux::aidmlm::Aidmlm mlm_ctx;
if(mlm_ctx.initialize(init_param) < 0){
printf("AidMLM initialize failed.\n");
return EXIT_FAILURE;
}Sampling Parameter Setup Operation.set_sampler()
After initialization completes successfully, sampling parameters can be set with this function to control the randomness, diversity, and quality of generated content.
| API | set_sampler |
| Description | Sets sampling parameters to control the randomness and diversity of LLM outputs. |
| Parameters | key: Name of the sampling parameter. Currently supported:
|
value: Parameter value represented as a string:
| |
| Return Value | A value of 0 indicates success; a non-zero value indicates failure (e.g. invalid key or unsupported value format). |
mlm_ctx.set_sampler("top-k", "20");
mlm_ctx.set_sampler("temp", "0.8");Session Inference Operation.run()
After successful initialization, you can send image-text combined prompts to the multimodal model for inference.
💡Note
This function is not thread-safe. Only one thread can call the run method at a time.
| API | run |
| Description | Executes one multimodal session inference |
| Parameters | prompt: User prompt string |
| sys_prompt: System prompt string | |
| img_vec: Reference to an ImageData vector containing the image data to input | |
| cb: Callback function of type AidLLMCB for handling inference results | |
| starting_round: Whether this is the start of a new conversation round (true for new conversation start) | |
| Return Value | A value of 0 indicates the inference executed successfully; otherwise a non-zero value indicates failure |
// Define callback function
void my_callback(aplux::aidmlm::AidLLMCBData& cb_data, void* user_data){
if(cb_data.state == aplux::aidmlm::AidLLMState::START){
printf("[BOS]%s", cb_data.text.c_str());
}else if(cb_data.state == aplux::aidmlm::AidLLMState::FINISH){
printf("[EOS]%s\n", cb_data.text.c_str());
}else if(cb_data.state == aplux::aidmlm::AidLLMState::ERROR){
printf("[ERROR]%s\n", cb_data.text.c_str());
}else{
printf("%s", cb_data.text.c_str());
}
}
// Prepare image data (pre-resize to model-required dimensions, RGB format)
cv::Mat img = cv::imread("test.jpg");
cv::Mat img_rgb;
cv::cvtColor(img, img_rgb, cv::COLOR_BGR2RGB);
cv::Mat img_resized;
cv::resize(img_rgb, img_resized, cv::Size(392, 392));
aplux::aidmlm::ImageData img_data = {
.img_pos = -1,
.img_data = (uint8_t*)img_resized.data,
};
std::vector<aplux::aidmlm::ImageData> img_vec;
img_vec.push_back(img_data);
// Execute inference
std::string sys_prompt = "You are a helpful assistant.";
std::string user_prompt = "Please describe the scene in this image";
int run_result = mlm_ctx.run(user_prompt, sys_prompt, img_vec, my_callback, true);
if(run_result < 0){
printf("AidMLM run failed.\n");
return EXIT_FAILURE;
}Session Termination Operation.abort()
Used to interrupt the currently running inference session.
| API | abort |
| Description | Terminates the currently running inference session |
| Parameters | reserve: Reserved field, default value is nullptr |
| Return Value | A value of 0 indicates successful termination; otherwise a non-zero value indicates failure |
Reset Operation.reset()
In multi-round conversation scenarios, when you need to process the next image or restart a conversation, call reset to clear internal state.
| API | reset |
| Description | Resets the internal state of the inference engine to prepare for the next inference |
| Parameters | void |
| Return Value | A value of 0 indicates successful reset; otherwise a non-zero value indicates failure |
// Reset after processing one image, prepare for the next
if(mlm_ctx.reset() < 0){
printf("AidMLM reset failed.\n");
return EXIT_FAILURE;
}Final Release Operation.finalize()
Releases model resources and completes de-initialization.
| API | finalize |
| Description | Releases model resources and completes necessary de-initialization operations |
| Parameters | void |
| Return Value | A value of 0 indicates successful release; otherwise a non-zero value indicates failure |
if(mlm_ctx.finalize() < 0){
printf("AidMLM finalize failed.\n");
return EXIT_FAILURE;
}Get Profiler.get_profiler()
When the profiler is enabled during initialization, this function can be used to obtain the Profiler object pointer. For detailed usage, refer to the "Profiler C++ API Documentation" section below.
| API | get_profiler |
| Description | Gets the pointer to the profiler object |
| Parameters | void |
| Return Value | If the profiler is enabled, returns a Profiler object pointer; if not enabled, returns nullptr |
// Enable profiler
aplux::aidmlm::AidmlmInitParam init_param;
init_param.enable_profiler = true;
// ... other parameter setup ...
mlm_ctx.initialize(init_param, true);
// Get performance data after inference
aplux::aidgen::Profiler* profiler = mlm_ctx.get_profiler();
aplux::aidgen::ProfileData data = profiler->get_data();
printf("Init time: %lu us\n", data.init_time_us);
printf("Time to first token: %lu us\n", data.time_to_first_token_us);
printf("Generate rate: %.2f tok/s\n", data.generate_rate);
printf("ViT execute time: %lu us\n", data.vit_execute_time_us);Profiler C++ API Documentation
💡Note
Profiler-related interfaces are under the aplux::aidgen namespace (independent of aplux::aidllm and aplux::aidmlm).
- Header file path
/usr/local/include/aidlux/aidgen/profiler.hpp - Both AidLLM and AidMLM use their respective
get_profiler()methods to obtain a Profiler object for performance analysis
Performance Data Type.struct ProfileData
During inference, developers may want to monitor performance metrics at each stage. The ProfileData struct stores performance data collected during the inference process.
Member List
| Member Name | Type | Description |
|---|---|---|
| init_time_us | uint64_t | Initialization time (microseconds) |
| prompt_token_num | uint64_t | Number of input prompt tokens |
| prompt_processing_rate | float | Prompt processing rate (tok/s) |
| time_to_first_token_us | uint64_t | Time to first token (microseconds) |
| generated_token_num | uint64_t | Number of generated tokens |
| generate_rate | float | Token generation rate (tok/s) |
| generate_time_us | uint64_t | Total generation time (microseconds) |
| vit_execute_time_us | uint64_t | Vision model execution time (microseconds), AidMLM only |
| vit_init_time_us | uint64_t | Vision model initialization time (microseconds), AidMLM only |
| vit_preprocess_time_us | uint64_t | Vision model preprocessing time (microseconds), AidMLM only |
| vit_postprocess_time_us | uint64_t | Vision model postprocessing time (microseconds), AidMLM only |
Profiler Class.class Profiler
The Profiler class manages performance data collection during inference. It must be enabled during initialization (enable_profiler = true) to be used.
Get Performance Data.get_data()
Gets the currently collected performance data.
| API | get_data |
| Description | Gets the performance analysis data collected during inference |
| Parameters | void |
| Return Value | ProfileData struct containing performance metrics for each stage |
Reset Performance Data.reset()
Resets the collected performance data, typically called before starting a new inference round.
| API | reset |
| Description | Clears collected performance data and restores to initial state |
| Parameters | void |
| Return Value | void |
// Enable profiler during initialization
int init_result = llm_interpreter_ptr->initialize(true);
// Get the profiler
aplux::aidgen::Profiler* profiler = llm_interpreter_ptr->get_profiler();
// Execute inference...
llm_interpreter_ptr->run(prompt, dialog_callback);
// Get performance data
aplux::aidgen::ProfileData data = profiler->get_data();
printf("Time to first token: %lu us\n", data.time_to_first_token_us);
printf("Generate rate: %.2f tok/s\n", data.generate_rate);
printf("Generated token count: %lu\n", data.generated_token_num);
// Reset data, prepare for next inference round
profiler->reset();