Skip to content

AidGen C++ 接口文档

AidLLM C++ API Documentation

💡Note

Before developing with AidGen-SDK C++, please be aware of the following basics:

  • During compilation, include the header file located at /usr/local/include/aidlux/aidgen/aidllm.hpp
  • During linking, specify the library file located at /usr/local/lib/libaidgen.so
  • All interfaces are under the aplux::aidllm namespace

Inference Backend Type.enum LLmBackendType

For AidllmSDK, different inference backend frameworks are supported to implement LLM inference tasks. The available inference backends are listed below.

Member NameTypeValueDescription
TYPE_DEFAULTuint8_t0Unknown backend type
TYPE_GENIEuint8_t1Genie inference backend

Inference Task State.enum LLMSentenceState

During an inference task, a single session may go through multiple stages. Developers can use these state codes to understand the current runtime status of the inference task.

Member NameTypeValueDescription
BEGINenum class0Session start segment
CONTINUEenum class1Intermediate content during ongoing session inference
ENDenum class2Session ending segment
COMPLETEenum class3Current session completed successfully
ABORTenum class4Current session terminated passively
ERRORenum class5Current session inference error

Interpreter Runtime State.enum LLMState

The overall state of the Aidllm interpreter during runtime. Developers can query this state to understand the interpreter's current working status.

Member NameTypeValueDescription
STANDIDLEenum class0Idle standby state
BUSYINGenum class1Busy processing inference
ABORTenum class2Inference has been terminated
ERRORenum class3Inference encountered an error

Log Level.enum LogLevel

AidllmSDK provides an API for logging (introduced later). You need to specify which log level is currently used, so this log-level enum is required.

Member NameTypeValueDescription
INFOuint8_t0Message
WARNINGuint8_t1Warning
ERRORuint8_t2Error
FATALuint8_t3Fatal error

Global Functions

Get Library Version.get_library_version()

Gets the version information string of the current Aidllm library.

API get_library_version
Description Gets the version information of the Aidllm library
Parameters void
Return Value A string containing the library version information
cpp
std::string version = aplux::aidllm::get_library_version();
printf("Current aidllm library version: %s\n", version.c_str());

Set Log Level.set_log_level()

Sets the minimum log output level for Aidllm. Logs below this level will not be output.

API set_log_level
Description Sets the minimum log output level
Parameters log_level: LogLevel enum value specifying the minimum log level
Return Value void
cpp
aplux::aidllm::set_log_level(aplux::aidllm::LogLevel::ERROR);

Set Log File Prefix.set_log_file_prefix()

Sets the log file name prefix for outputting logs to files with the specified prefix.

API set_log_file_prefix
Description Sets the log file name prefix
Parameters log_file_prefix: Log file name prefix string
Return Value void
cpp
aplux::aidllm::set_log_file_prefix("aidllm_log_");

Inference Callback Function Type.LLMCallback

The callback function type definition used during Aidllm inference. Developers need to implement a callback function of this type to handle inference results.

cpp
using LLMCallback = std::function<int32_t(LLMCallbackData& cb_data, void* user_data)>;

💡Note

The callback function return type is int32_t. Returning 0 indicates normal continuation of inference; a non-zero value can be used to control the inference flow.

Inference Callback Data Type.struct LLMCallbackData

During inference tasks, Aidllm uses developer-provided callback functions. This data type is the argument passed to that callback function, and developers can use it in custom callbacks to process inference results.

Member List

The LLMCallbackData struct contains the following members:

Member state
Type enum LLMSentenceState
Default Value
Description Status code of the current inference session
Member text
Type std::string
Default Value
Description Result text of the inference task / message corresponding to special status codes

Runtime Context Class.class LLMContext

During Aidllm runtime, some configuration information may need to be set, and runtime-related data also needs to be passed around. Objects of this runtime context type are used to complete data flow.

Create Instance Object.create_instance()

To set runtime context information, you first need a configuration instance object. This function is used to create an instance object of type LLMContext.

API create_instance
Description Used to construct an instance object of class LLMContext
Parameters config_file: Initial configuration file, where key information such as backend type and model file names can be configured
Return Value If it is nullptr, object construction failed; otherwise, it is a pointer to an LLMContext object
cpp
// Create a configuration instance object; report an error if the return value is null
std::unique_ptr<LLMContext> llm_context_ptr = LLMContext::create_instance("qwen2-7b/qwen2-7b.json");
if(llm_context_ptr == nullptr){
    printf("Test sample: LLMContext create_instance failed.\n");
    return EXIT_FAILURE;
}

Member List

The LLMContext object is used to manage runtime configuration information, including the following parameters:

Member config_file
Type std::string
Default Value
Description Initial configuration file, the config file parameter passed when creating the object
Member backend_type
Type LLmBackendType
Default Value LLmBackendType::TYPE_DEFAULT
Description The developer is required to specify the inference backend in the config file. After initialization parses the config file, this field will be overwritten to indicate the backend type specified by the developer
Member model_file_vec
Type std::vector<std::string>
Default Value
Description The developer is required to specify model files in the config file. After initialization parses the config file, this field will be overwritten to indicate the model files specified by the developer
Member config_overwrite_options
Type std::string
Default Value
Description By setting this field, you can specify certain key parameters in the inference process, thereby affecting inference speed, inference results, etc.
Member android_tmp_directory
Type std::string
Default Value
Description This field is only valid on the Android platform. By setting this field, you can specify a directory for which the system user has valid permissions, for temporary use by the inference program

Interpreter Class.class LLMInterpreter

An object instance of type LLMInterpreter is the main executor of inference operations and is used to carry out specific inference processes.

Create Instance Object.create_instance()

To perform inference-related operations, an inference interpreter is essential. This function is used to construct an instance object of the inference interpreter.

API create_instance
Description Uses various data managed by the LLMContext object to construct an object of type LLMInterpreter
Parameters llm_context: Reference to the unique_ptr of an LLMContext instance object (std::unique_ptr<LLMContext>&)
reserve: Reserved field, default value is nullptr
Return Value If it is nullptr, object construction failed; otherwise, it is a unique_ptr to an LLMInterpreter object
cpp
 // Use the LLMContext object pointer to create the interpreter object; report an error if the return value is null
std::unique_ptr<LLMInterpreter> llm_interpreter_ptr = LLMInterpreter::create_instance(llm_context_ptr);
if(llm_interpreter_ptr == nullptr){
    printf("Test sample: LLMInterpreter create_instance failed.\n");
    return EXIT_FAILURE;
}

Initialization Operation.initialize()

After the interpreter object is created, some initialization operations are required, such as environment checks and resource construction.

API initialize
Description Completes the initialization work required for inference
Parameters enable_profiler: Whether to enable the profiler, default value is false
reserve: Reserved field, default value is nullptr
Return Value A value of 0 indicates successful initialization; otherwise a non-zero value indicates failure
cpp
// Initialize the interpreter; report an error if the return value is non-zero
int init_result = llm_interpreter_ptr->initialize();
if(init_result != EXIT_SUCCESS){
    printf("Test sample: aidllm initialize failed.\n");
    return EXIT_FAILURE;
}

Sampling Parameter Setup Operation.set_sampler()

After initialization completes successfully, sampling parameters can be set with this function to control the randomness, diversity, and quality of generated content.

API set_sampler
Description Sets sampling parameters to control the randomness and diversity of LLM outputs.
Parameters key: Name of the sampling parameter. Currently supported:
  • "temp": Controls output randomness (Temperature); smaller values are more conservative.
  • "top-k": Limits the sampling range to the top K tokens with the highest probability.
  • "top-p": Nucleus Sampling; limits to the token pool whose cumulative probability reaches P.
value: Parameter value represented as a string:
  • For "temp": floating-point numeric string (e.g. "1.2").
  • For "top-k": integer numeric string (e.g. "20").
  • For "top-p": floating-point numeric string (e.g. "0.6").
Return Value A value of 0 indicates success; a non-zero value indicates failure (e.g. invalid key or unsupported value format).
cpp
// Set sampling parameters
llm_interpreter_ptr->set_sampler("temp", "0.8");
llm_interpreter_ptr->set_sampler("top-k", "20");
llm_interpreter_ptr->set_sampler("top-p", "0.9");

Session Inference Operation.run()

After successful initialization, you can run dialog inference with the LLM. Developers provide a custom callback function to handle continuous inference results during the session.

API run
Description Executes one session inference
Parameters prompt: Prompt string
cb: Callback function of type LLMCallback for handling continuous inference results during the session
user_data: Pointer to user data, convenient for using this data in custom callback functions, default value is nullptr
Return Value A value of 0 indicates the inference executed successfully; otherwise a non-zero value indicates failure
cpp
// Define callback function
LLMCallback dialog_callback = [&](LLMCallbackData& cb_data, void* user_data)->int32_t{
    if(cb_data.state == LLMSentenceState::BEGIN){
        printf("%s", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::CONTINUE){
        printf("%s", cb_data.text.c_str());
        fflush(stdout);
    }else if(cb_data.state == LLMSentenceState::END){
        printf("%s\n", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::COMPLETE){
        printf("\n[COMPLETE]%s\n", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::ABORT){
        printf("\n[ABORT]%s\n", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::ERROR){
        printf("\n[ERROR]%s\n", cb_data.text.c_str());
    }
    return EXIT_SUCCESS;
};

// Execute inference
std::string prompt = "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n";
int run_result = llm_interpreter_ptr->run(prompt, dialog_callback);
if(run_result != EXIT_SUCCESS){
    printf("Test sample: aidllm run failed.\n");
    return EXIT_FAILURE;
}

Query Inference State.state()

During inference, developers may need to query the current runtime state of the interpreter, such as determining whether it is idle or actively inferring.

API state
Description Gets the current runtime state of the inference task
Parameters state: Reference to an LLMState variable; the function will overwrite this variable with the current state
Return Value A value of 0 indicates the query executed successfully; otherwise a non-zero value indicates failure
cpp
LLMState current_state = LLMState::STANDIDLE;
llm_interpreter_ptr->state(current_state);
printf("Current state: %d\n", (int)current_state);

Session Termination Operation.abort()

In some situations, users may want to interrupt the session that is currently running inference. This function is used to terminate inference.

⚠️Warning

It is strictly forbidden to call the abort function inside the callback function (LLMCallback), as this may cause deadlocks or undefined behavior.

API abort
Description Terminates the currently running inference session
Parameters reserve: Reserved field, default value is nullptr
Return Value A value of 0 indicates successful termination; otherwise a non-zero value indicates failure
cpp
// Terminate inference in another thread
int abort_result = llm_interpreter_ptr->abort();
if(abort_result != EXIT_SUCCESS){
    printf("Test sample: aidllm abort failed.\n");
    return EXIT_FAILURE;
}

Final Release Operation.finalize()

As mentioned above, the interpreter object needs to run initialize() for initialization. Correspondingly, the interpreter also needs to run release operations to destroy previously created resources.

API finalize
Description Completes necessary de-initialization and release operations
Parameters reserve: Reserved field, default value is nullptr
Return Value A value of 0 indicates the release operation executed successfully; otherwise a non-zero value indicates failure
cpp
// Execute interpreter de-initialization; report an error if the return value is non-zero
int fin_result = llm_interpreter_ptr->finalize();
if(fin_result != EXIT_SUCCESS){
    printf("Test : aidllm finalize failed.\n");
    return EXIT_FAILURE;
}

Get Profiler.get_profiler()

When the profiler is enabled during initialization (enable_profiler = true), this function can be used to obtain the profiler object pointer for performance data collection and analysis. For detailed usage, refer to the "Profiler C++ API Documentation" section below.

API get_profiler
Description Gets the pointer to the profiler object
Parameters void
Return Value If the profiler is enabled, returns a Profiler object pointer; if not enabled, returns nullptr
cpp
// Enable profiler during initialization
int init_result = llm_interpreter_ptr->initialize(true);

// Get the profiler
aplux::aidgen::Profiler* profiler = llm_interpreter_ptr->get_profiler();

AidMLM C++ API Documentation

💡Note

Before developing with AidMLM-SDK C++, please be aware of the following basics:

  • During compilation, include the header file located at /usr/local/include/aidlux/aidgen/aidmlm.hpp
  • During linking, specify the library file located at /usr/local/lib/libaidgen.so
  • All interfaces are under the aplux::aidmlm namespace
  • AidMLM is designed for multimodal large model (vision-language model) inference, currently supporting Qwen2-VL and Qwen2.5-VL series models

Inference State.enum AidLLMState

During an AidMLM inference task, a single session may go through various stages. Developers can use these state codes to understand the current runtime status of the inference task.

Member NameTypeValueDescription
STANDenum class0Not yet working
STARTenum class1Inference started
BUSYINGenum class2Inference in progress
FINISHenum class3Inference finished
COMPLETEenum class4Inference completed fully or truncated
WAITINGenum class5Current token decoding failed, waiting for next decode
ABORTenum class6Current inference terminated early by developer
ERRORenum class7Inference failed due to exception

Log Level.enum LogLevel

Member NameTypeValueDescription
INFOuint8_t0Message
WARNINGuint8_t1Warning
ERRORuint8_t2Error
FATALuint8_t3Fatal error

Model Type.enum ModelType

Specifies the type of multimodal model currently in use.

Member NameTypeValueDescription
RESERVEDenum class0Reserved type
QWEN2VLenum class1Qwen2-VL model
QWEN25VLenum class2Qwen2.5-VL model

Inference Callback Data Type.struct AidLLMCBData

During AidMLM inference tasks, developer-provided callback functions are used. This data type is the argument passed to that callback function.

Member List

Member state
Type enum AidLLMState
Default Value
Description Status code of the current inference session
Member text
Type std::string
Default Value
Description Result text of the inference task / message corresponding to special status codes

Inference Callback Function Type.AidLLMCB

The callback function type definition used during AidMLM inference.

cpp
using AidLLMCB = std::function<void(AidLLMCBData& cb_data, void* user_data)>;

Image Data Type.struct ImageData

A struct for passing image data to the multimodal model.

Member List

Member img_pos
Type int
Default Value -1
Description Position index of the image in the prompt. If -1, the image is appended at the end of the prompt
Member img_data
Type uint8_t*
Default Value nullptr
Description Image data pointer, pointing to RGB format image pixel data. Developers need to pre-resize to the model's required width and height

Initialization Parameter Type.struct AidmlmInitParam

Configuration parameters required for AidMLM initialization.

Member List

Member NameTypeDefault ValueDescription
vision_model_pathstd::stringVision encoder model file path
pos_emb_cos_pathstd::stringPosition encoding cosine weight file path
pos_emb_sin_pathstd::stringPosition encoding sine weight file path
embedding_weights_pathstd::stringWord embedding weights file path
window_attention_mask_pathstd::stringWindow attention mask file path (Qwen2.5-VL only)
full_attention_mask_pathstd::stringFull attention mask file path (Qwen2.5-VL only)
llm_model_path_vecstd::vector<std::string>LLM model file path list
dbg_optstd::stringDebug options string
typeModelTypeModelType::RESERVEDMultimodal model type
qwen2vl_cfgQwen2VLConfigQwen2-VL model configuration
qwen25vl_cfgQwen25VLConfigQwen2.5-VL model configuration
enable_profilerboolfalseWhether to enable the profiler
genie_log_levelint1Genie backend log level (1=ERROR, 2=WARN, 3=INFO, 4=VERBOSE)
use_shared_bufferboolfalseWhether to use shared buffer
use_mmapboolfalseWhether to use memory-mapped model loading
use_genie_load_model_exboolfalseWhether to use Genie extended model loading

Model Configuration Structs

AidMLM provides predefined model configuration structs to specify vision model configurations for different resolutions and parameter scales. Developers can choose the corresponding configuration based on the model in use.

Config Struct NameModelImage SizeEmbedding Dim
Qwen2VLConfigQwen2-VL644×6441536
Qwen25VLConfigQwen2.5-VL 3B392×3922048
Qwen25VL3B644ConfigQwen2.5-VL 3B644×6442048
Qwen25VL3B672ConfigQwen2.5-VL 3B672×6722048
Qwen25VL7B392ConfigQwen2.5-VL 7B392×3923584
Qwen25VL7B644ConfigQwen2.5-VL 7B644×6443584
Qwen25VL7B672ConfigQwen2.5-VL 7B672×6723584

Global Functions

Get Library Version.get_library_version()

API get_library_version
Description Gets the version information of the AidMLM library
Parameters void
Return Value A string containing the library version information
cpp
std::string version = aplux::aidmlm::get_library_version();
printf("Current aidmlm library version: %s\n", version.c_str());

Multimodal Inference Class.class Aidmlm

An object instance of type Aidmlm is the main executor of multimodal inference operations and is used to carry out vision-language model inference processes.

Construction and Destruction

Aidmlm objects are created via the default constructor.

cpp
aplux::aidmlm::Aidmlm mlm_ctx;

Set Log Level.set_log_level()

A static method that sets the minimum log output level for AidMLM.

API set_log_level
Description Sets the minimum log output level (static method)
Parameters log_level: LogLevel enum value
Return Value void
cpp
aplux::aidmlm::Aidmlm::set_log_level(aplux::aidmlm::LogLevel::INFO);

Set Log File Prefix.set_log_file_prefix()

A static method that sets the log file name prefix.

API set_log_file_prefix
Description Sets the log file name prefix (static method)
Parameters log_file: Log file name prefix string
Return Value void
cpp
aplux::aidmlm::Aidmlm::set_log_file_prefix("./test_mlm");

Initialization Operation.initialize()

Loads the multimodal model and initializes the inference environment.

API initialize
Description Loads the model and completes the initialization work required for inference
Parameters param: AidmlmInitParam struct reference containing model paths, configurations, and other initialization parameters
enable_profiler: Whether to enable the profiler, default value is false
Return Value A value of 0 indicates successful initialization; otherwise a non-zero value indicates failure
cpp
aplux::aidmlm::AidmlmInitParam init_param;
init_param.type = aplux::aidmlm::ModelType::QWEN25VL;
init_param.vision_model_path = "/path/to/veg.serialized.bin.aidem";
init_param.pos_emb_cos_path = "/path/to/position_ids_cos.raw";
init_param.pos_emb_sin_path = "/path/to/position_ids_sin.raw";
init_param.embedding_weights_path = "/path/to/embedding_weights.raw";
init_param.window_attention_mask_path = "/path/to/window_attention_mask.raw";
init_param.full_attention_mask_path = "/path/to/full_attention_mask.raw";
init_param.llm_model_path_vec.push_back("/path/to/llm_model.serialized.bin.aidem");
init_param.use_genie_load_model_ex = true;

aplux::aidmlm::Aidmlm mlm_ctx;
if(mlm_ctx.initialize(init_param) < 0){
    printf("AidMLM initialize failed.\n");
    return EXIT_FAILURE;
}

Sampling Parameter Setup Operation.set_sampler()

After initialization completes successfully, sampling parameters can be set with this function to control the randomness, diversity, and quality of generated content.

API set_sampler
Description Sets sampling parameters to control the randomness and diversity of LLM outputs.
Parameters key: Name of the sampling parameter. Currently supported:
  • "temp": Controls output randomness (Temperature); smaller values are more conservative.
  • "top-k": Limits the sampling range to the top K tokens with the highest probability.
  • "top-p": Nucleus Sampling; limits to the token pool whose cumulative probability reaches P.
value: Parameter value represented as a string:
  • For "temp": floating-point numeric string (e.g. "1.2").
  • For "top-k": integer numeric string (e.g. "20").
  • For "top-p": floating-point numeric string (e.g. "0.6").
Return Value A value of 0 indicates success; a non-zero value indicates failure (e.g. invalid key or unsupported value format).
cpp
mlm_ctx.set_sampler("top-k", "20");
mlm_ctx.set_sampler("temp", "0.8");

Session Inference Operation.run()

After successful initialization, you can send image-text combined prompts to the multimodal model for inference.

💡Note

This function is not thread-safe. Only one thread can call the run method at a time.

API run
Description Executes one multimodal session inference
Parameters prompt: User prompt string
sys_prompt: System prompt string
img_vec: Reference to an ImageData vector containing the image data to input
cb: Callback function of type AidLLMCB for handling inference results
starting_round: Whether this is the start of a new conversation round (true for new conversation start)
Return Value A value of 0 indicates the inference executed successfully; otherwise a non-zero value indicates failure
cpp
// Define callback function
void my_callback(aplux::aidmlm::AidLLMCBData& cb_data, void* user_data){
    if(cb_data.state == aplux::aidmlm::AidLLMState::START){
        printf("[BOS]%s", cb_data.text.c_str());
    }else if(cb_data.state == aplux::aidmlm::AidLLMState::FINISH){
        printf("[EOS]%s\n", cb_data.text.c_str());
    }else if(cb_data.state == aplux::aidmlm::AidLLMState::ERROR){
        printf("[ERROR]%s\n", cb_data.text.c_str());
    }else{
        printf("%s", cb_data.text.c_str());
    }
}

// Prepare image data (pre-resize to model-required dimensions, RGB format)
cv::Mat img = cv::imread("test.jpg");
cv::Mat img_rgb;
cv::cvtColor(img, img_rgb, cv::COLOR_BGR2RGB);
cv::Mat img_resized;
cv::resize(img_rgb, img_resized, cv::Size(392, 392));

aplux::aidmlm::ImageData img_data = {
    .img_pos = -1,
    .img_data = (uint8_t*)img_resized.data,
};
std::vector<aplux::aidmlm::ImageData> img_vec;
img_vec.push_back(img_data);

// Execute inference
std::string sys_prompt = "You are a helpful assistant.";
std::string user_prompt = "Please describe the scene in this image";
int run_result = mlm_ctx.run(user_prompt, sys_prompt, img_vec, my_callback, true);
if(run_result < 0){
    printf("AidMLM run failed.\n");
    return EXIT_FAILURE;
}

Session Termination Operation.abort()

Used to interrupt the currently running inference session.

API abort
Description Terminates the currently running inference session
Parameters reserve: Reserved field, default value is nullptr
Return Value A value of 0 indicates successful termination; otherwise a non-zero value indicates failure

Reset Operation.reset()

In multi-round conversation scenarios, when you need to process the next image or restart a conversation, call reset to clear internal state.

API reset
Description Resets the internal state of the inference engine to prepare for the next inference
Parameters void
Return Value A value of 0 indicates successful reset; otherwise a non-zero value indicates failure
cpp
// Reset after processing one image, prepare for the next
if(mlm_ctx.reset() < 0){
    printf("AidMLM reset failed.\n");
    return EXIT_FAILURE;
}

Final Release Operation.finalize()

Releases model resources and completes de-initialization.

API finalize
Description Releases model resources and completes necessary de-initialization operations
Parameters void
Return Value A value of 0 indicates successful release; otherwise a non-zero value indicates failure
cpp
if(mlm_ctx.finalize() < 0){
    printf("AidMLM finalize failed.\n");
    return EXIT_FAILURE;
}

Get Profiler.get_profiler()

When the profiler is enabled during initialization, this function can be used to obtain the Profiler object pointer. For detailed usage, refer to the "Profiler C++ API Documentation" section below.

API get_profiler
Description Gets the pointer to the profiler object
Parameters void
Return Value If the profiler is enabled, returns a Profiler object pointer; if not enabled, returns nullptr
cpp
// Enable profiler
aplux::aidmlm::AidmlmInitParam init_param;
init_param.enable_profiler = true;
// ... other parameter setup ...
mlm_ctx.initialize(init_param, true);

// Get performance data after inference
aplux::aidgen::Profiler* profiler = mlm_ctx.get_profiler();
aplux::aidgen::ProfileData data = profiler->get_data();
printf("Init time: %lu us\n", data.init_time_us);
printf("Time to first token: %lu us\n", data.time_to_first_token_us);
printf("Generate rate: %.2f tok/s\n", data.generate_rate);
printf("ViT execute time: %lu us\n", data.vit_execute_time_us);

Profiler C++ API Documentation

💡Note

Profiler-related interfaces are under the aplux::aidgen namespace (independent of aplux::aidllm and aplux::aidmlm).

  • Header file path /usr/local/include/aidlux/aidgen/profiler.hpp
  • Both AidLLM and AidMLM use their respective get_profiler() methods to obtain a Profiler object for performance analysis

Performance Data Type.struct ProfileData

During inference, developers may want to monitor performance metrics at each stage. The ProfileData struct stores performance data collected during the inference process.

Member List

Member NameTypeDescription
init_time_usuint64_tInitialization time (microseconds)
prompt_token_numuint64_tNumber of input prompt tokens
prompt_processing_ratefloatPrompt processing rate (tok/s)
time_to_first_token_usuint64_tTime to first token (microseconds)
generated_token_numuint64_tNumber of generated tokens
generate_ratefloatToken generation rate (tok/s)
generate_time_usuint64_tTotal generation time (microseconds)
vit_execute_time_usuint64_tVision model execution time (microseconds), AidMLM only
vit_init_time_usuint64_tVision model initialization time (microseconds), AidMLM only
vit_preprocess_time_usuint64_tVision model preprocessing time (microseconds), AidMLM only
vit_postprocess_time_usuint64_tVision model postprocessing time (microseconds), AidMLM only

Profiler Class.class Profiler

The Profiler class manages performance data collection during inference. It must be enabled during initialization (enable_profiler = true) to be used.

Get Performance Data.get_data()

Gets the currently collected performance data.

API get_data
Description Gets the performance analysis data collected during inference
Parameters void
Return Value ProfileData struct containing performance metrics for each stage

Reset Performance Data.reset()

Resets the collected performance data, typically called before starting a new inference round.

API reset
Description Clears collected performance data and restores to initial state
Parameters void
Return Value void
cpp
// Enable profiler during initialization
int init_result = llm_interpreter_ptr->initialize(true);

// Get the profiler
aplux::aidgen::Profiler* profiler = llm_interpreter_ptr->get_profiler();

// Execute inference...
llm_interpreter_ptr->run(prompt, dialog_callback);

// Get performance data
aplux::aidgen::ProfileData data = profiler->get_data();
printf("Time to first token: %lu us\n", data.time_to_first_token_us);
printf("Generate rate: %.2f tok/s\n", data.generate_rate);
printf("Generated token count: %lu\n", data.generated_token_num);

// Reset data, prepare for next inference round
profiler->reset();