AidGen C++ API

AidLLM C++ API Documentation

💡Note

Before developing with AidGen-SDK C++, please be aware of the following basics:

During compilation, include the header file located at /usr/local/include/aidlux/aidgen/aidllm.hpp
During linking, specify the library file located at /usr/local/lib/libaidgen.so
All interfaces are under the aplux::aidllm namespace

Inference Backend Type.enum LLmBackendType

For AidllmSDK, different inference backend frameworks are supported to implement LLM inference tasks. The available inference backends are listed below.

Member Name	Type	Value	Description
TYPE_DEFAULT	uint8_t	0	Unknown backend type
TYPE_GENIE	uint8_t	1	Genie inference backend

Inference Task State.enum LLMSentenceState

During an inference task, a single session may go through multiple stages. Developers can use these state codes to understand the current runtime status of the inference task.

Member Name	Type	Value	Description
BEGIN	enum class	0	Session start segment
CONTINUE	enum class	1	Intermediate content during ongoing session inference
END	enum class	2	Session ending segment
COMPLETE	enum class	3	Current session completed successfully
ABORT	enum class	4	Current session terminated passively
ERROR	enum class	5	Current session inference error

Interpreter Runtime State.enum LLMState

The overall state of the Aidllm interpreter during runtime. Developers can query this state to understand the interpreter's current working status.

Member Name	Type	Value	Description
STANDIDLE	enum class	0	Idle standby state
BUSYING	enum class	1	Busy processing inference
ABORT	enum class	2	Inference has been terminated
ERROR	enum class	3	Inference encountered an error

Log Level.enum LogLevel

AidllmSDK provides an API for logging (introduced later). You need to specify which log level is currently used, so this log-level enum is required.

Member Name	Type	Value	Description
INFO	uint8_t	0	Message
WARNING	uint8_t	1	Warning
ERROR	uint8_t	2	Error
FATAL	uint8_t	3	Fatal error

Global Functions

Get Library Version.get_library_version()

Gets the version information string of the current Aidllm library.

API	get_library_version
Description	Gets the version information of the Aidllm library
Parameters	void
Return Value	A string containing the library version information

cpp

std::string version = aplux::aidllm::get_library_version();
printf("Current aidllm library version: %s\n", version.c_str());

Set Log Level.set_log_level()

Sets the minimum log output level for Aidllm. Logs below this level will not be output.

API	set_log_level
Description	Sets the minimum log output level
Parameters	log_level: LogLevel enum value specifying the minimum log level
Return Value	void

cpp

aplux::aidllm::set_log_level(aplux::aidllm::LogLevel::ERROR);

Set Log File Prefix.set_log_file_prefix()

Sets the log file name prefix for outputting logs to files with the specified prefix.

API	set_log_file_prefix
Description	Sets the log file name prefix
Parameters	log_file_prefix: Log file name prefix string
Return Value	void

cpp

aplux::aidllm::set_log_file_prefix("aidllm_log_");

Inference Callback Function Type.LLMCallback

The callback function type definition used during Aidllm inference. Developers need to implement a callback function of this type to handle inference results.

cpp

using LLMCallback = std::function<int32_t(LLMCallbackData& cb_data, void* user_data)>;

💡Note

The callback function return type is int32_t. Returning 0 indicates normal continuation of inference; a non-zero value can be used to control the inference flow.

Inference Callback Data Type.struct LLMCallbackData

During inference tasks, Aidllm uses developer-provided callback functions. This data type is the argument passed to that callback function, and developers can use it in custom callbacks to process inference results.

Member List

The LLMCallbackData struct contains the following members:

Member	state
Type	enum LLMSentenceState
Default Value
Description	Status code of the current inference session

Member	text
Type	std::string
Default Value
Description	Result text of the inference task / message corresponding to special status codes

Runtime Context Class.class LLMContext

During Aidllm runtime, some configuration information may need to be set, and runtime-related data also needs to be passed around. Objects of this runtime context type are used to complete data flow.

Create Instance Object.create_instance()

To set runtime context information, you first need a configuration instance object. This function is used to create an instance object of type LLMContext.

API	create_instance
Description	Used to construct an instance object of class LLMContext
Parameters	config_file: Initial configuration file, where key information such as backend type and model file names can be configured
Return Value	If it is nullptr, object construction failed; otherwise, it is a pointer to an LLMContext object

cpp

// Create a configuration instance object; report an error if the return value is null
std::unique_ptr<LLMContext> llm_context_ptr = LLMContext::create_instance("qwen2-7b/qwen2-7b.json");
if(llm_context_ptr == nullptr){
    printf("Test sample: LLMContext create_instance failed.\n");
    return EXIT_FAILURE;
}

Member List

The LLMContext object is used to manage runtime configuration information, including the following parameters:

Member	config_file
Type	std::string
Default Value
Description	Initial configuration file, the config file parameter passed when creating the object

Member	backend_type
Type	LLmBackendType
Default Value	LLmBackendType::TYPE_DEFAULT
Description	The developer is required to specify the inference backend in the config file. After initialization parses the config file, this field will be overwritten to indicate the backend type specified by the developer

Member	model_file_vec
Type	std::vector<std::string>
Default Value
Description	The developer is required to specify model files in the config file. After initialization parses the config file, this field will be overwritten to indicate the model files specified by the developer

Member	config_overwrite_options
Type	std::string
Default Value
Description	By setting this field, you can specify certain key parameters in the inference process, thereby affecting inference speed, inference results, etc.

Member	android_tmp_directory
Type	std::string
Default Value
Description	This field is only valid on the Android platform. By setting this field, you can specify a directory for which the system user has valid permissions, for temporary use by the inference program

Interpreter Class.class LLMInterpreter

An object instance of type LLMInterpreter is the main executor of inference operations and is used to carry out specific inference processes.

Create Instance Object.create_instance()

To perform inference-related operations, an inference interpreter is essential. This function is used to construct an instance object of the inference interpreter.

API	create_instance
Description	Uses various data managed by the LLMContext object to construct an object of type LLMInterpreter
Parameters	llm_context: Reference to the unique_ptr of an LLMContext instance object (std::unique_ptr<LLMContext>&)
Parameters	reserve: Reserved field, default value is nullptr
Return Value	If it is nullptr, object construction failed; otherwise, it is a unique_ptr to an LLMInterpreter object

cpp

 // Use the LLMContext object pointer to create the interpreter object; report an error if the return value is null
std::unique_ptr<LLMInterpreter> llm_interpreter_ptr = LLMInterpreter::create_instance(llm_context_ptr);
if(llm_interpreter_ptr == nullptr){
    printf("Test sample: LLMInterpreter create_instance failed.\n");
    return EXIT_FAILURE;
}

Initialization Operation.initialize()

After the interpreter object is created, some initialization operations are required, such as environment checks and resource construction.

API	initialize
Description	Completes the initialization work required for inference
Parameters	enable_profiler: Whether to enable the profiler, default value is false
Parameters	reserve: Reserved field, default value is nullptr
Return Value	A value of 0 indicates successful initialization; otherwise a non-zero value indicates failure

cpp

// Initialize the interpreter; report an error if the return value is non-zero
int init_result = llm_interpreter_ptr->initialize();
if(init_result != EXIT_SUCCESS){
    printf("Test sample: aidllm initialize failed.\n");
    return EXIT_FAILURE;
}

Sampling Parameter Setup Operation.set_sampler()

After initialization completes successfully, sampling parameters can be set with this function to control the randomness, diversity, and quality of generated content.

API	set_sampler
Description	Sets sampling parameters to control the randomness and diversity of LLM outputs.
Parameters	key: Name of the sampling parameter. Currently supported: `"temp"`: Controls output randomness (Temperature); smaller values are more conservative. `"top-k"`: Limits the sampling range to the top K tokens with the highest probability. `"top-p"`: Nucleus Sampling; limits to the token pool whose cumulative probability reaches P.
Parameters	value: Parameter value represented as a string: For `"temp"`: floating-point numeric string (e.g. `"1.2"`). For `"top-k"`: integer numeric string (e.g. `"20"`). For `"top-p"`: floating-point numeric string (e.g. `"0.6"`).
Return Value	A value of 0 indicates success; a non-zero value indicates failure (e.g. invalid key or unsupported value format).

cpp

// Set sampling parameters
llm_interpreter_ptr->set_sampler("temp", "0.8");
llm_interpreter_ptr->set_sampler("top-k", "20");
llm_interpreter_ptr->set_sampler("top-p", "0.9");

Session Inference Operation.run()

After successful initialization, you can run dialog inference with the LLM. Developers provide a custom callback function to handle continuous inference results during the session.

API	run
Description	Executes one session inference
Parameters	prompt: Prompt string
	cb: Callback function of type LLMCallback for handling continuous inference results during the session
	user_data: Pointer to user data, convenient for using this data in custom callback functions, default value is nullptr
Return Value	A value of 0 indicates the inference executed successfully; otherwise a non-zero value indicates failure

cpp

// Define callback function
LLMCallback dialog_callback = [&](LLMCallbackData& cb_data, void* user_data)->int32_t{
    if(cb_data.state == LLMSentenceState::BEGIN){
        printf("%s", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::CONTINUE){
        printf("%s", cb_data.text.c_str());
        fflush(stdout);
    }else if(cb_data.state == LLMSentenceState::END){
        printf("%s\n", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::COMPLETE){
        printf("\n[COMPLETE]%s\n", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::ABORT){
        printf("\n[ABORT]%s\n", cb_data.text.c_str());
    }else if(cb_data.state == LLMSentenceState::ERROR){
        printf("\n[ERROR]%s\n", cb_data.text.c_str());
    }
    return EXIT_SUCCESS;
};

// Execute inference
std::string prompt = "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n";
int run_result = llm_interpreter_ptr->run(prompt, dialog_callback);
if(run_result != EXIT_SUCCESS){
    printf("Test sample: aidllm run failed.\n");
    return EXIT_FAILURE;
}

Query Inference State.state()

During inference, developers may need to query the current runtime state of the interpreter, such as determining whether it is idle or actively inferring.

API	state
Description	Gets the current runtime state of the inference task
Parameters	state: Reference to an LLMState variable; the function will overwrite this variable with the current state
Return Value	A value of 0 indicates the query executed successfully; otherwise a non-zero value indicates failure

cpp

LLMState current_state = LLMState::STANDIDLE;
llm_interpreter_ptr->state(current_state);
printf("Current state: %d\n", (int)current_state);

Session Termination Operation.abort()

In some situations, users may want to interrupt the session that is currently running inference. This function is used to terminate inference.

⚠️Warning

It is strictly forbidden to call the abort function inside the callback function (LLMCallback), as this may cause deadlocks or undefined behavior.

API	abort
Description	Terminates the currently running inference session
Parameters	reserve: Reserved field, default value is nullptr
Return Value	A value of 0 indicates successful termination; otherwise a non-zero value indicates failure

cpp

// Terminate inference in another thread
int abort_result = llm_interpreter_ptr->abort();
if(abort_result != EXIT_SUCCESS){
    printf("Test sample: aidllm abort failed.\n");
    return EXIT_FAILURE;
}

Final Release Operation.finalize()

As mentioned above, the interpreter object needs to run initialize() for initialization. Correspondingly, the interpreter also needs to run release operations to destroy previously created resources.

API	finalize
Description	Completes necessary de-initialization and release operations
Parameters	reserve: Reserved field, default value is nullptr
Return Value	A value of 0 indicates the release operation executed successfully; otherwise a non-zero value indicates failure

cpp

// Execute interpreter de-initialization; report an error if the return value is non-zero
int fin_result = llm_interpreter_ptr->finalize();
if(fin_result != EXIT_SUCCESS){
    printf("Test : aidllm finalize failed.\n");
    return EXIT_FAILURE;
}

Get Profiler.get_profiler()

When the profiler is enabled during initialization (enable_profiler = true), this function can be used to obtain the profiler object pointer for performance data collection and analysis. For detailed usage, refer to the "Profiler C++ API Documentation" section below.

API	get_profiler
Description	Gets the pointer to the profiler object
Parameters	void
Return Value	If the profiler is enabled, returns a Profiler object pointer; if not enabled, returns nullptr

cpp

// Enable profiler during initialization
int init_result = llm_interpreter_ptr->initialize(true);

// Get the profiler
aplux::aidgen::Profiler* profiler = llm_interpreter_ptr->get_profiler();

AidMLM C++ API Documentation

💡Note

Before developing with AidMLM-SDK C++, please be aware of the following basics:

During compilation, include the header file located at /usr/local/include/aidlux/aidgen/aidmlm.hpp
During linking, specify the library file located at /usr/local/lib/libaidgen.so
All interfaces are under the aplux::aidmlm namespace
AidMLM is designed for multimodal large model (vision-language model) inference, currently supporting Qwen2-VL and Qwen2.5-VL series models

Inference State.enum AidLLMState

During an AidMLM inference task, a single session may go through various stages. Developers can use these state codes to understand the current runtime status of the inference task.

Member Name	Type	Value	Description
STAND	enum class	0	Not yet working
START	enum class	1	Inference started
BUSYING	enum class	2	Inference in progress
FINISH	enum class	3	Inference finished
COMPLETE	enum class	4	Inference completed fully or truncated
WAITING	enum class	5	Current token decoding failed, waiting for next decode
ABORT	enum class	6	Current inference terminated early by developer
ERROR	enum class	7	Inference failed due to exception

Log Level.enum LogLevel

Member Name	Type	Value	Description
INFO	uint8_t	0	Message
WARNING	uint8_t	1	Warning
ERROR	uint8_t	2	Error
FATAL	uint8_t	3	Fatal error

Model Type.enum ModelType

Specifies the type of multimodal model currently in use.

Member Name	Type	Value	Description
RESERVED	enum class	0	Reserved type
QWEN2VL	enum class	1	Qwen2-VL model
QWEN25VL	enum class	2	Qwen2.5-VL model

Inference Callback Data Type.struct AidLLMCBData

During AidMLM inference tasks, developer-provided callback functions are used. This data type is the argument passed to that callback function.

Member List

Member	state
Type	enum AidLLMState
Default Value
Description	Status code of the current inference session

Member	text
Type	std::string
Default Value
Description	Result text of the inference task / message corresponding to special status codes

Inference Callback Function Type.AidLLMCB

The callback function type definition used during AidMLM inference.

cpp

using AidLLMCB = std::function<void(AidLLMCBData& cb_data, void* user_data)>;

Image Data Type.struct ImageData

A struct for passing image data to the multimodal model.

Member List

Member	img_pos
Type	int
Default Value	-1
Description	Position index of the image in the prompt. If -1, the image is appended at the end of the prompt

Member	img_data
Type	uint8_t*
Default Value	nullptr
Description	Image data pointer, pointing to RGB format image pixel data. Developers need to pre-resize to the model's required width and height

Initialization Parameter Type.struct AidmlmInitParam

Configuration parameters required for AidMLM initialization.

Member List

Member Name	Type	Default Value	Description
vision_model_path	std::string		Vision encoder model file path
pos_emb_cos_path	std::string		Position encoding cosine weight file path
pos_emb_sin_path	std::string		Position encoding sine weight file path
embedding_weights_path	std::string		Word embedding weights file path
window_attention_mask_path	std::string		Window attention mask file path (Qwen2.5-VL only)
full_attention_mask_path	std::string		Full attention mask file path (Qwen2.5-VL only)
llm_model_path_vec	std::vector<std::string>		LLM model file path list
dbg_opt	std::string		Debug options string
type	ModelType	ModelType::RESERVED	Multimodal model type
qwen2vl_cfg	Qwen2VLConfig		Qwen2-VL model configuration
qwen25vl_cfg	Qwen25VLConfig		Qwen2.5-VL model configuration
enable_profiler	bool	false	Whether to enable the profiler
genie_log_level	int	1	Genie backend log level (1=ERROR, 2=WARN, 3=INFO, 4=VERBOSE)
use_shared_buffer	bool	false	Whether to use shared buffer
use_mmap	bool	false	Whether to use memory-mapped model loading
use_genie_load_model_ex	bool	false	Whether to use Genie extended model loading

Model Configuration Structs

AidMLM provides predefined model configuration structs to specify vision model configurations for different resolutions and parameter scales. Developers can choose the corresponding configuration based on the model in use.

Config Struct Name	Model	Image Size	Embedding Dim
Qwen2VLConfig	Qwen2-VL	644×644	1536
Qwen25VLConfig	Qwen2.5-VL 3B	392×392	2048
Qwen25VL3B644Config	Qwen2.5-VL 3B	644×644	2048
Qwen25VL3B672Config	Qwen2.5-VL 3B	672×672	2048
Qwen25VL7B392Config	Qwen2.5-VL 7B	392×392	3584
Qwen25VL7B644Config	Qwen2.5-VL 7B	644×644	3584
Qwen25VL7B672Config	Qwen2.5-VL 7B	672×672	3584

Global Functions

Get Library Version.get_library_version()

API	get_library_version
Description	Gets the version information of the AidMLM library
Parameters	void
Return Value	A string containing the library version information

cpp

std::string version = aplux::aidmlm::get_library_version();
printf("Current aidmlm library version: %s\n", version.c_str());

Multimodal Inference Class.class Aidmlm

An object instance of type Aidmlm is the main executor of multimodal inference operations and is used to carry out vision-language model inference processes.

Construction and Destruction

Aidmlm objects are created via the default constructor.

cpp

aplux::aidmlm::Aidmlm mlm_ctx;

Set Log Level.set_log_level()

A static method that sets the minimum log output level for AidMLM.

API	set_log_level
Description	Sets the minimum log output level (static method)
Parameters	log_level: LogLevel enum value
Return Value	void

cpp

aplux::aidmlm::Aidmlm::set_log_level(aplux::aidmlm::LogLevel::INFO);

Set Log File Prefix.set_log_file_prefix()

A static method that sets the log file name prefix.

API	set_log_file_prefix
Description	Sets the log file name prefix (static method)
Parameters	log_file: Log file name prefix string
Return Value	void

cpp

aplux::aidmlm::Aidmlm::set_log_file_prefix("./test_mlm");

Initialization Operation.initialize()

Loads the multimodal model and initializes the inference environment.

API	initialize
Description	Loads the model and completes the initialization work required for inference
Parameters	param: AidmlmInitParam struct reference containing model paths, configurations, and other initialization parameters
Parameters	enable_profiler: Whether to enable the profiler, default value is false
Return Value	A value of 0 indicates successful initialization; otherwise a non-zero value indicates failure

cpp

aplux::aidmlm::AidmlmInitParam init_param;
init_param.type = aplux::aidmlm::ModelType::QWEN25VL;
init_param.vision_model_path = "/path/to/veg.serialized.bin.aidem";
init_param.pos_emb_cos_path = "/path/to/position_ids_cos.raw";
init_param.pos_emb_sin_path = "/path/to/position_ids_sin.raw";
init_param.embedding_weights_path = "/path/to/embedding_weights.raw";
init_param.window_attention_mask_path = "/path/to/window_attention_mask.raw";
init_param.full_attention_mask_path = "/path/to/full_attention_mask.raw";
init_param.llm_model_path_vec.push_back("/path/to/llm_model.serialized.bin.aidem");
init_param.use_genie_load_model_ex = true;

aplux::aidmlm::Aidmlm mlm_ctx;
if(mlm_ctx.initialize(init_param) < 0){
    printf("AidMLM initialize failed.\n");
    return EXIT_FAILURE;
}

Sampling Parameter Setup Operation.set_sampler()

After initialization completes successfully, sampling parameters can be set with this function to control the randomness, diversity, and quality of generated content.

API	set_sampler
Description	Sets sampling parameters to control the randomness and diversity of LLM outputs.
Parameters	key: Name of the sampling parameter. Currently supported: `"temp"`: Controls output randomness (Temperature); smaller values are more conservative. `"top-k"`: Limits the sampling range to the top K tokens with the highest probability. `"top-p"`: Nucleus Sampling; limits to the token pool whose cumulative probability reaches P.
Parameters	value: Parameter value represented as a string: For `"temp"`: floating-point numeric string (e.g. `"1.2"`). For `"top-k"`: integer numeric string (e.g. `"20"`). For `"top-p"`: floating-point numeric string (e.g. `"0.6"`).
Return Value	A value of 0 indicates success; a non-zero value indicates failure (e.g. invalid key or unsupported value format).

cpp

mlm_ctx.set_sampler("top-k", "20");
mlm_ctx.set_sampler("temp", "0.8");

Session Inference Operation.run()

After successful initialization, you can send image-text combined prompts to the multimodal model for inference.

💡Note

This function is not thread-safe. Only one thread can call the run method at a time.

API	run
Description	Executes one multimodal session inference
Parameters	prompt: User prompt string
	sys_prompt: System prompt string
	img_vec: Reference to an ImageData vector containing the image data to input
	cb: Callback function of type AidLLMCB for handling inference results
	starting_round: Whether this is the start of a new conversation round (true for new conversation start)
	user_data: Pointer to user data, convenient for using this data in custom callback functions, default value is nullptr
Return Value	A value of 0 indicates the inference executed successfully; otherwise a non-zero value indicates failure

cpp

// Define callback function
void my_callback(aplux::aidmlm::AidLLMCBData& cb_data, void* user_data){
    if(cb_data.state == aplux::aidmlm::AidLLMState::START){
        printf("[BOS]%s", cb_data.text.c_str());
    }else if(cb_data.state == aplux::aidmlm::AidLLMState::FINISH){
        printf("[EOS]%s\n", cb_data.text.c_str());
    }else if(cb_data.state == aplux::aidmlm::AidLLMState::ERROR){
        printf("[ERROR]%s\n", cb_data.text.c_str());
    }else{
        printf("%s", cb_data.text.c_str());
    }
}

// Prepare image data (pre-resize to model-required dimensions, RGB format)
cv::Mat img = cv::imread("test.jpg");
cv::Mat img_rgb;
cv::cvtColor(img, img_rgb, cv::COLOR_BGR2RGB);
cv::Mat img_resized;
cv::resize(img_rgb, img_resized, cv::Size(392, 392));

aplux::aidmlm::ImageData img_data = {
    .img_pos = -1,
    .img_data = (uint8_t*)img_resized.data,
};
std::vector<aplux::aidmlm::ImageData> img_vec;
img_vec.push_back(img_data);

// Execute inference
std::string sys_prompt = "You are a helpful assistant.";
std::string user_prompt = "Please describe the scene in this image";
int run_result = mlm_ctx.run(user_prompt, sys_prompt, img_vec, my_callback, true);
if(run_result < 0){
    printf("AidMLM run failed.\n");
    return EXIT_FAILURE;
}

Session Termination Operation.abort()

Used to interrupt the currently running inference session.

API	abort
Description	Terminates the currently running inference session
Parameters	reserve: Reserved field, default value is nullptr
Return Value	A value of 0 indicates successful termination; otherwise a non-zero value indicates failure

Reset Operation.reset()

In multi-round conversation scenarios, when you need to process the next image or restart a conversation, call reset to clear internal state.

API	reset
Description	Resets the internal state of the inference engine to prepare for the next inference
Parameters	void
Return Value	A value of 0 indicates successful reset; otherwise a non-zero value indicates failure

cpp

// Reset after processing one image, prepare for the next
if(mlm_ctx.reset() < 0){
    printf("AidMLM reset failed.\n");
    return EXIT_FAILURE;
}

Final Release Operation.finalize()

Releases model resources and completes de-initialization.

API	finalize
Description	Releases model resources and completes necessary de-initialization operations
Parameters	void
Return Value	A value of 0 indicates successful release; otherwise a non-zero value indicates failure

cpp

if(mlm_ctx.finalize() < 0){
    printf("AidMLM finalize failed.\n");
    return EXIT_FAILURE;
}

Get Profiler.get_profiler()

When the profiler is enabled during initialization, this function can be used to obtain the Profiler object pointer. For detailed usage, refer to the "Profiler C++ API Documentation" section below.

API	get_profiler
Description	Gets the pointer to the profiler object
Parameters	void
Return Value	If the profiler is enabled, returns a Profiler object pointer; if not enabled, returns nullptr

cpp

// Enable profiler
aplux::aidmlm::AidmlmInitParam init_param;
init_param.enable_profiler = true;
// ... other parameter setup ...
mlm_ctx.initialize(init_param, true);

// Get performance data after inference
aplux::aidgen::Profiler* profiler = mlm_ctx.get_profiler();
aplux::aidgen::ProfileData data = profiler->get_data();
printf("Init time: %lu us\n", data.init_time_us);
printf("Time to first token: %lu us\n", data.time_to_first_token_us);
printf("Generate rate: %.2f tok/s\n", data.generate_rate);
printf("ViT execute time: %lu us\n", data.vit_execute_time_us);

Profiler C++ API Documentation

💡Note

Profiler-related interfaces are under the aplux::aidgen namespace (independent of aplux::aidllm and aplux::aidmlm).

Header file path /usr/local/include/aidlux/aidgen/profiler.hpp
Both AidLLM and AidMLM use their respective get_profiler() methods to obtain a Profiler object for performance analysis

Performance Data Type.struct ProfileData

During inference, developers may want to monitor performance metrics at each stage. The ProfileData struct stores performance data collected during the inference process.

Member List

Member Name	Type	Description
init_time_us	uint64_t	Initialization time (microseconds)
prompt_token_num	uint64_t	Number of input prompt tokens
prompt_processing_rate	float	Prompt processing rate (tok/s)
time_to_first_token_us	uint64_t	Time to first token (microseconds)
generated_token_num	uint64_t	Number of generated tokens
generate_rate	float	Token generation rate (tok/s)
generate_time_us	uint64_t	Total generation time (microseconds)
vit_execute_time_us	uint64_t	Vision model execution time (microseconds), AidMLM only
vit_init_time_us	uint64_t	Vision model initialization time (microseconds), AidMLM only
vit_preprocess_time_us	uint64_t	Vision model preprocessing time (microseconds), AidMLM only
vit_postprocess_time_us	uint64_t	Vision model postprocessing time (microseconds), AidMLM only

Profiler Class.class Profiler

The Profiler class manages performance data collection during inference. It must be enabled during initialization (enable_profiler = true) to be used.

Get Performance Data.get_data()

Gets the currently collected performance data.

API	get_data
Description	Gets the performance analysis data collected during inference
Parameters	void
Return Value	ProfileData struct containing performance metrics for each stage

Reset Performance Data.reset()

Resets the collected performance data, typically called before starting a new inference round.

API	reset
Description	Clears collected performance data and restores to initial state
Parameters	void
Return Value	void

cpp

// Enable profiler during initialization
int init_result = llm_interpreter_ptr->initialize(true);

// Get the profiler
aplux::aidgen::Profiler* profiler = llm_interpreter_ptr->get_profiler();

// Execute inference...
llm_interpreter_ptr->run(prompt, dialog_callback);

// Get performance data
aplux::aidgen::ProfileData data = profiler->get_data();
printf("Time to first token: %lu us\n", data.time_to_first_token_us);
printf("Generate rate: %.2f tok/s\n", data.generate_rate);
printf("Generated token count: %lu\n", data.generated_token_num);

// Reset data, prepare for next inference round
profiler->reset();

AidGen C++ API ​

AidLLM C++ API Documentation ​

Inference Backend Type.enum LLmBackendType ​

Inference Task State.enum LLMSentenceState ​

Interpreter Runtime State.enum LLMState ​

Log Level.enum LogLevel ​

Global Functions ​

Get Library Version.get_library_version() ​

Set Log Level.set_log_level() ​

Set Log File Prefix.set_log_file_prefix() ​

Inference Callback Function Type.LLMCallback ​

Inference Callback Data Type.struct LLMCallbackData ​

Member List ​

Runtime Context Class.class LLMContext ​

Create Instance Object.create_instance() ​

Member List ​

Interpreter Class.class LLMInterpreter ​

Create Instance Object.create_instance() ​

Initialization Operation.initialize() ​

Sampling Parameter Setup Operation.set_sampler() ​

Session Inference Operation.run() ​

Query Inference State.state() ​

Session Termination Operation.abort() ​

Final Release Operation.finalize() ​

Get Profiler.get_profiler() ​

AidMLM C++ API Documentation ​

Inference State.enum AidLLMState ​

Log Level.enum LogLevel ​

Model Type.enum ModelType ​

Inference Callback Data Type.struct AidLLMCBData ​

Member List ​

Inference Callback Function Type.AidLLMCB ​

Image Data Type.struct ImageData ​

Member List ​

Initialization Parameter Type.struct AidmlmInitParam ​

Member List ​

Model Configuration Structs ​

Global Functions ​

Get Library Version.get_library_version() ​

Multimodal Inference Class.class Aidmlm ​

Construction and Destruction ​

Set Log Level.set_log_level() ​

Set Log File Prefix.set_log_file_prefix() ​

Initialization Operation.initialize() ​

Sampling Parameter Setup Operation.set_sampler() ​

Session Inference Operation.run() ​

Session Termination Operation.abort() ​

Reset Operation.reset() ​

Final Release Operation.finalize() ​

Get Profiler.get_profiler() ​

Profiler C++ API Documentation ​

Performance Data Type.struct ProfileData ​

Member List ​

Profiler Class.class Profiler ​

Get Performance Data.get_data() ​

Reset Performance Data.reset() ​

AidGen C++ API

AidLLM C++ API Documentation

Inference Backend Type.enum LLmBackendType

Inference Task State.enum LLMSentenceState

Interpreter Runtime State.enum LLMState

Log Level.enum LogLevel

Global Functions

Get Library Version.get_library_version()

Set Log Level.set_log_level()

Set Log File Prefix.set_log_file_prefix()

Inference Callback Function Type.LLMCallback

Inference Callback Data Type.struct LLMCallbackData

Member List

Runtime Context Class.class LLMContext

Create Instance Object.create_instance()

Member List

Interpreter Class.class LLMInterpreter

Create Instance Object.create_instance()

Initialization Operation.initialize()

Sampling Parameter Setup Operation.set_sampler()

Session Inference Operation.run()

Query Inference State.state()

Session Termination Operation.abort()

Final Release Operation.finalize()

Get Profiler.get_profiler()

AidMLM C++ API Documentation

Inference State.enum AidLLMState

Log Level.enum LogLevel

Model Type.enum ModelType

Inference Callback Data Type.struct AidLLMCBData

Member List

Inference Callback Function Type.AidLLMCB

Image Data Type.struct ImageData

Member List

Initialization Parameter Type.struct AidmlmInitParam

Member List

Model Configuration Structs

Global Functions

Get Library Version.get_library_version()

Multimodal Inference Class.class Aidmlm

Construction and Destruction

Set Log Level.set_log_level()

Set Log File Prefix.set_log_file_prefix()

Initialization Operation.initialize()

Sampling Parameter Setup Operation.set_sampler()

Session Inference Operation.run()

Session Termination Operation.abort()

Reset Operation.reset()

Final Release Operation.finalize()

Get Profiler.get_profiler()

Profiler C++ API Documentation

Performance Data Type.struct ProfileData

Member List

Profiler Class.class Profiler

Get Performance Data.get_data()

Reset Performance Data.reset()