Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiler result is not consistent with each run #794

Open
RavikumarLav opened this issue Dec 17, 2024 · 2 comments
Open

Profiler result is not consistent with each run #794

RavikumarLav opened this issue Dec 17, 2024 · 2 comments

Comments

@RavikumarLav
Copy link

RavikumarLav commented Dec 17, 2024

Hello,

i am using below code part to capture the runtime for model inference

// Import the TensorFlow model. Note: use CreateNetworkFromBinaryFile for .pb files.
armnnTfLiteParser::ITfLiteParserPtr parser = armnnTfLiteParser::ITfLiteParser::Create();

armnn::INetworkPtr network = parser->CreateNetworkFromBinaryFile("model_latest.tflite");

// Find the binding points for the input and output nodes  
armnnTfLiteParser::BindingPointInfo inputBindingInfo = parser->GetNetworkInputBindingInfo(0, "conv2d_input");
armnnTfLiteParser::BindingPointInfo outputBindingInfo = parser->GetNetworkOutputBindingInfo(0, "Identity");

// Create ArmNN runtime
armnn::IRuntime::CreationOptions options; // default options
armnn::IRuntimePtr runtime = armnn::IRuntime::Create(options);

armnn::Compute device= armnn::Compute::CpuAcc; 
//armnn::Compute device= armnn::Compute::CpuRef;
armnn::IOptimizedNetworkPtr optNet = Optimize(*network, {device}, runtime->GetDeviceSpec()); 
// Load the optimized network onto the runtime device
armnn::NetworkId networkIdentifier;
runtime->LoadNetwork(networkIdentifier, std::move(optNet));

// Create a profiler and register it for the current thread.
std::shared_ptrarmnn::IProfiler profiler = runtime->GetProfiler(networkIdentifier);
profiler->EnableProfiling(true);

// Enable profiling.
profiler->EnableProfiling(true);

// Run Inference
armnn::InputTensors inputTensor = MakeInputTensors(inputBindingInfo, &input[0]);
armnn::OutputTensors outputTensor = MakeOutputTensors(outputBindingInfo, &output[0]);
armnn::Status ret = runtime->EnqueueWorkload(networkIdentifier, inputTensor, outputTensor);

// Print output
profiler->Print(std::cout);

able to see json format of each layer profiler result .

Problem: Running .tflite model on arm a78 core with CpuAcc as the option the runtime is different for each run of same model.

for one of model it is varying from 0.8 to 1.2ms

Need to know how runtime is calculating using system clock or by using arm registers

@Colm-in-Arm
Copy link
Collaborator

Hello.

To answer your last question, event timing is using system clock time. Have a look at armnn/src/armnn/WallClockTimer.hpp for details on how its done. Ultimately, it goes back to clock_gettime in /usr/include/time.h.

In general if you are looking at overall inference execution times you should run multiple inferences and watch the trend. Depending on the type of model there may be some operations executed on the first inference that will be cached for subsequent inferences of a loaded model.

Colm.

@RavikumarLav
Copy link
Author

RavikumarLav commented Jan 17, 2025

Hello,

In general if you are looking at overall inference execution times you should run multiple inferences and watch the trend. Depending on the type of model there may be some operations executed on the first inference that will be cached for subsequent inferences of a loaded model.
I am loading single model and there is no dependency with previous model or this model data not required by any subsequent model

To elaborate more there are 2 problems i am facing with respect to profiling.

  1. Running above code part for single model sometimes it will give 0.8 and if i run same code again it will take 1.4ms
    with same executable

  2. In this case the profiler result will be for the first loop around 0.8ms and for the rest it will be around 0.3ms for each loop
    Note: here model is same and loaded once and i/p tried with same and different but same behaviour
    for (int i=0;i<10;i++)
    {
    armnn::InputTensors inputTensor = MakeInputTensors(inputBindingInfo, &input[i]);
    armnn::OutputTensors outputTensor = MakeOutputTensors(outputBindingInfo, &output[i]);

// Enable profiling.
profiler->EnableProfiling(true);

    // Run Inference        
    armnn::Status ret = runtime->EnqueueWorkload(networkIdentifier, inputTensor, outputTensor);
    
    // Print output
    profiler->Print(std::cout);

}
why at the first time the runtime is high

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants