Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype JSON serializer for CallProfiler #323

Merged
merged 17 commits into from
Jun 5, 2024

Conversation

ThreeMonth03
Copy link
Collaborator

Following the issue : #302
This PR includes :

  1. Serialize the call profiler.
  2. Deserialize the call profiler.

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implement the serialization of the CallProfiler, except for CallProfiler.m_cancel_callbacks;.
I will implement the deserialization a few days later.
I have a question, should I serialize CallProfiler.m_cancel_callbacks;?
@yungyuc @tigercosmos , do you have any idea?

Comment on lines 215 to 239
TEST_F(CallProfilerTest, test_serialization)
{
pProfiler->reset();

foo1();

// Example:
// {
// "radix_tree":
// {
// "nodes":[
// {"key":-1,"name":"","data":{"start_time": "0","caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children":[0]},
// {"key":0,"name":"void modmesh::detail::foo1()","data":{"start_time": "17745276708555250","caller_name": "void modmesh::detail::foo1()","total_time": 61002916,"call_count": 1,"is_running": 1},"children":[1]},
// {"key":1,"name":"void modmesh::detail::foo2()","data":{"start_time": "17745276708555458","caller_name": "void modmesh::detail::foo2()","total_time": 54002250,"call_count": 1,"is_running": 1},"children":[2]},
// {"key":2,"name":"void modmesh::detail::foo3()","data":{"start_time": "17745276743555833","caller_name": "void modmesh::detail::foo3()","total_time": 19001833,"call_count": 1,"is_running": 1},"children":[]}
// ],
// "current_node":-1,
// "unique_id":3
// }
// }

std::stringstream ss;
pProfiler->serialize(ss);
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the simple example of the serialization of the CallProfiler.

Copy link
Collaborator

@tigercosmos tigercosmos Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only ran the function, but didn't verify the output.

here is what you can do:

  1. have a answer json file
  2. run the function and see if the output is equal to the answer.
  3. optional, verify the result matching the json schema (see more here)

Comment on lines 195 to 204
void serialize(std::ostream & outstream) const
{
outstream << R"({)";
outstream << R"("start_time": )" << start_time.time_since_epoch().count() << R"(,)";
outstream << R"("caller_name": ")" << caller_name << R"(",)";
outstream << R"("total_time": )" << total_time.count() << R"(,)";
outstream << R"("call_count": )" << call_count << R"(,)";
outstream << R"("is_running": )" << is_running;
outstream << R"(})";
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serialization of the CallerProfile.

Comment on lines 75 to 127
void CallProfiler::serialize(std::ostream & outstream) const
{
outstream << R"({)";
outstream << R"("radix_tree": )";
serialize_radix_tree(outstream);
outstream << R"(})";
}

void CallProfiler::serialize_radix_tree(std::ostream & outstream) const
{
outstream << R"({)";
outstream << R"("nodes": [)";
serialize_radix_tree_nodes(*(m_radix_tree.get_current_node()), true, outstream);
outstream << R"(],)";
outstream << R"("current_node": )" << m_radix_tree.get_current_node()->key() << R"(,)";
outstream << R"("unique_id": )" << m_radix_tree.get_unique_node();
outstream << R"(})";
}

// NOLINTNEXTLINE(misc-no-recursion)
void CallProfiler::serialize_radix_tree_nodes(const RadixTreeNode<CallerProfile> & node, bool is_first_node, std::ostream & outstream) const
{
// Avoid the trailing comma.
if (!is_first_node)
{
outstream << R"(,)";
}
outstream << R"({)";
outstream << R"("key": )" << node.key() << R"(,)";
outstream << R"("name": ")" << node.name() << R"(",)";
outstream << R"("data": )";
node.data().serialize(outstream);
outstream << R"(,)";
outstream << R"("children": [)";
bool is_first_child = true;
for (const auto & child : node.children())
{
// Avoid the trailing comma.
if (!is_first_child)
{
outstream << R"(,)";
}
is_first_child = false;
outstream << child->key();
}
outstream << R"(])";
outstream << R"(})";
for (const auto & child : node.children())
{
// NOLINTNEXTLINE(misc-no-recursion)
serialize_radix_tree_nodes(*child, false, outstream);
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to avoid the multiprogramming, I implement these three functions under the CallProfiler.

@tigercosmos
Copy link
Collaborator

I implement the serialization of the CallProfiler, except for CallProfiler.m_cancel_callbacks;. I will implement the deserialization a few days later. I have a question, should I serialize CallProfiler.m_cancel_callbacks;? @yungyuc @tigercosmos , do you have any idea?

@ThreeMonth03 you don't need to handle it.


void CallProfiler::serialize_radix_tree(std::ostream & outstream) const
{
outstream << R"({)";
Copy link
Collaborator

@tigercosmos tigercosmos Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should not do the JSON serialization yourself. Consider using a library, such as RapidJson, to help you wrap the object into JSON format.

Here is what you can do:

  1. define a json schema
  2. implement the data structure in RadpidJson (or any other library)
  3. use the library to dump the json format

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should not do the JSON serialization yourself. Consider using a library, such as RapidJson, to help you wrap the object into JSON format.

Here is what you can do:

  1. define a json schema
  2. implement the data structure in RadpidJson (or any other library)
  3. use the library to dump the json format

Does it mean that I need to install RapidJson library in .github/workflows ?

Copy link
Collaborator

@tigercosmos tigercosmos Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to install RadpidJson in CMake, so you can use it in cpp files. (of course, also update the CI files)
You can do a study first before opting in RapidJson, perhaps there is a more suitable library for us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @ThreeMonth03 's current approach.

Developing the JSON writing code provides more freedom than using a thirdparty. A thirdparty introduces additional dependency and increases the complexity of the building system. JSON is easy, and writing a parser should be fun.

I do not see a good reason to use a thirdparty JSON library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussion in Discord, I don't think we need a library for JSON, but if we need one, the current implementation is not general, we should have a JSON parser class instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the discussion in Discord, I don't think we need a library for JSON, but if we need one, the current implementation is not general, we should have a JSON parser class instead.

Does it mean that I should survey and implement a general json parser class?

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add fixture (as golden) for the serialization results
  • Keep serialization code outside the object to be serialized
  • Avoid recursion from the implementation of the serializer

When you are ready for the next review, please leave a global comment stating so (write down something like "I am ready for the next review").

I think you do not need to handle cancel() and the callbacks at the moment.


foo1();

// Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use a fixture (which can be just a string in this test file) to compare with the serialization results?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example comment block have the same content as your fixture string? If yes, this comment block should be removed. And as mentioned above, the fixture strings should use good indentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.
I will remove this comment block.

@@ -72,6 +72,60 @@ void CallProfiler::print_profiling_result(const RadixTreeNode<CallerProfile> & n
}
}

void CallProfiler::serialize(std::ostream & outstream) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to keep the serialization code outside the object to be serialized. Perhaps try to make a CallProfilerSerializer. What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me.
It can make CallProfiler compact.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThreeMonth03 I think it's better to make the json parser as a separate class, so later other classes can utilize the json parser as well. Let me know if you would like to implement the generic json parser. I can help implement the parser if you don't have time.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ThreeMonth03 I think it's better to make the json parser as a separate class, so later other classes can utilize the json parser as well. Let me know if you would like to implement the generic json parser. I can help implement the parser if you don't have time.

I think I'm unable to implement the generic json parser.
@tigercosmos Could you give me a hand?

}

// NOLINTNEXTLINE(misc-no-recursion)
void CallProfiler::serialize_radix_tree_nodes(const RadixTreeNode<CallerProfile> & node, bool is_first_node, std::ostream & outstream) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more readable to use some comment showing examples of the serialized results.

outstream << R"(})";
}

// NOLINTNEXTLINE(misc-no-recursion)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid recursion from the implementation

@yungyuc yungyuc added the performance Profiling, runtime, and memory consumption label Apr 30, 2024
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add fixture (as golden) for the serialization results
  • Keep serialization code outside the object to be serialized
  • Avoid recursion from the implementation of the serializer

I am ready for the next review.
I fix the above problem in this commit.

@@ -310,6 +341,23 @@ class CallProfilerProbe
CallProfiler & m_profiler;
}; /* end struct CallProfilerProbe */

class CallProfilerSerializer
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create CallProfilerSerializer and move the serialization function in to the CallProfilerSerializer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add some comment for the class CallProfilerSerializer. It may be good to point to the google test file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether the comment is enough to help reading.

Comment on lines 215 to 294
#ifdef _MSC_VER
std::string serializeExample = R"x({"radix_tree": {"nodes": [{"key": -1,"name": "","data": {"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children": [0]},{"key": 0,"name": "void __cdecl modmesh::detail::foo1(void)","data": {"start_time": 18348582416327166,"caller_name": "void __cdecl modmesh::detail::foo1(void)","total_time": 61001042,"call_count": 1,"is_running": 1},"children": [1]},{"key": 1,"name": "void __cdecl modmesh::detail::foo2(void)","data": {"start_time": 18348582416327416,"caller_name": "void __cdecl modmesh::detail::foo2(void)","total_time": 54000667,"call_count": 1,"is_running": 1},"children": [2]},{"key": 2,"name": "void __cdecl modmesh::detail::foo3(void)","data": {"start_time": 18348582451327708,"caller_name": "void __cdecl modmesh::detail::foo3(void)","total_time": 19000167,"call_count": 1,"is_running": 1},"children": []}],"current_node": -1,"unique_id": 3}})x";
#else
std::string serializeExample = R"x({"radix_tree": {"nodes": [{"key": -1,"name": "","data": {"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children": [0]},{"key": 0,"name": "void modmesh::detail::foo1()","data": {"start_time": 18348582416327166,"caller_name": "void modmesh::detail::foo1()","total_time": 61001042,"call_count": 1,"is_running": 1},"children": [1]},{"key": 1,"name": "void modmesh::detail::foo2()","data": {"start_time": 18348582416327416,"caller_name": "void modmesh::detail::foo2()","total_time": 54000667,"call_count": 1,"is_running": 1},"children": [2]},{"key": 2,"name": "void modmesh::detail::foo3()","data": {"start_time": 18348582451327708,"caller_name": "void modmesh::detail::foo3()","total_time": 19000167,"call_count": 1,"is_running": 1},"children": []}],"current_node": -1,"unique_id": 3}})x";
#endif

std::string start_time_str = R"("start_time": )";
std::string total_time_str = R"("total_time": )";
std::string caller_name_str = R"(,"caller_name": )";
std::string call_count_str = R"(,"call_count": )";

TEST_F(CallProfilerTest, test_serialization)
{
pProfiler->reset();

foo1();

// Example:
// {
// "radix_tree":
// {
// "nodes":[
// {"key":-1,"name":"","data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children":[0]},
// {"key":0,"name":"void modmesh::detail::foo1()","data":{"start_time": 17745276708555250,"caller_name": "void modmesh::detail::foo1()","total_time": 61002916,"call_count": 1,"is_running": 1},"children":[1]},
// {"key":1,"name":"void modmesh::detail::foo2()","data":{"start_time": 17745276708555458,"caller_name": "void modmesh::detail::foo2()","total_time": 54002250,"call_count": 1,"is_running": 1},"children":[2]},
// {"key":2,"name":"void modmesh::detail::foo3()","data":{"start_time": 17745276743555833,"caller_name": "void modmesh::detail::foo3()","total_time": 19001833,"call_count": 1,"is_running": 1},"children":[]}
// ],
// "current_node":-1,
// "unique_id":3
// }
// }

std::stringstream ss;
CallProfilerSerializer::serialize(*pProfiler, ss);

int ss_start_time_pos0 = ss.str().find(start_time_str);
int ss_start_time_pos1 = ss.str().find(start_time_str, ss_start_time_pos0 + 1);
int ss_start_time_pos2 = ss.str().find(start_time_str, ss_start_time_pos1 + 1);
int ss_start_time_pos3 = ss.str().find(start_time_str, ss_start_time_pos2 + 1);
int ss_total_time_pos0 = ss.str().find(total_time_str);
int ss_total_time_pos1 = ss.str().find(total_time_str, ss_total_time_pos0 + 1);
int ss_total_time_pos2 = ss.str().find(total_time_str, ss_total_time_pos1 + 1);
int ss_total_time_pos3 = ss.str().find(total_time_str, ss_total_time_pos2 + 1);
int ss_caller_name_pos0 = ss.str().find(caller_name_str);
int ss_caller_name_pos1 = ss.str().find(caller_name_str, ss_caller_name_pos0 + 1);
int ss_caller_name_pos2 = ss.str().find(caller_name_str, ss_caller_name_pos1 + 1);
int ss_caller_name_pos3 = ss.str().find(caller_name_str, ss_caller_name_pos2 + 1);
int ss_call_count_pos0 = ss.str().find(call_count_str);
int ss_call_count_pos1 = ss.str().find(call_count_str, ss_call_count_pos0 + 1);
int ss_call_count_pos2 = ss.str().find(call_count_str, ss_call_count_pos1 + 1);
int ss_call_count_pos3 = ss.str().find(call_count_str, ss_call_count_pos2 + 1);

int example_start_time_pos0 = serializeExample.find(start_time_str);
int example_start_time_pos1 = serializeExample.find(start_time_str, example_start_time_pos0 + 1);
int example_start_time_pos2 = serializeExample.find(start_time_str, example_start_time_pos1 + 1);
int example_start_time_pos3 = serializeExample.find(start_time_str, example_start_time_pos2 + 1);
int example_total_time_pos0 = serializeExample.find(total_time_str);
int example_total_time_pos1 = serializeExample.find(total_time_str, example_total_time_pos0 + 1);
int example_total_time_pos2 = serializeExample.find(total_time_str, example_total_time_pos1 + 1);
int example_total_time_pos3 = serializeExample.find(total_time_str, example_total_time_pos2 + 1);
int example_caller_name_pos0 = serializeExample.find(caller_name_str);
int example_caller_name_pos1 = serializeExample.find(caller_name_str, example_caller_name_pos0 + 1);
int example_caller_name_pos2 = serializeExample.find(caller_name_str, example_caller_name_pos1 + 1);
int example_caller_name_pos3 = serializeExample.find(caller_name_str, example_caller_name_pos2 + 1);
int example_call_count_pos0 = serializeExample.find(call_count_str);
int example_call_count_pos1 = serializeExample.find(call_count_str, example_call_count_pos0 + 1);
int example_call_count_pos2 = serializeExample.find(call_count_str, example_call_count_pos1 + 1);
int example_call_count_pos3 = serializeExample.find(call_count_str, example_call_count_pos2 + 1);

// Validate the serialization result except for the start_time and total_time
EXPECT_EQ(ss.str().substr(0, ss_start_time_pos0 + start_time_str.size()), serializeExample.substr(0, example_start_time_pos0 + start_time_str.size()));
EXPECT_EQ(ss.str().substr(ss_caller_name_pos0, ss_total_time_pos0 + total_time_str.size() - ss_caller_name_pos0), serializeExample.substr(example_caller_name_pos0, example_total_time_pos0 + total_time_str.size() - example_caller_name_pos0));
EXPECT_EQ(ss.str().substr(ss_call_count_pos0, ss_start_time_pos1 - ss_call_count_pos0), serializeExample.substr(example_call_count_pos0, example_start_time_pos1 - example_call_count_pos0));
EXPECT_EQ(ss.str().substr(ss_caller_name_pos1, ss_total_time_pos1 + total_time_str.size() - ss_caller_name_pos1), serializeExample.substr(example_caller_name_pos1, example_total_time_pos1 + total_time_str.size() - example_caller_name_pos1));
EXPECT_EQ(ss.str().substr(ss_call_count_pos1, ss_start_time_pos2 - ss_call_count_pos1), serializeExample.substr(example_call_count_pos1, example_start_time_pos2 - example_call_count_pos1));
EXPECT_EQ(ss.str().substr(ss_caller_name_pos2, ss_total_time_pos2 + total_time_str.size() - ss_caller_name_pos2), serializeExample.substr(example_caller_name_pos2, example_total_time_pos2 + total_time_str.size() - example_caller_name_pos2));
EXPECT_EQ(ss.str().substr(ss_call_count_pos2, ss_start_time_pos3 - ss_call_count_pos2), serializeExample.substr(example_call_count_pos2, example_start_time_pos3 - example_call_count_pos2));
EXPECT_EQ(ss.str().substr(ss_caller_name_pos3, ss_total_time_pos3 + total_time_str.size() - ss_caller_name_pos3), serializeExample.substr(example_caller_name_pos3, example_total_time_pos3 + total_time_str.size() - example_caller_name_pos3));
EXPECT_EQ(ss.str().substr(ss_call_count_pos3), serializeExample.substr(example_call_count_pos3));
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the test of the serialization function.
I check the output except for the start time and the total time of the functions.

Comment on lines 75 to 201
// "nodes":[
// {"key":-1,"name":"","data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children":[0]},
// {"key":0,"name":"void modmesh::detail::foo1()","data":{"start_time": 17745276708555250,"caller_name": "void modmesh::detail::foo1()","total_time": 61002916,"call_count": 1,"is_running": 1},"children":[1]},
// {"key":1,"name":"void modmesh::detail::foo2()","data":{"start_time": 17745276708555458,"caller_name": "void modmesh::detail::foo2()","total_time": 54002250,"call_count": 1,"is_running": 1},"children":[2]},
// {"key":2,"name":"void modmesh::detail::foo3()","data":{"start_time": 17745276743555833,"caller_name": "void modmesh::detail::foo3()","total_time": 19001833,"call_count": 1,"is_running": 1},"children":[]}
// ],
// "current_node":-1,
// "unique_id":3
// }

outstream << R"({)";
outstream << R"("nodes": [)";
CallProfilerSerializer::serialize_radix_tree_nodes(profiler.radix_tree().get_current_node(), outstream);
outstream << R"(],)";
outstream << R"("current_node": )" << profiler.radix_tree().get_current_node()->key() << R"(,)";
outstream << R"("unique_id": )" << profiler.radix_tree().get_unique_node();
outstream << R"(})";
}
void CallProfilerSerializer::serialize_radix_tree_nodes(const RadixTreeNode<CallerProfile> * node, std::ostream & outstream)
{
std::queue<const RadixTreeNode<CallerProfile> *> nodes_buffer;
nodes_buffer.push(node);
bool is_first_node = true;

// Serialize the nodes in a breadth-first manner.
while (!nodes_buffer.empty())
{
const int nodes_buffer_size = nodes_buffer.size();
for (int i = 0; i < nodes_buffer_size; i++)
{
const RadixTreeNode<CallerProfile> * current_node = nodes_buffer.front();
nodes_buffer.pop();
CallProfilerSerializer::serialize_radix_tree_node(*current_node, is_first_node, outstream);
is_first_node = false;
for (const auto & child : current_node->children())
{
nodes_buffer.push(child.get());
}
}
}
}

void CallProfilerSerializer::serialize_radix_tree_node(const RadixTreeNode<CallerProfile> & node, bool is_first_node, std::ostream & outstream)
{
// Example:
// {
// "key":-1,
// "name":"",
// "data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},
// "children":[0]
// }

// Avoid the trailing comma.
if (!is_first_node)
{
outstream << R"(,)";
}
outstream << R"({)";
outstream << R"("key": )" << node.key() << R"(,)";
outstream << R"("name": ")" << node.name() << R"(",)";
outstream << R"("data": )";
CallProfilerSerializer::serialize_caller_profile(node.data(), outstream);
outstream << R"(,)";
outstream << R"("children": [)";
bool is_first_child = true;
for (const auto & child : node.children())
{
// Avoid the trailing comma.
if (!is_first_child)
{
outstream << R"(,)";
}
is_first_child = false;
outstream << child->key();
}
outstream << R"(])";
outstream << R"(})";
}

void CallProfilerSerializer::serialize_caller_profile(const CallerProfile & profile, std::ostream & outstream)
{
// Example:
// {
// "start_time": 0,
// "caller_name": "",
// "total_time": 0,
// "call_count": 0,
// "is_running": 0
// }

outstream << R"({)";
outstream << R"("start_time": )" << profile.start_time.time_since_epoch().count() << R"(,)";
outstream << R"("caller_name": ")" << profile.caller_name << R"(",)";
outstream << R"("total_time": )" << profile.total_time.count() << R"(,)";
outstream << R"("call_count": )" << profile.call_count << R"(,)";
outstream << R"("is_running": )" << profile.is_running;
outstream << R"(})";
}

} /* end namespace modmesh */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add the comment about the serialization format.
Furthermore, I traverse the radix tree by bfs with a queue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to use a loop to traverse. Is there a performance concern?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. It seems to be fine.
The argument are constant referenced.
And there is not any std::flush that may affect the performance.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fixes. Now we can fine-tune the implementation.

  • Remove duplicated and/or lengthy code or data examples.
  • Avoid overly long lines.
  • (Style) blank line between functions.
  • Take care of looper incrementation semantics (++i).
  • Make example comments in serialize_radix_tree_node() and serialize_caller_profile() clearer.
  • Beautify the fixture strings.
  • Make the serializer testable.

@@ -72,6 +72,132 @@ void CallProfiler::print_profiling_result(const RadixTreeNode<CallerProfile> & n
}
}

void CallProfilerSerializer::serialize_call_profiler(const CallProfiler & profiler, std::ostream & outstream)
{
// Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to keep examples everywhere. The example in this block of comments seems to duplicate with that in google test. Please keep the code here clean by removing the unnecessary comments.

And marking only "Example" in this block does not help a maintainer to read. The only common information between the code and comment in the function is "radix_tree". It does not help maintainability.

If anything in the example comments here should be kept, please make it concise. And try to use doxygen format if it fits.


void CallProfilerSerializer::serialize_radix_tree(const CallProfiler & profiler, std::ostream & outstream)
{
// Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also looks like unnecessary and unclear comments.

outstream << R"("unique_id": )" << profiler.radix_tree().get_unique_node();
outstream << R"(})";
}
void CallProfilerSerializer::serialize_radix_tree_nodes(const RadixTreeNode<CallerProfile> * node, std::ostream & outstream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep a blank line between functions.

Is it possible to make clang-format capture this? (Apparently we haven't configured it this way, yet.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I use the clang-format tools, and it doesn't work.

nodes_buffer.push(node);
bool is_first_node = true;

// Serialize the nodes in a breadth-first manner.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment. It's informative and helps read the code.

while (!nodes_buffer.empty())
{
const int nodes_buffer_size = nodes_buffer.size();
for (int i = 0; i < nodes_buffer_size; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ++i for clearer semantics. Both ++i and i++ generates the same instructions in this construct, but the semantics differ. We want the former, not the latter.

If i is not a fundamental types, i++ could have unwanted side effects.

@@ -212,5 +212,86 @@ TEST_F(CallProfilerTest, cancel)
EXPECT_EQ(radix_tree().get_unique_node(), 0);
}

#ifdef _MSC_VER
std::string serializeExample = R"x({"radix_tree": {"nodes": [{"key": -1,"name": "","data": {"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children": [0]},{"key": 0,"name": "void __cdecl modmesh::detail::foo1(void)","data": {"start_time": 18348582416327166,"caller_name": "void __cdecl modmesh::detail::foo1(void)","total_time": 61001042,"call_count": 1,"is_running": 1},"children": [1]},{"key": 1,"name": "void __cdecl modmesh::detail::foo2(void)","data": {"start_time": 18348582416327416,"caller_name": "void __cdecl modmesh::detail::foo2(void)","total_time": 54000667,"call_count": 1,"is_running": 1},"children": [2]},{"key": 2,"name": "void __cdecl modmesh::detail::foo3(void)","data": {"start_time": 18348582451327708,"caller_name": "void __cdecl modmesh::detail::foo3(void)","total_time": 19000167,"call_count": 1,"is_running": 1},"children": []}],"current_node": -1,"unique_id": 3}})x";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line (and that in the else fence) is too long! Please do not use more than 100 characters in a line for the fixture. And please use indentation when breaking the line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix them after I write the gtest.

@@ -310,6 +341,23 @@ class CallProfilerProbe
CallProfiler & m_profiler;
}; /* end struct CallProfilerProbe */

class CallProfilerSerializer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add some comment for the class CallProfilerSerializer. It may be good to point to the google test file.

Comment on lines 75 to 201
// "nodes":[
// {"key":-1,"name":"","data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children":[0]},
// {"key":0,"name":"void modmesh::detail::foo1()","data":{"start_time": 17745276708555250,"caller_name": "void modmesh::detail::foo1()","total_time": 61002916,"call_count": 1,"is_running": 1},"children":[1]},
// {"key":1,"name":"void modmesh::detail::foo2()","data":{"start_time": 17745276708555458,"caller_name": "void modmesh::detail::foo2()","total_time": 54002250,"call_count": 1,"is_running": 1},"children":[2]},
// {"key":2,"name":"void modmesh::detail::foo3()","data":{"start_time": 17745276743555833,"caller_name": "void modmesh::detail::foo3()","total_time": 19001833,"call_count": 1,"is_running": 1},"children":[]}
// ],
// "current_node":-1,
// "unique_id":3
// }

outstream << R"({)";
outstream << R"("nodes": [)";
CallProfilerSerializer::serialize_radix_tree_nodes(profiler.radix_tree().get_current_node(), outstream);
outstream << R"(],)";
outstream << R"("current_node": )" << profiler.radix_tree().get_current_node()->key() << R"(,)";
outstream << R"("unique_id": )" << profiler.radix_tree().get_unique_node();
outstream << R"(})";
}
void CallProfilerSerializer::serialize_radix_tree_nodes(const RadixTreeNode<CallerProfile> * node, std::ostream & outstream)
{
std::queue<const RadixTreeNode<CallerProfile> *> nodes_buffer;
nodes_buffer.push(node);
bool is_first_node = true;

// Serialize the nodes in a breadth-first manner.
while (!nodes_buffer.empty())
{
const int nodes_buffer_size = nodes_buffer.size();
for (int i = 0; i < nodes_buffer_size; i++)
{
const RadixTreeNode<CallerProfile> * current_node = nodes_buffer.front();
nodes_buffer.pop();
CallProfilerSerializer::serialize_radix_tree_node(*current_node, is_first_node, outstream);
is_first_node = false;
for (const auto & child : current_node->children())
{
nodes_buffer.push(child.get());
}
}
}
}

void CallProfilerSerializer::serialize_radix_tree_node(const RadixTreeNode<CallerProfile> & node, bool is_first_node, std::ostream & outstream)
{
// Example:
// {
// "key":-1,
// "name":"",
// "data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},
// "children":[0]
// }

// Avoid the trailing comma.
if (!is_first_node)
{
outstream << R"(,)";
}
outstream << R"({)";
outstream << R"("key": )" << node.key() << R"(,)";
outstream << R"("name": ")" << node.name() << R"(",)";
outstream << R"("data": )";
CallProfilerSerializer::serialize_caller_profile(node.data(), outstream);
outstream << R"(,)";
outstream << R"("children": [)";
bool is_first_child = true;
for (const auto & child : node.children())
{
// Avoid the trailing comma.
if (!is_first_child)
{
outstream << R"(,)";
}
is_first_child = false;
outstream << child->key();
}
outstream << R"(])";
outstream << R"(})";
}

void CallProfilerSerializer::serialize_caller_profile(const CallerProfile & profile, std::ostream & outstream)
{
// Example:
// {
// "start_time": 0,
// "caller_name": "",
// "total_time": 0,
// "call_count": 0,
// "is_running": 0
// }

outstream << R"({)";
outstream << R"("start_time": )" << profile.start_time.time_since_epoch().count() << R"(,)";
outstream << R"("caller_name": ")" << profile.caller_name << R"(",)";
outstream << R"("total_time": )" << profile.total_time.count() << R"(,)";
outstream << R"("call_count": )" << profile.call_count << R"(,)";
outstream << R"("is_running": )" << profile.is_running;
outstream << R"(})";
}

} /* end namespace modmesh */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good to use a loop to traverse. Is there a performance concern?


foo1();

// Example:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this example comment block have the same content as your fixture string? If yes, this comment block should be removed. And as mentioned above, the fixture strings should use good indentation.

std::stringstream ss;
CallProfilerSerializer::serialize(*pProfiler, ss);

int ss_start_time_pos0 = ss.str().find(start_time_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to do string search yourself? They look too convoluted to be useful tests. Could you please explain what you are doing here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theory output is not always same as the fixture string.
The fixture string contains the information of the start_time and total_time of the program.
Thus, I only compare the string that doesn't include time information.

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add more information into the json format, which includes:
  • address of nodes
  • function-key dictionary

outstream << R"("unique_id": )" << profiler.radix_tree().get_unique_node();
outstream << R"(})";
}
void CallProfilerSerializer::serialize_radix_tree_nodes(const RadixTreeNode<CallerProfile> * node, std::ostream & outstream)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I use the clang-format tools, and it doesn't work.

Comment on lines 75 to 201
// "nodes":[
// {"key":-1,"name":"","data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children":[0]},
// {"key":0,"name":"void modmesh::detail::foo1()","data":{"start_time": 17745276708555250,"caller_name": "void modmesh::detail::foo1()","total_time": 61002916,"call_count": 1,"is_running": 1},"children":[1]},
// {"key":1,"name":"void modmesh::detail::foo2()","data":{"start_time": 17745276708555458,"caller_name": "void modmesh::detail::foo2()","total_time": 54002250,"call_count": 1,"is_running": 1},"children":[2]},
// {"key":2,"name":"void modmesh::detail::foo3()","data":{"start_time": 17745276743555833,"caller_name": "void modmesh::detail::foo3()","total_time": 19001833,"call_count": 1,"is_running": 1},"children":[]}
// ],
// "current_node":-1,
// "unique_id":3
// }

outstream << R"({)";
outstream << R"("nodes": [)";
CallProfilerSerializer::serialize_radix_tree_nodes(profiler.radix_tree().get_current_node(), outstream);
outstream << R"(],)";
outstream << R"("current_node": )" << profiler.radix_tree().get_current_node()->key() << R"(,)";
outstream << R"("unique_id": )" << profiler.radix_tree().get_unique_node();
outstream << R"(})";
}
void CallProfilerSerializer::serialize_radix_tree_nodes(const RadixTreeNode<CallerProfile> * node, std::ostream & outstream)
{
std::queue<const RadixTreeNode<CallerProfile> *> nodes_buffer;
nodes_buffer.push(node);
bool is_first_node = true;

// Serialize the nodes in a breadth-first manner.
while (!nodes_buffer.empty())
{
const int nodes_buffer_size = nodes_buffer.size();
for (int i = 0; i < nodes_buffer_size; i++)
{
const RadixTreeNode<CallerProfile> * current_node = nodes_buffer.front();
nodes_buffer.pop();
CallProfilerSerializer::serialize_radix_tree_node(*current_node, is_first_node, outstream);
is_first_node = false;
for (const auto & child : current_node->children())
{
nodes_buffer.push(child.get());
}
}
}
}

void CallProfilerSerializer::serialize_radix_tree_node(const RadixTreeNode<CallerProfile> & node, bool is_first_node, std::ostream & outstream)
{
// Example:
// {
// "key":-1,
// "name":"",
// "data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},
// "children":[0]
// }

// Avoid the trailing comma.
if (!is_first_node)
{
outstream << R"(,)";
}
outstream << R"({)";
outstream << R"("key": )" << node.key() << R"(,)";
outstream << R"("name": ")" << node.name() << R"(",)";
outstream << R"("data": )";
CallProfilerSerializer::serialize_caller_profile(node.data(), outstream);
outstream << R"(,)";
outstream << R"("children": [)";
bool is_first_child = true;
for (const auto & child : node.children())
{
// Avoid the trailing comma.
if (!is_first_child)
{
outstream << R"(,)";
}
is_first_child = false;
outstream << child->key();
}
outstream << R"(])";
outstream << R"(})";
}

void CallProfilerSerializer::serialize_caller_profile(const CallerProfile & profile, std::ostream & outstream)
{
// Example:
// {
// "start_time": 0,
// "caller_name": "",
// "total_time": 0,
// "call_count": 0,
// "is_running": 0
// }

outstream << R"({)";
outstream << R"("start_time": )" << profile.start_time.time_since_epoch().count() << R"(,)";
outstream << R"("caller_name": ")" << profile.caller_name << R"(",)";
outstream << R"("total_time": )" << profile.total_time.count() << R"(,)";
outstream << R"("call_count": )" << profile.call_count << R"(,)";
outstream << R"("is_running": )" << profile.is_running;
outstream << R"(})";
}

} /* end namespace modmesh */
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea. It seems to be fine.
The argument are constant referenced.
And there is not any std::flush that may affect the performance.


foo1();

// Example:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.
I will remove this comment block.

std::stringstream ss;
CallProfilerSerializer::serialize(*pProfiler, ss);

int ss_start_time_pos0 = ss.str().find(start_time_str);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The theory output is not always same as the fixture string.
The fixture string contains the information of the start_time and total_time of the program.
Thus, I only compare the string that doesn't include time information.

Comment on lines 78 to 90
// {
// "radix_tree":
// {
// "nodes":[
// {"key":-1,"name":"","data":{"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children":[0]},
// {"key":0,"name":"void modmesh::detail::foo1()","data":{"start_time": 17745276708555250,"caller_name": "void modmesh::detail::foo1()","total_time": 61002916,"call_count": 1,"is_running": 1},"children":[1]},
// {"key":1,"name":"void modmesh::detail::foo2()","data":{"start_time": 17745276708555458,"caller_name": "void modmesh::detail::foo2()","total_time": 54002250,"call_count": 1,"is_running": 1},"children":[2]},
// {"key":2,"name":"void modmesh::detail::foo3()","data":{"start_time": 17745276743555833,"caller_name": "void modmesh::detail::foo3()","total_time": 19001833,"call_count": 1,"is_running": 1},"children":[]}
// ],
// "current_node":-1,
// "unique_id":3
// }
// }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The json format is not informative enough.
Actually, the key is the ID of the function, and the same function may appear in the different trie nodes.
So I should add the address of the node in the nodes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, the dictionary, that map the function to keyID, should be added into the radix tree.

Copy link
Member

@yungyuc yungyuc May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the address of the node in the nodes, do you mean the address of the stack frame?

Maybe it is worth a try, but I am not 100% sure the stack frame address can be used to identify a node.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the address of the node in the nodes, do you mean the address of the stack frame?

Maybe it is worth a try, but I am not 100% sure the stack frame address can be used to identify a node.

Yes, it is.
I just record the relationship between nodes,
and I actually don't want to get the information from the real address.

By the way, I try to casting the pointer to the unsigned long when serializing.
It seems that we couldn't directly print the address in ubuntuOS.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the address of the node in the nodes, do you mean the address of the stack frame?
Maybe it is worth a try, but I am not 100% sure the stack frame address can be used to identify a node.

Yes, it is. I just record the relationship between nodes, and I actually don't want to get the information from the real address.

By the way, I try to casting the pointer to the unsigned long when serializing. It seems that we couldn't directly print the address in ubuntuOS.

You are right. Here is a bug.
https://github.com/solvcon/modmesh/actions/runs/9338060610/job/25700741584?pr=323
It seems that we couldn't identify a node by stack frame address because of the forbidden of the reinterpret_cast.
I think I should number the node by myself when Breadth-first searching.

@yungyuc yungyuc changed the title Serialize/Deserialize the call profiler Prototype JSON serializer for CallProfiler Jun 2, 2024
@yungyuc
Copy link
Member

yungyuc commented Jun 2, 2024

@ThreeMonth03 I retitled the PR for I did not see deserializer in it. But please correct me if I am wrong.

@ThreeMonth03
Copy link
Collaborator Author

@ThreeMonth03 I retitled the PR for I did not see deserializer in it. But please correct me if I am wrong.

You are right.

@ThreeMonth03
Copy link
Collaborator Author

ThreeMonth03 commented Jun 2, 2024

#323 (review)
#323 (review)

  • Remove duplicated and/or lengthy code or data examples.
  • (Style) blank line between functions.
  • Take care of looper incrementation semantics (++i).
  • Make example comments in serialize_radix_tree_node() and serialize_caller_profile() clearer.
  • Beautify the fixture strings.
  • Add the function-key dictionary into the json format.
  • Add the address unique number of nodes into the json format, which can indentify the relationship among nodes.
  • The reason could be referd here.
  • Avoid overly long lines.
  • Make the serializer testable.

@yungyuc I am ready for the next review.

@yungyuc
Copy link
Member

yungyuc commented Jun 2, 2024

Thanks, @ThreeMonth03 . Could you please resolve CI failures before requesting for a review?

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -212,5 +212,86 @@ TEST_F(CallProfilerTest, cancel)
EXPECT_EQ(radix_tree().get_unique_node(), 0);
}

#ifdef _MSC_VER
std::string serializeExample = R"x({"radix_tree": {"nodes": [{"key": -1,"name": "","data": {"start_time": 0,"caller_name": "","total_time": 0,"call_count": 0,"is_running": 0},"children": [0]},{"key": 0,"name": "void __cdecl modmesh::detail::foo1(void)","data": {"start_time": 18348582416327166,"caller_name": "void __cdecl modmesh::detail::foo1(void)","total_time": 61001042,"call_count": 1,"is_running": 1},"children": [1]},{"key": 1,"name": "void __cdecl modmesh::detail::foo2(void)","data": {"start_time": 18348582416327416,"caller_name": "void __cdecl modmesh::detail::foo2(void)","total_time": 54000667,"call_count": 1,"is_running": 1},"children": [2]},{"key": 2,"name": "void __cdecl modmesh::detail::foo3(void)","data": {"start_time": 18348582451327708,"caller_name": "void __cdecl modmesh::detail::foo3(void)","total_time": 19000167,"call_count": 1,"is_running": 1},"children": []}],"current_node": -1,"unique_id": 3}})x";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix them after I write the gtest.

Comment on lines 145 to 211
void CallProfilerSerializer::serialize_radix_tree_node(const RadixTreeNode<CallerProfile> & node, bool is_first_node, std::ostream & outstream)
{
// Serialize the RadixTreeNode to the json format.

// Avoid the trailing comma for the first node.
if (!is_first_node)
{
outstream << R"(,)" << '\n';
outstream << R"( {)" << '\n';
}
else
{
outstream << R"({)" << '\n';
}

outstream << R"( "key": )" << node.key() << R"(,)" << '\n';
outstream << R"( "name": ")" << node.name() << R"(",)" << '\n';
CallProfilerSerializer::serialize_caller_profile(node.data(), outstream);
outstream << R"( "address": ")" << reinterpret_cast<std::uintptr_t>(&node) << R"(",)" << '\n';
CallProfilerSerializer::serialize_radix_tree_node_children(node.children(), outstream);

outstream << R"( })";
}

void CallProfilerSerializer::serialize_radix_tree_node_children(const CallProfilerSerializer::child_list_type & children, std::ostream & outstream)
{
// Serialize the children list in RadixTreeNode.
outstream << R"( children": [)";

// If the children list is empty, close the list at the same line.
if (children.empty())
{
outstream << R"(])" << '\n';
}

else
{
outstream << '\n';

bool is_first_child = true;
for (const auto & child : children)
{
// Avoid the trailing comma.
if (!is_first_child)
{
outstream << R"(,)" << '\n';
}
is_first_child = false;
outstream << R"( ")" << reinterpret_cast<std::uintptr_t>(child.get()) << R"(")";
}
outstream << '\n';
outstream << R"( ])" << '\n';
}
}

void CallProfilerSerializer::serialize_caller_profile(const CallerProfile & profile, std::ostream & outstream)
{
// Serialize the CallerProfile to the json format.
outstream << R"( "data": {)" << '\n';
outstream << R"( "start_time": )" << profile.start_time.time_since_epoch().count() << R"(,)" << '\n';
outstream << R"( "caller_name": ")" << profile.caller_name << R"(",)" << '\n';
outstream << R"( "total_time": )" << profile.total_time.count() << R"(,)" << '\n';
outstream << R"( "call_count": )" << profile.call_count << R"(,)" << '\n';
outstream << R"( "is_running": )" << profile.is_running << '\n';
outstream << R"( },)" << '\n';
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I add the indent for the output.
  2. I remove the lengthy comment.
  3. I split more function to improve the readability.

Comment on lines 168 to 175
class CallProfilerPK
{
private:
CallProfilerPK() = default;
friend CallProfilerSerializer;
};

const std::unordered_map<std::string, key_type> & get_id_map(CallProfilerPK const &) const { return m_id_map; }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to expose the id_map to the serializer, I add the passkey in CallProfiler.

while (!nodes_buffer.empty())
{
const int nodes_buffer_size = nodes_buffer.size();
for (int i = 0; i < nodes_buffer_size; ++i)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the looper incrementation semantics.

Comment on lines 228 to 299
/* Here is the expected format of output:
{
"radix_tree": {
"current_node": "105553178001632",
"nodes": [{
"key": -1,
"name": "",
"data": {
"start_time": 0,
"caller_name": "",
"total_time": 0,
"call_count": 0,
"is_running": 0
},
"address": "105553178001632",
children": [
"105553178002080"
]
},
{
"key": 0,
"name": "void modmesh::detail::foo1()",
"data": {
"start_time": 1969175311810791,
"caller_name": "void modmesh::detail::foo1()",
"total_time": 61010334,
"call_count": 1,
"is_running": 1
},
"address": "105553178002080",
children": [
"105553178001408"
]
},
{
"key": 1,
"name": "void modmesh::detail::foo2()",
"data": {
"start_time": 1969175311811000,
"caller_name": "void modmesh::detail::foo2()",
"total_time": 54010041,
"call_count": 1,
"is_running": 1
},
"address": "105553178001408",
children": [
"105553178034176"
]
},
{
"key": 2,
"name": "void modmesh::detail::foo3()",
"data": {
"start_time": 1969175346815166,
"caller_name": "void modmesh::detail::foo3()",
"total_time": 19005625,
"call_count": 1,
"is_running": 1
},
"address": "105553178034176",
children": []
}
],
"id_map": {
"void modmesh::detail::foo3()": 2,
"void modmesh::detail::foo2()": 1,
"void modmesh::detail::foo1()": 0
},
"unique_id": 3
}
}
*/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I haven't written the gtest.
As a temporarily replacement, here is the actually output of the serialization.
I've already dealt with the additional indent and newline when serializing the tree.

@ThreeMonth03
Copy link
Collaborator Author

ThreeMonth03 commented Jun 2, 2024

Thanks, @ThreeMonth03 . Could you please resolve CI failures before requesting for a review?

Ok, I will fix the bug about reinterpret_cast the address.

ThreeMonth03 and others added 4 commits June 4, 2024 21:48
There are some modification in this pull request:
 - [x] Replace node address with a unique number in the serialization result.
 - [x] Create a new file, [gtests/test_nopython_callprofiler.cpp](https://github.com/ThreeMonth03/modmesh/compare/debug...ThreeMonth03:modmesh:test?expand=1#diff-f2b5f43898d27aa295793363d34b89b0ef47c3beacb2ed3dd2756797f28f044c), to test the serializer reliably.
 - [x] Rewrite the serializer function, to rearrange the json format and increase the readability of the code.
Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Remove duplicated and/or lengthy code or data examples.
  • (Style) blank line between functions.
  • Take care of looper incrementation semantics (++i).
  • Make example comments in serialize_radix_tree_node() and serialize_caller_profile() clearer.
  • Beautify the fixture strings.
  • Add the function-key dictionary into the json format.
  • Add the address unique number of nodes into the json format, which can indentify the relationship among nodes.
  • The reason could be referd here.
  • Avoid overly long lines.
  • Make the serializer testable.

@yungyuc I am ready for the next review.

Comment on lines +390 to +504
"start_time": 0,
"caller_name": "",
"total_time": 0,
"call_count": 0,
"is_running": 0
},
children": [
0,
1,
2
]
},
{
"unique_number": 0,
"key": 0,
"name": "void modmesh::detail::func1()",
"data": {
"start_time": 2074704984391166,
"caller_name": "void modmesh::detail::func1()",
"total_time": 61000834,
"call_count": 1,
"is_running": 1
},
children": [
3
]
},
{
"unique_number": 1,
"key": 1,
"name": "void modmesh::detail::func2()",
"data": {
"start_time": 2074705045392250,
"caller_name": "void modmesh::detail::func2()",
"total_time": 54001000,
"call_count": 1,
"is_running": 1
},
children": [
4
]
},
{
"unique_number": 2,
"key": 2,
"name": "void modmesh::detail::func3()",
"data": {
"start_time": 2074705118393708,
"caller_name": "void modmesh::detail::func3()",
"total_time": 38000208,
"call_count": 2,
"is_running": 1
},
children": []
},
{
"unique_number": 3,
"key": 1,
"name": "void modmesh::detail::func2()",
"data": {
"start_time": 2074704984391458,
"caller_name": "void modmesh::detail::func2()",
"total_time": 54000417,
"call_count": 1,
"is_running": 1
},
children": [
5
]
},
{
"unique_number": 4,
"key": 2,
"name": "void modmesh::detail::func3()",
"data": {
"start_time": 2074705080392791,
"caller_name": "void modmesh::detail::func3()",
"total_time": 19000417,
"call_count": 1,
"is_running": 1
},
children": []
},
{
"unique_number": 5,
"key": 2,
"name": "void modmesh::detail::func3()",
"data": {
"start_time": 2074705019391625,
"caller_name": "void modmesh::detail::func3()",
"total_time": 19000125,
"call_count": 1,
"is_running": 1
},
children": []
}
]
}
}
*/
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether I should show the output in the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time being let's keep it here and if it's no good we can remove it later.

It's strange that I cannot see some of your comments like this inline. Github PR code review is really bad.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time being let's keep it here and if it's no good we can remove it later.

It's strange that I cannot see some of your comments like this inline. Github PR code review is really bad.

I don't get it.
Do you mean that I should add more comment based on the example in the comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's OK. You don't need to do anything. I was complaining Github is not showing my review comments at the places I was expecting.

Comment on lines +44 to +337
else
{
// line 12 is the begin of children.
auto lineChildrenBegin = R"( children": [)";
EXPECT_EQ(linesNodeInfo[11], lineChildrenBegin);

// The second last line is the end of children.
auto lineChildrenEnd = R"( ])";
EXPECT_EQ(linesNodeInfo[numOfLinesNodeInfo - 2], lineChildrenEnd);

// Check the trailing comma for the children list.
for (int j = 12; j < numOfLinesNodeInfo - 3; j++)
{
EXPECT_EQ(linesNodeInfo[j].back(), ',');
}
// Expect the last element has no trailing comma.
EXPECT_NE(linesNodeInfo[numOfLinesNodeInfo - 3].back(), ',');

// Check the unique number of children list is correct.
std::unordered_set<int> childrenUniqueNumbers;
for (int j = 12; j < numOfLinesNodeInfo - 2; j++)
{
std::string uniqueNumberString = linesNodeInfo[j];
if (uniqueNumberString.back() == ',')
uniqueNumberString.pop_back();
int uniqueNumber = std::stoi(uniqueNumberString);
childrenUniqueNumbers.insert(uniqueNumber);
}

for (const auto & child : node->children())
{
int uniqueNumber = node_to_unique_number[child.get()];
EXPECT_TRUE(childrenUniqueNumbers.find(uniqueNumber) != childrenUniqueNumbers.end());
childrenUniqueNumbers.erase(uniqueNumber);
}
}
}
}

std::vector<std::vector<std::string>> CallProfilerTest::split_nodes_info(const std::vector<std::string> & lines)
{
int numOfLines = lines.size();
// Split the nodes serialization into a vector of strings.
std::vector<std::vector<std::string>> nodesInfo;
auto lineNodesBegin = R"( "nodes": [{)";
int curNodeInfoBegin = std::find(lines.begin(), lines.end(), lineNodesBegin) - lines.begin();
int curNodeInfoEnd = 0;

while (curNodeInfoEnd < numOfLines - 4)
{
std::vector<std::string> nodeInfo;
curNodeInfoEnd = std::find(lines.begin() + curNodeInfoBegin, lines.end(), R"( },)") - lines.begin();
if (curNodeInfoEnd == numOfLines)
{
curNodeInfoEnd = numOfLines - 4;
// Expect the last node has no trailing comma.
EXPECT_NE(lines[curNodeInfoEnd].back(), ',');
}

// Unify the first line and last line of the node to make the check easier.
nodeInfo.push_back(R"( {)");
for (int i = curNodeInfoBegin + 1; i <= curNodeInfoEnd; ++i)
{
if (i == numOfLines - 4)
nodeInfo.push_back(lines[i] + ",");
else
nodeInfo.push_back(lines[i]);
}
nodesInfo.push_back(nodeInfo);
curNodeInfoBegin = curNodeInfoEnd + 1;
}
return nodesInfo;
}

void CallProfilerTest::BFS_radix_tree(const RadixTreeNode<CallerProfile> * node, node_to_number_map_type & node_to_unique_number, number_to_node_map_type & unique_number_to_node)
{
// BFS the radix tree and number the nodes.
std::queue<const RadixTreeNode<CallerProfile> *> nodes_buffer;
nodes_buffer.push(node);
node_to_unique_number[node] = -1;
unique_number_to_node[-1] = node;
int unique_node_number = -1;

while (!nodes_buffer.empty())
{
const int nodes_buffer_size = nodes_buffer.size();
for (int i = 0; i < nodes_buffer_size; ++i)
{
const RadixTreeNode<CallerProfile> * cur_node = nodes_buffer.front();
nodes_buffer.pop();
for (const auto & child : cur_node->children())
{
nodes_buffer.push(child.get());
// Store the key and value in the two hash maps.
node_to_unique_number[child.get()] = ++unique_node_number;
unique_number_to_node[unique_node_number] = child.get();
}
}
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I check the format of the serialization result, including trailing comma ,newline and indent.
  • I BFS the radixtree again, and check the correctness of the information of the nodes, and the relationship among nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing code here is overly complex. I think they will break very soon as we move forward the development. At that time, don't hesitate to remove the tests and remake them.

I think the effective form of such tests will be in Python after #344 is done. Let's leave the tests here for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike to use names like BFS which is not intuitive from high level. The naming style mixes CAPITAL with snake_case and looks bad too. But let's leave it here for now.

Copy link
Collaborator Author

@ThreeMonth03 ThreeMonth03 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike to use names like BFS which is not intuitive from high level. The naming style mixes CAPITAL with snake_case and looks bad too. But let's leave it here for now.

I'm curious whether there is a unified naming style/convention for modmesh.
I think I may refer the naming style for the next pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is but we have not yet time to document it. It may be good to write it down now. See #350

@@ -24,6 +24,7 @@ add_executable(
test_nopython_inout.cpp
test_nopython_radixtree.cpp
test_nopython_callprofiler.cpp
test_nopython_callprofilerserializer.cpp
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrite the origin gtest to gtests/test_nopython_callprofilerserializer.cpp.

@@ -310,6 +341,23 @@ class CallProfilerProbe
CallProfiler & m_profiler;
}; /* end struct CallProfilerProbe */

class CallProfilerSerializer
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether the comment is enough to help reading.

@@ -160,6 +170,16 @@ class RadixTree
RadixTreeNode<T> * get_current_node() const { return m_current_node; }
key_type get_unique_node() const { return m_unique_id; }

class CallProfilerPK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of passkey

Comment on lines +44 to +337
else
{
// line 12 is the begin of children.
auto lineChildrenBegin = R"( children": [)";
EXPECT_EQ(linesNodeInfo[11], lineChildrenBegin);

// The second last line is the end of children.
auto lineChildrenEnd = R"( ])";
EXPECT_EQ(linesNodeInfo[numOfLinesNodeInfo - 2], lineChildrenEnd);

// Check the trailing comma for the children list.
for (int j = 12; j < numOfLinesNodeInfo - 3; j++)
{
EXPECT_EQ(linesNodeInfo[j].back(), ',');
}
// Expect the last element has no trailing comma.
EXPECT_NE(linesNodeInfo[numOfLinesNodeInfo - 3].back(), ',');

// Check the unique number of children list is correct.
std::unordered_set<int> childrenUniqueNumbers;
for (int j = 12; j < numOfLinesNodeInfo - 2; j++)
{
std::string uniqueNumberString = linesNodeInfo[j];
if (uniqueNumberString.back() == ',')
uniqueNumberString.pop_back();
int uniqueNumber = std::stoi(uniqueNumberString);
childrenUniqueNumbers.insert(uniqueNumber);
}

for (const auto & child : node->children())
{
int uniqueNumber = node_to_unique_number[child.get()];
EXPECT_TRUE(childrenUniqueNumbers.find(uniqueNumber) != childrenUniqueNumbers.end());
childrenUniqueNumbers.erase(uniqueNumber);
}
}
}
}

std::vector<std::vector<std::string>> CallProfilerTest::split_nodes_info(const std::vector<std::string> & lines)
{
int numOfLines = lines.size();
// Split the nodes serialization into a vector of strings.
std::vector<std::vector<std::string>> nodesInfo;
auto lineNodesBegin = R"( "nodes": [{)";
int curNodeInfoBegin = std::find(lines.begin(), lines.end(), lineNodesBegin) - lines.begin();
int curNodeInfoEnd = 0;

while (curNodeInfoEnd < numOfLines - 4)
{
std::vector<std::string> nodeInfo;
curNodeInfoEnd = std::find(lines.begin() + curNodeInfoBegin, lines.end(), R"( },)") - lines.begin();
if (curNodeInfoEnd == numOfLines)
{
curNodeInfoEnd = numOfLines - 4;
// Expect the last node has no trailing comma.
EXPECT_NE(lines[curNodeInfoEnd].back(), ',');
}

// Unify the first line and last line of the node to make the check easier.
nodeInfo.push_back(R"( {)");
for (int i = curNodeInfoBegin + 1; i <= curNodeInfoEnd; ++i)
{
if (i == numOfLines - 4)
nodeInfo.push_back(lines[i] + ",");
else
nodeInfo.push_back(lines[i]);
}
nodesInfo.push_back(nodeInfo);
curNodeInfoBegin = curNodeInfoEnd + 1;
}
return nodesInfo;
}

void CallProfilerTest::BFS_radix_tree(const RadixTreeNode<CallerProfile> * node, node_to_number_map_type & node_to_unique_number, number_to_node_map_type & unique_number_to_node)
{
// BFS the radix tree and number the nodes.
std::queue<const RadixTreeNode<CallerProfile> *> nodes_buffer;
nodes_buffer.push(node);
node_to_unique_number[node] = -1;
unique_number_to_node[-1] = node;
int unique_node_number = -1;

while (!nodes_buffer.empty())
{
const int nodes_buffer_size = nodes_buffer.size();
for (int i = 0; i < nodes_buffer_size; ++i)
{
const RadixTreeNode<CallerProfile> * cur_node = nodes_buffer.front();
nodes_buffer.pop();
for (const auto & child : cur_node->children())
{
nodes_buffer.push(child.get());
// Store the key and value in the two hash maps.
node_to_unique_number[child.get()] = ++unique_node_number;
unique_number_to_node[unique_node_number] = child.get();
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testing code here is overly complex. I think they will break very soon as we move forward the development. At that time, don't hesitate to remove the tests and remake them.

I think the effective form of such tests will be in Python after #344 is done. Let's leave the tests here for now.

outstream << R"( })";
}

void CallProfilerSerializer::serialize_radix_tree_node_children(const CallProfilerSerializer::child_list_type & children, CallProfilerSerializer::node_to_number_map_type & node_to_unique_number, std::ostream & outstream)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is 216 char long. It is too long. But let's leave it here for now. It is hard to set a threshold for how long a line should be, and it is subjective. I do not want to set a number yet.

For a maintainer to quickly parse the argument by naked eyes, (i.e., readability's sake), the code style for this may be like (not clang-formatted):

void CallProfilerSerializer::serialize_radix_tree_node(
    const RadixTreeNode<CallerProfile> & node, bool is_first_node,
    CallProfilerSerializer::node_to_number_map_type & node_to_unique_number,
    std::ostream & outstream)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is 216 char long. It is too long. But let's leave it here for now. It is hard to set a threshold for how long a line should be, and it is subjective. I do not want to set a number yet.

For a maintainer to quickly parse the argument by naked eyes, (i.e., readability's sake), the code style for this may be like (not clang-formatted):

void CallProfilerSerializer::serialize_radix_tree_node(
    const RadixTreeNode<CallerProfile> & node, bool is_first_node,
    CallProfilerSerializer::node_to_number_map_type & node_to_unique_number,
    std::ostream & outstream)

I think the bad naming style results in the excessive long line,
but I have no idea how to name a type(by using or typedef)/varialble/function for a complex hierarchical structure.
Do you have any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long names are inevitable. It is not practical to find a general way to name things perfectly when the system is complex (like to have 100 classes and 3,000 functions).

The only practical approach I know is to constantly review and refactor.

Comment on lines +390 to +504
"start_time": 0,
"caller_name": "",
"total_time": 0,
"call_count": 0,
"is_running": 0
},
children": [
0,
1,
2
]
},
{
"unique_number": 0,
"key": 0,
"name": "void modmesh::detail::func1()",
"data": {
"start_time": 2074704984391166,
"caller_name": "void modmesh::detail::func1()",
"total_time": 61000834,
"call_count": 1,
"is_running": 1
},
children": [
3
]
},
{
"unique_number": 1,
"key": 1,
"name": "void modmesh::detail::func2()",
"data": {
"start_time": 2074705045392250,
"caller_name": "void modmesh::detail::func2()",
"total_time": 54001000,
"call_count": 1,
"is_running": 1
},
children": [
4
]
},
{
"unique_number": 2,
"key": 2,
"name": "void modmesh::detail::func3()",
"data": {
"start_time": 2074705118393708,
"caller_name": "void modmesh::detail::func3()",
"total_time": 38000208,
"call_count": 2,
"is_running": 1
},
children": []
},
{
"unique_number": 3,
"key": 1,
"name": "void modmesh::detail::func2()",
"data": {
"start_time": 2074704984391458,
"caller_name": "void modmesh::detail::func2()",
"total_time": 54000417,
"call_count": 1,
"is_running": 1
},
children": [
5
]
},
{
"unique_number": 4,
"key": 2,
"name": "void modmesh::detail::func3()",
"data": {
"start_time": 2074705080392791,
"caller_name": "void modmesh::detail::func3()",
"total_time": 19000417,
"call_count": 1,
"is_running": 1
},
children": []
},
{
"unique_number": 5,
"key": 2,
"name": "void modmesh::detail::func3()",
"data": {
"start_time": 2074705019391625,
"caller_name": "void modmesh::detail::func3()",
"total_time": 19000125,
"call_count": 1,
"is_running": 1
},
children": []
}
]
}
}
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time being let's keep it here and if it's no good we can remove it later.

It's strange that I cannot see some of your comments like this inline. Github PR code review is really bad.

Copy link
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are still defects in the code but I would like to merge the PR for now so that you can fix the defects in later PRs. The major issues are the formatting and testing as I commented inline.

But the code has basic tests. That's an achievement.

@yungyuc yungyuc merged commit 1f03e8e into solvcon:master Jun 5, 2024
13 checks passed
@yungyuc
Copy link
Member

yungyuc commented Jun 5, 2024

Thanks for the patch, @ThreeMonth03 . It's now merging. Since I squashed it some review comments disappeared from the PR page. Please find them in the View reviewed changes. A useful one is at https://github.com/solvcon/modmesh/pull/323/files/5f1eb6619dd22b1f8f61cab46baedbf1805791b0#r1626933409 .

@yungyuc yungyuc added the enhancement New feature or request label Jun 9, 2024
@ThreeMonth03 ThreeMonth03 deleted the debug branch June 10, 2024 03:43
@yungyuc yungyuc linked an issue Jun 24, 2024 that may be closed by this pull request
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance Profiling, runtime, and memory consumption
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Prototype profiler object serialization using JSON
3 participants