Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Congestion placement #2384

Closed

Conversation

behnam-rs
Copy link

Description

This change adds a congestion aware placement into vpr. This algorithm takes advantage of the crossing parameter in VPR to make the congestion matrix and adds it into the placement cost function.

Related Issue

Motivation and Context

Since VPR does not support congestion in placement, The final placement sometimes is unroutable. Therefore, this chagne help VPR to make a uncogested placement

How Has This Been Tested?

This new placer have been tested for a bunch of test-benches which were not routable via conventional SA placement algorithm.

Types of changes

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed

@github-actions github-actions bot added the VPR VPR FPGA Placement & Routing Tool label Aug 31, 2023
@vaughnbetz
Copy link
Contributor

Adding @saaramahmoudi and @soheilshahrouz as reviewers as they know this code.
@behnam-rs : you should have an invite to the vtr repositories in your email; if you accept it CI will start running automatically when you push code or make a pull request.

@@ -171,6 +171,11 @@ static const float cross_count[50] = {/* [0..49] */ 1.0, 1.0, 1.0, 1.0828,
2.5610, 2.5864, 2.6117, 2.6371, 2.6625, 2.6887, 2.7148, 2.7410, 2.7671,
2.7933};

double cong_matrix[400][400];
double cong_matrix_new[400][400];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed size arrays are not a good practice. What if the grid size is larger than 400x400? Even if grid size is smaller than 400x400, it unnecessarily increases the memory footprint. When the array is larger than what is actually needed, illegal accesses cannot be spotted by memory debugging tools.

If the scope of these arrays are limited to this file, it is better to define them as static.

vtr::NdMatrix provides the same funtionality and can be dynamically allocated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use vtr::NdMatrix for these congestion data structures. The usage is pretty straight-forward and can be resized using grid.width() and grid.height() when other dynamic arrays are resized.

A high level comment would help in here (as the other static data structures)
you can also show the dimenstion in comments like [0..grid.width()-1][0..grid.height()-1] to clarify usage.

@@ -171,6 +171,11 @@ static const float cross_count[50] = {/* [0..49] */ 1.0, 1.0, 1.0, 1.0828,
2.5610, 2.5864, 2.6117, 2.6371, 2.6625, 2.6887, 2.7148, 2.7410, 2.7671,
2.7933};

double cong_matrix[400][400];
double cong_matrix_new[400][400];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reasons for using double type instead of float? I assume using float type may make the code run faster.

double cong_matrix[400][400];
double cong_matrix_new[400][400];

float congestion_tradeoff = 1.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the scope is limited to this file, it is better to make it static.

@@ -323,7 +328,7 @@ static bool driven_by_moved_block(const ClusterNetId net,

static float analyze_setup_slack_cost(const PlacerSetupSlacks* setup_slacks);

static e_move_result assess_swap(double delta_c, double t);
static e_move_result assess_swap(double delta_c, double t,double cong_delta_c,double cost,const t_placer_opts& placer_opts);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding space after each "," makes the code more readable.

Copy link
Contributor

@saaramahmoudi saaramahmoudi Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think every new cost term that we add should be included in delta_c and pass it through assess_swap. This function just check that if the cost is reduced we accept the move, if increased we accept it with some probability. I can't see why we need to pass an individual cost for each term and why the function need to know about the place algorithm.

@@ -266,7 +271,7 @@ static void alloc_and_load_for_fast_cost_update(float place_cost_exp);

static void free_fast_cost_update();

static double comp_bb_cost(e_cost_methods method);
static double comp_bb_cost(e_cost_methods method,const t_place_algorithm& place_algorithm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding space after each "," makes the code more readable.

if (physical_tile_type(bnum) != physical_tile) {
VTR_LOG_ERROR(
"Block %zu type (%s) does not match grid location (%zu,%zu, %d) type (%s).\n",
size_t(bnum), logical_block->name, i, j, layer_num, physical_tile->name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is not right

|| !is_sub_tile_compatible(physical_tile, logical_block,
loc.sub_tile)) {
VTR_LOG_ERROR(
"Block %zu's location is (%d,%d,%d) but found in grid at (%zu,%zu,%d,%d).\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

}
if (usage_check != place_ctx.grid_blocks.get_usage(tile_loc)) {
VTR_LOG_ERROR(
"%d block(s) were placed at location (%d,%d,%d), but location contains %d block(s).\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation needs to be corrected in check_block_placement_consistency()

@@ -2938,7 +3137,7 @@ static int check_placement_costs(const t_placer_costs& costs,
double bb_cost_check;
double timing_cost_check;

bb_cost_check = comp_bb_cost(CHECK);
bb_cost_check = comp_bb_cost(CHECK, place_algorithm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are congestion costs are checked?

@@ -65,6 +65,7 @@ static GridBlock init_grid_blocks() {
void t_placer_costs::update_norm_factors() {
if (place_algorithm.is_timing_driven()) {
bb_cost_norm = 1 / bb_cost;
cong_cost_norm = 1/ cong_cost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if place_algorithm is not congestion aware?
Is it possible to divide by zero?

Copy link
Contributor

@saaramahmoudi saaramahmoudi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For c++ formatting, please run "make format" in the $VTR_ROOT directory and push the changes into the branch.

@@ -1928,9 +1934,10 @@ argparse::ArgumentParser create_arg_parser(std::string prog_name, t_options& arg
"Controls which placement algorithm is used. Valid options:\n"
" * bounding_box: Focuses purely on minimizing the bounding box wirelength of the circuit. Turns off timing analysis if specified.\n"
" * criticality_timing: Focuses on minimizing both the wirelength and the connection timing costs (criticality * delay).\n"
" * slack_timing: Focuses on improving the circuit slack values to reduce critical path delay.\n")
" * slack_timing: Focuses on improving the circuit slack values to reduce critical path delay.\n"
" * congestion_aware: Focuses on improving routability.\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to explain how being congestion aware would result in better routability.

place_timing_grp.add_argument(args.CongestionTradeoff, "--congest_tradeoff")
.help(
"Trade-off control the bouding value for the contestion matrix.\n"
" a value near routing channel width can be a good value.\n"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misspelled word : bouding -> bounding
While comment says that a value near routing channel width can be good, the default value set to 1. Is there a specific reason for the default value? If not, I think Soheil is right, just use place_chan_width.

@@ -171,6 +171,11 @@ static const float cross_count[50] = {/* [0..49] */ 1.0, 1.0, 1.0, 1.0828,
2.5610, 2.5864, 2.6117, 2.6371, 2.6625, 2.6887, 2.7148, 2.7410, 2.7671,
2.7933};

double cong_matrix[400][400];
double cong_matrix_new[400][400];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use vtr::NdMatrix for these congestion data structures. The usage is pretty straight-forward and can be resized using grid.width() and grid.height() when other dynamic arrays are resized.

A high level comment would help in here (as the other static data structures)
you can also show the dimenstion in comments like [0..grid.width()-1][0..grid.height()-1] to clarify usage.

double cong_matrix[400][400];
double cong_matrix_new[400][400];

float congestion_tradeoff = 1.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same thing as Soheil. Here is my suggestion:

  1. Remove the current command line option that you added and use place_chan_width instead.
  2. Make this congestion_tradeoff a command line option with value between [0,1] and give the user control to change it if they want instead of a global variable in place.cpp.

@@ -323,7 +328,7 @@ static bool driven_by_moved_block(const ClusterNetId net,

static float analyze_setup_slack_cost(const PlacerSetupSlacks* setup_slacks);

static e_move_result assess_swap(double delta_c, double t);
static e_move_result assess_swap(double delta_c, double t,double cong_delta_c,double cost,const t_placer_opts& placer_opts);
Copy link
Contributor

@saaramahmoudi saaramahmoudi Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think every new cost term that we add should be included in delta_c and pass it through assess_swap. This function just check that if the cost is reduced we accept the move, if increased we accept it with some probability. I can't see why we need to pass an individual cost for each term and why the function need to know about the place algorithm.

proposed_net_cost[net_id] = get_net_cost(net_id,
&ts_bb_coord_new[net_id]);
bb_delta_c += proposed_net_cost[net_id] - net_cost[net_id];
}

// bb_delta_c = get_cong_cost()-before_cost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code in the comment

@@ -1994,10 +2081,13 @@ static double get_total_cost(t_placer_costs* costs, const t_placer_opts& placer_
if (placer_opts.place_algorithm == BOUNDING_BOX_PLACE) {
// in bounding box mode we only care about wirelength
total_cost = costs->bb_cost * costs->bb_cost_norm;
} else if (placer_opts.place_algorithm == CONGESTION_AWARE_PLACE) {
// in timing mode we include both wirelength and timing costs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment needs to be updated

@@ -2077,7 +2167,8 @@ static e_move_result assess_swap(double delta_c, double t) {

float fnum = vtr::frand();
float prob_fac = std::exp(-delta_c / t);
if (prob_fac > fnum) {
float prob_fac_cong = std::exp(-cong_delta_c / t);
if (prob_fac > fnum && (prob_fac_cong>fnum || placer_opts.place_algorithm != CONGESTION_AWARE_PLACE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to calculate exp(-cong_delta_c/t) as an individual term in here. The delta_c already has that term

for(int i = 0; i < int(device_ctx.grid.width()); i++){
for(int j = 0; j < int(device_ctx.grid.height()); j++){
cong_matrix[i][j] = 0.0;
// cong_matrix_new[i][j] = 0.0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code in the comment

}
}

// cost = get_cong_cost();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code in the comment

@saaramahmoudi
Copy link
Contributor

saaramahmoudi commented Sep 1, 2023

@behnam-rs Is there any specific reason that you want to merge this into openfpga branch instead of the master? From what I saw in the code, there is nothing specific to TILEABLE feature and only the place engine changed.

@soheilshahrouz soheilshahrouz mentioned this pull request Aug 5, 2024
7 tasks
@behzadmehmood-rs
Copy link
Contributor

PR (#2672) supersedes this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants