Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-40] Add cross-billing project spend report API #3107

Merged
merged 37 commits into from
Dec 4, 2024

Conversation

calypsomatic
Copy link
Contributor

@calypsomatic calypsomatic commented Oct 31, 2024

https://broadworkbench.atlassian.net/browse/CORE-40

Adds a new spend reporting API that gets a spend report for each workspace the user has owner access to. Returns a SpendReportingResult, with each workspace broken down by category, and a summary for the total of all workspaces. Includes offset and pageSize parameters for paginating.
In the case of workspaces with non-UUID resourceIds or billing projects without billing accounts, they are simply dismissed without error and not included in the report.


PR checklist

  • Include the JIRA issue number in the PR description and title
  • Make sure Swagger is updated if API changes
    • ...and Orchestration's Swagger too!
  • If you changed anything in model/, then you should publish a new official rawls-model and update rawls-model in Orchestration's dependencies.
  • Get two thumbsworth of PR review
  • Verify all tests go green, including CI tests
  • Squash commits and merge to develop (branches are automatically deleted after merging)
  • Inform other teams of any substantial changes via Slack and/or email

Copy link
Contributor

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial feedback!


pathPrefix(Segment) { projectId =>
pathEnd {
pathPrefix("spendReport") {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file must've been hit by scalafmt for some reason; adding in this new spendReport route is all I did

@@ -417,6 +405,19 @@ trait ApiServiceSpec
samDAO
)

val spendReportingBigQueryService = bigQueryServiceFactory.getServiceFromJson("json", GoogleProject("test-project"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moved after the creation of workspaceServiceConstructor so I can pass it in as a parameter

Comment on lines 165 to 195
List(
SpendReportingForDateRange(
totalCost,
"0", // Ignoring credits for now; do we want to include them?
currency,
Option(start),
Option(end),
workspace = Some(workspaceName),
googleProjectId = Some(GoogleProject(projectId.value))
),
SpendReportingForDateRange(
computeCost,
"0",
currency,
Option(start),
Option(end),
workspace = Some(workspaceName),
googleProjectId = Some(GoogleProject(projectId.value)),
category = Some(TerraSpendCategories.Compute)
),
SpendReportingForDateRange(
storageCost,
"0",
currency,
Option(start),
Option(end),
workspace = Some(workspaceName),
googleProjectId = Some(GoogleProject(projectId.value)),
category = Some(TerraSpendCategories.Storage)
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to clean this up a bit, like:

val baseSpend = SpendReportingForDateRange(
        "0",
        "0", // Ignoring credits for now; do we want to include them?
        currency,
        Option(start),
        Option(end),
        workspace = Some(workspaceName),
        googleProjectId = Some(GoogleProject(projectId.value))
      )

      // Each row gets a summary of compute, storage, and total
      List(
        baseSpend.copy(cost = totalCost),
        baseSpend.copy(cost = computeCost, category = Some(TerraSpendCategories.Compute)),
        baseSpend.copy(cost = storageCost, category = Some(TerraSpendCategories.Storage)),
      )

action: SamResourceAction,
ctx: RawlsRequestContext
): Future[Seq[SamUserResource]] =
resourceTypeName match {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content of this mock method is mostly copied from listUserResources, with just the action included. Not sure if we need anything better

@calypsomatic calypsomatic changed the title add getownerworkspaces [CORE-40] Add cross-billing project spend report API Nov 19, 2024
val callback = new SamApiCallback[ListResourcesV2200Response]("listResourcesV2")

resourcesApi(ctx).listResourcesV2Async(
/* format = */ "hierarchical",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* format = */ "hierarchical",
/* format = */ "flat",

hierarchical will not return actions that are included in roles

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, with hierarchical I get a policies object like so:

     "policies": [
        {
          "actions": [],
          "inherited": false,
          "isPublic": false,
          "policy": "owner",
          "roles": [
            {
              "actions": [
                "read_spend_report"
              ],
              "role": "owner"
            }
          ]
        }
      ]

but with flat it's just:

      "policies": [
        {
          "inherited": false,
          "isPublic": false,
          "policy": "owner"
        }
      ]

Looks like I want hierarchical?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, this does not seem right, I will look at what sam is doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a full resource in the response for flat looks like

{
      "actions": [
        "read_spend_report"
      ],
      "authDomainGroups": [],
      "missingAuthDomainGroups": [],
      "policies": [
        {
          "inherited": false,
          "isPublic": false,
          "policy": "owner"
        }
      ],
      "resourceId": "eaacdc66-9ac5-4dd9-aeae-1a2c98f1eda8",
      "resourceType": "spend-profile",
      "roles": [
        "owner"
      ]
    }

The actions are not included in the policies. The is probably what it means to be flat.

The full resource for a hierarchical response looks like

{
      "authDomainGroups": [],
      "missingAuthDomainGroups": [],
      "policies": [
        {
          "actions": [],
          "inherited": false,
          "isPublic": false,
          "policy": "owner",
          "roles": [
            {
              "actions": [
                "read_spend_report"
              ],
              "role": "owner"
            }
          ]
        }
      ],
      "resourceId": "7900ff92-a1f0-4c3e-b051-e549e2211e32",
      "resourceType": "spend-profile"
    }

The actions are both at policy and role level. Also note there could be multiple policies that contain the action. Flat is really what you want because it hides all the details about how user has the action on the resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this further, flat vs hierarchical does not matter because nothing is used of the response other than the resource id. But hierarchical is much harder to use correctly. So I would either have this listResourcesWithActions function return only a list of resource ids or switch to flat

@@ -11,7 +11,10 @@ import org.broadinstitute.dsde.rawls.coordination.{DataSourceAccess, Uncoordinat
import org.broadinstitute.dsde.rawls.dataaccess._
import org.broadinstitute.dsde.rawls.dataaccess.slick.{TestDriverComponent, WorkflowRecord}
import org.broadinstitute.dsde.rawls.expressions.{BoundOutputExpression, OutputExpression}
import org.broadinstitute.dsde.rawls.jobexec.SubmissionMonitorActor.{ExecutionServiceStatusResponse, StatusCheckComplete}
import org.broadinstitute.dsde.rawls.jobexec.SubmissionMonitorActor.{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't make any changes to this file, it just somehow got mangled by scalafmt

@calypsomatic calypsomatic marked this pull request as ready for review November 25, 2024 20:47
@calypsomatic calypsomatic requested a review from a team as a code owner November 25, 2024 20:47
@calypsomatic calypsomatic requested review from dvoet, kevinmarete and davidangb and removed request for a team November 25, 2024 20:47
Copy link
Contributor

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts inline!

getRoundedNumericValue("credits").toString,
currencyCode.toString,
category = Option(TerraSpendCategories.Compute)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these SpendReportingForDateRange objects include the same value for getRoundedNumericValue("credits"). Is this triple-counting credits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On closer inspection, I think it shows the same credits three times, one for each category, but only counts it once in the total. Still less than ideal

val summary = SpendReportingForDateRange(
total.toString,
total_credits.toString,
currency.toString, // TODO: what to do about combined summary for currencies?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same question! Maybe for this version of the report we need to throw an error if we find multiple currencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it shouldn't throw an error - for a single billing project it should be a single currency, but since this covers multiple billing projects it seems plausible that there might be different currencies. So I think ideally we would either not have a summary, or have a summary per currency? Not really sure what the best thing to do here is but I don't think it should be error throwing. We could potentially do that for a first pass and deal with it later on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support throwing an error for now and in the future, we can choose provided multiple aggregated reports for each currency [{UsdAggregate}, {EuroAggregate},...etc}]

| project.id AS project_id,
| project.name AS project_name,
| currency,
| SUM(IFNULL((SELECT SUM(c.amount) FROM UNNEST(credits) c), 0)) as credits,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do credits also have a description like "Cloud Storage" or "Compute Engine" we could use to categorize them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know and I don't have the permissions to figure it out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see a type column defined for the credits record. If this CASE … WHEN doesn't work out, we could look at the type column. I think we'll really just need to test this empirically and see what results it gives.

Comment on lines 570 to 575
query = getAllUserWorkspaceQuery(billingMap, pageSize, offset)
queryJob = setUpAllUserWorkspaceQuery(query, start, end)

job: Job <- bigQueryService.use(_.runJob(queryJob)).unsafeToFuture().map(_.waitFor())
_ = logSpendQueryStats(job.getStatistics[JobStatistics.QueryStatistics])
result = job.getQueryResults()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be a great place to add tracing: let's get timing and visibility into how the BigQuery query is performing. Might be easiest to pull this logic out into a separate, traced, method

complete {
spendReportingConstructor(ctx).getSpendForAllWorkspaces(
startDate,
endDate.plusDays(1).minusMillis(1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy!

Copy link
Contributor

@kevinmarete kevinmarete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a Scala expert, but I have added a few optional comments for consideration. I tested the API locally, and everything seems to work well. Nice job!

| SUM(CASE WHEN spend_category = 'Compute' THEN category_cost ELSE 0 END) AS compute_cost,
| SUM(CASE WHEN spend_category = 'Other' THEN category_cost ELSE 0 END) AS other_cost,
| currency,
| SUM(credits) as credits
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess to David's suggestion above we can split the credits by category similar to the category_costs

Suggested change
| SUM(credits) as credits
| SUM(CASE WHEN spend_category = 'Storage' THEN credits ELSE 0 END) AS storage_credits,
| SUM(CASE WHEN spend_category = 'Compute' THEN credits ELSE 0 END) AS compute_credits,
| SUM(CASE WHEN spend_category = 'Other' THEN credits ELSE 0 END) AS other_credits,

val summary = SpendReportingForDateRange(
total.toString,
total_credits.toString,
currency.toString, // TODO: what to do about combined summary for currencies?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support throwing an error for now and in the future, we can choose provided multiple aggregated reports for each currency [{UsdAggregate}, {EuroAggregate},...etc}]

Copy link
Contributor

@davidangb davidangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last question about public workspaces and a note about accuracy of credits.

Since I'll be out, I'm giving this a proactive thumb. Since it's an API, we can release it and iterate on it before actually releasing user-facing features.

/* policies = */ util.List.of(),
/* roles = */ util.List.of(),
/* actions = */ util.List.of(action.value),
/* includePublic = */ false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will the includePublic = false prevent seeing spend reports for public workspaces, even if the calling user is an owner? As in, will our support/comms team be unable to see spend for our public workspaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll change it to true.

| project.id AS project_id,
| project.name AS project_name,
| currency,
| SUM(IFNULL((SELECT SUM(c.amount) FROM UNNEST(credits) c), 0)) as credits,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see a type column defined for the credits record. If this CASE … WHEN doesn't work out, we could look at the type column. I think we'll really just need to test this empirically and see what results it gives.

@calypsomatic calypsomatic merged commit 6fafd0c into develop Dec 4, 2024
29 checks passed
@calypsomatic calypsomatic deleted the core-40-spend branch December 4, 2024 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants