Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-40] Add cross-billing project spend report API #3107

Merged
merged 37 commits into from
Dec 4, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
c0ef330
add getownerworkspaces
calypsomatic Oct 31, 2024
1a396b9
update getOwnerWorkspaces, start to add getBilling
calypsomatic Nov 1, 2024
3a4dc82
interim getBillingForWorkspaces
calypsomatic Nov 1, 2024
56cf2bc
update getbillingforworkspaces to get what we need
calypsomatic Nov 4, 2024
cc2c3bd
update getbilling and query
calypsomatic Nov 6, 2024
2adf61b
remove unnecessary changes
calypsomatic Nov 6, 2024
775616e
update getbillingworkspaces to be one sql query
calypsomatic Nov 7, 2024
414207c
start putting it all together
calypsomatic Nov 7, 2024
c5c736a
add new spendreport api
calypsomatic Nov 7, 2024
81f843b
query roughly working
calypsomatic Nov 8, 2024
0895f80
Merge branch 'develop' into core-40-spend
calypsomatic Nov 14, 2024
a023bbb
update return value of cross billing spend report
calypsomatic Nov 18, 2024
0c6915b
update how to extract spend report results
calypsomatic Nov 19, 2024
9a8694b
add test skeleton
calypsomatic Nov 19, 2024
554256b
get workspaces/bps from sam instead
calypsomatic Nov 19, 2024
9085110
Merge branch 'develop' into core-40-spend
calypsomatic Nov 19, 2024
8f52a84
some cleanup
calypsomatic Nov 19, 2024
b5fa11f
Merge branch 'develop' into core-40-spend
calypsomatic Nov 21, 2024
9fd6bc0
fix unit tests
calypsomatic Nov 21, 2024
9ad4c7e
move logic around and expand test
calypsomatic Nov 21, 2024
b07bbb1
compare spend report permission to workspace ownership
calypsomatic Nov 21, 2024
a0e9897
check own action on workspaces
calypsomatic Nov 22, 2024
9da40c0
fix rebase
calypsomatic Nov 22, 2024
6a03274
get workspaces by spend report action
calypsomatic Nov 22, 2024
2200b62
Merge branch 'develop' into core-40-spend
calypsomatic Nov 22, 2024
e1e31a1
add pagination
calypsomatic Nov 22, 2024
e4a13db
include credits and deal with edge cases
calypsomatic Nov 25, 2024
85ad0f2
start to add trace
calypsomatic Nov 25, 2024
c6d966f
Merge branch 'develop' into core-40-spend
calypsomatic Nov 25, 2024
0c4cbe1
scalafmt
calypsomatic Nov 25, 2024
7d9f379
pr comments round 1
calypsomatic Nov 26, 2024
20dfd8c
pr comments round 2
calypsomatic Nov 26, 2024
cdc0b53
Merge branch 'develop' into core-40-spend
calypsomatic Nov 27, 2024
a6b900f
remove extraneous todo
calypsomatic Nov 27, 2024
e079f56
include public workspaces
calypsomatic Dec 4, 2024
780974f
hierarchical to flat
calypsomatic Dec 4, 2024
9dcfd0f
whoops
calypsomatic Dec 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions core/src/main/resources/swagger/api-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,55 @@ paths:
$ref: '#/components/schemas/ErrorReport'
500:
$ref: '#/components/responses/RawlsInternalError'
/api/billing/v2/spendReport:
get:
tags:
- billing_v2
summary: get spend report for all workspaces user has owner access to
description: get spend report for all workspaces user has owner access to
operationId: getSpendReportAllWorkspaces
parameters:
- name: startDate
in: query
description: start date of report (YYYY-MM-DD). Data included in report will start at 12 AM UTC on this date.
required: true
schema:
type: string
format: date
- name: endDate
in: query
description: end date of report (YYYY-MM-DD). Data included in report will end at 11:59 PM UTC on this date.
required: true
schema:
type: string
format: date
responses:
200:
description: Success
content:
'application/json':
schema:
$ref: '#/components/schemas/SpendReport'
400:
description: invalid spend report parameters
content:
'application/json':
schema:
$ref: '#/components/schemas/ErrorReport'
403:
description: You must be a project owner to view the spend report of a project
content:
'application/json':
schema:
$ref: '#/components/schemas/ErrorReport'
404:
description: The specified billing project could not be found
content:
'application/json':
schema:
$ref: '#/components/schemas/ErrorReport'
500:
$ref: '#/components/responses/RawlsInternalError'
/api/billing/v2/{projectId}/spendReportConfiguration:
get:
tags:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -515,7 +515,8 @@ object Boot extends IOApp with LazyLogging {
billingRepository,
billingProfileManagerDAO,
samDAO,
spendReportingServiceConfig
spendReportingServiceConfig,
workspaceServiceConstructor
)

val billingAdminServiceConstructor: RawlsRequestContext => BillingAdminService =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,16 @@ trait RawlsBillingProjectComponent {
.result
.map(_.headOption.map(RawlsBillingProjectRecord.toBillingProjectSpendExport))

def getBillingProjectsSpendConfiguration(
billingProjectNames: Seq[RawlsBillingProjectName]
): ReadAction[Seq[Option[BillingProjectSpendExport]]] =
rawlsBillingProjectQuery
.withProjectNames(billingProjectNames)
.result
.map(projectRecords =>
projectRecords.map(record => Some(RawlsBillingProjectRecord.toBillingProjectSpendExport(record)))
)

calypsomatic marked this conversation as resolved.
Show resolved Hide resolved
def insertOperations(operations: Seq[RawlsBillingProjectOperationRecord]): WriteAction[Unit] =
(rawlsBillingProjectOperationQuery ++= operations).map(_ => ())

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,19 @@ import org.broadinstitute.dsde.rawls.dataaccess.{SamDAO, SlickDataSource}
import org.broadinstitute.dsde.rawls.metrics.{GoogleInstrumented, HitRatioGauge, RawlsInstrumented}
import org.broadinstitute.dsde.rawls.model.{SpendReportingAggregationKeyWithSub, _}
import org.broadinstitute.dsde.rawls.spendreporting.SpendReportingService._
import org.broadinstitute.dsde.rawls.workspace.WorkspaceService
import org.broadinstitute.dsde.workbench.google2.GoogleBigQueryService
import org.broadinstitute.dsde.rawls.{RawlsException, RawlsExceptionWithErrorReport}
import org.broadinstitute.dsde.workbench.model.google.GoogleProject
import org.joda.time.format.ISODateTimeFormat
import org.joda.time.{DateTime, Days}
import spray.json.JsArray
import org.broadinstitute.dsde.rawls.model.WorkspaceJsonSupport.WorkspaceListResponseFormat

import scala.concurrent.{ExecutionContext, Future}
import scala.jdk.CollectionConverters._
import scala.math.BigDecimal.RoundingMode
import org.broadinstitute.dsde.rawls.model.WorkspaceAccessLevels.Owner

object SpendReportingService {
def constructor(
Expand All @@ -34,15 +38,18 @@ object SpendReportingService {
billingRepository: BillingRepository,
bpmDao: BillingProfileManagerDAO,
samDAO: SamDAO,
spendReportingServiceConfig: SpendReportingServiceConfig
spendReportingServiceConfig: SpendReportingServiceConfig,
workspaceServiceConstructor: RawlsRequestContext => WorkspaceService
)(ctx: RawlsRequestContext)(implicit executionContext: ExecutionContext): SpendReportingService =
new SpendReportingService(ctx,
dataSource,
bigQueryService,
billingRepository: BillingRepository,
bpmDao,
samDAO,
spendReportingServiceConfig
new SpendReportingService(
ctx,
dataSource,
bigQueryService,
billingRepository: BillingRepository,
bpmDao,
samDAO,
spendReportingServiceConfig,
workspaceServiceConstructor
)

val SpendReportingMetrics = "spendReporting"
Expand Down Expand Up @@ -133,6 +140,65 @@ object SpendReportingService {
SpendReportingResults(aggregations.map(aggregateRows(allRows, _)).toList, summary)
}

def extractCrossBillingProjectSpendReportingResults(
allRows: List[FieldValueList],
start: DateTime,
end: DateTime,
names: Map[GoogleProjectId, WorkspaceName]
): List[SpendReportingResults] = {

val spendDetails = allRows.flatMap { row =>
val projectId = GoogleProjectId(row.get("project_id").getStringValue)
val workspaceName = names.getOrElse(
projectId,
throw RawlsExceptionWithErrorReport(
StatusCodes.InternalServerError,
s"unexpected project $projectId returned by BigQuery"
)
)
val totalCost = row.get("total_cost").getDoubleValue.toString
val computeCost = row.get("compute_cost").getDoubleValue.toString
val storageCost = row.get("storage_cost").getDoubleValue.toString
val currency = row.get("currency").getStringValue

// Each row gets a summary of compute, storage, and total
List(
SpendReportingForDateRange(
totalCost,
"0", // Ignoring credits for now; do we want to include them?
currency,
Option(start),
Option(end),
workspace = Some(workspaceName),
googleProjectId = Some(GoogleProject(projectId.value))
),
SpendReportingForDateRange(
computeCost,
"0",
currency,
Option(start),
Option(end),
workspace = Some(workspaceName),
googleProjectId = Some(GoogleProject(projectId.value)),
category = Some(TerraSpendCategories.Compute)
),
SpendReportingForDateRange(
storageCost,
"0",
currency,
Option(start),
Option(end),
workspace = Some(workspaceName),
googleProjectId = Some(GoogleProject(projectId.value)),
category = Some(TerraSpendCategories.Storage)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these SpendReportingForDateRange objects include the same value for getRoundedNumericValue("credits"). Is this triple-counting credits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On closer inspection, I think it shows the same credits three times, one for each category, but only counts it once in the total. Still less than ideal

)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might be able to clean this up a bit, like:

val baseSpend = SpendReportingForDateRange(
        "0",
        "0", // Ignoring credits for now; do we want to include them?
        currency,
        Option(start),
        Option(end),
        workspace = Some(workspaceName),
        googleProjectId = Some(GoogleProject(projectId.value))
      )

      // Each row gets a summary of compute, storage, and total
      List(
        baseSpend.copy(cost = totalCost),
        baseSpend.copy(cost = computeCost, category = Some(TerraSpendCategories.Compute)),
        baseSpend.copy(cost = storageCost, category = Some(TerraSpendCategories.Storage)),
      )

}

// TODO what format should the result be? Do we need a new model?
spendDetails.map(details => SpendReportingResults(List.empty, details))
}

}

class SpendReportingService(
Expand All @@ -142,7 +208,8 @@ class SpendReportingService(
billingRepository: BillingRepository,
bpmDao: BillingProfileManagerDAO,
samDAO: SamDAO,
spendReportingServiceConfig: SpendReportingServiceConfig
spendReportingServiceConfig: SpendReportingServiceConfig,
workspaceServiceConstructor: RawlsRequestContext => WorkspaceService
)(implicit val executionContext: ExecutionContext)
extends LazyLogging
with RawlsInstrumented {
Expand Down Expand Up @@ -198,6 +265,20 @@ class SpendReportingService(
)
}

def getSpendExportConfigurations(projects: Seq[RawlsBillingProjectName]): Future[Seq[BillingProjectSpendExport]] =
dataSource
.inTransaction(_.rawlsBillingProjectQuery.getBillingProjectsSpendConfiguration(projects))
.recover { case _: RawlsException =>
throw RawlsExceptionWithErrorReport(
StatusCodes.BadRequest,
s"billing account not found on billing project" // TODO: identify problem
)
}
.map { exportOptions =>
exportOptions.collect { case Some(export) => export }

}

def getWorkspaceGoogleProjects(projectName: RawlsBillingProjectName): Future[Map[GoogleProjectId, WorkspaceName]] =
dataSource.inTransaction(_.workspaceQuery.listWithBillingProject(projectName)).map {
_.collect {
Expand All @@ -222,11 +303,8 @@ class SpendReportingService(
// all of which have optional subAggregationKeys and convert to Set[SpendReportingAggregationKey]
val queryKeys = aggregations.flatMap(a => Set(Option(a.key), a.subAggregationKey).flatten)
val tableName = config.spendExportTable.getOrElse(spendReportingServiceConfig.defaultTableName)
val timePartitionColumn: String = {
val isBroadTable = tableName == spendReportingServiceConfig.defaultTableName
// The Broad table uses a view with a different column name.
if (isBroadTable) spendReportingServiceConfig.defaultTimePartitionColumn else "_PARTITIONTIME"
}
val timePartitionColumn: String = getTimePartitionColumn(tableName)

s"""
| SELECT
| SUM(cost) as cost,
Expand All @@ -237,8 +315,70 @@ class SpendReportingService(
| AND $timePartitionColumn BETWEEN @startDate AND @endDate
| AND project.id in UNNEST(@projects)
| GROUP BY currency ${queryKeys.map(_.bigQueryGroupByClause()).mkString}
|""".stripMargin
.replace("REPLACE_TIME_PARTITION_COLUMN", timePartitionColumn)
|""".stripMargin.replace("REPLACE_TIME_PARTITION_COLUMN", timePartitionColumn)
}

private def getTimePartitionColumn(tableName: String): String = {
val isBroadTable = tableName == spendReportingServiceConfig.defaultTableName
// The Broad table uses a view with a different column name.
if (isBroadTable) spendReportingServiceConfig.defaultTimePartitionColumn else "_PARTITIONTIME"
}

def getAllUserWorkspaceQuery(billingProjects: Map[BillingProjectSpendExport, Seq[GoogleProjectId]]): String = {
val baseQuery = s"""
| SELECT
| project.id AS project_id,
| project.name AS project_name,
| currency,
| CASE
| WHEN service.description IN ('Cloud Storage') THEN 'Storage'
| WHEN service.description IN ('Compute Engine', 'Google Kubernetes Engine') THEN 'Compute'
| ELSE 'Other'
| END AS spend_category,
| SUM(CAST(cost AS FLOAT64)) AS category_cost
| FROM
| _BILLING_ACCOUNT_TABLE
| where
| project.id in _PROJECT_ID_LIST AND
| _PARTITIONTIME BETWEEN @startDate AND @endDate
| GROUP BY
| project_id,
| project_name,
| currency,
| spend_category""".stripMargin.trim

val bpSubQuery = billingProjects
.map { bp =>
val tableName = bp._1.spendExportTable.getOrElse(spendReportingServiceConfig.defaultTableName)
val timePartitionColumn: String = getTimePartitionColumn(tableName)
baseQuery
.replace("_PARTITIONTIME", timePartitionColumn)
.replace("_BILLING_ACCOUNT_TABLE", tableName)
.replace("_PROJECT_ID_LIST", "(" + bp._2.map(id => s""""${id.value}"""").mkString(", ") + ")")
}
.mkString("\nUNION ALL\n")

s"""WITH spend_categories AS (
|$bpSubQuery
|)
|SELECT
| project_id,
| project_name,
| SUM(category_cost) AS total_cost,
| SUM(CASE WHEN spend_category = 'Storage' THEN category_cost ELSE 0 END) AS storage_cost,
| SUM(CASE WHEN spend_category = 'Compute' THEN category_cost ELSE 0 END) AS compute_cost,
| SUM(CASE WHEN spend_category = 'Other' THEN category_cost ELSE 0 END) AS other_cost,
| currency
|FROM
| spend_categories
|GROUP BY
| project_id,
| project_name,
| currency
|ORDER BY
| total_cost DESC
|limit 5
|""".stripMargin.trim
}

def setUpQuery(
Expand Down Expand Up @@ -270,6 +410,23 @@ class SpendReportingService(
JobInfo.newBuilder(queryConfig).build()
}

def setUpAllUserWorkspaceQuery(
query: String,
start: DateTime,
end: DateTime
): JobInfo = {
def queryParam(value: String): QueryParameterValue =
QueryParameterValue.newBuilder().setType(StandardSQLTypeName.STRING).setValue(value).build()

val queryConfig = QueryJobConfiguration
.newBuilder(query)
.addNamedParameter("startDate", queryParam(toISODateString(start)))
.addNamedParameter("endDate", queryParam(toISODateString(end)))
.build()

JobInfo.newBuilder(queryConfig).build()
}

def logSpendQueryStats(stats: JobStatistics.QueryStatistics): Unit = {
if (stats.getCacheHit) cacheHitRate().hit() else cacheHitRate().miss()
bytesProcessedCounter += stats.getEstimatedBytesProcessed
Expand Down Expand Up @@ -359,4 +516,58 @@ class SpendReportingService(
case ex: Exception =>
Future.failed(RawlsExceptionWithErrorReport(ErrorReport(StatusCodes.InternalServerError, ex)))
}

def getSpendForAllWorkspaces(
start: DateTime,
end: DateTime
): Future[List[SpendReportingResults]] = {
validateReportParameters(start, end)
for {
workspaces <- getOwnerWorkspaces()
projectNames = workspaces
.map(wsResp =>
wsResp.workspace.googleProject -> WorkspaceName(wsResp.workspace.namespace, wsResp.workspace.name)
)
.toMap // Map[GoogleProjectId, WorkspaceName]
calypsomatic marked this conversation as resolved.
Show resolved Hide resolved
billing <- getBillingSpendExportsForWorkspaces(workspaces)
query = getAllUserWorkspaceQuery(billing)
queryJob = setUpAllUserWorkspaceQuery(query, start, end)

job: Job <- bigQueryService.use(_.runJob(queryJob)).unsafeToFuture().map(_.waitFor())
_ = logSpendQueryStats(job.getStatistics[JobStatistics.QueryStatistics])
result = job.getQueryResults()
} yield result.getValues.asScala.toList match {
case Nil =>
throw RawlsExceptionWithErrorReport(
StatusCodes.NotFound,
s"no spend data found between dates ${toISODateString(start)} and ${toISODateString(end)}"
) // TODO update this
case rows => extractCrossBillingProjectSpendReportingResults(rows, start, end, projectNames)
}
}

def getOwnerWorkspaces(): Future[Seq[WorkspaceListResponse]] =
workspaceServiceConstructor(ctx).listWorkspaces(WorkspaceFieldSpecs(), -1) map {
case JsArray(jsArray) =>
val workspaces = jsArray.map(_.convertTo[WorkspaceListResponse])
workspaces.filter(_.accessLevel == Owner)
case _ => throw new IllegalArgumentException("Expected a JsArray")
}
calypsomatic marked this conversation as resolved.
Show resolved Hide resolved

def getBillingSpendExportsForWorkspaces(
workspaces: Seq[WorkspaceListResponse]
): Future[Map[BillingProjectSpendExport, Seq[GoogleProjectId]]] = {
val groupedWorkspaces = workspaces.groupBy(_.workspace.namespace)
val billingProjects = groupedWorkspaces.keys.map(RawlsBillingProjectName).toList
// billingProjects.map(project =>
// requireProjectAction(project, SamBillingProjectActions.readSpendReport)
// ) // TODO a better way to handle this?

getSpendExportConfigurations(billingProjects).map { exportConfigs =>
exportConfigs.map { config =>
config -> groupedWorkspaces(config.billingProjectName.value).map(ws => ws.workspace.googleProject)
}.toMap
}
}
calypsomatic marked this conversation as resolved.
Show resolved Hide resolved

}
Loading
Loading