-
Notifications
You must be signed in to change notification settings - Fork 107
ReqMgr2 MicroService Pileup
This wiki provides a description of the MSPileup service, as well as the final pileup data structure adopted and a plain english description of the logic to be implemented.
The architecture of MSPileup service follows MPV (Model, View, Controller) pattern with MongoDB as back-end database for storage. The service is based on WMCore REST framework and located in WMCore/MicroService/MSPileup area. It consists of the following modules: -MSPileupObj.py provides details of MSPileup data object
- MSPileupData.py provides data layer business logic, i.e. it creates, deletes, modifies MS pileup objects in MongoDB database
- MSPileup.py API layer which defines all MSPileup APIs
- Service/Data.py module defines all HTTP APIs (MSPileup and others used by MS micro-services)
- RestApiHub.py defines MSPileup HTTP end-point within WMCore REST server.
Below, you can find detailed description of individual layers.
MSPileup provides reach set of RESTful APIs which will allow clients to fetch, upload, modify and delete the pileup documents:
echo "HTTP GET API calls"
curl -v "http://localhost:8249/ms-pileup/data/access?pileupName=pileup"
curl -v "http://localhost:8249/ms-pileup/data/access?campaign=Campaign"
# and we can use filters to extract certain fields from the doc, e.g.
curl -v "http://localhost:8249/ms-pileup/data/access?pileupName=pileup&filters=["campaigns", "pileupType"]"
echo "HTTP POST API calls"
# first create document
curl -v -X POST -H "Content-type: application/json" -d '{"bla":1}' \
http://localhost:8249/ms-pileup/data/access
# second, query the data via provided JSON query (spec)
curl -v -X POST -H "Content-type: application/json" \
-d '{"query":{"pileupName":"bla"}}' \
http://localhost:8249/ms-pileup/data/access
# third, use query and filters
curl -v -X POST -H "Content-type: application/json" \
-d '{"query":{"pileupName": "bla"}, "filters":["campaigns"]}' \
http://localhost:8249/ms-pileup/data/access
echo "HTTP PUT API call"
curl -v -X PUT -H "Content-type: application/json" -d '{"bla":1}' \
http://localhost:8249/ms-pileup/data/access
echo "HTTP DELETE API calls"
curl -v -X DELETE -H "Content-type: application/json" -d '{"bla":1}' \
http://localhost:8249/ms-pileup/data/access
curl -v -X DELETE -H "Content-type: application/json" -d '{"pileupName":"pileup"}' \
http://localhost:8249/ms-pileup/data/access
The MSPileup data layer defines the following MSPileup data-structure:
{"pileupName": string,
"pileupType": string,
"insertTime": integer,
"lastUpdateTime": integer,
"expectedRSEs": list of strings,
"currentRSEs": list of strings,
"fullReplicas": integer,
"campaigns": list of strings,
"containerFraction": float,
"replicationGrouping": string,
"activatedOn": integer,
"deactivatedOn": integer,
"pileupSize": integer,
"rulesList": list of strings,
"active": boolean
}
where:
-
pileupName
: pileup dataset -
pileupType
: either "premix" or "classic" value. Identifying which pileup data it corresponds to. -
insertTime
: the timestamp (seconds since epoch in GMT) of when this pileup id was first defined in the microservice (by an user) - `lastUpdateTime: the timestamp (seconds since epoch in GMT) of the last modification made to this pileup structure (by an user)
-
expectedRSEs
: to be provided by someone (P&R and/or DM), ["Disk1", "Disk2"]. This list of RSEs need to be properly validated against the known RSE names and ensured that it corresponds to a Disk RSE. -
currentRSEs
: to be filled up by the micro-service itself. e.g. ["Disk3", "Disk4"], as a result of continuous data location. Once there is a lock for the pileup id in question, the RSE name should be written here. -
fullReplicas
: will eventually supersede "expectedRSEs". To be used whenever the micro-service is mature enough and automatic data placement decisions can be performed. -
active
: whether the pileup id is meant to be used by any campaign/workflow.False
means it's no longer used and that replicas should be removed from Disk. -
campaigns
: a list of campaigns -
containerFraction
: a real number that corresponds to the fraction number of blocks to be replicated in a given RSE (default value is 1.0, full container at the same RSE - only supported case at this very moment) -
replicationGrouping
: matching the grouping granularity field provided by Rucio. Allowed values would be DATASET (data placement by block) or ALL (whole container is placed under the same RSE) -
activatedOn
: seconds since epoch in GMT, a date of when this pileup last became active -
deactivatedOn
: seconds since epoch in GMT, a date of when this pileup last became deactive. -
pileupSize
: to be updated by the microservice itself with the current size of the pileup in bytes -
rulesList
: list of strings (rules) used to lock the pileup id.
These pileup objects are supposed to be stored in a central database (MongoDB?) in a single document, with a list data structure like:
[<pileup object 1>, <pileup object 2>, ..., <pileup object n>]
This section describes some architecture design choices, assumptions for the service and important information that needs to be recorded in the logs.
Some assumptions to be considered for this microservice are:
- rules will be created against a single RSE (for a single DID)
- service is supposed to be a singleton (depending on the cost-benefit, supporting multiple instances would be a bonus)
- polling cycle will very likely not be smaller than every hour
- no need to adopt database transactions (likely worst case would be to perform the same action from multiple instances, overwriting documents in MongoDB)
There are 3 main tasks to be executed by this service, all part of the same MSPileup microservice and running in the same service instance. These tasks should be executed sequentially, but their internal logic can apply concurrent processing:
- Monitoring task (listing status of rule ids and persisting it in the database)
- Inactive pileup task (clean up of rule ids that belong to inactive pileup objects)
- Active pileup task (rule creation or deletion for active pileup objects
Log records should be created whenever appropriate, including a short summary by the end of each polling cycle. Nonetheless, here is a non-exhaustive list of critical logs to have:
- start and end of each of the major activities (monitoring, active, inactive); and time spent.
- rule creation (containing the DID, RSE and rule id)
- rule deletion (containing the DID and rule id)
- rule completion (either fully satisfied or partial)
- if rule is not completed (! OK), then it would be useful to print its state as well
This task is supposed to iterate over all the MongoDB documents, fetch the current state of each rule ID and persist this information back in MongoDB. A short algorithm for it can be described as follows:
- Read pileup document from MongoDB with filter
active=true
- For each rule id in
rulesList
:
- query Rucio for that rule id and fetch its state (e.g.:
afd122143kjmdskj
) - if state=OK, log that the rule has been satisfied and add that RSE to the
currentRSEs
(unique) - otherwise, calculate the rule completion based on the 3 locks_* field and remove that RSE from the
currentRSEs
(if existent)
- now that all the known rules have been inspected, persist the up-to-date pileup doc in MongoDB
This task is supposed to look at pileup documents that have been set to inactive. The main goal here is to ensure that there are no Rucio rules left in the system (of course, for the relevant DID and the Rucio account adopted by our microservice). Pileup documents that are updated as a result of this logic should have their data persisted back in MongoDB. A short algorithm for it can be described as follows:
- Read pileup document from MongoDB with filter
active=false
- for each DID and Rucio account, get a list of all the existent rules
- make a Rucio call to delete that rule id, then:
- remove the rule id from
rulesList
(if any) and remove the RSE name fromcurrentRSEs
(if any)
- remove the rule id from
- make a log record if the DID + Rucio account tuple does not have any existent rules
- and set
rulesList
andcurrentRSEs
to an empty list
- once all the relevant rules have been removed, persist an up-to-date version of the pileup data structure in MongoDB
This task is supposed to look at pileup documents active in the system. Its main goal is to ensure that the pileup DID has all the requested rules (and nothing beyond them), according to the pileup object configuration. Pileup documents that are updated as a result of this logic should have their data persisted back in MongoDB.
There are two possible candidates for this implementation, as listed below.
CANDIDATE 1:
- Read pileup document from MongoDB with filter
active=true
- if
expectedRSEs
is different thancurrentRSEs
, then further data placement is required (it's possible that data removal is required!) - for each
expectedRSEs
not incurrentRSEs
:
- check if there is an ongoing Rucio rule for that DID + Rucio account + RSE (we might want to remove RSE from this call?)
- if there is a Rucio rule, then ensure that it's listed under
rulesList
, otherwise add it - if there is none, then a new rule needs to be created. First, check whether the RSE has enough space available for that
- if it does, create the rule and append the rule id to
rulesList
- otherwise, make a log record saying that there is not enough space
- if it does, create the rule and append the rule id to
- once all the relevant rules have been created, persist an up-to-date version of the pileup data structure in MongoDB
CANDIDATE 2:
- Read pileup document from MongoDB with filter
active=true
- for each DID and Rucio account, get a list of all the existent rules
- if the rule RSE name is in
expectedRSEs
, then add the rule id inrulesList
(keeping uniqueness) - elif the rule RSE name is in
currentRSEs
, then (this rule should no longer exist):- make a Rucio call to delete this rule, remove the RSE from
currentRSEs
and remove the rule id fromrulesList
(if any)
- make a Rucio call to delete this rule, remove the RSE from
- else - thus RSE not in
expectedRSEs
nor incurrentRSEs
- it means that the rule has been created by someone else, delete it!- make a Rucio call to delete this rule and delete the rule id from
rulesList
(if any)
- make a Rucio call to delete this rule and delete the rule id from
- for each RSE in
expectedRSEs
that does not have an ongoing rule and/or that is not listed undercurrentRSEs
- create a new rule for the DID + Rucio account + RSE and add the rule id to the
rulesList
list
- once rules have been removed or created, persist an up-to-date version of the pileup data structure in MongoDB
These candidates need to be further discussed and a decision made.
NOTE: this sub-section can be removed once we know these answers.
Q.1: what is the definition of currentRSEs
?
a) RSEs that have a Rucio rule (unsatisfied rucio rule)
b) RSEs that have a complete copy of the data (a satisfied rucio rule) <----- Alan's preference!
Q.2.: what happens if a given rule id is deleted now and we try to delete it in an hour from now?
a) does the rule deletion fails the second time?
b) or does it sets a new expiration time for 2h from now?
NOTE: this sub-section can be removed once we know these answers.
Rucio wrapper client: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Services/Rucio/Rucio.py#L645 Rucio pycurl based wrapper: https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/MicroService/Tools/PycurlRucio.py