Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection replicas table not getting cleaned up #471

Open
dynamic-entropy opened this issue Apr 18, 2023 · 8 comments · May be fixed by dmwm/rucio-flux#260
Open

Collection replicas table not getting cleaned up #471

dynamic-entropy opened this issue Apr 18, 2023 · 8 comments · May be fixed by dmwm/rucio-flux#260
Assignees

Comments

@dynamic-entropy
Copy link
Contributor

The following rse has 0 file replicas but still shows two empty dataset replicas.

rucio list-datasets-rse T0_CH_CERN_Tape_Test --long
+---------------------------------------------------------------------------------------------------------+---------------------------+---------------------------+
| DID                                                                                                     | LOCAL FILES/TOTAL FILES   | LOCAL BYTES/TOTAL BYTES   |
|---------------------------------------------------------------------------------------------------------+---------------------------+---------------------------|
| cms:/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM#3517e1b6-76e3-11e7-a0c8-02163e00d7b3 | 0/43                      | 0/108723314200            |
| cms:/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM#35197562-76e3-11e7-a0c8-02163e00d7b3 | 0/3                       | 0/4801598954              |
+---------------------------------------------------------------------------------------------------------+---------------------------+---------------------------+
@dynamic-entropy
Copy link
Contributor Author

dynamic-entropy commented Apr 18, 2023

What is the mode of operation for the table:

  1. Is it append only? Replicas are only marked unavailable but not deleted - in that case there is an issue with the API.
  2. If deleted replicas are indeed supposed to be cleaned up (rows removed) - At the moment we have 9201779 rows with available_bytes and available_replicas as 0.

@dynamic-entropy
Copy link
Contributor Author

@amanrique1

@dynamic-entropy
Copy link
Contributor Author

Hi @yuyiguo
This is the case with tables in both prod and int instances. Do you think the reasons could be similar to usage counter inconsistencies? #213

@yuyiguo
Copy link
Member

yuyiguo commented Jun 30, 2023

I am not sure I understand your question. Please write the problem and how to reproduce it. @dynamic-entropy

@dynamic-entropy
Copy link
Contributor Author

Hello Yuyi

As mentioned in the description. The collection_replicas table has two rows in it.
You can query that using both the CLI (command in the description) or query the table.

However, the replicas table does not. You can query using:
select * from cms_rucio_prod.collection_replicas where rse_id = (select id from cms_rucio_prod.rses where rse='T0_CH_CERN_Tape_Test');

@yuyiguo
Copy link
Member

yuyiguo commented Jun 30, 2023

The datasets have no availability for this RSE/T0_CH_CERN_Tape_Test. What is your point? Did you get inconsistent results? If so, can you show them?

@dynamic-entropy
Copy link
Contributor Author

Yes, this issue stems from the assumption that

"unavailable" space in rucio - is data that is not at the RSE but expected to be.

Please, see #212

So, lets check the rse-usage for T0_CH_CERN_Tape_Test:

➜  ~ rucio list-rse-usage T0_CH_CERN_Tape_Test
USAGE:
------
  rse_id: 14366bf923854ad39ebad6d6b19644c4
  source: expired
  used: 0.000 B
  updated_at: 2023-07-01 05:10:17
  rse: T0_CH_CERN_Tape_Test
------
  rse_id: 14366bf923854ad39ebad6d6b19644c4
  source: obsolete
  used: 0.000 B
  updated_at: 2023-07-01 05:11:00
  rse: T0_CH_CERN_Tape_Test
------
  rse_id: 14366bf923854ad39ebad6d6b19644c4
  source: rucio
  used: 0.000 B
  files: 0
  updated_at: 2023-04-13 22:28:08
  rse: T0_CH_CERN_Tape_Test
------
  rse_id: 14366bf923854ad39ebad6d6b19644c4
  source: unavailable  👈
  used: 113.525 GB 👈
  updated_at: 2022-09-12 18:00:01
  rse: T0_CH_CERN_Tape_Test
------

You will see that the unavailable space corresponds to the sum of the two replicas in the CollectionReplicas table, as shown by

rucio list-datasets-rse T0_CH_CERN_Tape_Test --long
+---------------------------------------------------------------------------------------------------------+---------------------------+---------------------------+
| DID                                                                                                     | LOCAL FILES/TOTAL FILES   | LOCAL BYTES/TOTAL BYTES   |
|---------------------------------------------------------------------------------------------------------+---------------------------+---------------------------|
| cms:/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM#3517e1b6-76e3-11e7-a0c8-02163e00d7b3 | 0/43                      | 0/108723314200            |
| cms:/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM#35197562-76e3-11e7-a0c8-02163e00d7b3 | 0/3                       | 0/4801598954              |
+---------------------------------------------------------------------------------------------------------+---------------------------+---------------------------+

or From the table

select * from cms_rucio_prod.collection_replicas where rse_id = (select id from cms_rucio_prod.rses where rse='T0_CH_CERN_Tape_Test');

However, there are no pending transfers to the T0_CH_CERN_Tape_Test.

  1. There is no 'unavailable' replica in the Replicas table.
select * from cms_rucio_prod.replicas where rse_id = (select id from cms_rucio_prod.rses where rse='T0_CH_CERN_Tape_Test');
  1. There are no 'unavailable' (or any state) locks in the Locks table.
select * from cms_rucio_prod.locks where rse_id = (select id from cms_rucio_prod.rses where rse='T0_CH_CERN_Tape_Test');
  1. There are no 'unavailable' locks in the DatasetLocks table.
select * from cms_rucio_prod.dataset_locks where rse_id = (select id from cms_rucio_prod.rses where rse='T0_CH_CERN_Tape_Test');

Let me know if you have more questions.

@dynamic-entropy dynamic-entropy closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2024
@dynamic-entropy
Copy link
Contributor Author

Hello @ericvaandering
While looking into the source of those 12 files. I realised we are not running the abacus-collection-replica https://rucio.cern.ch/documentation/bin/rucio-abacus-collection-replica

And this is the reason why out collection-replicas tables are not cleaned up ever.
Don't know if it will take the backlog with it but it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants