MIgration failures #605

belforte · 2019-05-13T19:54:55Z

Sorry to add more problems.
There have been a few, persistent, failures to migrate datasets from Global to Phys03. While it makes surely sense that migration can fail when Global can't be read, I'd like to see what exactly went wrong, if nothing else to know when to try again.

I lookeed at DBSmigration logs via vocms055, but they have no useful information, only a series of time stamps. Even when grepping for a given (failed) migration request id, I found only one line liting the Id, but no detail.

Is there maybe a verbosity level that could be changed, temporarely ?

examples of Dataset which failed to migrate:

/NonPrD0_pT-1p2_y-2p4_pp_13TeV_pythia8/RunIILowPUAutumn18DR-102X_upgrade2018_realistic_v15-v1/AODSIM

/DYJetsToLL_M-50_TuneCUETHS1_13TeV-madgraphMLM-herwigpp/RunIISummer16MiniAODv3-PUMoriond17_94X_mcRun2_asymptotic_v3-v2/MINIAODSIM

belforte · 2019-05-13T19:57:30Z

this refers to these two user problem reports, there may be more:
https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/4856.html
https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/4859.html

belforte · 2019-05-13T20:03:41Z

I was too optimistic.
Migrations keep failing even now that DBS is working, how can we debug/solve this ?
Example (I submitted this half an hour ago or so):

{'migration_status': 9, 'create_by': '[email protected]', 'migration_url': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader', 'last_modified_by': '[email protected]', 'creation_date': 1557774029, 'retry_count': 3, 'migration_input': '/NonPrD0_pT-1p2_y-2p4_pp_13TeV_pythia8/RunIILowPUAutumn18DR-102X_upgrade2018_realistic_v15-v1/AODSIM#4a4e2ab5-1acc-4fae-9b5a-83c85ae7ffe4', 'migration_request_id': 2738818, 'last_modification_date': 1557775662}

belforte · 2019-05-13T20:07:43Z

and this is all that the logs which I can find have to say about it:

belforte@vocms055/srv-logs> grep 2738818 */dbsmigration/dbsmigration-20190513.log
vocms0136/dbsmigration/dbsmigration-20190513.log:--------------------getResource--  Mon May 13 19:13:38 2019 Migration request ID: 2738818
vocms0163/dbsmigration/dbsmigration-20190513.log:--------------------getResource--  Mon May 13 19:00:33 2019 Migration request ID: 2738818
vocms0163/dbsmigration/dbsmigration-20190513.log:--------------------getResource--  Mon May 13 19:06:35 2019 Migration request ID: 2738818
vocms0165/dbsmigration/dbsmigration-20190513.log:--------------------getResource--  Mon May 13 19:22:41 2019 Migration request ID: 2738818

yuyiguo · 2019-05-13T20:08:43Z

Let me check. From: Stefano Belforte <[email protected]> Reply-To: dmwm/DBS <[email protected]> Date: Monday, May 13, 2019 at 3:07 PM To: dmwm/DBS <[email protected]> Cc: Subscribed <[email protected]> Subject: Re: [dmwm/DBS] MIgration failures (#605) and this is all that the logs which I can find have to say about it: belforte@vocms055/srv-logs> grep 2738818 */dbsmigration/dbsmigration-20190513.log vocms0136/dbsmigration/dbsmigration-20190513.log:--------------------getResource-- Mon May 13 19:13:38 2019 Migration request ID: 2738818 vocms0163/dbsmigration/dbsmigration-20190513.log:--------------------getResource-- Mon May 13 19:00:33 2019 Migration request ID: 2738818 vocms0163/dbsmigration/dbsmigration-20190513.log:--------------------getResource-- Mon May 13 19:06:35 2019 Migration request ID: 2738818 vocms0165/dbsmigration/dbsmigration-20190513.log:--------------------getResource-- Mon May 13 19:22:41 2019 Migration request ID: 2738818 — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_DBS_issues_605-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAANROTV3YU7E3NAMLP3KBQLPVHDBBA5CNFSM4HMS5T3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVJNDIQ-23issuecomment-2D491966882&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=dSUnEeUU2Hoxb-ySCz1qRIaHgWUyFgjvPQEl2mxoM80&s=fv78Oj5xlW4DXL_F0gB-CerwcuDHYuZz_KI0a6hN1D8&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AANROTUQGQ3G7B4FYERPWJ3PVHDBBANCNFSM4HMS5T3A&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=dSUnEeUU2Hoxb-ySCz1qRIaHgWUyFgjvPQEl2mxoM80&s=PiwKxJSC_mgdNiaKeZNLzUNJ_BfazIYq8vWY2t-wVFg&e=>.

yuyiguo · 2019-05-13T21:49:27Z

It looked like to me it was timeout.
I am trying to reenable it.

yuyiguo · 2019-05-13T21:58:17Z

I reenable the migration. Let's see if it can be done this time.

belforte · 2019-05-13T22:24:05Z

sorry @yuyiguo I do not know what you mean by re-enable.
What I did this morning was to remove existing failed migration request via removeMigration API
and submit a new one, which eventually failed again.
Do you mean that I should submit it again, or that you did this already ?

belforte · 2019-05-14T02:47:57Z

Tried another block, failed again:

In [60]: status = apiMig.statusMigration(migration_rqst_id=id)

In [61]: status[0]
Out[61]:
{'create_by': '[email protected]',
'creation_date': 1557801649,
'last_modification_date': 1557801952,
'last_modified_by': '[email protected]',
'migration_input': '/NonPrD0_pT-1p2_y-2p4_pp_13TeV_pythia8/RunIILowPUAutumn18DR-102X_upgrade2018_realistic_v15-v1/AODSIM#07f3510e-6c05-4546-9d54-af9f646fc910',
'migration_request_id': 2738841,
'migration_status': 3,
'migration_url': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'retry_count': 0}

In [62]:

belforte · 2019-05-14T03:10:53Z

Looking at ASO logs, all migrations are failing today :-(
Am checking past days now.

belforte · 2019-05-14T03:37:22Z

As I wrote to the users, DBS migration failures has been 'unheard of' since ever, so we do not have a counter or anything. I have to grep log files like zgrep -i migrat stderr.log-20190506.gz|grep "Migration id"|cut -d: -f 9|grep terminally . So sorry if I do not paste here a way for others to reproduce, but here's a count by day of number of migration requests ended with given status. I only count terminal statuses.
I only have logs since Apr 24.

day       success    fail
Apr 24     73         0
Apr 25    326         0
Apr 26   1280         0
Apr 27     38         0
Apr 28     22         0
Apr 29    139         0
Apr 30    260         1
May  1    191         2
May  2     86         3
May  3    179         4
May  4    201         8
May  5     40         3
May  6    218         3
May  7     94         2
May  8      0        40
May  9      3       392
May 10      0       413
May 11      0       840
May 12      0       326
May 13      0       352
May 14      1       297

belforte · 2019-05-14T12:33:13Z

I have submitted a new migration request after the rollback, let's see:

In [68]: status = apiMig.statusMigration(migration_rqst_id=id)

In [69]: status[0]
Out[69]:
{'create_by': '[email protected]',
'creation_date': 1557837005,
'last_modification_date': 1557837005,
'last_modified_by': '[email protected]',
'migration_input': '/NonPrD0_pT-1p2_y-2p4_pp_13TeV_pythia8/RunIILowPUAutumn18DR-102X_upgrade2018_realistic_v15-v1/AODSIM#e38bd65b-5772-4b06-b0ce-a646d9a93fb5',
'migration_request_id': 2738882,
'migration_status': 0,
'migration_url': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'retry_count': None}

yuyiguo · 2019-05-14T13:19:51Z

Sorry Stefano I did not explain what I meant by "reenabled". It is that I changed the database status from 9 to 0 in the db. So that migration requests can be reprocessed by the migration server. 9 means permanently failed.

belforte · 2019-05-14T13:25:56Z

OK. Not something I have an API for, anyhow should be the same as the migration delete + a new request which I did.
Alas the new request which I posted earlier failed:
In [73]: status[0]
Out[73]:
{'create_by': '[email protected]',
'creation_date': 1557837005,
'last_modification_date': 1557839875,
'last_modified_by': '[email protected]',
'migration_input': '/NonPrD0_pT-1p2_y-2p4_pp_13TeV_pythia8/RunIILowPUAutumn18DR-102X_upgrade2018_realistic_v15-v1/AODSIM#e38bd65b-5772-4b06-b0ce-a646d9a93fb5',
'migration_request_id': 2738882,
'migration_status': 3,
'migration_url': 'https://cmsweb.cern.ch/dbs/prod/global/DBSReader',
'retry_count': 1}

We can wait for the 3 retries, but it dos not look promising.
Back to: are there logs to checke anywhere ? Maybe I should look at Phys03Writer ?

yuyiguo · 2019-05-14T13:28:49Z

I reenabled yesterday's migration again. The problem is that DBS global reader is in high load. When we migration from global to phys03, the migration server will use DBS API to read the block from global just like anyone else. If the block is big, it will be timeout .

yuyiguo · 2019-05-14T13:29:10Z

It failed again.

belforte · 2019-05-14T13:36:25Z

I can understand that, but would like to see confirmation in logs of this timeout. I can look at block size but maybe we need to also include parents. OTOH load should not be very high now, is it ?

yuyiguo · 2019-05-14T13:39:20Z

The thing we need to figure out is which blockdump call is from migration server of phys03 in dbsGlobalReader logs. Let me check.

yuyiguo · 2019-05-14T13:44:47Z

I am working on it. Please don't restart or delete these two migrations.

belforte · 2019-05-14T14:02:31Z

OK. thanks.

belforte · 2019-05-14T14:12:14Z

let me know if I can help

yuyiguo · 2019-05-14T14:21:44Z

INFO:cherrypy.access:[14/May/2019:16:12:06] vocms0163.cern.ch 2001:1458:201:a8::100:1a9 "GET /dbs/prod/global/DBSReader/blockdump?block_name=%2FHIMinimumBias9%2FHIRun2018A-v1%2FRAW%232ad44aae-509b-4554-be3f-6802f1d66056 HTTP/1.1" 500 Internal Server Error [data: - in 135 out 200100579 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=lijimene/CN=817301/CN=Lina Marcela Jimenez Becerra" "" ] [ref: "" "PycURL/7.19.3 libcurl/7.35.0 OpenSSL/1.0.1r zlib/1.2.8 c-ares/1.10.0" ]

IS this in the time frame you were doing?

belforte · 2019-05-14T14:24:18Z

no. THis must be a test from Lina

belforte · 2019-05-14T14:24:43Z

ah sorry.... ignore. I misread the question.

belforte · 2019-05-14T14:26:26Z

last example I posted here is
create_by': '[email protected]',
'creation_date': 1557837005,
that date is: Tue May 14 14:30:05 CEST 2019

there musts be a place in hell for those who write timestamps in logs w/o a timezone

yuyiguo · 2019-05-14T14:27:32Z

the 500 error may related to the redeployment script. See here https://gitlab.cern.ch/cms-http-group/doc/issues/155#note_2584183

belforte · 2019-05-14T14:29:35Z

i submitted the migration request well after DBS was restarted this morning.

belforte · 2019-05-14T14:32:01Z

Migration status fro ASO Publisher today:
0 successes 129 failures. There were 31 failures 2 hours ago.

belforte · 2019-05-14T14:34:23Z

OK. I understand now your point about https://gitlab.cern.ch/cms-http-group/doc/issues/155#note_2584183 . Let's see how it goes once that is solved. Thanks

yuyiguo · 2019-05-14T14:39:08Z

Yes, let's wait for Lina to restart DBS with the new configuration. Then I will reenable one of the migration and see if we can get it going.

yuyiguo · 2019-05-14T17:18:16Z

It did not look good, see below. The migration server got timeout while download the block from the source/DBSGlobalReader. This is after the last restart at 15:00 GMT.

Tue May 14 15:47:32 2019HTTP Error 502: Proxy Error
--------------------getResource--  Tue May 14 15:47:37 2019 Migration request ID: 2738882

Tue May 14 15:57:43 2019HTTP Error 502: Proxy Error
--------------------getResource--  Tue May 14 15:57:48 2019 Migration request ID: 2738882

Tue May 14 16:07:56 2019HTTP Error 502: Proxy Error
--------------------getResource--  Tue May 14 16:08:01 2019 Migration request ID: 2738882

Tue May 14 16:18:07 2019HTTP Error 502: Proxy Error
--------------------getResource--  Tue May 14 16:18:12 2019 Migration request ID: 2738882

yuyiguo · 2019-05-17T14:08:52Z

Yes, Alan.

amaltaro · 2019-05-17T14:15:40Z

Here it is: cms-sw/cmsdist#4983 (untested :D)

yuyiguo · 2019-05-17T14:18:24Z

It looks good to me. Thanks Alan.
@h4d4 Lina, Could you please build and deploy this on testbed? I will test.

yuyiguo · 2019-05-17T14:23:54Z

@bbockelm @amaltaro
#2 DBS-specific fix as suggested in #606
We are going to fixed in https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/Database/DBCore.py#L123, right?

Individual DBS API fix may soon finding the same problem pop up from other places. The most heavy DBS APIs will do more than one sql calls and depend on the previous result. DBS is the most heavy user here. If a fix works for DBS, It should do for others too.

h4d4 · 2019-05-17T14:38:24Z

@yuyiguo Yuyi are you making reference to cms-sw/cmsdist#4983 ?

h4d4 · 2019-05-17T14:52:41Z

@yuyiguo @amaltaro Sure, I'm going to prepare the PR and put it on testbed today
I'm going to include it in HG1905.

amaltaro · 2019-05-17T15:09:11Z

As a follow up of this 3 digits long GH thread, I created this WMCore to be addressed in the beginning of the next week: dmwm/WMCore#9207
If you have further suggestions/concerns, please make it there. But let's avoid to go again beyond 100 comments if possible :)

belforte · 2019-05-17T15:45:10Z

+1Sorry.bad morning for work sent from Stefano's mobileOn May 17, 2019 09:06, Brian P Bockelman <[email protected]> wrote:@amaltaro - I think the sequence should be: Rollback sqlalchemy. Get CRAB working ASAP.DBS-specific fix as suggested in #606Larger discussion, preferably not through a million GitHub threads :) —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.

h4d4 · 2019-05-17T16:43:21Z

@yuyiguo Yuyi,

I've built tag HG1905j including cms-sw/cmsdist#4983, and I've deploy it on testbed.
Tag history changes for validation results here.

I'll keep an eye on validation results, in order to coordinate where it should go to production if tests are successful.

Best Regards, Lina.

yuyiguo · 2019-05-17T16:44:50Z

Thanks Lina. Will let you know as soon as I am done with validation. Yuyi From: Lina Jiménez <[email protected]> Reply-To: dmwm/DBS <[email protected]> Date: Friday, May 17, 2019 at 11:43 AM To: dmwm/DBS <[email protected]> Cc: Yuyi Guo <[email protected]>, Mention <[email protected]> Subject: Re: [dmwm/DBS] MIgration failures (#605) @yuyiguo<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_yuyiguo&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=kWroZ2tZFdvtM8czOcZg-AB6St3EmP9tN6oLa1rsIPk&s=YM_KJX4gc1zaozoBS-eJJaMQTwqJ_-nzq60LNl-WVzk&e=> Yuyi, I've built tag HG1905j<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cms-2Dsw_cmsdist_commits_HG1905j&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=kWroZ2tZFdvtM8czOcZg-AB6St3EmP9tN6oLa1rsIPk&s=t9IO3P7Skpe8oA9CSgHakrM4fOERAwwxN8JhJiJB264&e=> including cms-sw/cmsdist#4983<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_cms-2Dsw_cmsdist_pull_4983&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=kWroZ2tZFdvtM8czOcZg-AB6St3EmP9tN6oLa1rsIPk&s=LF5cJoL0upunuD_p94lv3AETJls76iSF9WYKO54d1-o&e=>, and I've deploy it on testbed. Tag history changes for validation results here<https://gitlab.cern.ch/cms-http-group/doc/issues/155>. I'll keep an eye on validation results, in order to coordinate where it should go to production if tests are successful. Best Regards, Lina. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_DBS_issues_605-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAANROTRK2RUPHIEXMD4OV3LPV3OCTA5CNFSM4HMS5T3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVVILJQ-23issuecomment-2D493520294&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=kWroZ2tZFdvtM8czOcZg-AB6St3EmP9tN6oLa1rsIPk&s=FXq45HE54SSe08jT52Cr-K90-nKXiamxvCOR1UG-KFo&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AANROTVSBGL3PI7K6GYJSVTPV3OCTANCNFSM4HMS5T3A&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=kWroZ2tZFdvtM8czOcZg-AB6St3EmP9tN6oLa1rsIPk&s=7SPVsyySpLHhRDnseZyJcOztHUgdflKOU-wQc6IXBXY&e=>.

belforte · 2019-05-17T18:36:11Z

thanks all. I did just a quick check on blockDump on cmsweb-testbed (which was timing out yesterday) and it works in s couple of seconds now :-)

yuyiguo · 2019-05-19T19:58:26Z

@h4d4
Lina,

I did validation tests. Everything is good for the cmsweb-testbed, but migration.

The problem is that for testing the migration, I need a source DBS and a destination DBS. In the cmsweb-testbed testing, the testbed is destination DBS. In regular case, the source DBS will be dbs global, however the DBS global does not work for blockdump due to the SQLAlchemy version. So I switched to my VM as the source DBS, I got errors that the migration server could not build the migration list. After some debug with a few times of restarting migration server on cmsweb-testbed with some prints, I found the reason was "Failed to connect to dbs3-test2.cern.ch port 8443: No route to host". That was kind surprised to me. I know my VM works when I do everything on the node. So I did a simple testing by log in lxplus and curl to DBS server on my VM . I got the same connection error as the migration server on cmsweb-testbed.

I am not sure if this is a feature of new VM or cmsweb. I recalled that I was able to access dbs servers on personal VMs as long as I was inside cern in the past. Then I tried to open the 8443/443 port, but failed. However, I did test the blockdump on cmsweb-testbed and worked well. The blockdump caused the past migration failure.

At this point, I don't know what we need to do with the deployment on prod. If we want to continue for full validation, @vkuznet Valentin or others may help on opening ports issue. Then I will test again. If the current test is enough, you may deploy on production.
Thanks,
Yuyi

belforte · 2019-05-19T20:05:20Z

Well if we deploy on production and migration still fails, we are no worse off than we are now. May very well try.

amaltaro · 2019-05-20T04:22:13Z

@yuyiguo how about triggering a migration from testbed to prod (given that the failure happens at the reader side)? If you want, I have a few datasets available only in testbed, e.g.:
[u'/RelValProdMinBias/DMWM_Test-ProdMinBias_TaskChain_ProdMinBias_Agent122_Validation_Privv2-v11/GEN-SIM',
u'/RelValProdMinBias/DMWM_Test-DIGIPROD1_TaskChain_ProdMinBias_Agent122_Validation_Privv2-v11/GEN-SIM-RAW',
u'/RelValProdMinBias/DMWM_Test-RECOPROD1_TaskChain_ProdMinBias_Agent122_Validation_Privv2-v11/AODSIM',
u'/RelValProdMinBias/DMWM_Test-RECOPROD1_TaskChain_ProdMinBias_Agent122_Validation_Privv2-v11/GEN-SIM-RECO']

Otherwise, I'm sending (just did) you a trick via private email that should allow you to connect to services running on your VM. Don't forget you need to allow your DN in the authmap.json file (check your crontab entry to see what I'm talking about)

h4d4 · 2019-05-20T07:39:47Z

Reading the latest inputs, my understanding is that there is still a need to make additional tests. I'll keep an eye on it.

vkuznet · 2019-05-20T12:54:55Z

Yuyi, I think the default behavior now for machine tools is to connect to port 8443. We had a campaign about it an year ago and you implemented this change in your DBS client APIs, see https://github.com/dmwm/DBS/blob/master/Client/src/python/dbs/apis/dbsClient.py#L140 So you have 2 options, either to adjust your client to use port 443 and then I would expect everything will work or you need to add new iptables rules to open port 8443 on your VM. The rule sudo iptables -I INPUT -i eth0 -p tcp -s $ipaddr --dport $port -j ACCEPT where you need to change ipaddr and port accordingly. Then you may need to save the rule sudo iptables save sudo iptables reload Please pay attention that VM may be using puppet which runs its cron and rules may be wiped out because puppet cron will put back according to puppet configuration. In that case the rules should be changed in puppet configuration. Best, Valentin.

…

On 0, Yuyi Guo ***@***.***> wrote: @h4d4 Lina, I did validation tests. Everything is good for the cmsweb-testbed, but migration. The problem is that for testing the migration, I need a source DBS and a destination DBS. In the cmsweb-testbed testing, the testbed is destination DBS. In regular case, the source DBS will be dbs global, however the DBS global does not work for blockdump due to the SQLAlchemy version. So I switched to my VM as the source DBS, I got errors that the migration server could not build the migration list. After some debug with a few times of restarting migration server on cmsweb-testbed with some prints, I found the reason was "Failed to connect to dbs3-test2.cern.ch port 8443: No route to host". That was kind surprised to me. I know my VM works when I do everything on the node. So I did a simple testing by log in lxplus and curl to DBS server on my VM . I got the same connection error as the migration server on cmsweb-testbed. I am not sure if this is a feature of new VM or cmsweb. I recalled that I was able to access dbs servers on personal VMs as long as I was inside cern in the past. Then I tried to open the 8443/443 port, but failed. However, I did test the blockdump on cmsweb-testbed and worked well. The blockdump caused the past migration failure. At this point, I don't know what we need to do with the deployment on prod. If we want to continue for full validation, @vkuznet Valentin or others may help on opening ports issue. Then I will test again. If the current test is enough, you may deploy on production. Thanks, Yuyi -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #605 (comment)

yuyiguo · 2019-05-20T14:12:57Z

@h4d4
Lina and All,
All tests were successful. Please deploy DBS 3.7.8 with HG1902X for production.
Thanks Valentin and Alan for the help on ports.
Yuyi

h4d4 · 2019-05-20T14:18:02Z

@yuyiguo Yuyi, All,

Are you all agree in scheduling this intervention for tomorrow early in the morning?
Or are you suggesting doing it ASAP?

I need to announce it on hn-cms-cerncompannounce

Best Regards, Lina.

yuyiguo · 2019-05-20T14:22:30Z

Lina, I would do it ASAP since crab has been waiting for this, but if we need to go through all the proper channels then we have to wait. Cheers, Yuyi From: Lina Jiménez <[email protected]> Reply-To: dmwm/DBS <[email protected]> Date: Monday, May 20, 2019 at 9:18 AM To: dmwm/DBS <[email protected]> Cc: Yuyi Guo <[email protected]>, Mention <[email protected]> Subject: Re: [dmwm/DBS] MIgration failures (#605) @yuyiguo<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_yuyiguo&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=pkIv-vjMMqVic27Z9_ZIFHqbWnCBIamQYy0Dw3MXAlg&s=TQiTv6qN0tMXcimHgTTk6PB1_WudjKp4JHa8Abf3aFM&e=> Yuyi, All, Are you all agree in scheduling this intervention for tomorrow early in the morning? Or are you suggesting doing it ASAP? I need to announce it on hn-cms-cerncompannounce Best Regards, Lina. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_DBS_issues_605-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAANROTQZ2CUQ53AYNUWVS2LPWKXJXA5CNFSM4HMS5T3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVY7GKY-23issuecomment-2D494007083&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=pkIv-vjMMqVic27Z9_ZIFHqbWnCBIamQYy0Dw3MXAlg&s=E1TKYP9GDWUH6T6-ro1L99C25aySvYNnhha1lUOIVKQ&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AANROTU2N54Z5PJS5KTIAUTPWKXJXANCNFSM4HMS5T3A&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=pkIv-vjMMqVic27Z9_ZIFHqbWnCBIamQYy0Dw3MXAlg&s=cgVKyLTbJq3HBZn6SHm0YF3cJAkjpcH4Q9nS_Y3ZfOY&e=>.

belforte · 2019-05-20T14:42:24Z

Tomorrow morning will be OK. Better not do changes to production late in the day.
And one day more or less makes no difference.

Big thanks to all of you !

h4d4 · 2019-05-20T14:45:46Z

@yuyiguo Yuyi, All,

Thanks for the feedback, therefore I'll run a production intervention for DBS's servers tomorrow at 9:00 AM GVA Time. I'm going to send the announcement to 'hn-cms-cerncompannounce'.

Best Regards, Lina.

yuyiguo · 2019-05-20T14:47:12Z

Thanks all, Yuyi From: Lina Jiménez <[email protected]> Reply-To: dmwm/DBS <[email protected]> Date: Monday, May 20, 2019 at 9:45 AM To: dmwm/DBS <[email protected]> Cc: Yuyi Guo <[email protected]>, Mention <[email protected]> Subject: Re: [dmwm/DBS] MIgration failures (#605) @yuyiguo<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_yuyiguo&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=bOaFLdJMXKNFmyaXY00ba3o5LH76sBZ98EHKMOHx3f8&s=oOWeCrJCkaI9MXxr9fqUpjzDO3zWTypswdttZxBRnOw&e=> Yuyi, All, Thanks for the feedback, therefore I'll run a production intervention for DBS's servers tomorrow at 9:00 AM GVA Time. I'm going to send the announcement to 'hn-cms-cerncompannounce'. Best Regards, Lina. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmwm_DBS_issues_605-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAANROTSI53Z6P2VOSLFZS23PWK2RXA5CNFSM4HMS5T3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVZB6HQ-23issuecomment-2D494018334&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=bOaFLdJMXKNFmyaXY00ba3o5LH76sBZ98EHKMOHx3f8&s=bewAkBShrOzl1_dvpi4KDMFOQEUfZ2PSgM2PMHEfO1U&e=>, or mute the thread<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AANROTUCNULFBU7GLWUC443PWK2RXANCNFSM4HMS5T3A&d=DwMCaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=8bursUuc0V63OwREQMBG2Q&m=bOaFLdJMXKNFmyaXY00ba3o5LH76sBZ98EHKMOHx3f8&s=hqWlxhldb6GuMa_8ZChaYX4v4HCnwHGzcEAiQKSw-eY&e=>.

yuyiguo · 2019-05-20T14:54:54Z

@h4d4
Lina,
We need to bring DBS int db to the data level of prod. It will take one or two days to dump the data from prod to int and cmsweb-testbed DBS will not available during that time. What is a good time to do so?
Cheers,
Yuyi

h4d4 · 2019-05-20T15:34:31Z

@yuyiguo Yuyi,
Since tomorrow will start HG1906 preproduction validation, DBS testbed outages could affect validation for services such DAS. @vkuznet
Then I saw there two possibilities:

Start the procedure now (if it is really urgent).
Do it once HG1906 preproduction validation ends, it is on 03rd June.

Best Regards, Lina.

amaltaro · 2019-05-20T15:52:48Z

Lina, I'm not sure I understand what you're proposing. We run the production intervention tomorrow morning, then once it's done; you start working on the testbed new release.

Am I missing anything?

belforte · 2019-05-20T15:58:56Z

The question is: who is using the DBS Oracle INT instance in testbed ?

amaltaro · 2019-05-20T16:02:15Z

I suggest we discuss the prod --> int DB copy/dump in another thread not to mix too many things here. And yes, WMCore team is using the cmsweb-testbed int database almost daily.

h4d4 · 2019-05-21T12:13:16Z

@yuyiguo @vkuznet @belforte @amaltaro
I've created this issue regarding the required procedure 'int DB dump'. Please comment there.
Best Regards, Lina.

amaltaro mentioned this issue May 17, 2019

Sanitize SQL input binds against unicode key and values dmwm/WMCore#9207

Closed

yuyiguo closed this as completed Apr 30, 2020

MIgration failures #605

MIgration failures #605

Comments

belforte commented May 13, 2019

belforte commented May 13, 2019

belforte commented May 13, 2019

belforte commented May 13, 2019

yuyiguo commented May 13, 2019 via email

yuyiguo commented May 13, 2019

yuyiguo commented May 13, 2019

belforte commented May 13, 2019

belforte commented May 14, 2019

belforte commented May 14, 2019

belforte commented May 14, 2019

belforte commented May 14, 2019

yuyiguo commented May 14, 2019

belforte commented May 14, 2019

yuyiguo commented May 14, 2019

yuyiguo commented May 14, 2019

belforte commented May 14, 2019

yuyiguo commented May 14, 2019

yuyiguo commented May 14, 2019

belforte commented May 14, 2019

belforte commented May 14, 2019

yuyiguo commented May 14, 2019 • edited Loading

belforte commented May 14, 2019

belforte commented May 14, 2019

belforte commented May 14, 2019

yuyiguo commented May 14, 2019

belforte commented May 14, 2019

belforte commented May 14, 2019

belforte commented May 14, 2019

yuyiguo commented May 14, 2019

yuyiguo commented May 14, 2019 • edited Loading

yuyiguo commented May 17, 2019

amaltaro commented May 17, 2019

yuyiguo commented May 17, 2019

yuyiguo commented May 17, 2019

h4d4 commented May 17, 2019 • edited Loading

h4d4 commented May 17, 2019

amaltaro commented May 17, 2019

belforte commented May 17, 2019 via email

h4d4 commented May 17, 2019

yuyiguo commented May 17, 2019 via email

belforte commented May 17, 2019

yuyiguo commented May 19, 2019

belforte commented May 19, 2019

amaltaro commented May 20, 2019

h4d4 commented May 20, 2019 • edited Loading

vkuznet commented May 20, 2019 via email

yuyiguo commented May 20, 2019

h4d4 commented May 20, 2019

yuyiguo commented May 20, 2019 via email

belforte commented May 20, 2019

h4d4 commented May 20, 2019

yuyiguo commented May 20, 2019 via email

yuyiguo commented May 20, 2019

h4d4 commented May 20, 2019

amaltaro commented May 20, 2019

belforte commented May 20, 2019

amaltaro commented May 20, 2019 • edited Loading

h4d4 commented May 21, 2019

yuyiguo commented May 14, 2019 •

edited

Loading

yuyiguo commented May 14, 2019 •

edited

Loading

h4d4 commented May 17, 2019 •

edited

Loading

h4d4 commented May 20, 2019 •

edited

Loading

amaltaro commented May 20, 2019 •

edited

Loading