Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NATS integration and implement PdmV use case #617

Merged
merged 6 commits into from
Jan 16, 2020
Merged

Conversation

vkuznet
Copy link
Contributor

@vkuznet vkuznet commented Dec 3, 2019

This PR contains the following changes:

  • it adds NATSManager from CMSMonitoring and configured it external configuration
  • it adds publishing messages upon dataset/file creation to accommodate PdmV (McM) use cases discussed in CMSMONIT-161. In particular, we need to see real-time progress of the following:
    • dataset type change, e.g. from PRODUCTION to VALID
    • dataset growth, e.g. when dataset is updated with new block/file info (since event_count is only provided at file injection level, I updated DBS API at insertFile level)

The interaction with NATS is made optional and safe, i.e. if something happens during NATS publication (e.g. Exception is thrown) the DBS code should not be affected and neither blocked. All NATS publications are done in asynchronous way such that there is no blocking requests.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 3, 2019

Yuyi, I think I found appropriate places for NATS integration and implemented McM use case. I suggest to perform profile measurements of insert APIs with and without NATS such that it will give you confidence if NATS makes any additional load on DBS. I made NATS fully configurable, e.g. if it is not present in configuration then no NATS interaction will happen. To configure NATS you'll need the following parameters in your config file: nats_server=<server> and nats_topics=[]. The later is usually should be empty list (it will only be required if we'll decide to publish dbs data on dedicated topics).

To include NATS the changes to dbs spec file will be required to include CMSMonitoring dependency. I'll prepare separate PR for that and link it here.

Please review, ask questions and let me know.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 3, 2019

PR#5401 provides necessary changes for dbs spec to include CMSMonitoring dependency.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 3, 2019

PR#821 contains necessary changes for DBS configuration to include/exclude NATS.

@yuyiguo
Copy link
Member

yuyiguo commented Dec 3, 2019

Valentin,
Glad you figured out quickly. As I said that I have a lot of stuff in my plate and it would take a while for me to work on NATS. Feel free to test it yourself if you like.
Yuyi

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 3, 2019

Yuyi, don't worry, as I said I'll try to help as much as I can. For testing I can build new DBS RPMs, and set them up somewhere, but I need to know how to inject some data. If you can give me some examples it would be great, at least how to inject a new dataset would be sufficient. Of course I understand that I need to configure DBS with proper DB backend (whatever test DB you're using).

Or, if you have working VM and can give me access to it, I can easily patch DBS over there and try it out.

@yuyiguo
Copy link
Member

yuyiguo commented Dec 3, 2019

Valentin,
Integration DBS DB just updated from production and you have all the info as it is used k8s setup. We can run DBS unit tests for the 1st run.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 3, 2019

So, can I adjust DBS in k8s then?

@yuyiguo
Copy link
Member

yuyiguo commented Dec 3, 2019

yes.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

Yuyi, I deployed DBS with NATS on k8s cluster (it is part of DBSGlobalWriter). Could you please let me know how to run an injection of dataset and/or test for writer that I can see how code is working.

@yuyiguo
Copy link
Member

yuyiguo commented Dec 4, 2019

Valentin,

There are a lot of examples at https://github.com/dmwm/DBS/tree/master/Client/utils
You need to change the url to k8s. You may want to try https://github.com/dmwm/DBS/blob/master/Client/utils/updateDatasetType.py to see if you get NATS message.
Yuyi

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

Yuyi,
could you please provide details what is required to get access to DBS writer. I tried insertdataset.py and it produced the following output:

# the url I access
https://cmsweb-test.cern.ch/dbs/int/global/DBSWriter
# the output I get
['dataset', 'primary_ds_name', 'physics_group_name', 'xtcrosssection', 'acquisition_era_name', 'processing_version', 'dataset_access_type', 'data_tier_name', 'processed_ds_name']
Traceback (most recent call last):
  File "insertdataset.py", line 20, in <module>
    print(dbs3api.insertDataset(dataset))
  File "/afs/cern.ch/user/v/valya/workspace/DBS/Client/src/python/dbs/apis/dbsClient.py", line 370, in insertDataset
    return self.__callServer("datasets", data = datasetObj, callmethod='POST' )
  File "/afs/cern.ch/user/v/valya/workspace/DBS/Client/src/python/dbs/apis/dbsClient.py", line 201, in __callServer
    self.__parseForException(http_error)
  File "/afs/cern.ch/user/v/valya/workspace/DBS/Client/src/python/dbs/apis/dbsClient.py", line 228, in __parseForException
    raise HTTPError(http_error.url, data['exception'], data['message'], http_error.header, http_error.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 403: You are not allowed to access this resource, authz denied

I can access DBSReader though, e.g.

# if I use this call
scurl -v "https://cmsweb-test.cern.ch/dbs/int/global/DBSReader/datasets?dataset=/ZMM*/*/*"

So, what I need is

  • which URL to use, I thought I can use https://cmsweb-test.cern.ch/dbs/int/global/DBSWriter but I'm not sure if it is correct one
  • do I need to setup X509 proxy and/or use user certificates, if so how to setup them that DBS api will read them
  • can you post a simple curl call to insert the dataset in DBS, it is much easier to understand then deal with lots of python wrappers (DBS APIs), e.g. if we need to make a POST request to URL and provide some json I can do it easily with curl without bothering how to setup dbs client. For testing purposes it would be sufficient and probably faster.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

From DBSWriter logs on k8s I see this

INFO:cherrypy.access:[04/Dec/2019:17:13:46] dbs-global-w-6dfdc68885-mjfj8 137.138.31.19 "POST /dbs/int/global/DBSWriter/datasets HTTP/1.1" 403 Forbidden [data: 341 in 111 out 1650 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov" "" ] [ref: "" "DBSClient/Unknown/" ]

which implies that my DN is not authorized to write to DBS. What do I need to change to get the access?

@yuyiguo
Copy link
Member

yuyiguo commented Dec 4, 2019

You were not in the DBS operator group. I tried to add you from siteDB, but failed. I opened a ticket on suteDB and cc'ed to you.

Just wonder I thought that you wrote to DBS in the past, didn't you?

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

I never wrote to DBS, so, yes I need a permission. And, now we use CRIC instead of SiteDB. I think CRIC is managed by [email protected] or [email protected]

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019 via email

@yuyiguo
Copy link
Member

yuyiguo commented Dec 4, 2019

SiteDB page should be redirect to CRIC.I wrote to them.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

Yuyi, now I seems to have an access since I'm getting different error, but I don't know what's wrong. Please advise:

https://cmsweb-test.cern.ch/dbs/int/global/DBSWriter
{'dataset': '/cmsnats_pri/cmsnats-v101/GEN-SIM-DIGI-RAW', 'primary_ds_name': 'cmsnats_pri', 'physics_group_name': 'Tracker', 'xtcrosssection': 123, 'acquisition_era_name': 'cmsnats', 'processing_version': 101, 'dataset_access_type': 'VALID', 'data_tier_name': 'GEN-SIM-DIGI-RAW', 'processed_ds_name': 'cmsnats-v101'}
Traceback (most recent call last):
  File "insertdataset.py", line 28, in <module>
    print(dbs3api.insertDataset(dataset))
  File "/afs/cern.ch/user/v/valya/workspace/DBS/Client/src/python/dbs/apis/dbsClient.py", line 370, in insertDataset
    return self.__callServer("datasets", data = datasetObj, callmethod='POST' )
  File "/afs/cern.ch/user/v/valya/workspace/DBS/Client/src/python/dbs/apis/dbsClient.py", line 201, in __callServer
    self.__parseForException(http_error)
  File "/afs/cern.ch/user/v/valya/workspace/DBS/Client/src/python/dbs/apis/dbsClient.py", line 228, in __parseForException
    raise HTTPError(http_error.url, data['exception'], data['message'], http_error.header, http_error.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 412: insertDataset must have: dataset,                                          primary_ds_name, processed_ds_name, data_tier_name

I already inserted Acq era though and I printed a dict I'm trying to insert, but apparetnly DBS complains that insertDataset must have "dataset, primary_ds_name, processed_ds_name, data_tier_name" which are presented in my dict.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

this is how error looks in dbs server logs

INFO:cherrypy.access:[04/Dec/2019:20:21:50] dbs-global-w-6dfdc68885-mjfj8 137.138.54.48 "POST /dbs/int/global/DBSWriter/datasets HTTP/1.1" 412 Precondition Failed [data: 316 in 180 out 90276 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov" "" ] [ref: "" "DBSClient/Unknown/" ]
INFO:cherrypy.access:REQUEST [04/Dec/2019 20:24:26] 137.138.31.19 37788 POST /datasets [/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov] [{}]
ERROR:cherrypy.error:Wed Dec  4 19:24:26 2019 dbsException-missing-data: insertDataset must have: dataset,                                          primary_ds_name, processed_ds_name, data_tier_name
INFO:cherrypy.access:[04/Dec/2019:20:24:26] dbs-global-w-6dfdc68885-mjfj8 137.138.31.19 "POST /dbs/int/global/DBSWriter/datasets HTTP/1.1" 412 Precondition Failed [data: 316 in 180 out 30963 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov" "" ] [ref: "" "DBSClient/Unknown/" ]

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

and this is the client code I used:

from __future__ import print_function
#DBS-3 imports
from dbs.apis.dbsClient import *
import os

url=os.getenv('DBS_WRITER_URL', "https://cmsweb-test.cern.ch/dbs/int/global/DBSWriter")
print(url)
# API Object
dbs3api = DbsApi(url=url)

#acq_era={'acquisition_era_name': 'cmsnats', 'description': 'testing_insert_era',
#                 'start_date':1234567890}
#print(dbs3api.insertAcquisitionEra(acq_era))

dataset={'primary_ds_name': 'cmsnats_pri',
 'physics_group_name': 'Tracker',
 'processed_ds_name':'cmsnats-v101',
 'dataset_access_type': 'VALID',
 'xtcrosssection': 123,
 'data_tier_name': 'GEN-SIM-DIGI-RAW',
 'acquisition_era_name':'cmsnats',
 'processing_version':101 }

dataset.update({'dataset' : '/%s/%s/%s' %(dataset['primary_ds_name'], dataset['processed_ds_name'],
                dataset['data_tier_name'])})

print(dataset)
print(dbs3api.insertDataset(dataset))

@yuyiguo
Copy link
Member

yuyiguo commented Dec 4, 2019 via email

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019 via email

@yuyiguo
Copy link
Member

yuyiguo commented Dec 4, 2019 via email

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 4, 2019

Yuyi,
it is working !!! Here is output from DBS logs:

INFO:cherrypy.access:REQUEST [04/Dec/2019 22:07:19] 137.138.31.19 37756 PUT /datasets [/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov] [{'dataset_access_type': u'PRODUCTION', 'dataset': u'/ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2/NANOAODSIM'}]
INFO:cherrypy.access:[04/Dec/2019:22:07:19] dbs-global-w-6dfdc68885-gw2zm 137.138.31.19 "PUT /dbs/int/global/DBSWriter/datasets?dataset_access_type=PRODUCTION&dataset=%2FZMM_13TeV_TuneCP5-pythia8%2FRunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2%2FNANOAODSIM HTTP/1.1" 200 OK [data: 2 in 4 out 271749 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov" "" ] [ref: "" "DBSClient/Unknown/" ]
INFO:cherrypy.access:REQUEST [04/Dec/2019 22:07:19] 137.138.31.19 54056 PUT /datasets [/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov] [{'dataset_access_type': u'VALID', 'dataset': u'/ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2/NANOAODSIM'}]
INFO:cherrypy.access:[04/Dec/2019:22:07:20] dbs-global-w-6dfdc68885-gw2zm 137.138.31.19 "PUT /dbs/int/global/DBSWriter/datasets?dataset_access_type=VALID&dataset=%2FZMM_13TeV_TuneCP5-pythia8%2FRunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2%2FNANOAODSIM HTTP/1.1" 200 OK [data: 2 in 4 out 119986 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov" "" ] [ref: "" "DBSClient/Unknown/" ]

and here is messages I got in my subscriber

./nats-sub -t "cms.dbs.>"
Listening on [cms-nats.cern.ch/cms.dbs.>]
2019/12/04 22:07:19 /ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2/NANOAODSIM PRODUCTION
2019/12/04 22:07:19 /ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2/NANOAODSIM PRODUCTION
2019/12/04 22:07:19 /ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2/NANOAODSIM VALID
2019/12/04 22:07:20 /ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2/NANOAODSIM VALID

But I don't understand why I get 2 requests in DBS which lead to two message per single change of dataset access type. From DBS log I see 2 PUT requests

INFO:cherrypy.access:REQUEST [04/Dec/2019 22:07:19] 137.138.31.19 37756 PUT /datasets [/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov] [{'dataset_access_type': u'PRODUCTION', 'dataset': u'/ZMM_13TeV_TuneCP5-pythia8/RunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2/NANOAODSIM'}]
INFO:cherrypy.access:[04/Dec/2019:22:07:19] dbs-global-w-6dfdc68885-gw2zm 137.138.31.19 "PUT /dbs/int/global/DBSWriter/datasets?dataset_access_type=PRODUCTION&dataset=%2FZMM_13TeV_TuneCP5-pythia8%2FRunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2%2FNANOAODSIM HTTP/1.1" 200 OK [data: 2 in 4 out 271749 us ] [auth: OK "/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=valya/CN=443502/CN=Valentin Y Kuznetsov" "" ] [ref: "" "DBSClient/Unknown/" ]

The first uses /datasets API call payload data (JSON), while second uses /dbs/int/global/DBSWriter/datasets?dataset_access_type=PRODUCTION&dataset=%2FZMM_13TeV_TuneCP5-pythia8%2FRunIIAutumn18NanoAODv5-SNBHP_Nano1June2019_SNB_HP_102X_upgrade2018_realistic_v19-v2%2FNANOAODSIM API w/o payload. I doubt that it is related to my changes since I didn't modified DBS APIs logic, I only added nats messaging. It is something that you should check.

To sum-up, now the new DBS image cmssw/dbs contains this patch which adds NATS manager and yields messages. If you want to test the impact of NATS I think we need to run series of tests with and without NATS in identical environment and measure timing of DBS APIs.

@yuyiguo
Copy link
Member

yuyiguo commented Dec 4, 2019 via email

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 5, 2019

Yuyi, I updated PR with print statements (not exceptions) about possible NATS failures (if any) and all NATS calls are within try/except blocks so it will not affect usual DBS API flaw. From my part the PR is done and you can review and decide when to merge it. I would suggest that you'll first put it into your VM and run your usual unit tests with/without NATS, then we can add the code to cmsweb-testbed where we'll have some data flaw and we can observe it in NATS.

If you need more info please feel free to let me know, I can easily chat and even show you a demo how NATS behave when I make changes in DBS. But I also want to understand a time scale you foresee for this issue.

@vkuznet
Copy link
Contributor Author

vkuznet commented Dec 16, 2019

@yuyi, do you have time to review (and possibly merge) this PR before holidays? Since we have monthly upgrade cycles I would prefer to include this functionality for next cmsweb upgrade such that we can properly test it and decide if we can enable it.

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 14, 2020

@yuyi, can we include this PR into this cmsweb release upgrade cycle? I understand that you may still need to test the functionality before enabling the NATS, but if we don't enable and just include the code it will be there and we can avoid waiting yet another round of cmsweb upgrade cycle?

@yuyiguo
Copy link
Member

yuyiguo commented Jan 14, 2020

Hi Valentin,
I will look into the PRs and test them tomorrow. But if I test them on my VM, How the publishing messages goes? Do you also get message from my VMs?

Also, please disable the DBS DB connection from your testing server. I need that DB to do DBS testing for the cmsweb-testbed because we have new DBS server deployed.
Yuyi

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 15, 2020

Publishing is automatic, there is no configuration is needed, and it is de-centralized, therefore messages can come from any host.

I'm not sure about DBS DB connection, which testing server you're referring? I tested NATS using k8s DBS deployment, I didn't make any special setup for it. Therefore I doubt I need to do anything.

Regarding testing message, the server can post messages at any time, we only need to run subscriber which can consume messages. But since authentication is involved, you need a login/password. Feel free to ping me tomorrow on slack or send email and I can show you how to run it, it is just one executable (and not setup) which can listen to messages.

@yuyiguo
Copy link
Member

yuyiguo commented Jan 15, 2020

[VK] I'm not sure about DBS DB connection, which testing server you're referring? I tested NATS using k8s DBS deployment, I didn't make any special setup for it. Therefore I doubt I need to do anything.

Valentin,
You were running a k8s DBS deployment. Is this DBS server still running? If so you need to stop the running server because it uses the same DBS database. We cannot test DBS while more than one server is running on the same DBS DB.

Thanks,
Yuyi

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 15, 2020

Yes, my k8s server is running, we also deployed pre-production k8s cluster where DBS is running. On top of that we have DBS in VM based pre-production cluster. We may have other k8s deployment in a future. You should provide clear instructions how DBS should be handled. My understanding that unless we write something to DBS the read-only DB are fine to use.

In my k8s cluster I use cms_dbs3_int_XXX owner/account. You should check with Muhammad which account he used in k8s preproduction cluster he setup. You should start coordinating this more closely as we need to test things in parallel and/or protect your codebase to avoid conflicts.

Meanwhile, please let me know when you'll address this PR.

@yuyiguo
Copy link
Member

yuyiguo commented Jan 15, 2020

@vkuznet @muhammadimranfarooqi
Valentine,
the idea of "My understanding that unless we write something to DBS the read-only DB are fine to use." is wrong.
The only official usage of this DBS DB account(cms_dbs3_int_XXX owner/account) is the DBS in VM based pre-production cluster. You should stop using it for your k8s server ASAP . I don't have time to maintain a list of DBS db. If you want to use it, you should ask your own account. I allowed you to use it, just to help you to do a quick test to approve the concept. I did not meant that you could you it for long time. I hope you understand.

I had clear message with Lina about this DB account used on k8s pre-production. I asked her stopping the server as soon as the quick test was done. But since the credential is in the system, I was used by Muhammad. I will create a new DBS account for the k8s pre_production. Muhammad, could you please stop the k8s DBS pre-production servers for now? Otherwise, I can not do anything for DBS validation tests.

Thanks,
Yuyi

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 15, 2020

Yuyi, please use this ticket for NATs related questions. All other issues should be redirected to appropriate tickets (e.g. for k8s and cmsweb related issues to gitlab). I don't want to have very confused thread about different topics not related to issue in question.

@yuyiguo
Copy link
Member

yuyiguo commented Jan 15, 2020

Valentin,
insertBulkBlock function:
https://github.com/dmwm/DBS/pull/617/files#diff-90376ab25a2a1f8676d4af50998bd00eR275
is the most used function in DBS.

Basically, DBS client will insert into DBS by a block. A new dataset is also inserted when the first block of the dataset is inserted. So what you need is fire a NATS message when insertBulkBlock is called. Could you update you PR?

I put one comment on the print in the code. Other than that, the rest code is OK to me.

Thanks,
Yuyi

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 15, 2020

Please be clear where to add publication. Should I remove it from insertDataset and put it into insertBulkBlock? If later, how to extract a dataset name from indata. Please provide a syntax how your JSON data is structured in those APIs.

@yuyiguo
Copy link
Member

yuyiguo commented Jan 15, 2020

You don't need to remove the insertDataset. what you need to is add publication to insertBukBlock. The input JSON most time is a huge data block. DBS does not extract data from it at web layer. I am afraid we have to extract data two times from this data block to cause a big memory print and add execution time of the API. We are already at the edge of timeout.

Regarding of the data structure, you may look at an example https://github.com/dmwm/DBS/blob/master/Client/utils/blockdump.dict.

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 15, 2020

Yuyi, I added code to insertBulkBlock and there is no need to extract data twice since the code already does json decoding (i.e. your indata is a dict and therefore I was need only to get the appropriate key). Please have a look at updated code. I extracted dataset info based on your blockdump.dict structure.

@vkuznet
Copy link
Contributor Author

vkuznet commented Jan 16, 2020

Yuyi, regarding big memory footprint. If you'll change your unstructured JSON input to structured one then all of your memory issues will be gone since later can be parsed line by line instead of loading entire dict into RAM. I already pointed out how this can be done in dmwm/DBS/issue/599

@yuyiguo
Copy link
Member

yuyiguo commented Jan 16, 2020

I have urgent matter to handle, I may not get back to this until tomorrow or next week. I will look into your changes as soon as I have time and will let you know.

@yuyiguo
Copy link
Member

yuyiguo commented Jan 16, 2020

I am going to merge this PR now. I understand that data you want to broadcast is the dataset info, no blocks or files involved.

@yuyiguo yuyiguo merged commit c5f7378 into dmwm:master Jan 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants