Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DBS API BulkBlock input size control #599

Open
yuyiguo opened this issue Apr 15, 2019 · 24 comments
Open

DBS API BulkBlock input size control #599

yuyiguo opened this issue Apr 15, 2019 · 24 comments

Comments

@yuyiguo
Copy link
Member

yuyiguo commented Apr 15, 2019

@bbockelm @amaltaro @belforte @vkuznet and @ALL
DBS database became bigger with time and DBS servers are getting more loads. We no longer have the luxury to load huge files. The most recently issues was a block with 500 files, sized about 200 MB and 1,643,229 lumi sections. This block could not even be load entire data trough the front end.

Now it is the time that we start look into how DBS should make the limit. What is reasonable limits? Limits on block size, number of files and number of lumi sections in a block?

Currently, WMagents have total 500 files per block as limit. But the files various a lot.
I am not sure what the limit crab put in.

@vkuznet
Copy link
Contributor

vkuznet commented Apr 15, 2019 via email

@belforte
Copy link
Member

About last point: CRAB Publisher is currently configured for 100 files/block. There is also a limit on how many lumis can be in an input block at 100k. Since one job can not cross block boundaries this gives a max. of 100k lumis in one output file if someone process data which are "at the edge" and wants output in DBS.

Is this the time to question whether and how we want to store lumilist for nanoAOD. So far the above limitation results in "can read nanoAOD with CRAB, but only with splitByFile". But I suspect nobody tried to store in DBS output of "nanoAOD skimming" resulting in even more lumis/file. In the end someone could have search analysis producing one file with maybe 100 events, but all lumis in all CMS runs !

@belforte
Copy link
Member

Valentin, protecting DBS from code which ran astray is good, but here we need also to define how we should use DBS so that it keeps working smoothly for us. Breaking large inputs in pieces may avoid FE timeouts, but do we really need to push those enormous JSON lists in Oracle ?

@amaltaro
Copy link
Contributor

Thanks for starting this discussion Yuyi.
Another possibility would be to break this bulkBlock API in multiple/many different APIs, such that we can send less data in each call (of course, at the cost of a higher number of HTTP requests and more micromanagement on the client side).
It looks like we could have a different insert API for file_conf_list, file_parent_list and files (which contains lumis).

We also have to come up with better thresholds for the clients (aka CRAB and WMAgent). Imposing these limitations will always be sup-optimal though and people need to be aware that it won't come for free (like small blocks here and there).

BTW, that json is very large because we post that info to DBS with the keys/binds already formatted for the DAO, that's why the volume is large (which saves quite some CPU cycles on the DBS server side).

@belforte
Copy link
Member

Alan, which kind of dataset was that huge block for ? I do not see how 500 files could be a problem, but 1.6 M lumis in ASCII formatted JSON really sounds a lot to digest. Why do we store lumilist in a relational DB ? Is it only for answering the question "give me the file(s) in this dataset which contain lumi X from run R" ? I do not see that question as useful for highly compacted data tiers.

@vkuznet
Copy link
Contributor

vkuznet commented Apr 16, 2019 via email

@belforte
Copy link
Member

CRAB uses the format from this example to fill the structure to be passed to insertBulkBlock
https://github.com/dmwm/DBS/blob/master/Client/tests/dbsclient_t/unittests/blockdump.dict
since we could not find any other documentation.
In that every file is a list of dictionaries one of which is a list of {'run':int; 'lumi':int}
We have not touched that code since it was written early in DBS3 history.

I would distinguish three things here:

  1. how to pass that list efficiently (ranges may not get more than a few O(1) factors since there are many gaps lumis are scattered almost at random in initial RAW files)
  2. how to store that information (i.e. which kind of query and/or retrieval do we want)
  3. when to store it (i.e. for which files/datasets)

@yuyiguo
Copy link
Member Author

yuyiguo commented Apr 16, 2019

Thanks all for the discussion here. For the huge block of 200 files, you may find some f details at https://its.cern.ch/jira/browse/CMSCOMPPR-5196.

Regarding the input data format, the link Stefano pointed out is the current input requirement. We designed this format because we want to have the date to be insert without reformating in DBS. but if this format is the problem, we definitely can redesign it to reduce the input volume. However, we have the 300 minutes limit posted on the server, reformatting the data will increase the time and memory in DBS. What we want to trade here? Even if we reduce the data volume to passing into DBS, 1.6 millions of lumi inserted into DBS still is a big challenge. So my idea is to find a balance.

@yuyiguo
Copy link
Member Author

yuyiguo commented Apr 16, 2019

Breaking the bulkblock insertion into multiple API was what DBS2 did. All the people experienced DBS2 knew what the problems were. I would not discuss here. I do not think that we are going back that route unless we really want to redesign DBS for DBS4.

@belforte
Copy link
Member

belforte commented Apr 16, 2019 via email

@belforte
Copy link
Member

and clearly for a GenSim dataset there is absolutely no reason to be prepared to answer "give me the file which contains lumi number X". Why do we push that list in an Oracle table ? Masochism ?

@vkuznet
Copy link
Contributor

vkuznet commented Apr 16, 2019 via email

@belforte
Copy link
Member

@vkuznet clearly a leaner protocol will help. Maybe such a change can be kept inside current DBS client API to avoid changes to WMA/CRAB ?
E.g. flat lists surely are efficient but are error-prone to be given for use to naive code writers (like me), while a well coded and validated method can take the verbose thing and zip it at best.
Why not start with insertBulkBlock returning an error when it thinks input is too large ?
Then it can surely relax the limits once is able to reduce to more compact structure
and evaluate that.

OTOH I hope we can also make some progress on what exactly we need from DBS. Even if we could store 10M lumis for one block, do we really want to do it ?

@vkuznet
Copy link
Contributor

vkuznet commented Apr 16, 2019 via email

@yuyiguo
Copy link
Member Author

yuyiguo commented Apr 16, 2019 via email

@bbockelm
Copy link
Contributor

Is it possible the issue is not the size of the lumi information but rather how we are loading it?

That is, any web server worth its salt should be able to easily handle a 200MB POST -- however, it's going to be extremely difficult to manage such a thing if all 200MB have to be buffered to memory at once! @vkuznet - does the frontend need to load the full POST before it can start proxying the request to the remote side?

A few thoughts:

  1. How many APIs (or API implementations) need "fixed"?
  2. Do we need to switch to a streaming JSON decoder/encoder? Is there a reason to render the whole structure in memory inside DBS?
  3. If we are treating the lumi information as opaque blobs, why not compress them and never fully decompress on the server side?

Looks like a little medicine in the implementation might be able to go a long way, especially with respect to a streaming JSON decoder.

@vkuznet
Copy link
Contributor

vkuznet commented Apr 18, 2019 via email

@belforte
Copy link
Member

belforte commented Apr 18, 2019 via email

@belforte
Copy link
Member

belforte commented Apr 18, 2019 via email

@vkuznet
Copy link
Contributor

vkuznet commented Apr 18, 2019 via email

@vkuznet
Copy link
Contributor

vkuznet commented Apr 19, 2019

Here is fully working example of jsonstreamer (save it as jsonstreamer.py):

#!/usr/bin/env python
import cherrypy
import json
from json import JSONEncoder

def jsonstreamer(func):
    """JSON streamer decorator"""
    def wrapper (self, *args, **kwds):
        """Decorator wrapper"""
        cherrypy.response.headers['Content-Type'] = "application/json"
        func._cp_config = {'response.stream': True}
        data = func (self, *args, **kwds)
        yield '{"data": ['
        if  isinstance(data, dict):
            for chunk in JSONEncoder().iterencode(data):
                yield chunk
        elif  isinstance(data, list) or isinstance(data, types.GeneratorType):
            sep = ''
            for rec in data:
                if  sep:
                    yield sep
                for chunk in JSONEncoder().iterencode(rec):
                    yield chunk
                if  not sep:
                    sep = ', '
        else:
            msg = 'jsonstreamer, improper data type %s' % type(data)
            raise Exception(msg)
        yield ']}'
    return wrapper

@jsonstreamer
def test(data):
    return data

data = {"foo":1, "bla":[1,2,3,4,5]}
print('JSON dumps')
print(json.dumps(data))
print('JSON stream')
for chunk in test(data):
    print(chunk)

Now if you'll run it python ./jsonstreamer.py you'll get the following output

JSON dumps
{"foo": 1, "bla": [1, 2, 3, 4, 5]}
JSON stream
{"data": [
{
"foo"
:
1
,
"bla"
:
[1
, 2
, 3
, 4
, 5
]
}
]}

Now we only need to write server side which will read chunks and then compose JSON object.

@vkuznet
Copy link
Contributor

vkuznet commented Apr 19, 2019

You can check it out with more sophisticated nested python structure, e.g.

rdict = {"fl":[1,2,3], 'name': 'bla'}
data = {"foo":1, "nested":[rdict for _ in range(10)]}
print('JSON dumps')
print(json.dumps(data))
print('JSON stream')
for chunk in test(data):
    print(chunk)

but I will not paste the output of this since it is kind of big.

@vkuznet
Copy link
Contributor

vkuznet commented Apr 19, 2019

And, now I completed full example, you can see it here: https://gist.github.com/vkuznet/e90b5a7cc92005df7d33877abde3206f

It provides the following:

  • jsonstreamer decorator
  • test function which uses this decorator
  • example based on StringIO to hold json stream
  • decoder to read json stream
  • function to measure memory usage of objects

If you'll run the code you'll get the following output:

JSON dumps
{"foo": 1, "bla": [1, 2, 3, 4, 5]}
size: 592
JSON stream
{"foo": 1, "bla": [1, 2, 3, 4, 5]}
size: 71
decoded output
{"foo": 1, "bla": [1, 2, 3, 4, 5]}
size: 908

So even in this basic example the original dict {"foo": 1, "bla": [1, 2, 3, 4, 5]} consumes 592 bytes, its json stream representation consumes only 71 bytes, while decoded object consumes 908 bytes. As you can see json stream is 8x smaller of original object and 12x smaller of decoded one. You may ask relevant question why decoded object is larger than original one. The answer is related to the way how python allocates the memory (in short it allocates more than necessary). Fee free to use more sophisticated/realistic DBS dicts to see the numbers.

@vkuznet
Copy link
Contributor

vkuznet commented Jan 16, 2020

The corresponding PR which provides support for different input formats can be found here: #618

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants