You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It uses whynot which is by default configured to update the production database, so testing is doubly difficult.
I propose we rewrite the process of updating and generating the databanks to solve the issues above.
This requires a lot of thought to determine what responsibilities should be in this process and which can be managed by separate services. For example, should generated databanks (e.g. hssp) be a separate service that watches for changes in its dependencies/checks to see what's missing?
The text was updated successfully, but these errors were encountered:
I've started https://github.com/cmbi/databanks2 since this issue was made but recently I'm wondering if it's the right approach. It uses Makefiles as in this repository (albeit in a much better way).
The problem is that it offers no API, which is useful for solving issues like cmbi/mrs#44 and #3.
The current databanks scripts do not scale at all. Distributed storage and processing platforms offer a much nicer way to process PDB and mmCIF files to create the other databanks. The problem is that all we have are a couple of large supermicrocomputers rather than many commodity servers. Moreover, the network speed is only 100Mbps. The market leader in distributed processing is Hadoop with HDFS; however, this works better for fewer small files, whereas the databanks are composed of many small files.
I'm in favour of moving to a distributed platform like Hadoop.
This projects is unwieldly:
update-databanks
script is difficult to understand and poorly commented;whynot
which is by default configured to update the production database, so testing is doubly difficult.I propose we rewrite the process of updating and generating the databanks to solve the issues above.
This requires a lot of thought to determine what responsibilities should be in this process and which can be managed by separate services. For example, should generated databanks (e.g. hssp) be a separate service that watches for changes in its dependencies/checks to see what's missing?
The text was updated successfully, but these errors were encountered: