You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Before you fully released the training data, I studied your prediction code and noticed that when using the pretrained Boltz model for prediction, if there are no precomputed MSA results locally, it will call the MSA server specified by msa_server_url. When there are multiple protein chains in a single prediction, it calls the run_mmseqs2 function with use_pairing set to True.
I'd like to ask about the results when using run_mmseqs with use_pairing=True - it seems the results for each chain should be paired with other chains (if they have the same key, see key definition in
), and these are combined with unpaired results. For each entity, results are written to a separate CSV file. When parsing the CSV results, there's a deduplication process, so each chain's MSA results are either paired or unpaired, with no overlap between the two since duplicates from unpaired results are removed if they already exist in paired results. From the paper, it seems the taxonomy information is used to serve the pairing of MSA results - is this to obtain paired MSAs?
Before you release the raw data processing pipeline, I thought I could use the prediction's data processing method to obtain MSAs for training. To speed up MSA retrieval, I set up a local MSA colabfold server and replaced the API URL. I've already obtained some MSAs through the prediction pipeline - does this mean I no longer need steps 5 and 6 from your training.md, since I have already obtained paired results using run_mmseqs with use_pairing=True?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Hello, thank you for your great work.
Before you fully released the training data, I studied your prediction code and noticed that when using the pretrained Boltz model for prediction, if there are no precomputed MSA results locally, it will call the MSA server specified by msa_server_url. When there are multiple protein chains in a single prediction, it calls the run_mmseqs2 function with use_pairing set to True.
I'd like to ask about the results when using
run_mmseqs
withuse_pairing=True
- it seems the results for each chain should be paired with other chains (if they have the same key, see key definition inboltz/src/boltz/main.py
Line 215 in 9d88b09
Before you release the raw data processing pipeline, I thought I could use the prediction's data processing method to obtain MSAs for training. To speed up MSA retrieval, I set up a local MSA colabfold server and replaced the API URL. I've already obtained some MSAs through the prediction pipeline - does this mean I no longer need steps 5 and 6 from your
training.md
, since I have already obtained paired results usingrun_mmseqs
withuse_pairing=True
?Thanks in advance!
The text was updated successfully, but these errors were encountered: