-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in sEV_recognizer - output path file creation issue w/ multiple samples? #13
Comments
You have the right understanding of how to input files for SEVtras. However, this is a bug in SEVtras. The code "os.mkdir" cannot create a directory if the parent directory does not exit. So, if you use the argument of By the way, the output "1 1" means that SEVtras finds only one represented gene for sEV identification. So I recommend to lower the argument |
Dear Dr. He, Thank you so much for the quick response! I will try lowering the alpha to 0.09. I had 3 other questions that came up in the meantime:
In my previous runs, it took over 1 hour to complete 1/6 sample with 32 cpus (predefine set to 30) and 128gb of memory. After looking at some of your other threads, I see that this time might be unusually large (#10). However, I am running SEVtras on Mac (via python script) and the multiprocessing appears to be working, and using 10x Genomics scRNA-seq data. Any ideas on how to speed up processing time?
I was wondering if SEVtras should be run on all samples, or if SEVtras should be split and run on samples by condition. For example, I am working with a dataset that contains 3 healthy and 3 diseased samples -- should I run SEVtras once on all 6, or opt to do 2 runs of 3 samples?
Finally, I was hoping to inquire about some of the downstream steps after sEV_recognizer is done running.
Thank you again for your time and all your package support! |
Thank you for your meaningful questions to SEVtras.
|
Dear Dr. He, Thank you for your detailed responses! Your support is greatly appreciated. I wanted to clarify 3_2 --> how would we go about adding cell type information to raw and unfiltered data? Most standard workflows seem to apply some sort of filtering (i.e., number of genes), or regressing (i.e., mitochondrial genes) prior to cell type annotation steps. Are you suggesting to bypass this standard workflow when working with the adata_cell.raw object? Or, are you instead suggesting that we should just run the standard preprocessing pipeline, but using the raw_matrix (rather than the filtered_matrix). On a different note, I proceeded with the analysis, setting 'Xraw = False', and using adata_cell (regular 10x filtered matrix, regular Seurat preprocessing steps), similar to the recommendations you made prior. I was able to successfully run ESAI_calculator, but noticed that my ESAIumap plots look slightly different. For some reason, my ESAIumap visualization appears to have a high alpha for the yellow sEV cluster; but in the example plot you provided, the sEV cluster is removed (to better visualize the sEV cell type contributions). |
If 3_2 refers to how to add cell type information, my suggestion is similar to the answer of 3_3. You can obtain the cell type information based on your own processing procedure, such as filtering or regressing. And the input for SEVtras also depends on your mind with the choice of |
Hi 123chrisc, can I get your email? I want to get the code to generate adata_cell in ESAI_calculator.(My email is [email protected]) |
Hello! I appear to be running into issues with initializing the correct file paths/sample file. I am a relatively new python coder, so my apologies if I am providing insufficient information. Please let me know and I will do my best to correct.
For context, my directory is formatted as such:
raw_data
--sample1
----raw_feature_bc_matrix
------barcodes.tsv.gz
------features.tsv.gz
------matrix.tsv.gz
--sample2
----raw_feature_bc_matrix
------barcodes.tsv.gz
------features.tsv.gz
------matrix.tsv.gz
... --sample6
Within the raw_data folder, I have the sample_file.txt file, containing the relative paths to my files (attached). sample_file.txt. I initially tried entering the absolute file paths (as recommended by documentation, 'Here, first parameter was the abosulte path of each sample row by row.'), but received the same issue as below.
When I run the following code:
SEVtras.sEV_recognizer(input_path='./',sample_file='./raw_data/sample_file.txt', out_path='./sev_results', species='Homo',dir_origin=False,predefine_threads=30)
I receive the following output:
0 1
1 1
FileNotFoundError Traceback (most recent call last)
/tmp/ipykernel_90051/889495166.py in
----> 1 SEVtras.sEV_recognizer(input_path='./',sample_file='./raw_data/sample_file.txt', out_path='./sev_results', species='Homo',dir_origin=False,predefine_threads=30)
~/anaconda3/envs/SEVtras_env/lib/python3.7/site-packages/SEVtras/main.py in sEV_recognizer(sample_file, out_path, input_path, species, predefine_threads, get_only, score_t, search_UMI, alpha, dir_origin)
155 pass
156 else:
--> 157 os.mkdir(str(out_path) + '/tmp_out/' + sample)
158
159 adata.write(str(out_path) + '/tmp_out/' + sample + '/raw_' + sample + '.h5ad')
FileNotFoundError: [Errno 2] No such file or directory: './sev_results/tmp_out/raw_data/sample1/raw_feature_bc_matrix'
I am hoping for some guidance on how to tackle, or some increased clarity on the correct file naming/path procedures. Thank you!
The text was updated successfully, but these errors were encountered: