Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, here's a summary of the contents of the pull request:
Please let me know if this doesn't work for you. It's supposed to work, but on one system that I tested, I needed to force re-installation of miniconda in order to get BISCUT's Python dependencies to install correctly (via
reticulate::install_miniconda(force = T)
).breakpoint file directory as an input. Both functions can use parallelization when the cores argument is specified. I made some other optimizations to speed runtime. Besides printing output files, I also made the main function return peak information within R. Here is example usage:
The output directory and reference files (abslocs, genelocs) can now be user-specified, as can parameters that could previously only be changed by editing source code (output directory, telcent_thres, amplitude_threshold, confidence interval, n bootstraps). The previously hardcoded data and parameter values are used as defaults when unspecified by the user.
The code was doing a lot of error suppression which could prevent the user from knowing about potential problems. I removed most of this and resolved any errors and warnings. There are probably still some that could be triggered by situations or system configurations that I didn't test.
Questions/issues
There appeared to be an error in the chromosome coordinates file (now at inst/extdata/SNP6_hg19_chromosome_locs_200605.txt). Every chromosome has q_start1 = centromere + offset except for chr15, where instead q_start1 = q_end1 (= 2409817111). It appears that the q_start1 value was accidentally copied from q_end1, unless there is something I don't understand about the format. I changed the value to match the other chromosomes. Please let me know if it should be changed back.
FYI,
process_for_ggplot_jagged()
doesn't run, partly because it wants to load files ending in plotpeaks.txt. I couldn't find any code in the project that generates such files. Therefore,do_biscut()
currently just callsprocess_for_ggplot()
.plot_fig_2()
is runnable, but the plots look odd. I couldn't tell which plot(s) from the paper they are supposed to resemble. If there isn't a continuing need for the function, I could remove the filter_BISCUT_arms() call from the main function to avoid generating unused files.Previously, there were hard-coded genes of interest--
c('CDKN2A','TERT','MYC','BAP1','TERC','TP53','ARID1A','EGFR','PPM1D')
--in BISCUT_peak_finding.R. I removed this declaration, as the user can instead callfilter_BISCUT_knowngenes
(via an R wrapper function,extract_gene_results()
) on any desired genes. However, sinceprocess_for_ggplot_jagged()
isn't running, I am not sure of the use case forfilter_BISCUT_knowngenes()
.Suggestions:
?do_biscut()
or?make_breakpoint_files()
. Therefore, some of this information could be removed from the README. If you think it is a good idea, a compressed segment file, perhaps covering fewer samples than the PANCAN file, could be saved in the repository so that the README could give runnable code on an example segment file.do_biscut()
is currently both returned within R and printed to output files, in somewhat different formats. If you are interested, I could further prune down output data and make all output file writing optional. I could also reduce the breakpoint file directory to a single breakpoint file, and makedo_biscut()
accept either the single file or a data.table/data.frame version of it.