Skip to content

How to find the selected isoforms

Sina Majidian edited this page Feb 4, 2025 · 1 revision

To find information on selected isoforms, please follow these instructions:

FastOMA/Nextflow provides progress updates as it runs. If you've already saved the output, you can use grep to extract the relevant information:

 $ grep "infer_roothogs " fastoma_log | tail -3
[xx/yyy] process > infer_roothogs (1)             [100%] 1 of 1 ✔
[xx/yyy] process > infer_roothogs (1)             [100%] 1 of 1 ✔

For each run there is an id xx/yyy, which is the starting of the relevant directory in the work folder in the same folder as fastoma's out (fastoma_output_dir ). Instead of xx/yyy use the id in your output. Then you can cd to this folder. Note that you need to use tab to use the autocomplete feature of terminal to fill the rest of folder: $ cd work/94/930a using tab results in cd work/94/930abc390e3b83a310bb5bdbcbdbd9

then you should be able to see the selected_isoforms folder

$ ls selected_isoforms/
ARTHA_selected_isoforms.tsv  SOLCW_selected_isoforms.tsv  TS117_selected_isoforms.tsv

$ head -n 3 selected_isoforms/TS222_selected_isoforms.tsv
Sopim_TS222_01T000001.1 Sopim_TS222_01T000001.1
Sopim_TS222_01T000002.1 Sopim_TS222_01T000002.1

If you don't have the fastoma log to find the relevant directory you can use find in the work folder

find . -name .command.sh  | xargs grep "fastoma-infer-roothogs"

The output of this command shows the directory inside work that includes selected_isoforms. There might be a few hits, you can ls and check which one has selected_isoforms.

Regarding the number of gained genes, we use the pyham package for phylostratigraphy. The count of gained genes also includes all genes that were not mapped to any groups and not selected isoforms. Note that this only occurs at the extant species level, and ancestral levels (internal nodes) are accurate.