Skip to content

Latest commit

 

History

History
277 lines (165 loc) · 12.6 KB

api-and-example.md

File metadata and controls

277 lines (165 loc) · 12.6 KB

API documentation

When entering Python interactive console, a set of modules and variables are automatically loaded for access of data. The loaded variables are as followed:

  1. cluster_viewing, module cluster_viewing
  2. cluster_enrichment, module cluster_enrichment
  3. cluster_export, module cluster_export
  4. ses, instance of Session
  5. clu, instance of Clusters
  6. iid, instance of Index
  7. ide, instance of IdentificationLUT
  8. rks, instance of RankedSpectra

This documentation only includes methods and properties that are frequently used, some internal components are not documented.

Clusters are stored as Graph object from graph_tool and it will be mentioned frequently, to better understand it or manipulate it in depth, see the documentation here.

cluster_viewing

draw_cluster_interactive(graph, force_draw_edge=False)

This function is equivalent to entering a cluster ID in cluster viewer except that the input graph is a graph object instead of ID. It draws the cluster in an interactive window. If the number of edges of supplied cluster is more than 10000, edges will not be drawn to improve performance. To override this behaviour, set force_draw_edge to True.

draw_cluster_save(graph, path=Path.cwd(), resolution=(4000, 4000))

This function is equivalent to command save in cluster viewer except that the input graph is a graph object instead of ID. It draws the cluster to a file specified by path. resolution is a tuple of horizontal and vertical pixels which specifies the resolution of the image.

cluster_enrichment

After clustering, clusters do not contain information of identification of spectra within. Cluster enrichment appends such information and internalize it in clusters. When a cluster is visualized, this process is first done so that nodes can be colored by identifications. Since this process involves random access of identification database, it takes a considerable amount of time when applied on many clusters.

enrich_clusters(update=False, num_of_threads=os.cpu_count())

Enrich all clusters. Clusters that are already enriched are skipped if update is False. This method utilizes multi-processing to improve performance. The number of processes is specified by num_of_threads or all logical CPUs if it is not specified.

enrich_one(cluster_id, update=False)

Enrich one cluster. If the supplied cluster is already enriched and update is False, no calculation will be done.

cluster_export

export_clusters(file, num_of_threads=os.cpu_count())

Export all clusters to a text file specified by file which expects a Path object. This method utilizes multi-processing to improve performance. The number of processes is specified by num_of_threads or all logical CPUs if it is not specified.

export_one(file, cluster_id)

Export one cluster to a text file specified by file which expects a Path object. If the file already exists, the cluster will be appended to it.

Session

name

Session name, specified by --name= argument.

creation_time

The time when this session is created.

config

Configuration used in this clustering session, specified by --config=. All parameters are printed when it is printed with print().

ms_exp_files

Array of Path objects pointing to all MS experiment files.

iden_files

Array of Path objects pointing to identification files.

Clusters

When iterated, it fetches row by row from the SQLite database and yields Cluster objects. To prevent high RAM consumption, the graph property of yielded objects is lazy. The graph data is fetched from database when the property is accessed.

file_path

File path to the SQLite database storing all clusters.

get_cluster(cluster_id)

Returns Cluster object.

exists(cluster_id)

Check does a cluster exist.

Cluster

num_of_nodes

Number of nodes in this cluster.

num_of_edges

Number of edges in this cluster.

num_of_identifications

Number of identifications in this cluster. Not None only if the cluster is enriched.

identifications

List of string representations of all identifications found in the cluster. Not None only if the cluster is enriched.

major_identification

The identification shared by the most identified spectra. Not None only if the cluster is enriched.

identified_ratio

The ratio of identified spectra. 1 if all spectra have identification found in imported identification files. Not None only if the cluster is enriched.

average_precursor_mass

Average value of precursor mass of spectra in this cluster.

graph

Graph object from graph_tool. This property is lazy, the pickled object is loaded when it is accessed for the first time. The graph contains a set of property maps listed below:

  1. Vertex property map iid, internal ID of spectra.
  2. Edge property map dps, dot products of spectra pairs of edges.

If the graph is enriched, it contains additional property maps listed below:

  1. Vertex property map ide, identifications of spectra, an integer being the key to access the string value stored in graph property map ide . -1 if a spectrum has no identification.
  2. Vertex property map prb, probability of identification.
  3. Graph property map ide, Python dictionary of all identifications keyed by integers.

For detail about property map, see here.

Index

Index is an array of lightweight representation of spectra. An entry of a spectrum can be fetched with index_instance[internal_id].

Entry

Entry is a pointer of a spectrum.

get_file_path()

Get the path to the MS experiment file containing the spectrum. Return Path object.

get_identification()

Search identification database and get the identification of the spectrum, return None if it is not identified.

get_spectrum()

Return Spectrum object. This method reads the MS experiment file and get the peaks of the spectrum out of it.

Spectrum

Spectrum contains peak information of a spectrum.

mz

Array of MZ values of peaks.

intensity

Array of intensity values of peaks.

ranked

A boolean of is this spectrum rank-transformed.

clip(mz_range=None, copy=True)

Remove all peaks that are outside the specified MZ range. mz_range is a tuple of lower limit and upper limit. If it is None, the method uses the value in session configuration. Return a new Spectrum object if copy is True. Otherwise, the removal is applied on the current instance, and the current instance is returned.

remove_precursor(removal_range=None, true_precursor_mass=None, copy=True)

Remove all peaks near the precursor mass. removal_range is a tuple of lower addition and upper addition. For example, if (-20, 20) is used and the precursor mass is 400, peaks within 380-420 will be removed. true_precursor_mass is a boolean telling the method should the precursor mass be divided by the precursor charge. If removal_range or true_precursor_mass is None, the method uses the value in session configuration. Return a new Spectrum object if copy is True. Otherwise, the removal is applied on the current instance, and the current instance is returned.

rank_transform(num_of_peaks=None, bins_per_th=None, copy=True, normalize=True)

Rank-transform the spectrum. num_of_peaks specifies the number of the highest peaks extracted. bins_per_th specifies the resolution of bins by the number of bins per Thomson in MZ axis. For example, bin size will be 0.25Th if bins_per_th is 4. normalize decides should the ranks of peaks be normalized to become intensities. If num_of_peaks or bins_per_th is None, the method uses the value in session configuration. Return a new Spectrum object if copy is True. Otherwise, the removal is applied on the current instance, and the current instance is returned.

ranked_dot_product(target)

Calculate the dot product of two rank-transformed spectra. target expects another Spectrum object. If either one of spectra is not rank-tranformed, None is returned instead.

plot(against=None, x_limit=(0, 2000), verificative=True, show_iden=True, highlight=True)

Plot the spectrum alone or against another spectrum specified by against which expects a Spectrum object. x_limit is a tuple of lower end and upper end of a range limiting the range of MZ axis being shown. verificative decides should the spectrum be processed according to the session configuration first, which includes clipping, removing precursor peaks and rank-transformation. If show_iden is True, identification database will be searched and if the spectrum has identification, the string representation of the identification will be printed in the plotted figure. If highlight is True and the spectrum has identification, the spectrum will be compared with a theoretical spectrum of the identification. Peaks matching ion peaks in the theoretical spectrum will be colored by the ion type.

IdentificationLUT

file_path

File path to the SQLite database storing all imported identifications.

get_identification(ms_exp_file_name, native_id):

Get the identification by MS experiment file name specified by ms_exp_file_name and the scan number specified by native_id. ms_exp_file_name is the name of the file without parent path and the extension. For example, if the path of a file is /path/to/file.mzXML, the supplied name should be file. Return None if no identification is found.

RankedSpectra

All rank-transformed spectra that was used to compute dot products. This is the data uploaded to GPU.

mz

2D array. The first dimension is spectra and the second dimension is the MZ values of peaks of each spectrum.

intensity

2D array. The first dimension is spectra and the second dimension is the intensity of peaks of each spectrum.

num_of_peaks

An integer of the number of peaks in each spectrum. All spectra must have the same number of peaks.

Examples

# Get cluster with ID 1301
cluster = clu.get_cluster(1301)

# Get vertices with identification n[33]MM[147]ENINAFQK[160]/2
graph = cluster.graph
search = 'n[33]MM[147]ENINAFQK[160]/2'
for key, value in graph.gp['ide'].items():
    if key == search: break
vertices = [vtx for vtx in graph.iter_vertices() if key == graph.vp['ide'][vtx]]

# Get the edge with lowest dot product
edges_and_dp = graph.iter_edges([graph.ep['dps']])
edge = min(edges_and_dp, key=lambda e: e[-1])
edge = graph.edge(edge[0], edge[1])

# Plot the spectrum pair of the edge
s_internal_id = graph.vp['iid'][e.source()]
t_internal_id = graph.vp['iid'][e.target()]
s_entry = iid[s_internal_id]
t_entry = iid[t_internal_id]
s_spectrum = s_entry.get_spectrum()
t_spectrum = t_entry.get_spectrum()
s_spectrum.plot(t_spectrum)

# Get largest cluster
largest = max(clu, key=lambda c: c.num_of_nodes)

# Get clusters containing a specific identification
search = 'n[33]VFVDDGLISLK[160]/2'
clusters = [c for c in clu if search in c.identifications.keys()]