When entering Python interactive console, a set of modules and variables are automatically loaded for access of data. The loaded variables are as followed:
cluster_viewing
, module cluster_viewingcluster_enrichment
, module cluster_enrichmentcluster_export
, module cluster_exportses
, instance of Sessionclu
, instance of Clustersiid
, instance of Indexide
, instance of IdentificationLUTrks
, instance of RankedSpectra
This documentation only includes methods and properties that are frequently used, some internal components are not documented.
Clusters are stored as Graph
object from graph_tool and it will be mentioned frequently, to better understand it or manipulate it in depth, see the documentation here.
This function is equivalent to entering a cluster ID in cluster viewer except that the input graph
is a graph object instead of ID. It draws the cluster in an interactive window. If the number of edges of supplied cluster is more than 10000, edges will not be drawn to improve performance. To override this behaviour, set force_draw_edge
to True
.
This function is equivalent to command save
in cluster viewer except that the input graph
is a graph object instead of ID. It draws the cluster to a file specified by path
. resolution
is a tuple of horizontal and vertical pixels which specifies the resolution of the image.
After clustering, clusters do not contain information of identification of spectra within. Cluster enrichment appends such information and internalize it in clusters. When a cluster is visualized, this process is first done so that nodes can be colored by identifications. Since this process involves random access of identification database, it takes a considerable amount of time when applied on many clusters.
Enrich all clusters. Clusters that are already enriched are skipped if update
is False
. This method utilizes multi-processing to improve performance. The number of processes is specified by num_of_threads
or all logical CPUs if it is not specified.
Enrich one cluster. If the supplied cluster is already enriched and update
is False
, no calculation will be done.
Export all clusters to a text file specified by file
which expects a Path
object. This method utilizes multi-processing to improve performance. The number of processes is specified by num_of_threads
or all logical CPUs if it is not specified.
Export one cluster to a text file specified by file
which expects a Path
object. If the file already exists, the cluster will be appended to it.
Session name, specified by --name=
argument.
The time when this session is created.
Configuration used in this clustering session, specified by --config=
. All parameters are printed when it is printed with print()
.
Array of Path
objects pointing to all MS experiment files.
Array of Path
objects pointing to identification files.
When iterated, it fetches row by row from the SQLite database and yields Cluster objects. To prevent high RAM consumption, the graph
property of yielded objects is lazy. The graph data is fetched from database when the property is accessed.
File path to the SQLite database storing all clusters.
Returns Cluster object.
Check does a cluster exist.
Number of nodes in this cluster.
Number of edges in this cluster.
Number of identifications in this cluster. Not None
only if the cluster is enriched.
List of string representations of all identifications found in the cluster. Not None
only if the cluster is enriched.
The identification shared by the most identified spectra. Not None
only if the cluster is enriched.
The ratio of identified spectra. 1 if all spectra have identification found in imported identification files. Not None
only if the cluster is enriched.
Average value of precursor mass of spectra in this cluster.
Graph
object from graph_tool. This property is lazy, the pickled object is loaded when it is accessed for the first time. The graph contains a set of property maps listed below:
- Vertex property map
iid
, internal ID of spectra. - Edge property map
dps
, dot products of spectra pairs of edges.
If the graph is enriched, it contains additional property maps listed below:
- Vertex property map
ide
, identifications of spectra, an integer being the key to access the string value stored in graph property mapide
. -1 if a spectrum has no identification. - Vertex property map
prb
, probability of identification. - Graph property map
ide
, Python dictionary of all identifications keyed by integers.
For detail about property map, see here.
Index
is an array of lightweight representation of spectra. An entry of a spectrum can be fetched with index_instance[internal_id]
.
Entry
is a pointer of a spectrum.
Get the path to the MS experiment file containing the spectrum. Return Path
object.
Search identification database and get the identification of the spectrum, return None
if it is not identified.
Return Spectrum object. This method reads the MS experiment file and get the peaks of the spectrum out of it.
Spectrum
contains peak information of a spectrum.
Array of MZ values of peaks.
Array of intensity values of peaks.
A boolean
of is this spectrum rank-transformed.
Remove all peaks that are outside the specified MZ range. mz_range
is a tuple of lower limit and upper limit. If it is None
, the method uses the value in session configuration. Return a new Spectrum
object if copy
is True
. Otherwise, the removal is applied on the current instance, and the current instance is returned.
Remove all peaks near the precursor mass. removal_range
is a tuple of lower addition and upper addition. For example, if (-20, 20) is used and the precursor mass is 400, peaks within 380-420 will be removed. true_precursor_mass
is a boolean telling the method should the precursor mass be divided by the precursor charge. If removal_range
or true_precursor_mass
is None
, the method uses the value in session configuration. Return a new Spectrum
object if copy
is True
. Otherwise, the removal is applied on the current instance, and the current instance is returned.
Rank-transform the spectrum. num_of_peaks
specifies the number of the highest peaks extracted. bins_per_th
specifies the resolution of bins by the number of bins per Thomson in MZ axis. For example, bin size will be 0.25Th if bins_per_th
is 4. normalize
decides should the ranks of peaks be normalized to become intensities. If num_of_peaks
or bins_per_th
is None
, the method uses the value in session configuration. Return a new Spectrum
object if copy
is True
. Otherwise, the removal is applied on the current instance, and the current instance is returned.
Calculate the dot product of two rank-transformed spectra. target
expects another Spectrum
object. If either one of spectra is not rank-tranformed, None
is returned instead.
Plot the spectrum alone or against another spectrum specified by against
which expects a Spectrum
object. x_limit
is a tuple of lower end and upper end of a range limiting the range of MZ axis being shown. verificative
decides should the spectrum be processed according to the session configuration first, which includes clipping, removing precursor peaks and rank-transformation. If show_iden
is True
, identification database will be searched and if the spectrum has identification, the string representation of the identification will be printed in the plotted figure. If highlight
is True
and the spectrum has identification, the spectrum will be compared with a theoretical spectrum of the identification. Peaks matching ion peaks in the theoretical spectrum will be colored by the ion type.
File path to the SQLite database storing all imported identifications.
Get the identification by MS experiment file name specified by ms_exp_file_name
and the scan number specified by native_id
. ms_exp_file_name
is the name of the file without parent path and the extension. For example, if the path of a file is /path/to/file.mzXML
, the supplied name should be file
. Return None
if no identification is found.
All rank-transformed spectra that was used to compute dot products. This is the data uploaded to GPU.
2D array. The first dimension is spectra and the second dimension is the MZ values of peaks of each spectrum.
2D array. The first dimension is spectra and the second dimension is the intensity of peaks of each spectrum.
An integer of the number of peaks in each spectrum. All spectra must have the same number of peaks.
# Get cluster with ID 1301
cluster = clu.get_cluster(1301)
# Get vertices with identification n[33]MM[147]ENINAFQK[160]/2
graph = cluster.graph
search = 'n[33]MM[147]ENINAFQK[160]/2'
for key, value in graph.gp['ide'].items():
if key == search: break
vertices = [vtx for vtx in graph.iter_vertices() if key == graph.vp['ide'][vtx]]
# Get the edge with lowest dot product
edges_and_dp = graph.iter_edges([graph.ep['dps']])
edge = min(edges_and_dp, key=lambda e: e[-1])
edge = graph.edge(edge[0], edge[1])
# Plot the spectrum pair of the edge
s_internal_id = graph.vp['iid'][e.source()]
t_internal_id = graph.vp['iid'][e.target()]
s_entry = iid[s_internal_id]
t_entry = iid[t_internal_id]
s_spectrum = s_entry.get_spectrum()
t_spectrum = t_entry.get_spectrum()
s_spectrum.plot(t_spectrum)
# Get largest cluster
largest = max(clu, key=lambda c: c.num_of_nodes)
# Get clusters containing a specific identification
search = 'n[33]VFVDDGLISLK[160]/2'
clusters = [c for c in clu if search in c.identifications.keys()]