Skip to content

Commit

Permalink
improve documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
vdumont86 committed Apr 4, 2017
1 parent a498928 commit 795293e
Show file tree
Hide file tree
Showing 78 changed files with 25,003 additions and 5 deletions.
4 changes: 4 additions & 0 deletions docs/.buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 713428654893a1da7141b1e1a48e11c6
tags: 645f666f9bcd5a90fca523b33c5a78b7
Empty file added docs/.nojekyll
Empty file.
Binary file added docs/_images/jupyter1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_images/jupyter2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_images/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,406 changes: 1,406 additions & 0 deletions docs/_sources/backup/backup.rst.txt

Large diffs are not rendered by default.

697 changes: 697 additions & 0 deletions docs/_sources/backup/excess_power.rst.txt

Large diffs are not rendered by default.

41 changes: 41 additions & 0 deletions docs/_sources/calculate_psd.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
.. _psdestimate:

Calculate Power Spectral Density (PSD)
======================================

The instrument's noise Power Spectral Density (PSD) will be used to whiten the data and help reveal the existence, or the absence, of repetitive patterns and correlation structures in the signal process. It will also determine the total bandwidth spanned by each of the filters that will subsequently be created. The first thing to do before calculating the PSD is to ensure that the time series data is converted into an array of floating values. ::

# Convert time series as array of float
data = ts_data.astype(numpy.float64)

The PSD is calculated by splitting up the signal into overlapping segments and scan through each segment to calculate individual periodogram. The periodograms from each segment are then averaged, reducing the variance of the individual power measurements. In order to proceed, we need to define the average method, ``avg_method``, that will be used to measure the PSD from the data. This can be specified with the ``--psd-estimation`` option. ::

# Average method to measure PSD from the data
avg_method = args.psd_estimation

One also needs to specify the length of each segment, ``seg_len``, as well as the separation between 2 consecutive segments, ``seg_stride``. Both parameters can be defined in second units with the ``--psd-segment-length`` and ``--psd-segment-stride`` arguments respectively and can then be converted into sample unit. ::

# The segment length for PSD estimation in samples
seg_len = int(args.psd_segment_length * args.sample_rate)
# The separation between consecutive segments in samples
seg_stride = int(args.psd_segment_stride * args.sample_rate)

We then use the `Welch's method <https://en.wikipedia.org/wiki/Welch%27s_method>`_ to perform the power spectral density estimate using the `welch <http://ligo-cbc.github.io/pycbc/latest/html/_modules/pycbc/psd/estimate.html#welch>`_ module from the ``pycbc.psd`` library. What this will do is to compute the discrete Fourier transform for each PSD segment to produce invidual periodograms, and then compute the squared magnitude of the result. The individual periodograms are then averaged using the user-defined average method, ``avg_method``, and return the frequency series, ``fd_psd``, which will store the power measurement for each frequency bin. ::

# Lifted from the psd.from_cli module
fd_psd = psd.welch(data,avg_method=avg_method,seg_len=seg_len,seg_stride=seg_stride)
# Plot the power spectral density
plot_spectrum(fd_psd)
# We need this for the SWIG functions
lal_psd = fd_psd.lal()

One can display the power measurements, frequency array and frequency between consecutive samples, :math:`\Delta f` in Hertz, by printing the following variables: ::

print 'Display power measurements of the first 10 frequency bins'
print fd_psd[:10]
print 'Display central frequency of the first 10 bins'
print fd_psd.sample_frequencies[:10]
print 'Display the frequency separation between bins'
print fd_psd.delta_f

:math:`\Delta f` corresponds to the inverse of a segment's length which is the smallest frequency (i.e. highest period) of detectable signals in each segment. The frequency range spans from 0 to the Nyquist frequency, i.e. half de the sampling rate.
15 changes: 15 additions & 0 deletions docs/_sources/calculate_spectral_correlation.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Two point spectral correlation
==============================

This part determines how much data on either side of the tukey window is to be discarded. Nominally, this means that one will lose ``window_fraction`` * ``args.psd_segment_length`` to corruption from the window, i.e. this is simply discarded. This is tuned to give an integer offset when used with ``args.psd_segment_length`` equal to 8, smaller windows will have fractions of integers, but larger powers of two will still preseve this (probably not a big deal in the end). ::

window_fraction = 0

The two point spectral correlation is then done with the :ref:`calculate_spectral_correlation <calculate_spectral_correlation>` function which will return both the Tukey window applied to the original time series data and the actual two-point spectral correlation function for the whitened frequency series from the applied whitening window. ::

# Do two point spectral correlation
window, spec_corr = calculate_spectral_correlation(seg_len,'tukey',window_fraction=window_fraction)
window = window.data.data
window_sigma_sq = numpy.mean(window**2)
# Pre scale the window by its root mean squared -- see eqn 11 of EP document
#window /= numpy.sqrt(window_sigma_sq)
37 changes: 37 additions & 0 deletions docs/_sources/check_filtering_settings.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
Checking filtering settings
===========================

The first thing to check is that the frequency of the high-pass filter (if defined) is below the minimum frequency of the filter bank. Indeed, a high-pass filter will only let pass frequency that are higher than the cutoff frequency (here defined by the ``strain_high_pass`` argument). If the high pass frequency is greater from the minimum frequency in the filter bank, the signal with frequencies lower than the cutoff frequency will get attenuated. ::

if args.min_frequency < args.strain_high_pass:
print >>sys.stderr, "Warning: strain high pass frequency %f is greater than the tile minimum frequency %f --- this is likely to cause strange output below the bandpass frequency" % (args.strain_high_pass, args.min_frequency)

In case the maximum frequency in the filter bank is not defined, we set it to be equal to the Nyquist frequency, i.e. half the sampling rate, which makes sense as a larger signal will not be able to get easily identifiable. ::

if args.max_frequency is None:
args.max_frequency = args.sample_rate / 2.0

If the bandwidth of the finest filter (``--tile-bandwidth`` argument, see section :ref:`construct_args <construct_args>` or the number of frequency channels (=--channels= argument) is not defined but the total spectral band is (``data_band``), one can then determined all the filter settings as follows: ::


if args.tile_bandwidth is None and args.channels is None:
# Exit program with error message
exit("Either --tile-bandwidth or --channels must be specified to set up time-frequency plane")
else:
# Define as assert statement that tile maximum frequency larger than its minimum frequency
assert args.max_frequency >= args.min_frequency
# Define spectral band of data
data_band = args.max_frequency - args.min_frequency
# Check if tile bandwidth or channel is defined
if args.tile_bandwidth is not None:
# Define number of possible filter bands
nchans = args.channels = int(data_band / args.tile_bandwidth) - 1
elif args.channels is not None:
# Define filter bandwidth
band = args.tile_bandwidth = data_band / (args.channels + 1)
assert args.channels > 1

The minimum frequency to be explored can be user-defined by using the ``--min-frequency`` option. ::

# Lowest frequency of the first filter
flow = args.min_frequency
13 changes: 13 additions & 0 deletions docs/_sources/compute_channel_renormalization.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Normalization of virtual channel
================================

The virtual channels will be used during the excesspower analysis to explore different frequency ranges around each PSD segments and look for possible triggers. Each channel is renormalized using the :ref:`compute_channel_renomalization <compute_channel_renomalization>` internal function. ::

# Initialise dictionary
mu_sq_dict = {}
# nc_sum additional channel adds
for nc_sum in range(0, int(math.log(nchans, 2))):
min_band = (len(filter_bank[0].data.data)-1) * filter_bank[0].deltaF / 2
print tprint(t0,t1),"Calculation for %d %d Hz channels" % (nc_sum+1, min_band)
nc_sum = 2**nc_sum - 1
mu_sq_dict[nc_sum] = compute_channel_renomalization(nc_sum, filter_bank, spec_corr, nchans)
33 changes: 33 additions & 0 deletions docs/_sources/construct_tiles.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
.. _tilebandwidth:

Constructing tiles of different bandwidth
=========================================

First and foremost, we define a clipping region in the data to be used to remove window corruption, this is non-zero if the ``window_fraction`` variable is set to a non-zero value. ::

print tprint(t0,t1),"Beginning tile construction..."
# Clip the boundaries to remove window corruption
clip_samples = int(args.psd_segment_length * window_fraction * args.sample_rate / 2)

In order to perform a multi-resolution search, tiles of many different bandwidths and durations will be scanned. We first need to setup a loop such that the maximum number of additional channel is equal to the base 2 logarithm of the total number of channels. The number of narrow band channels to be summed (``nc_sum``) would therefore be equal to 2 to the power of the current quantity of additional channels. ::

for nc_sum in range(0, int(math.log(nchans, 2)))[::-1]: # nc_sum additional channel adds
nc_sum = 2**nc_sum - 1
print tprint(t0,t1,t2),"Summing %d narrow band channels..." % (nc_sum+1)

The undersampling rate for this tile can be calculated using the channel frequency band and the number of narrow band channels to be summed such that the bandwidth of the tile is equal to ``band * (nc_sum + 1)``. ::

us_rate = int(round(1.0 / (2 * band*(nc_sum+1) * ts_data.delta_t)))
print >>sys.stderr, "Undersampling rate for this level: %f" % (args.sample_rate/us_rate)

"Virtual" wide bandwidth channels are constructed by summing the samples from multiple channels, and correcting for the overlap between adjacent channel filters. We then define the normalised channel at the current level and create a time frequency map for this tile using the :ref:`make_indp_tiles <make_indp_tiles>` internal function. In other word, we are constructing multiple sub-tiles for which we can determined the respective energy in the given frequency band. ::

mu_sq = mu_sq_dict[nc_sum]
sys.stderr.write("\t...calculating tiles...")
if clip_samples > 0:
tiles = make_indp_tiles(tf_map[:,clip_samples:-clip_samples:us_rate], nc_sum, mu_sq)
else:
tiles = make_indp_tiles(tf_map[:,::us_rate], nc_sum, mu_sq)
sys.stderr.write(" TF-plane is %dx%s samples... " % tiles.shape)
print >>sys.stderr, " done"
print "Tile energy mean: %f, var %f" % (numpy.mean(tiles), numpy.var(tiles))
41 changes: 41 additions & 0 deletions docs/_sources/create_filter_bank.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
.. _filterbank:

Computing the filter bank
=========================

The filter bank will create band-pass filters for each channel in the PSD frequency domain. The :ref:`create_filter_bank <create_filter_bank>` function will san the bandwidth from the central frequency of the first channel (i.e. flow+band/2) to final frequency of the last channel (i.e. band*nchans) in a increment equal to the frequency band. The filter's total extent in Fourier space is actually twice the stated bandwidth (FWHM). ::

# Define filters
filter_bank, fdb = create_filter_bank(fd_psd.delta_f, flow+band/2, band, nchans, fd_psd, spec_corr)

This function will returns 2 arrays: the ``filter_bank`` array which is a list of `COMPLEX16FrequencySeries <http://software.ligo.org/docs/lalsuite/lal/struct_c_o_m_p_l_e_x16_frequency_series.html>`_ arrays corresponding to each channel's filter, and the =fdb= array which provides the time-series from each filter. The length of each array is equal to the total number of channel (i.e. =nchans=). The filter's data, :math:`\Delta f` value, and first and last frequencies of any channel's filter can be displayed as followed: ::

# Print data of first channel's filter
print filter_bank[0].data.data
# Print frequency separation between 2 values in the first channel's filter
print filter_bank[0].deltaF
# Print first frequency of the first channel's filter
print filter_bank[0].f0
# Print last frequency of the first channel's filter (equal to twice the channel's bandwidth)
print filter_bank[0].f0+(len(filter_bank[0].data.data)-1)*filter_bank[0].deltaF

Further in the analysis, the following filters will used:
1. ``white_filter_ip``: Whitened filter inner products computed with themselves.
2. ``unwhite_filter_ip``: Unwhitened filter inner products computed with themselves.
3. ``white_ss_ip``: Whitened filter inner products computed between input adjacent filters.
4. ``unwhite_ss_ip``: Unwhitened filter inner products computed between input adjacent filters.

::
# This is necessary to compute the mu^2 normalizations
white_filter_ip = compute_filter_ips_self(filter_bank, spec_corr, None)
unwhite_filter_ip = compute_filter_ips_self(filter_bank, spec_corr, lal_psd)
# These two are needed for the unwhitened mean square sum (hrss)
white_ss_ip = compute_filter_ips_adjacent(filter_bank, spec_corr, None)
unwhite_ss_ip = compute_filter_ips_adjacent(filter_bank, spec_corr, lal_psd)

Some extra plots can also be made ::

tdb = convert_to_time_domain(fdb,sample_rate)
plot_bank(fdb)
plot_filters(tdb,flow,band)
47 changes: 47 additions & 0 deletions docs/_sources/create_tf_plane.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Create time-frequency map
=========================

We initialise a 2D zero array for a time-frequency map (``tf_map``) which will be computed for each frequency-domain filter associated to each PSD segment and where the filtered time-series for each frequency channels will be stored. The number of rows corresponds to the total number of frequency channels which is defined by the ``nchans`` variable. The number of columns corresponds to the segment length in samples (i.e. the number of samples covering one segment) which is defined by the ``seg_len`` variable. ::

# Initialise 2D zero array for time-frequency map
tf_map = numpy.zeros((nchans, seg_len), dtype=numpy.complex128)

We also initialise a zero vector for a temporary filter bank (``tmp_filter_bank``) that will store, for a given channel, the filter's values from the original filter bank (``filter_bank``) for that channel only. The length of the temporary filter bank is equal to the length of the PSD frequency series (``fd_psd``). ::

# Initialise 1D zero array
tmp_filter_bank = numpy.zeros(len(fd_psd), dtype=numpy.complex128)

We then loop over all the frequency channels. While in the loop, we first re-initialise the temporary filter bank with zero values everywhere along the frequency series. We then determine the first and last frequency of each channel and re-define the values of the filter in that frequency range based on the values from the original channel's filter from the original filter bank. ::

# Loop over all the channels
print tprint(t0,t1),"Filtering all %d channels..." % nchans
for i in range(nchans):
# Reset filter bank series
tmp_filter_bank *= 0.0
# Index of starting frequency
f1 = int(filter_bank[i].f0/fd_psd.delta_f)
# Index of ending frequency
f2 = int((filter_bank[i].f0 + 2*band)/fd_psd.delta_f)+1
# (FIXME: Why is there a factor of 2 here?)
tmp_filter_bank[f1:f2] = filter_bank[i].data.data * 2

We then extract the frequency series from the filter bank for that channel, which will be used as a template waveform to filter the actual data from the channel. ::

# Define the template to filter the frequency series with
template = types.FrequencySeries(tmp_filter_bank, delta_f=fd_psd.delta_f, copy=False)

Finally, we use the `matched_filter_core <http://ligo-cbc.github.io/pycbc/latest/html/pycbc.filter.html>`_ module from the ``pycbc.filter.matchedfilter`` library to filter the frequency series from the channel. This will return both a time series containing the complex signal-to-noise matched filtered against the data, and a frequency series containing the correlation vector. ::

# Create filtered series
filtered_series = filter.matched_filter_core(template,fs_data,h_norm=None,psd=None,
low_frequency_cutoff=filter_bank[i].f0,
high_frequency_cutoff=filter_bank[i].f0+2*band)

The `matched filter <http://en.wikipedia.org/wiki/Matched_filter>`_ is the optimal linear filter for maximizing the signal to noise ratio (SNR) in the presence of additive stochastic noise. The filtered time series is stored in the time-frequency map and can be used to produce a spectrogram of the segment of data being analysed. ::

# Include filtered series in the map
tf_map[i,:] = filtered_series[0].numpy()

The time-frequency map is a 2D array with a length that corresponds to the number of channels and a width equal to the number of sample present in one segment of data, i.e. segment's length in seconds times the the sampling rate. The map can finally be plotted with a :math:`\Delta t` corresponding to the sampling period of the original dataset (i.e. inverse of the original sampling rate), and :math:`\Delta f` is equal to the bandwidth of one channel. ::

plot_spectrogram(numpy.abs(tf_map).T,tmp_ts_data.delta_t,fd_psd.delta_f,ts_data.sample_rate,start_time,end_time,fname='%s/tf.png'%(segfolder))
56 changes: 56 additions & 0 deletions docs/_sources/create_xml.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
Extracting GPS time range
=========================

We use the `LIGOTimeGPS <http://software.ligo.org/docs/lalsuite/lal/group___l_a_l_datatypes.html#ss_LIGOTimeGPS>`_ structure from the =glue.lal= package to /store the starting and ending time in the dataset to nanosecond precision and synchronized to the Global Positioning System time reference/. Once both times are defined, the range of value is stored in a semi-open interval using the `segment <http://software.ligo.org/docs/glue/glue.__segments.segment-class.html>`_ module from the =glue.segments= package. ::

# Starting epoch relative to GPS starting epoch
start_time = LIGOTimeGPS(args.analysis_start_time or args.gps_start_time)
# Ending epoch relative to GPS ending epoch
end_time = LIGOTimeGPS(args.analysis_end_time or args.gps_end_time)
# Represent the range of values in the semi-open interval
inseg = segment(start_time,end_time)

Prepare output file for given time range
========================================

::

xmldoc = ligolw.Document()
xmldoc.appendChild(ligolw.LIGO_LW())
ifo = args.channel_name.split(":")[0]
proc_row = register_to_xmldoc(xmldoc, __program__, args.__dict__, ifos=[ifo],version=glue.git_version.id, cvs_repository=glue.git_version.branch, cvs_entry_time=glue.git_version.date)
# Figure out the data we actually analyzed
outseg = determine_output_segment(inseg, args.psd_segment_length, args.sample_rate, window_fraction)
ss = append_search_summary(xmldoc, proc_row, ifos=(station,), inseg=inseg, outseg=outseg)
for sb in event_list:
sb.process_id = proc_row.process_id
sb.search = proc_row.program
#sb.ifo, sb.channel = args.channel_name.split(":")
sb.ifo, sb.channel = station, setname
xmldoc.childNodes[0].appendChild(event_list)
fname = make_filename(station, inseg)
utils.write_filename(xmldoc, fname, gz=fname.endswith("gz"), verbose=True)

Plot trigger results
====================

::

events = SnglBurstTable.read(fname+'.gz')
#del events[10000:]
plot = events.plot('time', 'central_freq', "duration", "bandwidth", color="snr")
#plot = events.plot('time', 'central_freq', color='snr')
#plot.set_yscale("log")
plot.set_ylim(1e-0, 250)
t0 = 1153742417
plot.set_xlim(t0 + 0*60, t0 + 1*60)
#plot.set_xlim(t0 + 28, t0 + 32)
pyplot.axvline(t0 + 30, color='r')
cb = plot.add_colorbar(cmap='viridis')
plot.savefig("triggers.png")
Loading

0 comments on commit 795293e

Please sign in to comment.