improve documentation

GNOME-physics · Apr 4, 2017 · 795293e · 795293e
1 parent a498928
commit 795293e
Show file tree

Hide file tree

Showing 78 changed files with 25,003 additions and 5 deletions.
diff --git a/docs/.buildinfo b/docs/.buildinfo
@@ -0,0 +1,4 @@
+# Sphinx build info version 1
+# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
+config: 713428654893a1da7141b1e1a48e11c6
+tags: 645f666f9bcd5a90fca523b33c5a78b7
diff --git a/docs/.nojekyll b/docs/.nojekyll
diff --git a/docs/_images/jupyter1.png b/docs/_images/jupyter1.png
diff --git a/docs/_images/jupyter2.png b/docs/_images/jupyter2.png
diff --git a/docs/_images/overview.png b/docs/_images/overview.png
diff --git a/docs/_sources/backup/backup.rst.txt b/docs/_sources/backup/backup.rst.txt
diff --git a/docs/_sources/backup/excess_power.rst.txt b/docs/_sources/backup/excess_power.rst.txt
diff --git a/docs/_sources/calculate_psd.rst.txt b/docs/_sources/calculate_psd.rst.txt
@@ -0,0 +1,41 @@
+.. _psdestimate:
+
+Calculate Power Spectral Density (PSD)
+======================================
+
+The instrument's noise Power Spectral Density (PSD) will be used to whiten the data and help reveal the existence, or the absence, of repetitive patterns and correlation structures in the signal process. It will also determine the total bandwidth spanned by each of the filters that will subsequently be created. The first thing to do before calculating the PSD is to ensure that the time series data is converted into an array of floating values. ::
+
+  # Convert time series as array of float
+  data = ts_data.astype(numpy.float64)
+
+The PSD is calculated by splitting up the signal into overlapping segments and scan through each segment to calculate individual periodogram. The periodograms from each segment are then averaged, reducing the variance of the individual power measurements. In order to proceed, we need to define the average method, ``avg_method``, that will be used to measure the PSD from the data. This can be specified with the ``--psd-estimation`` option. ::
+
+  # Average method to measure PSD from the data
+  avg_method = args.psd_estimation
+
+One also needs to specify the length of each segment, ``seg_len``, as well as the separation between 2 consecutive segments, ``seg_stride``. Both parameters can be defined in second units with the ``--psd-segment-length`` and ``--psd-segment-stride`` arguments respectively and can then be converted into sample unit. ::
+
+  # The segment length for PSD estimation in samples
+  seg_len = int(args.psd_segment_length * args.sample_rate)
+  # The separation between consecutive segments in samples
+  seg_stride = int(args.psd_segment_stride * args.sample_rate)
+
+We then use the `Welch's method <https://en.wikipedia.org/wiki/Welch%27s_method>`_ to perform the power spectral density estimate using the `welch <http://ligo-cbc.github.io/pycbc/latest/html/_modules/pycbc/psd/estimate.html#welch>`_ module from the ``pycbc.psd`` library. What this will do is to compute the discrete Fourier transform for each PSD segment to produce invidual periodograms, and then compute the squared magnitude of the result. The individual periodograms are then averaged using the user-defined average method, ``avg_method``, and return the frequency series, ``fd_psd``, which will store the power measurement for each frequency bin. ::
+
+  # Lifted from the psd.from_cli module
+  fd_psd = psd.welch(data,avg_method=avg_method,seg_len=seg_len,seg_stride=seg_stride)
+  # Plot the power spectral density
+  plot_spectrum(fd_psd)
+  # We need this for the SWIG functions
+  lal_psd = fd_psd.lal()
+
+One can display the power measurements, frequency array and frequency between consecutive samples, :math:`\Delta f` in Hertz, by printing the following variables: ::
+
+  print 'Display power measurements of the first 10 frequency bins'
+  print fd_psd[:10]
+  print 'Display central frequency of the first 10 bins'
+  print fd_psd.sample_frequencies[:10]
+  print 'Display the frequency separation between bins'
+  print fd_psd.delta_f
+
+:math:`\Delta f` corresponds to the inverse of a segment's length which is the smallest frequency (i.e. highest period) of detectable signals in each segment. The frequency range spans from 0 to the Nyquist frequency, i.e. half de the sampling rate.
diff --git a/docs/_sources/calculate_spectral_correlation.rst.txt b/docs/_sources/calculate_spectral_correlation.rst.txt
@@ -0,0 +1,15 @@
+Two point spectral correlation
+==============================
+
+This part determines how much data on either side of the tukey window is to be discarded. Nominally, this means that one will lose ``window_fraction`` * ``args.psd_segment_length`` to corruption from the window, i.e. this is simply discarded. This is tuned to give an integer offset when used with ``args.psd_segment_length`` equal to 8, smaller windows will have fractions of integers, but larger powers of two will still preseve this (probably not a big deal in the end). ::
+
+  window_fraction = 0
+
+The two point spectral correlation is then done with the :ref:`calculate_spectral_correlation <calculate_spectral_correlation>` function which will return both the Tukey window applied to the original time series data and the actual two-point spectral correlation function for the whitened frequency series from the applied whitening window. ::
+
+  # Do two point spectral correlation
+  window, spec_corr = calculate_spectral_correlation(seg_len,'tukey',window_fraction=window_fraction)
+  window = window.data.data
+  window_sigma_sq = numpy.mean(window**2)
+  # Pre scale the window by its root mean squared -- see eqn 11 of EP document
+  #window /= numpy.sqrt(window_sigma_sq)
diff --git a/docs/_sources/check_filtering_settings.rst.txt b/docs/_sources/check_filtering_settings.rst.txt
@@ -0,0 +1,37 @@
+Checking filtering settings
+===========================
+
+The first thing to check is that the frequency of the high-pass filter (if defined) is below the minimum frequency of the filter bank. Indeed, a high-pass filter will only let pass frequency that are higher than the cutoff frequency (here defined by the ``strain_high_pass`` argument). If the high pass frequency is greater from the minimum frequency in the filter bank, the signal with frequencies lower than the cutoff frequency will get attenuated. ::
+
+  if args.min_frequency < args.strain_high_pass:
+      print >>sys.stderr, "Warning: strain high pass frequency %f is greater than the tile minimum frequency %f --- this is likely to cause strange output below the bandpass frequency" % (args.strain_high_pass, args.min_frequency)
+
+In case the maximum frequency in the filter bank is not defined, we set it to be equal to the Nyquist frequency, i.e. half the sampling rate, which makes sense as a larger signal will not be able to get easily identifiable. ::
+
+  if args.max_frequency is None:
+      args.max_frequency = args.sample_rate / 2.0
+
+If the bandwidth of the finest filter (``--tile-bandwidth`` argument, see section :ref:`construct_args <construct_args>` or the number of frequency channels (=--channels= argument) is not defined but the total spectral band is (``data_band``), one can then determined all the filter settings as follows: ::
+
+
+  if args.tile_bandwidth is None and args.channels is None:
+      # Exit program with error message
+      exit("Either --tile-bandwidth or --channels must be specified to set up time-frequency plane")
+  else:
+      # Define as assert statement that tile maximum frequency larger than its minimum frequency
+      assert args.max_frequency >= args.min_frequency
+      # Define spectral band of data
+      data_band = args.max_frequency - args.min_frequency
+      # Check if tile bandwidth or channel is defined
+      if args.tile_bandwidth is not None:
+          # Define number of possible filter bands
+          nchans = args.channels = int(data_band / args.tile_bandwidth)  - 1
+      elif args.channels is not None:
+          # Define filter bandwidth 
+          band = args.tile_bandwidth = data_band / (args.channels + 1)
+      assert args.channels > 1
+
+The minimum frequency to be explored can be user-defined by using the ``--min-frequency`` option. ::
+
+  # Lowest frequency of the first filter
+  flow = args.min_frequency
diff --git a/docs/_sources/compute_channel_renormalization.rst.txt b/docs/_sources/compute_channel_renormalization.rst.txt
@@ -0,0 +1,13 @@
+Normalization of virtual channel
+================================
+
+The virtual channels will be used during the excesspower analysis to explore different frequency ranges around each PSD segments and look for possible triggers. Each channel is renormalized using the :ref:`compute_channel_renomalization <compute_channel_renomalization>` internal function. ::
+
+  # Initialise dictionary
+  mu_sq_dict = {}
+  # nc_sum additional channel adds
+  for nc_sum in range(0, int(math.log(nchans, 2))):
+      min_band = (len(filter_bank[0].data.data)-1) * filter_bank[0].deltaF / 2
+      print tprint(t0,t1),"Calculation for %d %d Hz channels" % (nc_sum+1, min_band)
+      nc_sum = 2**nc_sum - 1
+      mu_sq_dict[nc_sum] = compute_channel_renomalization(nc_sum, filter_bank, spec_corr, nchans)
diff --git a/docs/_sources/construct_tiles.rst.txt b/docs/_sources/construct_tiles.rst.txt
@@ -0,0 +1,33 @@
+.. _tilebandwidth:
+
+Constructing tiles of different bandwidth
+=========================================
+
+First and foremost, we define a clipping region in the data to be used to remove window corruption, this is non-zero if the ``window_fraction`` variable is set to a non-zero value. ::
+
+      print tprint(t0,t1),"Beginning tile construction..."
+      # Clip the boundaries to remove window corruption
+      clip_samples = int(args.psd_segment_length * window_fraction * args.sample_rate / 2)
+
+In order to perform a multi-resolution search, tiles of many different bandwidths and durations will be scanned. We first need to setup a loop such that the maximum number of additional channel is equal to the base 2 logarithm of the total number of channels. The number of narrow band channels to be summed (``nc_sum``) would therefore be equal to 2 to the power of the current quantity of additional channels. ::
+
+      for nc_sum in range(0, int(math.log(nchans, 2)))[::-1]: # nc_sum additional channel adds
+          nc_sum = 2**nc_sum - 1
+          print tprint(t0,t1,t2),"Summing %d narrow band channels..." % (nc_sum+1)
+
+The undersampling rate for this tile can be calculated using the channel frequency band and the number of narrow band channels to be summed such that the bandwidth of the tile is equal to ``band * (nc_sum + 1)``. ::
+
+          us_rate = int(round(1.0 / (2 * band*(nc_sum+1) * ts_data.delta_t)))
+          print >>sys.stderr, "Undersampling rate for this level: %f" % (args.sample_rate/us_rate)
+
+"Virtual" wide bandwidth channels are constructed by summing the samples from multiple channels, and correcting for the overlap between adjacent channel filters. We then define the normalised channel at the current level and create a time frequency map for this tile using the :ref:`make_indp_tiles <make_indp_tiles>` internal function. In other word, we are constructing multiple sub-tiles for which we can determined the respective energy in the given frequency band. ::
+
+          mu_sq = mu_sq_dict[nc_sum]
+          sys.stderr.write("\t...calculating tiles...")
+          if clip_samples > 0:
+              tiles = make_indp_tiles(tf_map[:,clip_samples:-clip_samples:us_rate], nc_sum, mu_sq)
+          else:
+              tiles = make_indp_tiles(tf_map[:,::us_rate], nc_sum, mu_sq)
+          sys.stderr.write(" TF-plane is %dx%s samples... " % tiles.shape)
+          print >>sys.stderr, " done"
+          print "Tile energy mean: %f, var %f" % (numpy.mean(tiles), numpy.var(tiles))
diff --git a/docs/_sources/create_filter_bank.rst.txt b/docs/_sources/create_filter_bank.rst.txt
@@ -0,0 +1,41 @@
+.. _filterbank:
+
+Computing the filter bank
+=========================
+
+The filter bank will create band-pass filters for each channel in the PSD frequency domain. The :ref:`create_filter_bank <create_filter_bank>` function will san the bandwidth from the central frequency of the first channel (i.e. flow+band/2) to final frequency of the last channel (i.e. band*nchans) in a increment equal to the frequency band. The filter's total extent in Fourier space is actually twice the stated bandwidth (FWHM). ::
+
+  # Define filters
+  filter_bank, fdb = create_filter_bank(fd_psd.delta_f, flow+band/2, band, nchans, fd_psd, spec_corr)
+
+This function will returns 2 arrays: the ``filter_bank`` array which is a list of `COMPLEX16FrequencySeries <http://software.ligo.org/docs/lalsuite/lal/struct_c_o_m_p_l_e_x16_frequency_series.html>`_ arrays corresponding to each channel's filter, and the =fdb= array which provides the time-series from each filter. The length of each array is equal to the total number of channel (i.e. =nchans=). The filter's data, :math:`\Delta f` value, and first and last frequencies of any channel's filter can be displayed as followed: ::
+
+  # Print data of first channel's filter
+  print filter_bank[0].data.data
+  # Print frequency separation between 2 values in the first channel's filter
+  print filter_bank[0].deltaF
+  # Print first frequency of the first channel's filter
+  print filter_bank[0].f0
+  # Print last frequency of the first channel's filter (equal to twice the channel's bandwidth)
+  print filter_bank[0].f0+(len(filter_bank[0].data.data)-1)*filter_bank[0].deltaF
+
+Further in the analysis, the following filters will used:
+1. ``white_filter_ip``: Whitened filter inner products computed with themselves.
+2. ``unwhite_filter_ip``: Unwhitened filter inner products computed with themselves.
+3. ``white_ss_ip``: Whitened filter inner products computed between input adjacent filters.
+4. ``unwhite_ss_ip``: Unwhitened filter inner products computed between input adjacent filters.
+
+::
+     
+   # This is necessary to compute the mu^2 normalizations
+   white_filter_ip = compute_filter_ips_self(filter_bank, spec_corr, None)
+   unwhite_filter_ip = compute_filter_ips_self(filter_bank, spec_corr, lal_psd)
+   # These two are needed for the unwhitened mean square sum (hrss)
+   white_ss_ip = compute_filter_ips_adjacent(filter_bank, spec_corr, None)
+   unwhite_ss_ip = compute_filter_ips_adjacent(filter_bank, spec_corr, lal_psd)
+
+Some extra plots can also be made ::
+
+   tdb = convert_to_time_domain(fdb,sample_rate)
+   plot_bank(fdb)
+   plot_filters(tdb,flow,band)
diff --git a/docs/_sources/create_tf_plane.rst.txt b/docs/_sources/create_tf_plane.rst.txt
@@ -0,0 +1,47 @@
+Create time-frequency map
+=========================
+
+We initialise a 2D zero array for a time-frequency map (``tf_map``) which will be computed for each frequency-domain filter associated to each PSD segment and where the filtered time-series for each frequency channels will be stored. The number of rows corresponds to the total number of frequency channels which is defined by the ``nchans`` variable. The number of columns corresponds to the segment length in samples (i.e. the number of samples covering one segment) which is defined by the ``seg_len`` variable. ::
+
+      # Initialise 2D zero array for time-frequency map
+      tf_map = numpy.zeros((nchans, seg_len), dtype=numpy.complex128)
+
+We also initialise a zero vector for a temporary filter bank (``tmp_filter_bank``) that will store, for a given channel, the filter's values from the original filter bank (``filter_bank``) for that channel only. The length of the temporary filter bank is equal to the length of the PSD frequency series (``fd_psd``). ::
+
+      # Initialise 1D zero array 
+      tmp_filter_bank = numpy.zeros(len(fd_psd), dtype=numpy.complex128)
+
+We then loop over all the frequency channels. While in the loop, we first re-initialise the temporary filter bank with zero values everywhere along the frequency series. We then determine the first and last frequency of each channel and re-define the values of the filter in that frequency range based on the values from the original channel's filter from the original filter bank. ::
+
+      # Loop over all the channels
+      print tprint(t0,t1),"Filtering all %d channels..." % nchans
+      for i in range(nchans):
+          # Reset filter bank series
+          tmp_filter_bank *= 0.0
+          # Index of starting frequency
+          f1 = int(filter_bank[i].f0/fd_psd.delta_f)
+          # Index of ending frequency
+          f2 = int((filter_bank[i].f0 + 2*band)/fd_psd.delta_f)+1
+          # (FIXME: Why is there a factor of 2 here?)
+          tmp_filter_bank[f1:f2] = filter_bank[i].data.data * 2
+
+We then extract the frequency series from the filter bank for that channel, which will be used as a template waveform to filter the actual data from the channel. ::
+
+          # Define the template to filter the frequency series with
+          template = types.FrequencySeries(tmp_filter_bank, delta_f=fd_psd.delta_f, copy=False)
+
+Finally, we use the `matched_filter_core <http://ligo-cbc.github.io/pycbc/latest/html/pycbc.filter.html>`_ module from the ``pycbc.filter.matchedfilter`` library to filter the frequency series from the channel. This will return both a time series containing the complex signal-to-noise matched filtered against the data, and a frequency series containing the correlation vector. ::
+
+          # Create filtered series
+          filtered_series = filter.matched_filter_core(template,fs_data,h_norm=None,psd=None,
+                                                       low_frequency_cutoff=filter_bank[i].f0,
+                                                       high_frequency_cutoff=filter_bank[i].f0+2*band)
+
+The `matched filter <http://en.wikipedia.org/wiki/Matched_filter>`_ is the optimal linear filter for maximizing the signal to noise ratio (SNR) in the presence of additive stochastic noise. The filtered time series is stored in the time-frequency map and can be used to produce a spectrogram of the segment of data being analysed. ::
+
+          # Include filtered series in the map
+          tf_map[i,:] = filtered_series[0].numpy()
+
+The time-frequency map is a 2D array with a length that corresponds to the number of channels and a width equal to the number of sample present in one segment of data, i.e. segment's length in seconds times the the sampling rate. The map can finally be plotted with a :math:`\Delta t` corresponding to the sampling period of the original dataset (i.e. inverse of the original sampling rate), and :math:`\Delta f` is equal to the bandwidth of one channel. ::
+
+      plot_spectrogram(numpy.abs(tf_map).T,tmp_ts_data.delta_t,fd_psd.delta_f,ts_data.sample_rate,start_time,end_time,fname='%s/tf.png'%(segfolder))
diff --git a/docs/_sources/create_xml.rst.txt b/docs/_sources/create_xml.rst.txt
@@ -0,0 +1,56 @@
+Extracting GPS time range
+=========================
+
+We use the `LIGOTimeGPS <http://software.ligo.org/docs/lalsuite/lal/group___l_a_l_datatypes.html#ss_LIGOTimeGPS>`_ structure from the =glue.lal= package to /store the starting and ending time in the dataset to nanosecond precision and synchronized to the Global Positioning System time reference/. Once both times are defined, the range of value is stored in a semi-open interval using the `segment <http://software.ligo.org/docs/glue/glue.__segments.segment-class.html>`_ module from the =glue.segments= package. ::
+
+  # Starting epoch relative to GPS starting epoch
+  start_time = LIGOTimeGPS(args.analysis_start_time or args.gps_start_time)
+  # Ending epoch relative to GPS ending epoch
+  end_time = LIGOTimeGPS(args.analysis_end_time or args.gps_end_time)
+  # Represent the range of values in the semi-open interval 
+  inseg = segment(start_time,end_time)
+
+Prepare output file for given time range
+========================================
+
+::
+
+  xmldoc = ligolw.Document()
+  xmldoc.appendChild(ligolw.LIGO_LW())
+  
+  ifo = args.channel_name.split(":")[0]
+  proc_row = register_to_xmldoc(xmldoc, __program__, args.__dict__, ifos=[ifo],version=glue.git_version.id, cvs_repository=glue.git_version.branch, cvs_entry_time=glue.git_version.date)
+  
+  # Figure out the data we actually analyzed
+  outseg = determine_output_segment(inseg, args.psd_segment_length, args.sample_rate, window_fraction)
+  
+  ss = append_search_summary(xmldoc, proc_row, ifos=(station,), inseg=inseg, outseg=outseg)
+  
+  for sb in event_list:
+      sb.process_id = proc_row.process_id
+      sb.search = proc_row.program
+      #sb.ifo, sb.channel = args.channel_name.split(":")
+      sb.ifo, sb.channel = station, setname
+  
+  xmldoc.childNodes[0].appendChild(event_list)
+  fname = make_filename(station, inseg)
+  
+  utils.write_filename(xmldoc, fname, gz=fname.endswith("gz"), verbose=True)
+
+Plot trigger results
+====================
+
+::
+
+  events = SnglBurstTable.read(fname+'.gz')
+  #del events[10000:]
+  plot = events.plot('time', 'central_freq', "duration", "bandwidth", color="snr")
+  #plot = events.plot('time', 'central_freq', color='snr')
+  #plot.set_yscale("log")
+  plot.set_ylim(1e-0, 250)
+  t0 = 1153742417
+  plot.set_xlim(t0 + 0*60, t0 + 1*60)
+  #plot.set_xlim(t0 + 28, t0 + 32)
+  pyplot.axvline(t0 + 30, color='r')
+  cb = plot.add_colorbar(cmap='viridis')
+  plot.savefig("triggers.png")