Estimator class for J-Resolved (2DJ) datasets, enabling use of our CUPID
method for Pure Shift spectra. For a tutorial on the basic functionailty
this provides, see Using Estimator2DJ.
Note
To create an instance of Estimator2DJ, you are advised to use one of
the following methods if any are appropriate:
Apply baseline correction to the estimator’s data.
The algorithm applied is desribed in [1]. This uses an implementation
provided by pybaselines.
Parameters:
min_length – From the pybaseline docs: Any region of consecutive baseline
points less than min_length is considered to be a false
positive and all points in the region are converted to peak points.
A higher min_length ensures less points are falsely assigned as
baseline points.
Manipulate an estimation result. After the result has been changed,
it is subjected to optimisation.
There are five types of edit that you can make:
Add new oscillators with defined parameters.
Remove oscillators.
Merge multiple oscillators into a single oscillator.
Split an oscillator into many oscillators.
Unique to 2DJ: Mirror an oscillator. This allows you add a new
oscillator with the same parameters as an osciallator in the result,
except with the following frequencies:
The parameters of new oscillators to be added. Should be of shape
(n,2*(1+self.dim)), where n is the number of new
oscillators to add. Even when one oscillator is being added this
should be a 2D array, i.e.
1D data:
params=np.array([[a,φ,f,η]])
2D data:
params = np.array([[a, φ, f₁, f₂, η₁, η₂]])
rm_oscs – An iterable of ints for the indices of oscillators to remove from
the result.
merge_oscs – An iterable of iterables. Each sub-iterable denotes the indices of
oscillators to merge together. For example, [[0,2],[6,7]]
would mean that oscillators 0 and 2 are merged, and oscillators 6
and 7 are merged. A merge involves removing all the oscillators,
and creating a new oscillator with the sum of amplitudes, and the
average of phases, freqeuncies and damping factors.
split_oscs –
A dictionary with ints as keys, denoting the oscillators to split.
The values should themselves be dicts, with the following permitted
key/value pairs:
"separation" - An list of length equal to self.dim.
Indicates the frequency separation of the split oscillators in Hz.
If not specified, this will be the spectral resolution in each
dimension.
"number" - An int indicating how many oscillators to split
into. If not specified, this will be 2.
"amp_ratio" A list of floats with length equal to the number of
oscillators to be split into (see "number"). Specifies the
relative amplitudes of the oscillators. If not specified, the amplitudes
will be equal.
As an example for a 1D estimator:
split_oscs={2:{"separation":1.,# if 1D, don't need a list},5:{"number":3,"amp_ratio":[1.,2.,1.],},}
Here, 2 oscillators will be split.
Oscillator 2 will be split into 2 (default) oscillators with
equal amplitude (default). These will be separated by 1Hz.
Oscillator 5 will be split into 3 oscillators with relative
amplitudes 1:2:1. These will be separated by self.sw()[0]/self.default_pts()[0] Hz (default).
mirror_oscs – An interable of oscillators to mirror (see the description above).
estimate_kwargs – Keyword arguments to provide to the call to estimate(). Note
that "initial_guess" and "region_unit" are set internally and
will be ignored if given.
(Optional, but highly advised) Generate a frequency-filtered “sub-FID”
corresponding to a specified region of interest.
(Optional) Generate an initial guess using the Minimum Description
Length (MDL) [2] and Matrix Pencil Method (MPM) [3][4][5][6]
Apply numerical optimisation to determine a final estimate of the signal
parameters. The optimisation routine employed is the Trust Newton Conjugate
Gradient (NCG) algorithm ([7] , Algorithm 7.2).
Parameters:
region – The frequency range of interest. Should be of the form [left,right]
where left and right are the left and right bounds of the region
of interest in Hz or ppm (see region_unit). If None, the
full signal will be considered, though for sufficently large and
complex signals it is probable that poor and slow performance will
be realised.
noise_region – If region is not None, this must be of the form [left,right]
too. This should specify a frequency range where no noticeable signals
reside, i.e. only noise exists.
region_unit – One of "hz" or "ppm" Specifies the units that region
and noise_region have been given as.
initial_guess –
If None, an initial guess will be generated using the MPM
with the MDL being used to estimate the number of oscillators
present.
If an int, the MPM will be used to compute the initial guess with
the value given being the number of oscillators.
If a NumPy array, this array will be used as the initial guess.
hessian –
Specifies how to construct the Hessian matrix.
If "exact", the exact Hessian will be used.
If "gauss-newton", the Hessian will be approximated as is
done with the Gauss-Newton method. See the “Derivation from
Newton’s method” section of this article.
mode – A string containing a subset of the characters "a" (amplitudes),
"p" (phases), "f" (frequencies), and "d" (damping factors).
Specifies which types of parameters should be considered for optimisation.
In most scenarios, you are likely to want the default value, "apfd".
amp_thold –
A value that imposes a threshold for deleting oscillators of
negligible ampltiude.
If None, does nothing.
If a float, oscillators with amplitudes satisfying will be
removed from the parameter array, where is the Euclidian norm of the vector of
all the oscillator amplitudes. It is advised to set amp_thold
at least a couple of orders of magnitude below 1.
phase_variance – Whether or not to include the variance of oscillator phases in the cost
function. This should be set to True in cases where the signal being
considered is derived from well-phased data.
mpm_trim – Specifies the maximal size allowed for the filtered signal when
undergoing the Matrix Pencil. If None, no trimming is applied
to the signal. If an int, and the filtered signal has a size
greater than mpm_trim, this signal will be set as
signal[:mpm_trim].
nlp_trim – Specifies the maximal size allowed for the filtered signal when undergoing
nonlinear programming. By default (None), no trimming is applied to
the signal. If an int, and the filtered signal has a size greater than
nlp_trim, this signal will be set as signal[:nlp_trim].
max_iterations – A value specifiying the number of iterations the routine may run
through before it is terminated. If None, a default number
of maximum iterations is set, based on the the data dimension and
the value of hessian.
negative_amps –
Indicates how to treat oscillators which have gained negative
amplitudes during the optimisation.
"remove" will result in such oscillators being purged from
the parameter estimate. The optimisation routine will the be
re-run recursively until no oscillators have a negative
amplitude.
"flip_phase" will retain oscillators with negative
amplitudes, but the the amplitudes will be multiplied by -1,
and a π radians phase shift will be applied.
"ignore" will do nothing (negative amplitude oscillators will remain).
output_mode –
Dictates what information is sent to stdout.
If None, nothing will be sent.
If 0, only a message on the outcome of the optimisation will
be sent.
If a positive int k, information on the cost function,
gradient norm, and trust region radius is sent every kth
iteration.
save_trajectory –
If True, a list of parameters at each iteration will be saved, and
accessible via the trajectory attribute.
Warning
Not implemented yet!
epsilon – Sets the convergence criterion. Convergence will occur when
.
eta –
Criterion for accepting an update. An update will be accepted if
the ratio of the actual reduction and the predicted reduction is
greater than eta:
initial_trust_radius – The initial value of the radius of the trust region.
max_trust_radius – The largest permitted radius for the trust region.
check_neg_amps_every – For every iteration that is a multiple of this, negative amplitudes
will be checked for and dealt with if found.
“The pickle module is not secure. Only unpickle data you trust.
It is possible to construct malicious pickle data which will
execute arbitrary code during unpickling. Never unpickle data
that could have come from an untrusted source, or that could have
been tampered with.”
You should only use from_pickle on files that you are 100%
certain were generated using to_pickle(). If you load
pickled data from a .pkl file, and the resulting output is not an
estimator object, an error will be raised.
Parameters:
path – The path to the pickle file. Do not include the .pkl suffix.
Construct chemical shifts which reflect the experiment parameters.
Parameters:
pts – The number of points to construct the shifts with in each dimesnion.
If None, and self.default_pts is a tuple of ints, it will be
used.
unit – Must be one of "hz" or "ppm".
flip – If True, the shifts will be returned in descending order, as is
conventional in NMR. If False, the shifts will be in ascending order.
meshgrid – If time-points are being derived for a N-dimensional signal (N > 1),
setting this argument to True will return N-dimensional arrays
corresponding to all combinations of points in each dimension. If
False, an iterable of 1D arrays will be returned.
Construct time-points which reflect the experiment parameters.
Parameters:
pts – The number of points to construct the time-points with in each dimesnion.
If None, and self.default_pts is a tuple of ints, it will be
used.
start_time –
The start time in each dimension. If set to None, the initial
point in each dimension will be 0.0. To set non-zero start times,
a list of floats or strings can be used.
If floats are used, they specify the first value in each
dimension in seconds.
Strings of the form f'{N}dt', where N is an integer, may be
used, which indicates a cetain multiple of the dwell time.
meshgrid – If time-points are being derived for a N-dimensional signal (N > 1),
setting this argument to True will return N-dimensional arrays
corresponding to all combinations of points in each dimension. If
False, an iterable of 1D arrays will be returned.
pts – The number of points to construct the time-points with in each dimesnion.
If None, and self.default_pts is a tuple of ints, it will be
used.
snr – The signal-to-noise ratio. If None then no noise will be added
to the FID.
decibels – If True, the snr is taken to be in units of decibels. If False,
it is taken to be simply the ratio of the singal power over the
noise power.
indirect_modulation –
Acquisition mode in the indirect dimension if the data is 2D.
If the data is 1D, this argument is ignored.
None - hypercomplex dataset:
"amp" - amplitude modulated pair:
"phase" - phase-modulated pair:
None will lead to an array of shape (n1,n2). amp and phase
will lead to an array of shape (2,n1,n2), with fid[0] and
fid[1] being the two components of the pair.
Manually phase the data using a Graphical User Interface.
Parameters:
max_p1 – The largest permitted first order correction (rad). Set this to a larger
value than the default (10π) if you anticipate having to apply a
very large first order correction.
Create a new instance from a 2DJ Spinach simulation.
Parameters:
shifts – A list of tuple of chemical shift values for each spin.
couplings – The scalar couplings present in the spin system. Given shifts is of
length n, couplings should be an iterable with entries of the form
(i1,i2,coupling), where 1<=i1,i2<=n are the indices of
the two spins involved in the coupling, and coupling is the value
of the scalar coupling in Hz. None will set all spins to be
uncoupled.
pts – The number of points the signal comprises.
sw – The sweep width of the signal (Hz).
offset – The transmitter offset (Hz).
field – The magnetic field strength (T).
nucleus – The identity of the nucleus targeted in the pulse sequence.
snr – The signal-to-noise ratio of the resulting signal, in decibels. None
produces a noiseless signal.
lb – Line broadening (exponential damping) to apply to the signal.
The first point will be unaffected by damping, and the final point will
be multiplied by np.exp(-lb). The default results in the final
point being decreased in value by a factor of roughly 1000.
The figure includes a contour plot of the 2DJ spectrum, a 1D plot of the
first slice through the indirect dimension, plots of estimated multiplets,
and a plot of cupid_spectrum().
Frequency threshold for multiplet prediction. All oscillators that make
up a multiplet are assumed to obey the following expression:
where is the central frequency of the multiplet, and f_t is
the threshold.
high_resolution_pts – Indicates the number of points used to generate the multiplet structures
and cupid_spectrum(). Should be greater than or equal to
self.default_pts[1].
ratio_1d_2d – The relative heights of the regions containing the 1D spectra and the
2DJ spectrum.
axes_left – The position of the left edge of the axes, in figure coordinates. Should be between 0. and 1..
axes_right – The position of the right edge of the axes, in figure coordinates. Should
be between 0. and 1..
axes_top – The position of the top edge of the axes, in figure coordinates. Should
be between 0. and 1..
axes_bottom – The position of the bottom edge of the axes, in figure coordinates. Should
be between 0. and 1..
axes_region_separation – The extent by which adjacent regions are separated in the figure.
xaxis_label_height – The vertical location of the x-axis label, in figure coordinates. Should
be between 0. and 1., though you are likely to want this to be
only slightly larger than 0..
contour_base – The lowest level for the contour levels in the 2DJ spectrum plot.
contour_nlevels – The number of contour levels in the 2DJ spectrum plot.
contour_factor – The geometric scaling factor for adjacent contours in the 2DJ spectrum
plot.
contour_lw – The linewidth of contours in the 2DJ spectrum plot.
contour_color – The color of the 2DJ spectrum plot.
jres_sinebell – If True, applies sine-bell apodisation to the 2DJ spectrum.
multiplet_colors – Describes how to color multiplets. See color cycle for options.
multiplet_lw – Line width of multiplet spectra
multiplet_vertical_shift – The vertical displacement of adjacent mutliplets, as a multiple of
mutliplet_lw. Set to 0. if you want all mutliplets to lie on the
same line.
multiplet_show_center_freq – If True, lines are plotted on the 2DJ spectrum indicating the central
frequency of each mutliplet.
multiplet_show_45 – If True, lines are plotted on the 2DJ spectrum indicating the 45° line
along which peaks lie in each multiplet.
marker_size – The size of markers indicating positions of peaks on the 2DJ contour plot.
marker_shape – The shape of markers
indicating positions of peaks on the 2DJ contour plot.
kwargs – Keyword arguments provided to matplotlib.pyplot.figure. Allowed arguments include
figsize, facecolor, edgecolor, etc.
Returns:
fig – The result figure. This can be saved to various formats using the
savefig method.
axs – A (2,N) NumPy array of the axes used for plotting. The first row
of axes contain the 1D plots. The second row contain the 2DJ
contour plots.
force_overwrite – If path already exists and force_overwrite is set to False,
the user will be asked to confirm whether they are happy to
overwrite the file. If True, the file will be overwritten
without prompt.
fprint – Specifies whether or not to print infomation to the terminal.
Return an FID where direct dimension frequencies are perturbed such that:
This should yeild a signal where all components in a multiplet are centered
at the spin’s chemical shift in the direct dimenion, akin to performing
a 45° tilt.
pts – The number of points to construct the signal from. If None,
self.default_pts will be used.
indirect_modulation –
Acquisition mode in the indirect dimension.
None - hypercomplex dataset:
"amp" - amplitude modulated pair:
"phase" - phase-modulated pair:
None will lead to an array of shape (n1,n2). amp and phase
will lead to an array of shape (2,n1,n2), with fid[0] and
fid[1] being the two components of the pair.
Perform estiamtion on the entire signal via estimation of
frequency-filtered sub-bands.
This method splits the signal up into nsubbands equally-sized region
and extracts parameters from each region before finally concatenating all
the results together.
Warning
This method is a work-in-progress. It is unlikely to produce decent
results at the moment! I aim to improve the way that regions are
created in the future.
Parameters:
noise_region – Specifies a frequency range where no noticeable signals reside, i.e. only
noise exists.
noise_region_unit – One of "hz" or "ppm". Specifies the units that noise_region
have been given in.
nsubbands – The number of sub-bands to break the signal into. If None, the number
will be set as the nearest integer to the data size divided by 500.
estimate_kwargs – Keyword arguments to give to estimate(). Note that region
and initial_guess will be ignored.
Save the estimator to a byte stream using Python’s pickling protocol.
Parameters:
path – Path of file to save the byte stream to. Do not include the
'".pkl" suffix. If None, ./estimator_<x>.pkl will be
used, where <x> is the first number that doesn’t cause a clash
with an already existent file.
force_overwrite –
Defines behaviour if the specified path already exists:
If False, the user will be prompted if they are happy
overwriting the current file.
If True, the current file will be overwritten without prompt.
fprint – Specifies whether or not to print infomation to the terminal.
pts – The number of points to construct the dataset from.
force_overwite – If False, and <path>/<expno>/ already exists, the user will
be asked if they are happy to overwrite. If True, overwriting
will take place without asking.
pts – The number of points to construct the mutliplets from.
thold –
Frequency threshold for multiplet prediction. All oscillators that make
up a multiplet are assumed to obey the following expression:
where is the central frequency of the multiplet, and f_t is
thold
force_overwite – If False, if any directories that will be written to already exist,
you will be promted if you are happy to overwrite. If True,
overwriting will take place without asking.
fmt – Must be one of "txt" or "pdf". If you wish to generate a PDF, you
must have a LaTeX installation. See LaTeX (Optional).
description – Descriptive text to add to the top of the file.
sig_figs – The number of significant figures to give to parameters. If
None, the full value will be used. By default this is set to 5.
sci_lims – Given a value (-x,y) with ints x and y, any parameter p
with a value which satisfies p<10**-x or p>=10**y will be
expressed in scientific notation. If None, scientific notation
will never be used.
integral_mode –
One of "relative" or "absolute".
If "relative", the smallest integral will be set to 1,
and all other integrals will be scaled accordingly.
If "absolute", the absolute integral will be computed. This
should be used if you wish to directly compare different datasets.
force_overwrite –
Defines behaviour if the specified path already exists:
If False, the user will be prompted if they are happy
overwriting the current file.
If True, the current file will be overwritten without prompt.
fprint – Specifies whether or not to print information to the terminal.
pdflatex_exe –
The path to the system’s pdflatex executable.
Note
You are unlikely to need to set this manually. It is primarily
present to specify the path to pdflatex.exe on Windows when
the NMR-EsPy GUI has been loaded from TopSpin.