wdtools

Submodules

Package Contents

Classes

GFP

Generative Fitting Pipeline.

SpecTools

Spectrum processing tools and functions.

LineProfiles

Class to fit Voigt profiles to the Balmer absorption lines of DA white dwarfs, and then infer stellar labels.

Functions

gaia_cov(parallax_error, pmra_error, pmdec_error, parallax_pmra_corr, parallax_pmdec_corr, pmra_pmdec_corr)

log_exp_dec_prior(parallax, L=1350)

log_mvnorm(x, mu, cov)

get_post_samples(obj, walkers, burn, steps, progress=False, L=1350)

get_vt_samples(obj, nsample)

get_distance_samples(obj, nburn=100.0, nsample=1000.0, progress=False)

get_distance_mode(obj, L=1350)

plot_orbits(name, obj, rv, e_rv, nmc=10000, norbit=50)

teff3d(teff, logg)

logg3d(teff, logg)

corr3d(teff, logg)

class wdtools.GFP(resolution=3, specclass='DA')

Generative Fitting Pipeline.

label_sc(self, label_array)

Label scaler to transform Teff and logg to [0,1] interval based on preset bounds.

Parameters

label_array (array) – Unscaled array with Teff in the first column and logg in the second column

Returns

Scaled array

Return type

array

inv_label_sc(self, label_array)

Inverse label scaler to transform Teff and logg from [0,1] to original scale based on preset bounds.

Parameters

label_array (array) – Scaled array with Teff in the first column and logg in the second column

Returns

Unscaled array

Return type

array

spec_sc(self, spec)
inv_spec_sc(self, spec)
generator(self, H, n_pix)
synth_spectrum_sampler(self, wl, teff, logg, rv, specclass=None)

Generates synthetic spectra from labels using the neural network, translated by some radial velocity. These are _not_ interpolated onto the requested wavelength grid; The interpolation is performed only one time after the Gaussian convolution with the instrument resolution in GFP.spectrum_sampler. Use GFP.spectrum_sampler in most cases.

Parameters
  • wl (array) – Array of spectral wavelengths (included for completeness, not used by this function)

  • teff (float) – Effective surface temperature of sampled spectrum

  • logg (float) – log surface gravity of sampled spectrum (cgs)

  • rv (float) – Radial velocity (redshift) of sampled spectrum in km/s

  • specclass (str ['DA', 'DB']) – Whether to use hydrogen-rich (DA) or helium-rich (DB) atmospheric models. If None, uses default.

Returns

Synthetic spectrum with desired parameters, interpolated onto the supplied wavelength grid.

Return type

array

spectrum_sampler(self, wl, teff, logg, *polyargs, specclass=None)

Wrapper function that talks to the generative neural network in scaled units, and also performs the Gaussian convolution to instrument resolution.

Parameters
  • wl (array) – Array of spectral wavelengths on which to generate the synthetic spectrum

  • teff (float) – Effective surface temperature of sampled spectrum

  • logg (float) – log surface gravity of sampled spectrum (cgs)

  • polyargs (float, optional) – All subsequent positional arguments are assumed to be coefficients for the additive Chebyshev polynomial. If none are provided, no polynomial is added to the model spectrum.

  • specclass (str, optional) – Whether to use hydrogen-rich (DA) or helium-rich (DB) atmospheric models. If none, reverts to default.

Returns

Synthetic spectrum with desired parameters, interpolated onto the supplied wavelength grid and convolved with the instrument resolution.

Return type

array

spline_norm_DA(self, wl, fl, ivar, kwargs=dict(k=3, sfac=1, niter=3), crop=None)

Masks out Balmer lines, fits a smoothing spline to the continuum, and returns a continuum-normalized spectrum

Parameters
  • wl (array) – Array of observed spectral wavelengths.

  • fl (array) – Array of observed spectral fluxes.

  • ivar (array) – Array of observed inverse-variance.

  • kwargs (dict, optional) – Keyword arguments that are passed to the spline normalization function

  • crop (tuple, optional) – Defines a start and end wavelength to crop the spectrum to before continuum-normalization.

Returns

If crop is None, returns a 2-tuple of (normalized_flux, normalized_ivar). If a crop region is provided, then returns a 3-tuple of (cropped_wavelength, cropped_normalized_flux, cropped_normalized_ivar).

Return type

tuple

fit_spectrum(self, wl, fl, ivar=None, prior_teff=None, mcmc=False, fullspec=False, polyorder=0, norm_kw=dict(k=1, sfac=0.5, niter=0), nwalkers=25, burn=25, ndraws=25, threads=1, progress=True, plot_init=False, make_plot=True, plot_corner=False, plot_corner_full=False, plot_trace=False, savename=None, DA=True, crop=(3600, 7500), verbose=True, lines=['alpha', 'beta', 'gamma', 'delta', 'eps', 'h8'], lmfit_kw=dict(method='leastsq', epsfcn=0.1), rv_kw=dict(plot=False, distance=100, nmodel=2, edge=15), nteff=3, rv_line='alpha', corr_3d=False)

Main fitting routine, takes a continuum-normalized spectrum and fits it with MCMC to recover steller labels.

Parameters
  • wl (array) – Array of observed spectral wavelengths

  • fl (array) – Array of observed spectral fluxes, continuum-normalized. We recommend using the included normalize_balmer function from wdtools.spectrum to normalize DA spectra, and the generic continuum_normalize function for DB spectra.

  • ivar (array) – Array of observed inverse-variance for uncertainty estimation. If this is not available, use ivar = None to infer a constant inverse variance mask using a second-order beta-sigma algorithm. In this case, since the errors are approximated, the chi-square likelihood may be inexact - treat returned uncertainties with caution.

  • prior_teff (tuple, optional) – Tuple of (mean, sigma) to define a Gaussian prior on the effective temperature parameter. This is especially useful if there is strong prior knowledge of temperature from photometry. If not provided, a flat prior is used.

  • mcmc (bool, optional) – Whether to run MCMC, or simply return the errors estimated by LMFIT

  • fullspec (bool, optional) – Whether to fit the entire continuum-normalized spectrum, or only the Balmer lines.

  • polyorder (int, optional) – Order of additive Chebyshev polynomial during the fitting process. Can usually leave this to zero unless the normalization is really bad.

  • norm_kw (dict, optional) – Dictionary of keyword arguments that are passed to the spline normalization routine.

  • nwalkers (int, optional) – Number of independent MCMC ‘walkers’ that will explore the parameter space

  • burn (int, optional) – Number of steps to run and discard at the start of sampling to ‘burn-in’ the posterior parameter distribution. If intitializing from a high-probability point, keep this value high to avoid under-estimating uncertainties.

  • ndraws (int, optional) – Number of ‘production’ steps after the burn-in. The final number of posterior samples will be nwalkers * ndraws.

  • threads (int, optional) – Number of threads for distributed sampling.

  • progress (bool, optional) – Whether to show a progress bar during the MCMC sampling.

  • plot_init (bool, optional) – Whether to plot the continuum-normalization routine

  • make_plot (bool, optional) – If True, produces a plot of the best-fit synthetic spectrum over the observed spectrum.

  • plot_corner (bool, optional) – Makes a corner plot of the fitted stellar labels

  • plot_corner_full (bool, optional) – Makes a corner plot of all sampled parameters, the stellar labels plus any Chebyshev coefficients if polyorder > 0

  • plot_trace (bool, optiomal) – If True, plots the trace of posterior samples of each parameter for the production steps. Can be used to visually determine the quality of mixing of the chains, and ascertain if a longer burn-in is required.

  • savename (str, optional) – If provided, the corner plot and best-fit plot will be saved as PDFs in the working folder.

  • DA (bool, optional) – Whether the star is a DA white dwarf or not. As of now, this must be set to True.

  • crop (tuple, optional) – The region to crop the supplied spectrum before proceeding with the fit. Can be used to exclude low-SN regions at the edge of the spectrum.

  • verbose (bool, optional) – If True, the routine prints several progress statements to the terminal.

  • lines (array, optional) – List of Balmer lines to utilize in the fit. Defaults to all from H-alpha to H8.

  • lmfit_kw (dict, optional) – Dictionary of keyword arguments to the LMFIT solver

  • rv_kw (dict, optional) – Dictionary of keyword arguments to the RV fitting routine

  • nteff (int, optional) – Number of equidistant temperatures to try as initialization points for the minimization routine.

  • rv_line (str, optional) – Which Balmer line to use for the radial velocity fit. We recommend ‘alpha’.

  • corr_3d (bool, optional) – If True, applies 3D corrections from Tremblay et al. (2013) to stellar parameters before returning them.

Returns

Returns the fitted stellar labels along with a reduced chi-square statistic with the format: [[labels], [e_labels], redchi]. If polyorder > 0, then the returned arrays include the Chebyshev coefficients. The radial velocity (and RV error) are always the last elements in the array, so if polyorder > 0, the label array will have temperature, surface gravity, the Chebyshev coefficients, and then RV.

Return type

array

class wdtools.SpecTools(plot_continuum=False, smoothing=1e-15, filter_skylines=True, crop=True)

Spectrum processing tools and functions.

continuum_normalize(self, wl, fl, ivar=None)

Continuum-normalization with smoothing splines that avoid a pre-made list of absorption lines for DA and DB spectra. To normalize spectra that only have Balmer lines (DA),

we recommend using the normalize_balmer function instead. Also crops the spectrum to the 3700 - 7000 Angstrom range.

Parameters
  • wl (array) – Wavelength array of spectrum

  • fl (array) – Flux array of spectrum

  • ivar (array, optional) – Inverse variance array. If None, will return only the normalized wavelength and flux.

Returns

Tuple of cropped wavelength, cropped and normalized flux, and (if ivar is not None)

cropped and normalized inverse variance array.

Return type

tuple

normalize_line(self, wl, fl, ivar, centroid, distance, make_plot=False, return_centre=False)

Continuum-normalization of a single absorption line by fitting a linear model added to a Voigt profile to the spectrum, and dividing out the linear model.

Parameters
  • wl (array) – Wavelength array of spectrum

  • fl (array) – Flux array of spectrum

  • ivar (array, optional) – Inverse variance array. If None, will return only the normalized wavelength and flux.

  • centroid (float) – The theoretical centroid of the absorption line that is being fitted, in wavelength units.

  • distance (float) – Distance in Angstroms away from the line centroid to include in the fit. Should include the entire absorption line wings with minimal continum.

  • make_plot (bool, optional) – Whether to plot the linear + Voigt fit. Use for debugging.

Returns

Tuple of cropped wavelength, cropped and normalized flux, and (if ivar is not None) cropped and normalized inverse variance array.

Return type

tuple

normalize_balmer(self, wl, fl, ivar=None, lines=['alpha', 'beta', 'gamma', 'delta'], skylines=False, make_plot=False, make_subplot=False, make_stackedplot=False, centroid_dict=dict(alpha=6564.61, beta=4862.68, gamma=4341.68, delta=4102.89, eps=3971.2, h8=3890.12), distance_dict=dict(alpha=300, beta=200, gamma=120, delta=75, eps=50, h8=25), sky_fill=np.nan)

Continuum-normalization of any spectrum by fitting each line individually.

Fits every absorption line by fitting a linear model added to a Voigt profile to the spectrum, and dividing out the linear model. All normalized lines are concatenated and returned. For statistical and plotting purposes, two adjacent lines should not have overlapping regions (governed by the distance_dict).

Parameters
  • wl (array) – Wavelength array of spectrum

  • fl (array) – Flux array of spectrum

  • ivar (array, optional) – Inverse variance array. If None, will return only the normalized wavelength and flux.

  • lines (array-like, optional) – Array of which Balmer lines to include in the fit. Can be any combination of [‘alpha’, ‘beta’, ‘gamma’, ‘delta’, ‘eps’, ‘h8’]

  • skylines (bool, optional) – If True, masks out pre-selected telluric features and replace them with np.nan.

  • make_plot (bool, optional) – Whether to plot the continuum-normalized spectrum.

  • make_subplot (bool, optional) – Whether to plot each individual fit of the linear + Voigt profiles. Use for debugging.

  • make_stackedplot (bool, optional) – Plot continuum-normalized lines stacked with a common centroid, vertically displaced for clarity.

  • centroid_dict (dict, optional) – Dictionary of centroid names and theoretical wavelengths. Change this if your wavelength calibration is different from SDSS.

  • distance_dict (dict, optional) – Dictionary of centroid names and distances from the centroid to include in the normalization process. Should include the entire wings of each line and minimal continuum. No two adjacent lines should have overlapping regions.

  • sky_fill (float) – What value to replace the telluric features with on the normalized spectrum. Defaults to np.nan.

Returns

Tuple of cropped wavelength, cropped and normalized flux, and (if ivar is not None) cropped and normalized inverse variance array.

Return type

tuple

find_nearest(self, array, value)
interpolate(self, wl, flux, target_wl=np.arange(4000, 8000))
linear(self, wl, p1, p2)

Linear polynomial of degree 1

chisquare(self, residual)

Chi^2 statistics from residual

Unscaled chi^2 statistic from an array of residuals (does not account for uncertainties).

find_centroid(self, wl, flux, centroid, half_window=25, window_step=2, n_fit=12, make_plot=False, pltname='', debug=False, normalize=True)

Statistical inference of spectral redshift by iteratively fitting Voigt profiles to cropped windows around the line centroid.

Parameters
  • wl (array) – Wavelength array of spectrum

  • flux (array) – Flux array of spectrum

  • centroid (float) – Theoretical wavelength of line centroid

  • half_window (float, optional) – Distance in Angstroms from the theoretical centroid to include in the fit

  • window_step (float, optional) – Step size in Angstroms to reduce the half-window size after each fitting iteration

  • n_fit (int, optional) – Number of iterated fits to perform

  • make_plot (bool, optional) – Whether to plot the absorption line with all fits overlaid.

  • pltname (str, optional) – If not ‘’, saves the plot to the supplied path with whatever extension you specify.

Returns

Tuple of 3 values: the mean fitted centroid across iterations, the propagated uncertainty reported by the fitting routine, and the standard deviation of the centroid across all iterations. We find the latter is a good estimator of statistical uncertainty in the fitted centroid.

Return type

tuple

doppler_shift(self, wl, fl, dv)
xcorr_rv(self, wl, fl, temp_wl, temp_fl, init_rv=0, rv_range=500, npoint=None)
quad_max(self, rv, cc)
get_one_rv(self, wl, fl, temp_wl, temp_fl, r1=1000, p1=100, r2=100, p2=100, plot=False)
get_rv(self, wl, fl, ivar, temp_wl, temp_fl, N=100, kwargs={})
spline_norm(self, wl, fl, ivar, exclude_wl, sfac=1, k=3, plot=False, niter=0)
get_line_rv(self, wl, fl, ivar, centroid, template=None, return_template=False, distance=50, edge=10, nmodel=2, plot=False, rv_kwargs={}, init_width=20, init_amp=5)
class wdtools.LineProfiles(verbose=False, plot_profiles=False, n_trees=25, n_bootstrap=25, lines=['alpha', 'beta', 'gamma', 'delta'], optimizer='leastsq')

Class to fit Voigt profiles to the Balmer absorption lines of DA white dwarfs, and then infer stellar labels.

Probabilistic prediction uses 100 boostrapped random forest models with 25 trees each, trained on 5326 spectra from the Sloan Digital Sky Survey. Ground truth labels are taken from Tremblay et al. (2019) Line profiles are fit using the LMFIT package via chi^2 minimization.

linear(self, wl, p1, p2)
chisquare(self, residual)
initialize(self)

Initializes the random forest models by training them on the pre-supplied dataset of parameters. This only needs to be done once for each combination of absorption lines. The model is then pickled and saved for future use in the models/ directory.

fit_line(self, wl, flux, centroid, window=400, edges=200, make_plot=False)

Fit a Voigt profile around a specified centroid on the spectrum.

The continuum is normalized at each absorption line via a simple linear polynimial through the edges. Window size and edge size can be modified.

Parameters
  • wl (array) – Wavelength array of spectrum

  • flux (array) – Flux array of spectrum

  • centroid (float) – The theoretical centroid of the absorption line that is being fitted, in wavelength units.

  • window (float, optional) – How many Angstroms away from the line centroid are included in the fit (in both directions). This should be large enough to include the absorption line as well as some continuum on either side.

  • edges (float, optional) – What distance in Angstroms around each line (measured from the line center outwards) to exclude from the continuum-fitting step. This should be large enough to cover most of the absorption line whilst leaving some continuum intact on either side.

  • make_plot (bool, optional) – Make a plot of the fit.

Returns

A result instance from the lmfit package, from which fitted parameters and fit statistics can be extracted.

Return type

lmfit result object

fit_balmer(self, wl, flux, make_plot=False)

Fits Voigt profiles to the first three Balmer lines (H-alpha, H-beta, and H-gamma). Returns all 18 fitted parameters.

Parameters
  • wl (array) – Wavelength array of spectrum

  • flux (array) – Flux array of spectrum

  • make_plot (bool, optional) – Plot all individual Balmer fits.

Returns

Array of 18 Balmer parameters, 6 for each line. If the profile fit fails, returns array of 18 np.nan values.

Return type

array

train(self, x_data, y_data)

Trains ensemble of random forests on the provided data. Does not require scaling. You shouldn’t ever need to use this directly.

Parameters
  • x_data (array) – Input data, independent variables

  • y_data (array) – Output data, dependent variables

labels_from_parameters(self, balmer_parameters, quantile=0.67)

Predicts stellar labels from Balmer line parameters.

Parameters

balmer_parameters (array) – Array of fitted Balmer parameters from the fit_balmer function.

Returns

Array of predicted stellar labels with the following format: [Teff, e_Teff, logg, e_logg].

Return type

array

save(self, modelname='wd')
load(self, modelname='wd')
labels_from_spectrum(self, wl, flux, make_plot=False, quantile=0.67)

Wrapper function that directly predicts stellar labels from a provided spectrum. Performs continuum-normalization, fits Balmer profiles, and uses the bootstrap ensemble of random forests to infer labels.

Parameters
  • wl (array) – Array of spectrum wavelengths.

  • fl (array) – Array of spectrum fluxes. Can be normalized or un-normalized.

  • make_plot (bool, optional) – Plot all the individual Balmer-Voigt fits.

  • quantile (float, optional) – Which quantile of the fitted labels to use for the bootstrap error estimation. Defaults to 0.67, which corresponds to a 1-sigma uncertainty.

Returns

Array of predicted stellar labels with the following format: [Teff, e_Teff, logg, e_logg].

Return type

array

wdtools.gaia_cov(parallax_error, pmra_error, pmdec_error, parallax_pmra_corr, parallax_pmdec_corr, pmra_pmdec_corr)
wdtools.log_exp_dec_prior(parallax, L=1350)
wdtools.log_mvnorm(x, mu, cov)
wdtools.get_post_samples(obj, walkers, burn, steps, progress=False, L=1350)
wdtools.get_vt_samples(obj, nsample)
wdtools.get_distance_samples(obj, nburn=100.0, nsample=1000.0, progress=False)
wdtools.get_distance_mode(obj, L=1350)
wdtools.plot_orbits(name, obj, rv, e_rv, nmc=10000, norbit=50)
wdtools.teff3d(teff, logg)
wdtools.logg3d(teff, logg)
wdtools.corr3d(teff, logg)