Random Forest Regression

Introduction

This method fits white dwarf Balmer lines with parametric Voigt profiles, deriving their full-width at half-max (FWHM) and line amplitudes. The lines parameters of width and breadth are used with a random forest regression model to predict the stellar labels of effective temperature and surface gravity. Currently, this model uses the first four Balmer lines (or any subset therein), and ships pre-trained on 5000 spectra from the Sloan Digital Sky Survey with stellar labels calculated by [3].

1

Daniel Foreman-Mackey, David W. Hogg, Dustin Lang, and Jonathan Goodman. emcee: The MCMC Hammer. PASP, 125(925):306, March 2013. doi:10.1086/670067.

2

D. Koester. White dwarf spectra and atmosphere models. Mem. SAI, 81:921–931, 2010.

3

P. -E. Tremblay, E. Cukanovaite, N. P. Gentile Fusillo, T. Cunningham, and M. A. Hollands. Fundamental parameter accuracy of DA and DB white dwarfs in Gaia Data Release 2. MNRAS, 482(4):5222–5232, February 2019. arXiv:1811.03084, doi:10.1093/mnras/sty3067.

API

class wdtools.LineProfiles(verbose=False, plot_profiles=False, n_trees=25, n_bootstrap=25, lines=['alpha', 'beta', 'gamma', 'delta'], optimizer='leastsq')

Class to fit Voigt profiles to the Balmer absorption lines of DA white dwarfs, and then infer stellar labels.

Probabilistic prediction uses 100 boostrapped random forest models with 25 trees each, trained on 5326 spectra from the Sloan Digital Sky Survey. Ground truth labels are taken from Tremblay et al. (2019) Line profiles are fit using the LMFIT package via chi^2 minimization.

fit_balmer(wl, flux, make_plot=False)

Fits Voigt profiles to the first three Balmer lines (H-alpha, H-beta, and H-gamma). Returns all 18 fitted parameters.

Parameters
  • wl (array) – Wavelength array of spectrum

  • flux (array) – Flux array of spectrum

  • make_plot (bool, optional) – Plot all individual Balmer fits.

Returns

Array of 18 Balmer parameters, 6 for each line. If the profile fit fails, returns array of 18 np.nan values.

Return type

array

fit_line(wl, flux, centroid, window=400, edges=200, make_plot=False)

Fit a Voigt profile around a specified centroid on the spectrum.

The continuum is normalized at each absorption line via a simple linear polynimial through the edges. Window size and edge size can be modified.

Parameters
  • wl (array) – Wavelength array of spectrum

  • flux (array) – Flux array of spectrum

  • centroid (float) – The theoretical centroid of the absorption line that is being fitted, in wavelength units.

  • window (float, optional) – How many Angstroms away from the line centroid are included in the fit (in both directions). This should be large enough to include the absorption line as well as some continuum on either side.

  • edges (float, optional) – What distance in Angstroms around each line (measured from the line center outwards) to exclude from the continuum-fitting step. This should be large enough to cover most of the absorption line whilst leaving some continuum intact on either side.

  • make_plot (bool, optional) – Make a plot of the fit.

Returns

A result instance from the lmfit package, from which fitted parameters and fit statistics can be extracted.

Return type

lmfit result object

initialize()

Initializes the random forest models by training them on the pre-supplied dataset of parameters. This only needs to be done once for each combination of absorption lines. The model is then pickled and saved for future use in the models/ directory.

labels_from_parameters(balmer_parameters, quantile=0.67)

Predicts stellar labels from Balmer line parameters.

Parameters

balmer_parameters (array) – Array of fitted Balmer parameters from the fit_balmer function.

Returns

Array of predicted stellar labels with the following format: [Teff, e_Teff, logg, e_logg].

Return type

array

labels_from_spectrum(wl, flux, make_plot=False, quantile=0.67)

Wrapper function that directly predicts stellar labels from a provided spectrum. Performs continuum-normalization, fits Balmer profiles, and uses the bootstrap ensemble of random forests to infer labels.

Parameters
  • wl (array) – Array of spectrum wavelengths.

  • fl (array) – Array of spectrum fluxes. Can be normalized or un-normalized.

  • make_plot (bool, optional) – Plot all the individual Balmer-Voigt fits.

  • quantile (float, optional) – Which quantile of the fitted labels to use for the bootstrap error estimation. Defaults to 0.67, which corresponds to a 1-sigma uncertainty.

Returns

Array of predicted stellar labels with the following format: [Teff, e_Teff, logg, e_logg].

Return type

array

train(x_data, y_data)

Trains ensemble of random forests on the provided data. Does not require scaling. You shouldn’t ever need to use this directly.

Parameters
  • x_data (array) – Input data, independent variables

  • y_data (array) – Output data, dependent variables