Baseline Correction
Baseline correction¶
This example will look at the first step in TAP data preprocessing: the baseline correction. The baseline indicates where in time of the flux that the measurement of the molecules should be zero. This occurs either before a pulse has been initiated or after a certain time when all of the gas has diffused out of the reactor. However, due to instrument drift and ionization effects, the outlet response may exhibit a shift in the voltage overall time points. It is assumed that this shift is constant over time and hence we do not account for a non-linear baseline.
Below is an example of a flux that has a baseline shift:
# importing tapsap, pandas, numpy and the data
import tapsap
import pandas as pd
import numpy as np
import plotly
plotly.offline.init_notebook_mode()
argon_data = pd.read_csv('../tapsap/data/argon_100C_subset.csv')
times = argon_data['times'].values
pulse_with_shift = argon_data['pulse_10'].values + 0.3
# plotting the 10th pulse
tapsap.plot_tap(times, pulse_with_shift)
Traditional baseline correction¶
The traditional method of baseline correction is to determine where in time the flux is not changing (slope of zero) and taking the average of the baseline points. For example, visual inspection shows that the baseline can be approximated from 4.5 to 5 seconds where the mean of baseline should be approximately 0.3. This mean is verified below:
# grabbing the time indicies of the baseline
time_baseline_index_start = np.argmin(np.abs(times - 4.5))
len_time = len(times)
# taking the mean of the baseline
baseline_mean = np.mean(pulse_with_shift[time_baseline_index_start:len_time])
print(baseline_mean)
0.30016202985621687
The baseline mean is infact approximately 0.3. In tapsap, the baseline_correction function can also reproduce the effects. The function can either take a baseline time range, a baseline amount, or even neither where the last 95% of the points will be taken as the baseline. The output of the function returns a dictionary containing the 'flux' and the 'baseline_amount' Examples of each are given below by plotting the baseline corrected flux.
# using a known baseline coefficient
temp_flux = tapsap.baseline_correction(pulse_with_shift, times, baseline_amount=0.3)['flux']
tapsap.plot_tap(times, temp_flux)
# using a time range
temp_flux = tapsap.baseline_correction(pulse_with_shift, times, baseline_time_range=[4.5, 5])
tapsap.plot_tap(times, temp_flux['flux'])
# using the last 95% of the time
temp_flux = tapsap.baseline_correction(pulse_with_shift, times)
tapsap.plot_tap(times, temp_flux['flux'])
# printing the baseline_amount using the last 95% of the time
temp_flux['baseline_amount']
0.299987322790209
Baseline correction via the Gamma distribution¶
Sometimes the flux does not reach a baseline. Slow reactions or molecules sticking to the inert material may result in a flux that does not have a baseline. To address this, an approximation of the flux using a statistical distribution of the molecules with respect time can be used. More specifically, the Gamma distribution is used as it is connected to the velocity of molecules in Knudsen diffusion and a series of CSTR reactors. This method first determines the approximate Gamma distribution from the peak residence time (the time of the maximum of the flux, approximately 0.2 seconds in this example) and then uses the area of the gamma distribution to correct the area of the flux. This function (baseline_gamma) does not require any other inputs beyond the flux and the time.
# using the gamma distribution
temp_flux = tapsap.baseline_gamma(pulse_with_shift, times)
tapsap.plot_tap(times, temp_flux['flux'])
print(temp_flux['baseline_amount'])
0.26439835038884296
The baseline amount provided by the baseline_gamma is not quite what is expected (0.26 to 0.3). This could indicate a that either the reaction is not complete or that peak residence time cannot be accurately measured due to the noise. With that in mind, the process is repeated, but using clean version of the flux:
# using the gamma distribution with a clean flux
temp_smoothed_flux = tapsap.smooth_flux_gam(pulse_with_shift)
result = tapsap.baseline_gamma(temp_smoothed_flux, times)
tapsap.plot_tap(times, pulse_with_shift - result['baseline_amount'])
print(temp_flux['baseline_amount'])
0.26439835038884296
Smoothing the flux resulted in the baseline amount being closer to the initial value of 0.3 without any estimation of where the baseline starts and ends.
Application to a Transient object¶
The above code can be applied to each value of the dataframe, but can also be used in the method of the transient class called baseline_correct. The baseline_correct method takes the argumens baseline_time_range (a list of start and end time), baseline_amount (a float on the correction amount) and smooth_flux (smoothing the flux for baseline_gamma). The difference between this method and the baseline_correction function is that if the baseline_time_range nor the baseline amount are indicated, then the baseline_gamma function will be used. The example below processes the data using a given time range.
# read in the data
data = tapsap.read_tdms('../tapsap/data/argon_100C.tdms')
transient_info = data.species_data['AMU_40_1']
# apply baseline_correction to all pulses
transient_info.baseline_correct(baseline_time_range = [4.5, 5])
tapsap.plot_tap(transient_info.times, transient_info.flux.iloc[:,10].values)