Non-linear drift calibration
Non-linear drift calibration¶
This is an example where there exists non-linear drift within an inert TAP experiment. This example is used to compare the traditional calibration vs TEAK. The goal of this demonstration is to show how to use TEAK. The data set consists of two pulses in gases (Argon and CO2) and measurements of the fragmentation patterns of CO2. The calibration is performed on the measured gas species (Argon and CO2).
Reading in the data and removing known outgassing.
# imports
import tapsap
import copy
import plotly
import plotly.express as px
import pandas as pd
import numpy as np
plotly.offline.init_notebook_mode()
# read in the data
data = tapsap.read_tdms('../../TAPdata/CO2_intensity_1_18.5_eV.tdms')
# take the moments of the raw data
for i in list(data.species_data.keys()):
data.species_data[i].set_moments()
# making a copy of the data
traditional_data = copy.deepcopy(data)
teak_data = copy.deepcopy(data)
# species to examine
all_species_list = ['AMU_12_1', 'AMU_16_1', 'AMU_28_1' , 'AMU_40.1_1', 'AMU_44_1']
all_species_list_pretty = ['Mass 12', 'Mass 16', 'Mass 28' , 'Ar', 'CO2']
species_list = ['AMU_40.1_1', 'AMU_44_1']
species_pretty_names = ['Ar', 'CO2']
pulse_array = np.arange(0, data.species_data[i].num_pulse)
known_outgassing = [0,1,13, 93, 116, 154, 176, 198, 220, 249, 263]
pulse_array = np.delete(pulse_array, known_outgassing)
For a base comparision, the zeroth moment of Argon and CO2 are plotted against each other without any baseline correction nor calibration. Both series voltage values are close to one another and should have minimal scaling difference. However, there exists significant drift within the experiment. Some of the M0 measurements are relatively close (near pulse 150) while others differ significantly (pulse 75).
# plotting the raw values
df = pd.DataFrame()
for i, temp_species in enumerate(species_list):
df[species_pretty_names[i]] = data.species_data[temp_species].df_moments['M0']
df = df.drop(known_outgassing)
tapsap.plot_tap(pulse_array, df, legend = True, x_lab= 'Pulse Number', y_lab= 'M0')
Traditional calibration¶
All data is baseline corrected. The traditional method selects a time range to determine the baselien (4.7 to 5 seconds).
# apply baseline correction to each gas species
for i in all_species_list:
traditional_data.species_data[i].baseline_correct(baseline_time_range=[4.7,5])
traditional_data.species_data[i].set_moments()
Below is code for determining the mean and standard deviation of each flux after baseline correction. This is not necessary in preprocessing the data, but is used to show evidence of the Gamma distribution as the residence time distribution. More specifically, if the mean and standard deviation show a linear trend, then this follows the same trend within the Gamma distribution.
df = pd.DataFrame()
species_mean = []
species_sd = []
for i in all_species_list:
# removing the first index since it is an outlier
temp_data = traditional_data.species_data[i].df_moments['M0'].values
temp_data = np.delete(temp_data, known_outgassing)
species_mean.append(np.median(temp_data))
species_sd.append(np.std(temp_data))
species_mean = np.array(species_mean)
species_sd = np.array(species_sd)
df['mean'] = species_mean
df['sd'] = species_sd
df['species'] = np.array(all_species_list_pretty)
# plot the mean and standard deviation of the data after baseline correction
fig = px.scatter(df, x='mean', y='sd',color='species', template='plotly_white')
fig.update_traces(marker = dict(size = 20))
fig.update_layout(
xaxis_title='Mean',
yaxis_title='Standard Deviation',
font={
'size': 30
}
)
fig.show()
The mean values calculated earlier can also be used to determine the calibration coefficient for each gas with respect to argon. This is the typically process to determine a calibration coefficient within an inert experiement. More specificially, the calibration coefficient is the mean argon zeroth moment is divided by each species mean zeroth moment.
# Determining the calibration coefficients manually
# recall that the 3rd species is the argon and all others are ordered by mass number
calibration_coefs = species_mean[3] / species_mean
calibration_coefs = [calibration_coefs[3], calibration_coefs[4]]
print(calibration_coefs)
[1.0, 1.04656179085164]
Using the calibration coefficient, each CO2 flux is scaled to be the same mean argon zeroth moment.
# Traditional calibration
for i, temp_species in enumerate(species_list):
traditional_data.species_data[temp_species].calibrate_flux(calibration_coefs[i])
traditional_data.species_data[temp_species].set_moments()
Below shows a plot of the Argon vs CO2 moments after calibration. The difference between individual pulse indices match better than previously shown on the raw data.
# plotting the traditionally calibrated values
df = pd.DataFrame()
for i, temp_species in enumerate(species_list):
df[species_pretty_names[i]] = traditional_data.species_data[temp_species].df_moments['M0']
df = df.drop(known_outgassing)
tapsap.plot_tap(pulse_array, df, legend = True, x_lab= 'Pulse Number', y_lab= 'M0')
TEAK Calibration¶
Again, before any calibration, the data is baseline corrected. Note that the TEAK method uses information about the residence time distribution to determine the baseline coefficient. TEAK does require more time in computation due to the smoothing applied.
for i in all_species_list:
teak_data.species_data[i].baseline_correct(smooth_flux = True)
teak_data.species_data[i].set_moments()
The second step of TEAK is to calibrate the inert species (Argon) to itself such that it is consistent in all pulse numbers.
# First self calibrate the argon data
teak_data.species_data['AMU_40.1_1'].calibrate_flux(reference_index = 4, constraints = False, smooth_flux = True, huber_loss = True)
# plotting the teak calibrated values of argon
for i, temp_species in enumerate(species_list):
teak_data.species_data[temp_species].set_moments(smooth_flux = True) #
df = pd.DataFrame()
df['Ar'] = teak_data.species_data['AMU_40.1_1'].df_moments['M0']
df = df.drop(known_outgassing)
tapsap.plot_tap(pulse_array, df, legend = True, x_lab= 'Pulse Number', y_lab= 'M0')
print(np.std(df['Ar'].values))
0.0004605553355567907
After self calibrating the Argon response, the Argon is set to the reference species of CO2. Finally, the CO2 values are calibrated agains the Argon.
# next set reference gas for 44
teak_data.species_data['AMU_44_1'].reference_gas = teak_data.species_data['AMU_40.1_1']
teak_data.species_data['AMU_44_1'].calibrate_flux(smooth_flux = True, huber_loss = True, constraints = True)
A simple plot of the Argon and CO2 compared to one another. The Argon and CO2 almost perfectly align in this case.
# plotting the teak calibrated values for argon and co2
for i in species_list:
teak_data.species_data[i].set_moments(smooth_flux = True)
df = pd.DataFrame()
df['Ar'] = teak_data.species_data['AMU_40.1_1'].df_moments['M0']
df['CO2'] = teak_data.species_data['AMU_44_1'].df_moments['M0']
df = df.drop(known_outgassing)
tapsap.plot_tap(pulse_array, df, legend = True, x_lab= 'Pulse Number', y_lab= 'M0')
print(np.std(df['CO2']))
0.00046049770990623057
Finally, the traditional and TEAK methods of calibration are compared to one another below. TEAK is able to deal with the drift in the measurements as well as provide a closer representation of the CO2 compared to Argon.
# Comparing traditional and TEAK
df = pd.DataFrame()
df['Ar Traditional'] = traditional_data.species_data['AMU_40.1_1'].df_moments['M0']
df['CO2 Traditional'] = traditional_data.species_data['AMU_44_1'].df_moments['M0']
df['Ar TEAK'] = teak_data.species_data['AMU_40.1_1'].df_moments['M0']
df['CO2 TEAK'] = teak_data.species_data['AMU_44_1'].df_moments['M0']
df = df.drop(known_outgassing)
tapsap.plot_tap(pulse_array, df, legend = True, x_lab= 'Pulse Number', y_lab= 'M0')
To compare the two processes quantitatively, the root mean square error (RMSE) is calculated from the difference of Argon to CO2 for each method. The ratio of the traditional and TEAK RMSE reveals that TEAK reduces the error by a factor of approximately 30 times.
traditional_rmse = tapsap.rmse(traditional_data.species_data['AMU_40.1_1'].df_moments['M0'], traditional_data.species_data['AMU_44_1'].df_moments['M0'])
teak_rmse = tapsap.rmse(teak_data.species_data['AMU_40.1_1'].df_moments['M0'], teak_data.species_data['AMU_44_1'].df_moments['M0'])
print("Traditional MSE: ", traditional_rmse**2)
print("TEAK MSE: ", teak_rmse**2)
print("Reduction in error:", traditional_rmse**2 / teak_rmse**2 )
Traditional MSE: 0.009547081965082783 TEAK MSE: 1.1165141179115335e-05 Reduction in error: 855.0793771368363
fig = px.histogram(df, x="Ar Traditional", nbins=10, template='plotly_white')
fig.update_layout(
font={
'size': 30
}
)
fig.show()
fig = px.histogram(df, x="Ar TEAK", nbins=10, template='plotly_white')
fig.update_layout(
font={
'size': 30
}
)
fig.show()