Part 1: Getting started
This notebook describes some early exploration into data downloaded from the Glooko website. It includes both Dexcom CGM readings as well as Omnipod 5 insulin pump information.
My daughter was diagnosed in November and is still well within the diabetes "honeymoon" phase. The honeymoon phase happens after a Type 1 diabetic first starts receiving insulin. For some reason, the pancreas decides to kick into gear again, producing some of its own insulin in decreasing and unpredictable quantities. This can last from a few months to a few years. All of this means that it's very unlikely I'll find much in the way of insightful correlations. But I still want to look.
I am curious about smaller, short-term patterns. Some initial questions I have include:
- What is the typical level right before she eats/has a bolus?
- How does it change for different meals?
- How high is the peak immediately following that meal?
- How long does that peak last? (how peaky is the peak? What is its KURTOSIS - yeah, 4th statistical moment baby)
- What's the lowest she reaches before the next meal?
- Can cumulative blood glucose (BG) tell us anything? (i.e., taking the integral)
But probably the biggest questions are:
- Am I using the right carb ratio for each of her meals?
- Do I need to adjust her basal rate?
- Is her insulin correction right?
import pandas as pd
import numpy as np
def read_all(data_folder):
"""
Read all of the data and return a series of pandas data frames
Example usage:
df_cgm, df_bolus, df_basal, df_insulin = read_all(r"data")
Args:
data_folder (str): path to the root folder containing all the data files. The file names
and structure should be left exactly as they were in the initial download from Glooko.
Returns:
A 4-element tuple
- **df_cgm** (pandas df): pandas dataframe containing CGM time series data
- **df_bolus** (pandas df): Bolus data
- **df_basal** (pandas df): Basal data
- **df_insulin** (pandas df): Insulin corrections data
"""
# Load and format CGM data
df_cgm_data = pd.read_csv(data_folder + r"/cgm_data.csv", header=1,\
names=["time", "bg", "sn_cgm"])
df_cgm_data["time"] = pd.to_datetime(df_cgm_data["time"])
# Load and format bolus data
df_bolus_data = pd.read_csv(data_folder + r"/Insulin data/bolus_data.csv", header=1, \
names=["time", "insulin_type", "bg_input", "carbs_input", "carb_ratio",\
"insulin_delivered", "initial_delivery", "extended_delivery", "sn_omni"])
df_bolus_data["time"] = pd.to_datetime(df_bolus_data["time"])
# Load and format basal data
df_basal_data = pd.read_csv(data_folder + r"/Insulin data/basal_data.csv", header=1,\
names=["time", "insulin_type", "duration", "percentage", "rate", "insulin_delivered",\
"sn_omni"])
df_basal_data["time"] = pd.to_datetime(df_basal_data["time"])
# Load and format insulin data
df_insulin_data = pd.read_csv(data_folder + r"/Insulin data/insulin_data.csv", header=1,\
names=["time", "total_bolus", "total_insulin", "total_basal", "sn_omni"])
df_insulin_data["time"] = pd.to_datetime(df_insulin_data["time"])
return df_cgm_data, df_bolus_data, df_basal_data, df_insulin_data
# Load all of the data into pandas dataframes
df_cgm, df_bolus, df_basal, df_insulin = read_all(r"../data")
Bolus data¶
Everything I want to look at initially I can get from the Bolus data file. To start with, the bolus sheet includes the total insulin delivered, but you don't explicitly know how much was due to the insulin correction and how much was due to carbs. I will initially calculate by simply computing the carb dose using the carb ratio and carbs input. I'll then find the difference between that and the insulin deivered to get the portion that was a "glucose correction". This is actually a little more complex because there are additional factors like insulin-on-board (IOB).
df_bolus["carb_correction"] = np.divide(df_bolus["carbs_input"], df_bolus["carb_ratio"])
df_bolus["insulin_correction"] = df_bolus["insulin_delivered"] - df_bolus["carb_correction"]
import hvplot.pandas #noqa
import holoviews as hv
# Use holoviews to quicky plot CGM data
df_cgm.hvplot.line(x='time', y='bg',
ylabel='Blood glucose (mg/dL)', xlabel='Time',
height=500, width=620, color='lightgray')*\
df_cgm.hvplot.scatter(x='time', y='bg')*\
hv.HLine(150).opts(
color='k',
line_dash='dashed',
line_width=2.0,
)
Extracting key information¶
Now that I've got a plot, it's going to be important to extract information that might tell me something about glucose and insulin patterns. I'm going to try to extract information like:
- times and amplitudes of peaks
- times and amplitudes of troughs
- time it takes for BG to drop back to 150 after a meal
import matplotlib.pyplot as plt
%matplotlib widget
# start by taking the diff of the CGM data (compute the derivative)
df_cgm["d_bg"] = [0] + np.diff(df_cgm["bg"]).tolist()
# Plot diff and cgm data
fig,ax = plt.subplots(2,1, sharex=True)
ax[0].plot(df_cgm["time"], df_cgm["bg"],'.-')
ax[0].grid(True)
ax[0].set_title('CGM data')
ax[0].set_ylabel('Blood glucose (mg/dL')
ax[1].plot(df_cgm["time"], df_cgm["d_bg"],'.-')
ax[1].set_ylim([-50,50])
ax[1].grid(True)
ax[1].set_title('d_cgm/d_t')
ax[1].set_ylabel('BG (mg/dL) - difference')
fig.tight_layout()
The diff data is really bumpy. In order to extract peaks and troughs we need to pull out the places where the "derivative" is equal to zero. But when it's noisy, it's really hard to do that. I'm going to try smoothing the time series data first using a gaussian smoother. Luckily there's already a simple smoother in the scipy packages so let's give that a whirl!
from scipy.ndimage import gaussian_filter1d
g_filter = gaussian_filter1d(df_cgm['bg'],4)
df_cgm['bg_filt'] = g_filter
fig2, ax2 = plt.subplots(1,1)
ax2.plot(df_cgm['time'], df_cgm['bg'], 'k', label='Original')
ax2.plot(df_cgm['time'], g_filter, '--r', label='Filtered')
ax2.set_xlim([19402.098437682915, 19403.020563564292])
ax2.set_title('Gaussian filtering')
ax2.set_ylabel('Blood glucose (mg/dL)')
ax2.set_xlabel('Time')
ax2.tick_params(rotation=45)
ax2.grid()
fig2.tight_layout()
It looks like a gaussian filter with a standard deviation of 4 is a decent place to start (although you can see that the smoothed version does not quite capture the full magnitude of the peaks and troughs - but we'll sort that out later). Let's see how that affects the difference plot.
# Re-calculate the diff data using the filtered CGM data
df_cgm["d_bg"] = [0] + np.diff(df_cgm["bg_filt"]).tolist()
# Plot filtered CGM data with differenced data
fig3,ax3 = plt.subplots(2,1, sharex=True)
ax3[0].plot(df_cgm["time"], df_cgm["bg_filt"],'.-')
ax3[0].grid(True)
ax3[0].set_title('CGM data')
ax3[0].set_ylabel('Blood glucose (mg/dL')
ax3[1].plot(df_cgm["time"], df_cgm["d_bg"],'.-')
ax3[1].plot([np.min(df_cgm["time"]), max(df_cgm["time"])], [0, 0], 'r-', linewidth=2)
ax3[1].set_ylim([-10,10])
ax3[1].grid(True)
ax3[1].set_title('d_cgm/d_t')
ax3[1].set_ylabel('BG (mg/dL) - difference')
ax3[1].set_xlim([19402.098437682915, 19403.020563564292])
ax3[1].tick_params(rotation=45)
fig3.tight_layout()