Water Quality in Canada

Water Quality in Canada#

Jenny Lee, September 2022

Note

Kindly note that this web page solely offers a synopsis of the Water Quality in Canada project. The comprehensive set of scripts and codes used in the project is available in the corresponding GitHub repository.

Project Outline#

Data science toolbox used in this project

Data analysis
Data cleaning and wrangling
Interactive visualizations with Plotly

In this project, long-term monitoring data from Statistics Canada[1] will be utilized. To assess long-term water quality, Statistics Canada gathered data on 145 distinct parameters from 21 observation sites.

Among the 145 distinct water quality parameters, a detailed investigation will focus on five specific parameters: pH, dissolved oxygen, water temperature, dissolved sulphate, and dissolved phosphorus. For a deeper understanding of each parameter, please refer to the table below.

Parameter	Description
pH	A measure of the hydrogen ion activity in a solution. It indicates how acidic or basic water is.
Water Temperature	Variances in water temperature influence the types of aquatic organisms present, triggering chain reactions in aquatic ecosystems.
Dissolved Oxygen	Crucial for aquatic organisms’ respiration, low dissolved oxygen levels can lead to increased mortality of fish eggs and harm aquatic ecosystems.
Dissolved Sulphate	The oxidized form of sulphur, essential for biological processes, but high concentrations can be detrimental to aquatic organisms.
Dissolved Phosphorus	Essential for plant and animal growth, high concentrations can accelerate vegetation and algae growth, disrupting aquatic ecosystems.

Description from the table was retrieved from British Columbia government website[2].

Overall Process#

Data Preprocessing#

Since the entire dataset will be quite large, let’s first look into data from a single location.

(92418, 11)

	SITE_NO	DATE_TIME_HEURE	FLAG_MARQUEUR	VALUE_VALEUR	SDL_LDE	MDL_LDM	VMV_CODE	UNIT_UNITE	VARIABLE	VARIABLE_FR	STATUS_STATUT
0	BC08NL0001	2000-01-11 09:15	NaN	0.0170	0.0020	NaN	100216	MG/L	ALUMINUM TOTAL	ALUMINIUM TOTAL	P
1	BC08NL0001	2000-01-11 09:15	NaN	0.0003	0.0001	NaN	100250	MG/L	ARSENIC TOTAL	ARSENIC TOTAL	P
2	BC08NL0001	2000-01-11 09:15	NaN	19.6000	0.0100	NaN	100493	UG/L	BARIUM EXTRACTABLE	BARYUM EXTRACTIBLE	P
3	BC08NL0001	2000-01-11 09:15	NaN	0.0192	0.0002	NaN	100217	MG/L	BARIUM TOTAL	BARYUM TOTAL	P
4	BC08NL0001	2000-01-11 09:15	<	0.0020	0.0020	NaN	100474	UG/L	BERYLLIUM EXTRACTABLE	BÉRYLLIUM EXTRACTIBLE	P

Note that the dataset comprises 92,418 rows and 11 columns. Given the substantial size of the data when compiled from all 21 locations, we decided to selectively choose specific columns and rows of interest before proceeding further.

Now, the process involves gathering data from all locations. It’s important to note that only five parameters are being used, and these will be filtered during the data cleaning process. The displayed data frame below has undergone extensive preprocessing, and you can find the corresponding codes in the project GitHub repository. For viewing purposes, only data for Northwest Territories is displayed.

(14976, 11)

	PROVINCE	LOCATION	YEAR	SITE_NAME	SITE_TYPE	LONGITUDE	LATITUDE	VARIABLE	VALUE	UNIT	COLOR
0	Northwest Territories	arctic	2000	HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY	RIVER/RIVIÈRE	-122.4022	68.7542	pH	7.9825	PH	#1f77b4
1	Northwest Territories	arctic	2000	HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY	RIVER/RIVIÈRE	-122.4022	68.7542	Dissolved Phosphorus	0.0070	mg/L	#1f77b4
2	Northwest Territories	arctic	2000	HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY	RIVER/RIVIÈRE	-122.4022	68.7542	Dissolved Sulphate	10.0500	mg/L	#1f77b4
3	Northwest Territories	arctic	2000	HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY	RIVER/RIVIÈRE	-122.4022	68.7542	Water Temperature	6.9500	°C	#1f77b4
4	Northwest Territories	arctic	2002	HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY	RIVER/RIVIÈRE	-122.4022	68.7542	pH	7.8550	PH	#1f77b4

Summarization#

In our analysis, we also explore the provincial average of the parameter concentrations (values). The dataframe presented below offers a condensed view of the previous dataset, presenting the provincial average values for each year concerning the five selected parameters. For viewing purposes, only data for Alberta is displayed.

(1024, 5)

	PROVINCE	VARIABLE	YEAR	UNIT	VALUE
0	Alberta	Dissolved Oxygen	2000	mg/L	10.402069
1	Alberta	Dissolved Oxygen	2001	mg/L	10.925248
2	Alberta	Dissolved Oxygen	2002	mg/L	10.539742
3	Alberta	Dissolved Oxygen	2003	mg/L	10.522747
4	Alberta	Dissolved Oxygen	2004	mg/L	10.126394

Final Deliverable#

Line Graph#

Here, we utilize df_summary to create a line graph, which encompasses provincial average data of parameter concentrations from 2000 to 2019. Please be aware that certain data points are missing; the presence of discontinued lines in the plot indicates the absence of particular data.

Overall, parameter concentrations and values appear to remain relatively stable. While there are fluctuations on a yearly basis, there isn’t a significant change in the overall trend. One noteworthy observation is that Quebec exhibits a notably high concentration of dissolved phosphorus compared to other provinces.

Data-Plotted Map#

Now, we proceed to create a map with data plotted based on density. In this graph, the size of the bubbles will represent the parameter concentration or values. To facilitate comparison, we’ve categorized all parameter values into three groups: < first quartile, > Second quartile (median), and > third quartile. We will begin by developing functions that enable us to categorize the parameter values uniformly.

Data points in the < First quartile category represent the lower 25% of the value spectrum.
Data points in the > Second quartile category cover the 25% to 75% range of the value spectrum.
Data points in the > Third quartile category encompass the upper 25% of the value spectrum.

pH#

Dissolved Oxygen#

Water Temperature#

Combined Map with `iPywidget`#

With the assistance of ipywidgets, a dropdown menu has been incorporated to facilitate switching between parameters on a map diagram.

The code for generating the mapped figure with dropdowns is concealed below. Please expand to view.

Show code cell source Hide code cell source

# Define a function to assign category to parameter values
def create_scale(all_values, value):
    small = np.percentile(value, 25)
    large = np.percentile(value, 75)
    if (all_values <= small): return "< First Percentile"
    elif (all_values > small) & (all_values <= large): return "> First Percentile"
    elif (all_values >= large): return "> Third Percentile"

# Apply the scale to associated values 
def apply_scale(all_values):
    num_small = 5
    num_medium = 10
    num_large = 20
    if (all_values == "< First Percentile"): return num_small
    elif (all_values == "> First Percentile"): return num_medium
    elif (all_values == "> Third Percentile"): return num_large

def add_category(df):
    df["Category"] = "NA"
    df["Scale"] = np.NaN
    for variable in df["VARIABLE"].unique():
        mydf = df[df["VARIABLE"] == variable]
        quantile_variable = mydf.VALUE.astype(float)
        for index, row in df.iterrows():
            if row.VARIABLE == variable:
                myval = row.VALUE
                category = create_scale(myval, quantile_variable)
                df.at[index, "Category"] = category
                scaled_value = apply_scale(category)
                df.at[index, "Scale"] = scaled_value
    return df

# Create a scatterplot 
def create_map_with_widget(df):
    add_category(df)
    years = df["YEAR"].unique().tolist()

    hovertemplated="<br>".join([
        "<b>Province:  %{customdata[0]}</b>",
        "Longitude: %{customdata[2]}",
        "Latitude: %{customdata[1]}",
        "Location: %{customdata[3]}",
        "Variable: %{customdata[4]}",
        "Value: %{customdata[5]}",
        "Unit: %{customdata[6]}",
        "Category: %{customdata[7]}"])

    frames = [{"name":"frame_{}".format(year),
               "data":[{"type":"scattermapbox",
                        "lat":df.loc[df["YEAR"] == year]["LATITUDE"],
                        "lon":df.loc[df["YEAR"] == year]["LONGITUDE"],
                        "marker":go.scattermapbox.Marker(
                            size=df.loc[df["YEAR"] == year]["Scale"],
                            color=df.loc[df["YEAR"] == year]["COLOR"]),
                        "customdata":np.stack((df.loc[df["YEAR"] == year]["PROVINCE"],
                                   df.loc[df["YEAR"] == year]["LATITUDE"],
                                   df.loc[df["YEAR"] == year]["LONGITUDE"],
                                   df.loc[df["YEAR"] == year]["LOCATION"],
                                   df.loc[df["YEAR"] == year]["VARIABLE"],
                                   df.loc[df["YEAR"] == year]["VALUE"],
                                   df.loc[df["YEAR"] == year]["UNIT"],
                                   df.loc[df["YEAR"] == year]["Category"]), axis=1),
                        "hovertemplate": hovertemplated + "<extra></extra>"}]} for year in years]

    sliders = [{"transition":{"duration":0}, "x":0.08, "len":0.9,
                "currentvalue":{"font":{"size":12},"prefix":"Year: ", "visible":True, "xanchor":"left"},
                "steps":[{"label":year, "method":"animate",
                          "args":[["frame_{}".format(year)],
                                  {'mode':'immediate', 'frame':{'duration':100, 'redraw': True}, 
                                   'transition':{'duration':50}}]} for year in years]}]

    play_button = [{'type':'buttons','showactive':True,'x':0.045, 'y':-0.1,
                    'buttons':[{'label':'►','method':'animate',
                                'args':[None,{'frame':{'duration':100, 'redraw':True},
                                              'transition':{'duration':65},
                                              'fromcurrent':True,'mode':'immediate'}]}]}]

    layout = go.Layout(sliders=sliders, updatemenus=play_button,
                       mapbox = {'style': "carto-positron", 'center': {'lon':-98, 'lat': 58},'zoom': 2.5})

    myfig = go.Figure(data=frames[0]["data"], layout=layout, frames=frames)
    myfig.update_layout(width=800, height=500, margin=dict(l=20, r=20, t=0, b=20))
    myfig.show()

# Link the ipywidget with parameters and created scatterplot
variables = widgets.Dropdown(options=sorted(df.VARIABLE.unique()), 
                             value=df["VARIABLE"].unique().tolist()[0], description="Variable: ")

def created_map_with_widget(change):
    IPython.display.clear_output(wait=True)
    df1 = df.loc[df["VARIABLE"] == variables.value].copy().sort_values('YEAR')
    display(variables)
    create_map_with_widget(df1)


variables.observe(created_map_with_widget, names="value")
created_map_with_widget("")

Water Quality in Canada

Contents

Water Quality in Canada#

Project Outline#

Overall Process#

Data Preprocessing#

Summarization#

Final Deliverable#

Line Graph#

Data-Plotted Map#

pH#

Dissolved Oxygen#

Water Temperature#

Combined Map with `iPywidget`#

References#

Water Quality in Canada

Contents

Water Quality in Canada#

Project Outline#

Overall Process#

Data Preprocessing#

Summarization#

Final Deliverable#

Line Graph#

Data-Plotted Map#

pH#

Dissolved Oxygen#

Water Temperature#

Combined Map with iPywidget#

References#

Combined Map with `iPywidget`#