Water Quality in Canada#

Jenny Lee, September 2022

Note

Kindly note that this web page solely offers a synopsis of the Water Quality in Canada project. The comprehensive set of scripts and codes used in the project is available in the corresponding GitHub repository.

Project Outline#

In this project, long-term monitoring data from Statistics Canada[1] will be utilized. To assess long-term water quality, Statistics Canada gathered data on 145 distinct parameters from 21 observation sites.

Among the 145 distinct water quality parameters, a detailed investigation will focus on five specific parameters: pH, dissolved oxygen, water temperature, dissolved sulphate, and dissolved phosphorus. For a deeper understanding of each parameter, please refer to the table below.

Parameter

Description

pH

A measure of the hydrogen ion activity in a solution. It indicates how acidic or basic water is.

Water Temperature

Variances in water temperature influence the types of aquatic organisms present, triggering chain reactions in aquatic ecosystems.

Dissolved Oxygen

Crucial for aquatic organisms’ respiration, low dissolved oxygen levels can lead to increased mortality of fish eggs and harm aquatic ecosystems.

Dissolved Sulphate

The oxidized form of sulphur, essential for biological processes, but high concentrations can be detrimental to aquatic organisms.

Dissolved Phosphorus

Essential for plant and animal growth, high concentrations can accelerate vegetation and algae growth, disrupting aquatic ecosystems.

Description from the table was retrieved from British Columbia government website[2].

Overall Process#

Data Preprocessing#

Since the entire dataset will be quite large, let’s first look into data from a single location.

(92418, 11)
SITE_NO DATE_TIME_HEURE FLAG_MARQUEUR VALUE_VALEUR SDL_LDE MDL_LDM VMV_CODE UNIT_UNITE VARIABLE VARIABLE_FR STATUS_STATUT
0 BC08NL0001 2000-01-11 09:15 NaN 0.0170 0.0020 NaN 100216 MG/L ALUMINUM TOTAL ALUMINIUM TOTAL P
1 BC08NL0001 2000-01-11 09:15 NaN 0.0003 0.0001 NaN 100250 MG/L ARSENIC TOTAL ARSENIC TOTAL P
2 BC08NL0001 2000-01-11 09:15 NaN 19.6000 0.0100 NaN 100493 UG/L BARIUM EXTRACTABLE BARYUM EXTRACTIBLE P
3 BC08NL0001 2000-01-11 09:15 NaN 0.0192 0.0002 NaN 100217 MG/L BARIUM TOTAL BARYUM TOTAL P
4 BC08NL0001 2000-01-11 09:15 < 0.0020 0.0020 NaN 100474 UG/L BERYLLIUM EXTRACTABLE BÉRYLLIUM EXTRACTIBLE P

Note that the dataset comprises 92,418 rows and 11 columns. Given the substantial size of the data when compiled from all 21 locations, we decided to selectively choose specific columns and rows of interest before proceeding further.

Now, the process involves gathering data from all locations. It’s important to note that only five parameters are being used, and these will be filtered during the data cleaning process. The displayed data frame below has undergone extensive preprocessing, and you can find the corresponding codes in the project GitHub repository. For viewing purposes, only data for Northwest Territories is displayed.

(14976, 11)
PROVINCE LOCATION YEAR SITE_NAME SITE_TYPE LONGITUDE LATITUDE VARIABLE VALUE UNIT COLOR
0 Northwest Territories arctic 2000 HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY RIVER/RIVIÈRE -122.4022 68.7542 pH 7.9825 PH #1f77b4
1 Northwest Territories arctic 2000 HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY RIVER/RIVIÈRE -122.4022 68.7542 Dissolved Phosphorus 0.0070 mg/L #1f77b4
2 Northwest Territories arctic 2000 HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY RIVER/RIVIÈRE -122.4022 68.7542 Dissolved Sulphate 10.0500 mg/L #1f77b4
3 Northwest Territories arctic 2000 HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY RIVER/RIVIÈRE -122.4022 68.7542 Water Temperature 6.9500 °C #1f77b4
4 Northwest Territories arctic 2002 HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY RIVER/RIVIÈRE -122.4022 68.7542 pH 7.8550 PH #1f77b4

Summarization#

In our analysis, we also explore the provincial average of the parameter concentrations (values). The dataframe presented below offers a condensed view of the previous dataset, presenting the provincial average values for each year concerning the five selected parameters. For viewing purposes, only data for Alberta is displayed.

(1024, 5)
PROVINCE VARIABLE YEAR UNIT VALUE
0 Alberta Dissolved Oxygen 2000 mg/L 10.402069
1 Alberta Dissolved Oxygen 2001 mg/L 10.925248
2 Alberta Dissolved Oxygen 2002 mg/L 10.539742
3 Alberta Dissolved Oxygen 2003 mg/L 10.522747
4 Alberta Dissolved Oxygen 2004 mg/L 10.126394

Final Deliverable#

Line Graph#

Here, we utilize df_summary to create a line graph, which encompasses provincial average data of parameter concentrations from 2000 to 2019. Please be aware that certain data points are missing; the presence of discontinued lines in the plot indicates the absence of particular data.

Overall, parameter concentrations and values appear to remain relatively stable. While there are fluctuations on a yearly basis, there isn’t a significant change in the overall trend. One noteworthy observation is that Quebec exhibits a notably high concentration of dissolved phosphorus compared to other provinces.

Data-Plotted Map#

Now, we proceed to create a map with data plotted based on density. In this graph, the size of the bubbles will represent the parameter concentration or values. To facilitate comparison, we’ve categorized all parameter values into three groups: < first quartile, > Second quartile (median), and > third quartile. We will begin by developing functions that enable us to categorize the parameter values uniformly.

  • Data points in the < First quartile category represent the lower 25% of the value spectrum.

  • Data points in the > Second quartile category cover the 25% to 75% range of the value spectrum.

  • Data points in the > Third quartile category encompass the upper 25% of the value spectrum.

pH#

Dissolved Oxygen#

Water Temperature#

Combined Map with iPywidget#

With the assistance of ipywidgets, a dropdown menu has been incorporated to facilitate switching between parameters on a map diagram.

The code for generating the mapped figure with dropdowns is concealed below. Please expand to view.

Hide code cell source
# Define a function to assign category to parameter values
def create_scale(all_values, value):
    small = np.percentile(value, 25)
    large = np.percentile(value, 75)
    if (all_values <= small): return "< First Percentile"
    elif (all_values > small) & (all_values <= large): return "> First Percentile"
    elif (all_values >= large): return "> Third Percentile"

# Apply the scale to associated values 
def apply_scale(all_values):
    num_small = 5
    num_medium = 10
    num_large = 20
    if (all_values == "< First Percentile"): return num_small
    elif (all_values == "> First Percentile"): return num_medium
    elif (all_values == "> Third Percentile"): return num_large

def add_category(df):
    df["Category"] = "NA"
    df["Scale"] = np.NaN
    for variable in df["VARIABLE"].unique():
        mydf = df[df["VARIABLE"] == variable]
        quantile_variable = mydf.VALUE.astype(float)
        for index, row in df.iterrows():
            if row.VARIABLE == variable:
                myval = row.VALUE
                category = create_scale(myval, quantile_variable)
                df.at[index, "Category"] = category
                scaled_value = apply_scale(category)
                df.at[index, "Scale"] = scaled_value
    return df

# Create a scatterplot 
def create_map_with_widget(df):
    add_category(df)
    years = df["YEAR"].unique().tolist()

    hovertemplated="<br>".join([
        "<b>Province:  %{customdata[0]}</b>",
        "Longitude: %{customdata[2]}",
        "Latitude: %{customdata[1]}",
        "Location: %{customdata[3]}",
        "Variable: %{customdata[4]}",
        "Value: %{customdata[5]}",
        "Unit: %{customdata[6]}",
        "Category: %{customdata[7]}"])

    frames = [{"name":"frame_{}".format(year),
               "data":[{"type":"scattermapbox",
                        "lat":df.loc[df["YEAR"] == year]["LATITUDE"],
                        "lon":df.loc[df["YEAR"] == year]["LONGITUDE"],
                        "marker":go.scattermapbox.Marker(
                            size=df.loc[df["YEAR"] == year]["Scale"],
                            color=df.loc[df["YEAR"] == year]["COLOR"]),
                        "customdata":np.stack((df.loc[df["YEAR"] == year]["PROVINCE"],
                                   df.loc[df["YEAR"] == year]["LATITUDE"],
                                   df.loc[df["YEAR"] == year]["LONGITUDE"],
                                   df.loc[df["YEAR"] == year]["LOCATION"],
                                   df.loc[df["YEAR"] == year]["VARIABLE"],
                                   df.loc[df["YEAR"] == year]["VALUE"],
                                   df.loc[df["YEAR"] == year]["UNIT"],
                                   df.loc[df["YEAR"] == year]["Category"]), axis=1),
                        "hovertemplate": hovertemplated + "<extra></extra>"}]} for year in years]

    sliders = [{"transition":{"duration":0}, "x":0.08, "len":0.9,
                "currentvalue":{"font":{"size":12},"prefix":"Year: ", "visible":True, "xanchor":"left"},
                "steps":[{"label":year, "method":"animate",
                          "args":[["frame_{}".format(year)],
                                  {'mode':'immediate', 'frame':{'duration':100, 'redraw': True}, 
                                   'transition':{'duration':50}}]} for year in years]}]

    play_button = [{'type':'buttons','showactive':True,'x':0.045, 'y':-0.1,
                    'buttons':[{'label':'►','method':'animate',
                                'args':[None,{'frame':{'duration':100, 'redraw':True},
                                              'transition':{'duration':65},
                                              'fromcurrent':True,'mode':'immediate'}]}]}]

    layout = go.Layout(sliders=sliders, updatemenus=play_button,
                       mapbox = {'style': "carto-positron", 'center': {'lon':-98, 'lat': 58},'zoom': 2.5})

    myfig = go.Figure(data=frames[0]["data"], layout=layout, frames=frames)
    myfig.update_layout(width=800, height=500, margin=dict(l=20, r=20, t=0, b=20))
    myfig.show()

# Link the ipywidget with parameters and created scatterplot
variables = widgets.Dropdown(options=sorted(df.VARIABLE.unique()), 
                             value=df["VARIABLE"].unique().tolist()[0], description="Variable: ")

def created_map_with_widget(change):
    IPython.display.clear_output(wait=True)
    df1 = df.loc[df["VARIABLE"] == variables.value].copy().sort_values('YEAR')
    display(variables)
    create_map_with_widget(df1)


variables.observe(created_map_with_widget, names="value")
created_map_with_widget("")

References#