Water Quality in Canada#
Jenny Lee, September 2022
Note
Kindly note that this web page solely offers a synopsis of the Water Quality in Canada project. The comprehensive set of scripts and codes used in the project is available in the corresponding GitHub repository.
Project Outline#
Data science toolbox used in this project
Data analysis
Data cleaning and wrangling
Interactive visualizations with
Plotly
In this project, long-term monitoring data from Statistics Canada[1] will be utilized. To assess long-term water quality, Statistics Canada gathered data on 145 distinct parameters from 21 observation sites.
Among the 145 distinct water quality parameters, a detailed investigation will focus on five specific parameters: pH, dissolved oxygen, water temperature, dissolved sulphate, and dissolved phosphorus. For a deeper understanding of each parameter, please refer to the table below.
Parameter |
Description |
---|---|
pH |
A measure of the hydrogen ion activity in a solution. It indicates how acidic or basic water is. |
Water Temperature |
Variances in water temperature influence the types of aquatic organisms present, triggering chain reactions in aquatic ecosystems. |
Dissolved Oxygen |
Crucial for aquatic organisms’ respiration, low dissolved oxygen levels can lead to increased mortality of fish eggs and harm aquatic ecosystems. |
Dissolved Sulphate |
The oxidized form of sulphur, essential for biological processes, but high concentrations can be detrimental to aquatic organisms. |
Dissolved Phosphorus |
Essential for plant and animal growth, high concentrations can accelerate vegetation and algae growth, disrupting aquatic ecosystems. |
Description from the table was retrieved from British Columbia government website[2].
Overall Process#
Data Preprocessing#
Since the entire dataset will be quite large, let’s first look into data from a single location.
(92418, 11)
SITE_NO | DATE_TIME_HEURE | FLAG_MARQUEUR | VALUE_VALEUR | SDL_LDE | MDL_LDM | VMV_CODE | UNIT_UNITE | VARIABLE | VARIABLE_FR | STATUS_STATUT | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | BC08NL0001 | 2000-01-11 09:15 | NaN | 0.0170 | 0.0020 | NaN | 100216 | MG/L | ALUMINUM TOTAL | ALUMINIUM TOTAL | P |
1 | BC08NL0001 | 2000-01-11 09:15 | NaN | 0.0003 | 0.0001 | NaN | 100250 | MG/L | ARSENIC TOTAL | ARSENIC TOTAL | P |
2 | BC08NL0001 | 2000-01-11 09:15 | NaN | 19.6000 | 0.0100 | NaN | 100493 | UG/L | BARIUM EXTRACTABLE | BARYUM EXTRACTIBLE | P |
3 | BC08NL0001 | 2000-01-11 09:15 | NaN | 0.0192 | 0.0002 | NaN | 100217 | MG/L | BARIUM TOTAL | BARYUM TOTAL | P |
4 | BC08NL0001 | 2000-01-11 09:15 | < | 0.0020 | 0.0020 | NaN | 100474 | UG/L | BERYLLIUM EXTRACTABLE | BÉRYLLIUM EXTRACTIBLE | P |
Note that the dataset comprises 92,418 rows and 11 columns. Given the substantial size of the data when compiled from all 21 locations, we decided to selectively choose specific columns and rows of interest before proceeding further.
Now, the process involves gathering data from all locations. It’s important to note that only five parameters are being used, and these will be filtered during the data cleaning process. The displayed data frame below has undergone extensive preprocessing, and you can find the corresponding codes in the project GitHub repository. For viewing purposes, only data for Northwest Territories is displayed.
(14976, 11)
PROVINCE | LOCATION | YEAR | SITE_NAME | SITE_TYPE | LONGITUDE | LATITUDE | VARIABLE | VALUE | UNIT | COLOR | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Northwest Territories | arctic | 2000 | HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY | RIVER/RIVIÈRE | -122.4022 | 68.7542 | pH | 7.9825 | PH | #1f77b4 |
1 | Northwest Territories | arctic | 2000 | HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY | RIVER/RIVIÈRE | -122.4022 | 68.7542 | Dissolved Phosphorus | 0.0070 | mg/L | #1f77b4 |
2 | Northwest Territories | arctic | 2000 | HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY | RIVER/RIVIÈRE | -122.4022 | 68.7542 | Dissolved Sulphate | 10.0500 | mg/L | #1f77b4 |
3 | Northwest Territories | arctic | 2000 | HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY | RIVER/RIVIÈRE | -122.4022 | 68.7542 | Water Temperature | 6.9500 | °C | #1f77b4 |
4 | Northwest Territories | arctic | 2002 | HORNADAY RIVER BELOW UNNAMED EASTERN TRIBUTARY | RIVER/RIVIÈRE | -122.4022 | 68.7542 | pH | 7.8550 | PH | #1f77b4 |
Summarization#
In our analysis, we also explore the provincial average of the parameter concentrations (values). The dataframe presented below offers a condensed view of the previous dataset, presenting the provincial average values for each year concerning the five selected parameters. For viewing purposes, only data for Alberta is displayed.
(1024, 5)
PROVINCE | VARIABLE | YEAR | UNIT | VALUE | |
---|---|---|---|---|---|
0 | Alberta | Dissolved Oxygen | 2000 | mg/L | 10.402069 |
1 | Alberta | Dissolved Oxygen | 2001 | mg/L | 10.925248 |
2 | Alberta | Dissolved Oxygen | 2002 | mg/L | 10.539742 |
3 | Alberta | Dissolved Oxygen | 2003 | mg/L | 10.522747 |
4 | Alberta | Dissolved Oxygen | 2004 | mg/L | 10.126394 |
Final Deliverable#
Line Graph#
Here, we utilize df_summary
to create a line graph, which encompasses provincial average data of parameter concentrations from 2000 to 2019. Please be aware that certain data points are missing; the presence of discontinued lines in the plot indicates the absence of particular data.
Overall, parameter concentrations and values appear to remain relatively stable. While there are fluctuations on a yearly basis, there isn’t a significant change in the overall trend. One noteworthy observation is that Quebec exhibits a notably high concentration of dissolved phosphorus compared to other provinces.
Data-Plotted Map#
Now, we proceed to create a map with data plotted based on density. In this graph, the size of the bubbles will represent the parameter concentration or values. To facilitate comparison, we’ve categorized all parameter values into three groups: < first quartile, > Second quartile (median), and > third quartile. We will begin by developing functions that enable us to categorize the parameter values uniformly.
Data points in the
< First quartile
category represent the lower 25% of the value spectrum.Data points in the
> Second quartile
category cover the 25% to 75% range of the value spectrum.Data points in the
> Third quartile
category encompass the upper 25% of the value spectrum.
pH#
Dissolved Oxygen#
Water Temperature#
Combined Map with iPywidget
#
With the assistance of ipywidgets, a dropdown menu has been incorporated to facilitate switching between parameters on a map diagram.
The code for generating the mapped figure with dropdowns is concealed below. Please expand to view.
Show code cell source
# Define a function to assign category to parameter values
def create_scale(all_values, value):
small = np.percentile(value, 25)
large = np.percentile(value, 75)
if (all_values <= small): return "< First Percentile"
elif (all_values > small) & (all_values <= large): return "> First Percentile"
elif (all_values >= large): return "> Third Percentile"
# Apply the scale to associated values
def apply_scale(all_values):
num_small = 5
num_medium = 10
num_large = 20
if (all_values == "< First Percentile"): return num_small
elif (all_values == "> First Percentile"): return num_medium
elif (all_values == "> Third Percentile"): return num_large
def add_category(df):
df["Category"] = "NA"
df["Scale"] = np.NaN
for variable in df["VARIABLE"].unique():
mydf = df[df["VARIABLE"] == variable]
quantile_variable = mydf.VALUE.astype(float)
for index, row in df.iterrows():
if row.VARIABLE == variable:
myval = row.VALUE
category = create_scale(myval, quantile_variable)
df.at[index, "Category"] = category
scaled_value = apply_scale(category)
df.at[index, "Scale"] = scaled_value
return df
# Create a scatterplot
def create_map_with_widget(df):
add_category(df)
years = df["YEAR"].unique().tolist()
hovertemplated="<br>".join([
"<b>Province: %{customdata[0]}</b>",
"Longitude: %{customdata[2]}",
"Latitude: %{customdata[1]}",
"Location: %{customdata[3]}",
"Variable: %{customdata[4]}",
"Value: %{customdata[5]}",
"Unit: %{customdata[6]}",
"Category: %{customdata[7]}"])
frames = [{"name":"frame_{}".format(year),
"data":[{"type":"scattermapbox",
"lat":df.loc[df["YEAR"] == year]["LATITUDE"],
"lon":df.loc[df["YEAR"] == year]["LONGITUDE"],
"marker":go.scattermapbox.Marker(
size=df.loc[df["YEAR"] == year]["Scale"],
color=df.loc[df["YEAR"] == year]["COLOR"]),
"customdata":np.stack((df.loc[df["YEAR"] == year]["PROVINCE"],
df.loc[df["YEAR"] == year]["LATITUDE"],
df.loc[df["YEAR"] == year]["LONGITUDE"],
df.loc[df["YEAR"] == year]["LOCATION"],
df.loc[df["YEAR"] == year]["VARIABLE"],
df.loc[df["YEAR"] == year]["VALUE"],
df.loc[df["YEAR"] == year]["UNIT"],
df.loc[df["YEAR"] == year]["Category"]), axis=1),
"hovertemplate": hovertemplated + "<extra></extra>"}]} for year in years]
sliders = [{"transition":{"duration":0}, "x":0.08, "len":0.9,
"currentvalue":{"font":{"size":12},"prefix":"Year: ", "visible":True, "xanchor":"left"},
"steps":[{"label":year, "method":"animate",
"args":[["frame_{}".format(year)],
{'mode':'immediate', 'frame':{'duration':100, 'redraw': True},
'transition':{'duration':50}}]} for year in years]}]
play_button = [{'type':'buttons','showactive':True,'x':0.045, 'y':-0.1,
'buttons':[{'label':'►','method':'animate',
'args':[None,{'frame':{'duration':100, 'redraw':True},
'transition':{'duration':65},
'fromcurrent':True,'mode':'immediate'}]}]}]
layout = go.Layout(sliders=sliders, updatemenus=play_button,
mapbox = {'style': "carto-positron", 'center': {'lon':-98, 'lat': 58},'zoom': 2.5})
myfig = go.Figure(data=frames[0]["data"], layout=layout, frames=frames)
myfig.update_layout(width=800, height=500, margin=dict(l=20, r=20, t=0, b=20))
myfig.show()
# Link the ipywidget with parameters and created scatterplot
variables = widgets.Dropdown(options=sorted(df.VARIABLE.unique()),
value=df["VARIABLE"].unique().tolist()[0], description="Variable: ")
def created_map_with_widget(change):
IPython.display.clear_output(wait=True)
df1 = df.loc[df["VARIABLE"] == variables.value].copy().sort_values('YEAR')
display(variables)
create_map_with_widget(df1)
variables.observe(created_map_with_widget, names="value")
created_map_with_widget("")