# Understanding Air Quality in Canada and Its Impact on Environment
Jenny Lee, September 2022

```{note}
Kindly note that this web page solely offers a synopsis of the **Air Quality in Canada** project. The comprehensive set of scripts and codes used in the project is available in the corresponding [GitHub repository](https://github.com/jlee2843/data-visualizations-portfolio/tree/main/air-quality).
```

## Project Outline

In [27]:
import pandas as pd 
import numpy as np
import IPython
from IPython.display import YouTubeVideo
from IPython.display import HTML
from myst_nb import glue
from schemdraw.flow import *
from schemdraw import flow
import plotly.io as pio
import plotly
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
import plotly.offline as py
from IPython import display
import scipy.stats as stats

```{admonition} Data science toolbox used in this project
:class: tip, dropdown
- Data analysis
- Data cleaning and wrangling
- Interactive visualizations with `Plotly`
```

In this project, I integrated three different types of datasets: air pollutant emissions, wildfires in Canada, and Canadian population estimates. The original data sources were as follows:

- **Air Pollution Emissions Across Provinces**: Canada's Air Pollutant Emissions Inventory[^ref1]
- **Wildfires in Canada**: Canadian National Fire Database[^ref2]
- **Canadian Population Estimates**: Population Estimates from Statistics Canada[^ref3]

The Government of Canada[^ref4]  had identified sulfur oxides, nitrogen oxides, volatile organic compounds, particulate matter, carbon monoxide, ammonia, and ground-level ozone as the most common air contaminants in Canada.

Canada's Air Pollutant Inventory[^ref5] listed a total of thirteen air pollutants. For this project, the focus was on sulfur oxides, nitrogen oxides, volatile organic compounds, carbon monoxide, and ammonia—the pollutants recognized as the most common in Canada.

## Overall Process

### Gathering Provincial Pollutant Emissions per Capita

To investigate the air quality in Canada from the past to the present, we start by identifying pollutants that significantly impact air quality and health. We have chosen $\text{NO}_X$, $\text{NH}_3$, $\text{CO}$, $\text{SO}_X$, and $\text{VOC}$ as major pollutants for closer examination. 

First, we start by gathering data from Statistics Canada's Air Pollutant Emissions Inventory [^ref1] across all provinces in Canada. Data is collected from 1990 to 2020. 

In [5]:
pd.read_csv("projects/environmental-health/csv/provincial_df.csv")

Unnamed: 0,Province,Pollutant,1990,1991,1992,1993,1994,1995,1996,1997,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020
0,AB,NH3,95261.64,96392.61,100809.7,102888.8,108909.3,117873.2,123629.1,126274.3,...,132641.5,137630.6,139707.6,141095.1,139937.6,132619.5,124020.9,130492.1,131440.5,133161.8
1,AB,CO,1810389.0,1674240.0,1694347.0,1670384.0,1713860.0,1699242.0,1707362.0,1747507.0,...,1015160.0,1020173.0,1044806.0,1083451.0,985516.9,950438.0,1009270.0,1001194.0,992062.8,890235.1
2,AB,SOX,512405.2,524144.2,564717.8,571473.9,594900.2,569212.6,556679.1,522989.3,...,343134.3,333877.8,314187.7,291592.5,259725.4,239568.8,241440.0,225758.9,221287.8,182536.3
3,AB,NOX,613284.3,585705.3,609463.6,640884.4,692547.1,718124.9,749396.5,815873.3,...,691938.5,649050.7,643710.3,656735.7,630979.6,603268.9,627271.9,625976.4,625636.7,567697.2
4,AB,VOC,643752.2,624633.8,642477.6,654384.4,669557.5,679792.4,708522.4,678362.3,...,478680.3,518824.5,556411.1,572615.3,517421.8,476740.1,474301.4,501074.8,492636.6,456932.6
5,BC,NH3,22995.99,22310.89,22883.12,23423.89,23446.15,24276.14,24313.75,24261.03,...,19673.45,19597.05,19957.01,19406.65,19454.27,19898.55,20144.79,20962.11,21088.67,22697.67
6,BC,CO,2340344.0,2353114.0,2155700.0,2363510.0,2295990.0,2238614.0,2110270.0,2027979.0,...,736230.4,737041.8,661969.6,660414.8,630995.6,660545.0,681319.9,693402.9,679666.7,645152.3
7,BC,SOX,110747.4,90689.85,79350.32,77422.06,88186.66,82358.97,78644.31,82107.32,...,71221.32,73210.98,75419.82,75847.22,62540.8,68443.73,71916.75,73356.02,68922.92,69915.39
8,BC,NOX,285967.7,270828.1,262126.9,271899.1,294263.9,296947.8,306607.3,316168.5,...,231269.5,225107.5,228222.3,222718.9,213184.7,210834.8,215964.2,223692.2,220584.7,209468.8
9,BC,VOC,374973.2,376762.9,355183.1,390269.2,376905.4,379777.0,364154.1,354883.3,...,167721.3,165183.6,166548.6,159386.5,156313.0,144381.4,142284.1,141676.0,139949.6,131211.0


Next, we examine provincial emissions of air pollutants per capita. To achieve this, we first need to merge the provincial air pollutants data and population per capita data[^ref3].

We conduct additional data cleaning on the two datasets by renaming some columns. The resulting dataset encompasses provincial air pollutant emissions and population estimates for each year from 1990 to 2020. Subsequently, we divide air pollutant emissions' `Value` by population's `Per Capita` to obtain emission `Per Capita`.

Then, we take a look into the mean and air pollutants across Canadian provinces per capita. 

In [13]:
pd.read_csv("projects/environmental-health/csv/pollutant_capita_mean.csv")

Unnamed: 0,Province,Pollutant,Mean
0,AB,CO,0.426834
1,AB,NH3,0.038578
2,AB,NOX,0.209131
3,AB,SOX,0.13182
4,AB,VOC,0.179616
5,BC,CO,0.340157
6,BC,NH3,0.005356
7,BC,NOX,0.06406
8,BC,SOX,0.018102
9,BC,VOC,0.063772


### Enviromental Impact Caused By Air Pollution

To investigate the environmental implications of air pollution, we examine the frequency of wildfires in Canada and study how their occurrences are related to air pollution.

The Canadian National Fire Database[^ref2] provides diverse information on wildfires in Canada. We will explore the total number of wildfires and the total area burned by province. Using this information, we can assess whether the `Number of Fires` or `Area Burned (hectares)` correlates better with PM2.5 emissions.

Below are the resulting dataframes after undergoing some data cleaning.

In [21]:
pm25_fire = pd.read_csv("projects/environmental-health/csv/pm25_fire.csv").drop("Unnamed: 0", axis=1)
pm25_fire

Unnamed: 0,Province,Total Emission,Number of Fire,Area Burned (hectre)
0,AB,13308220.0,38015.0,5923754.0
1,BC,2600960.0,56698.0,5123188.0
2,MB,3077027.0,10953.0,6672828.0
3,NB,621120.7,9985.0,29089.4
4,NL,487498.2,2937.0,414891.7
5,NS,645516.9,8841.0,21594.8
6,ON,5625880.0,35041.0,4865412.0
7,PE,123533.5,11.0,21.60128
8,QC,3890464.0,21222.0,8173951.0
9,SK,14223730.0,16964.0,14885770.0


## Final Deliverable

First, we will examine a **stacked bar graph** illustrating emissions per capita across all provinces. This graph also enables us to observe annual variations in pollutant emissions. We have incorporated an interactive feature using the `Plotly` library. To explore a specific year, simply drag the button on the slider bar located at the bottom of the graph.

In [17]:
with open('projects/environmental-health/figure/stacked_bar.json', 'r') as f:
    fig = pio.from_json(f.read())
fig.show()

Second, we construct a **bar graph** to illustrate the cumulative contribution of air pollutants for each province.

In [18]:
with open('projects/environmental-health/figure/bar_fig.json', 'r') as f:
    fig = pio.from_json(f.read())
fig.show()

Lastly, we create a **pie graph** to visualize provincial contributions to total air pollutant emissions. This pie graph facilitates a simpler observation of the provincial breakdown of pollutant emissions.

In [19]:
with open('projects/environmental-health/figure/pie_fig.json', 'r') as f:
    fig = pio.from_json(f.read())
fig.show()

Now we examine the combined wildfire and PM2.5 data. Here, we've created subplots of scatterplots to illustrate any relationships present between the two datasets. We observe similar trends in the wildfire data and PM2.5 emission data, as expected. However, what do these patterns truly signify?

In [20]:
with open('projects/environmental-health/figure/fire_fig.json', 'r') as f:
    fig = pio.from_json(f.read())
fig.show()

In [25]:
# calculating correlational coefficient for the occurance of wildfire and PM2.5 emission.
corr, pval=stats.pearsonr(pm25_fire["Total Emission"],pm25_fire["Area Burned (hectre)"])
print("Correlation coefficient for total area burnt and total PM2.5 emission: " + str(corr))
corr, pval=stats.pearsonr(pm25_fire["Total Emission"],pm25_fire["Number of Fire"])
print("Correlation coefficient for number of fire and total PM2.5 emission: " + str(corr))

Correlation coefficient for total area burnt and total PM2.5 emission: 0.7947497297236447
Correlation coefficient for number of fire and total PM2.5 emission: 0.38717314196102626


The code above has confirmed that the total number of wildfires and PM2.5 emissions exhibit a correlation value of $0.795$, whereas the total area burnt by wildfires and PM2.5 emissions show a correlation value of $0.387$. Therefore, it can be concluded that the total area burnt by wildfires is a better predictor for PM2.5 emissions.

## References
[^ref1]: Environment and Climate Change Canada. (n.d.). Canada's Air Pollutant Emissions Inventory. Retrieved September 2021, from https://data.ec.gc.ca/data/substances/monitor/canada-s-air-pollutant-emissions-inventory/APEI_Tables_Canada_Provinces_Territories/?lang
[^ref2]: Canadian Forest Service. (n.d.). Canadian National Fire Database. Retrieved September 2021, from https://cwfis.cfs.nrcan.gc.ca/ha/nfdb
[^ref3]: Statistics Canada. (n.d.). Population Estimates. Retrieved September 2021, from https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1710000901
[^ref4]: Government of Canada. (n.d.). Common Contaminants - Air Pollution. Retrieved September 2021, from https://www.canada.ca/en/environment-climate-change/services/air-pollution/pollutants/common-contaminants.html
[^ref5]: Environment and Climate Change Canada. (n.d.). Canada's Air Pollutant Inventory. Retrieved September 2021, from https://data.ec.gc.ca/data/substances/monitor/canada-s-air-pollutant-emissions-inventory/APEI_Tables_Canada_Provinces_Territories/?lang