Get started with open reproducible science! (API version) 🌏 📈¶
Define Open Reproducible Science: Open reproduceible science makes the process of scientific research available to all in a way that anyone can access, understand and iterate the process. This allows to share and colaborate the methods,data, and outcomes easily with transparency. Simply, open reproducible science is a way of doing research which others can access,understand and replicate without any boundation. It makes the entire process transparent and ensures that results are available freely.
Choose one of the open source tools that you have learned about (i.e. Shell, Git/GitHub, Jupyter Notebook, Python) and explain how it supports open reproducible science: Github provides a platform where we can share and collaborate on our code and discuss issues about it. This course provided me to understand and learn hands on python code with applications in Geospatial domain via Git which is shareable, replicable, and available freely.
Does this Jupyter Notebook file have a machine-readable name? Explain your answer.
Yes, this Jupyter notebook has a machine readable name Reproducible Science! with the extension .ipynb. However, to make the file name easy to read and parse, we should avoid other special charecters except dash or underscore to separate words. The filename has been changed to Bangalore_TimeSeries.ipynb by removing the special charecter '!' and using underscore between the names.
Suggestions for creating readable, well-documented scientific workflows that are easier to reproduce¶
I can write clean code by:
a) Following the standard syntax
b) Creating meaningful variable and function names
c) Using modular programming
d) Proper documentation
Advantages of clean code include:
a) Easy to find logical errors in dev environment
b) Easy to maintain the code in production environment
c) Easily readable and understandable
d) Makes the code reusable
e) Helps in code fixes and further enhancements
Assignment : Climate Coding Challenge¶
Study Area: Bengaluru, India
Bengaluru also known as Bangalore lies in the southeast of the South Indian state of Karnataka at an elevation of 921 m. It covers an area of 741 square kms. Bangalore has a tropical savanna climate with distinct wet and dry seasons. Due to its high elevation, Bangalore has a moderate climate throughout the year, although occasional heat waves can make summer hotter.
I am using data from the NOAA NCEI Climate Data Online network (network ID: GHCND:IN009010100).
Data: Precipitation and Temperature Dataset from 1901-2024
API used : National Centers for Environmental Information (NCEI), NOAA website
url: https://www.ncei.noaa.gov/
import pandas as pd
bng_ind_url = ('https://www.ncei.noaa.gov/access/services/da'
'ta/v1?dataset=daily-summaries&dataTypes=TAVG,TMAX,TMIN,PRCP'
'&stations=IN009010100&startDate=1901-01-01&endDate=2024-04-15&'
'includeStationName=true&includeStation'
'Location=1&units=metric')
bng_ind_url
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TAVG,TMAX,TMIN,PRCP&stations=IN009010100&startDate=1901-01-01&endDate=2024-04-15&includeStationName=true&includeStationLocation=1&units=metric'
bgl_ind_df = pd.read_csv(
bng_ind_url,index_col='DATE',parse_dates=True,na_values=['NaN'])
bgl_ind_df
STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | PRCP | TAVG | TMAX | TMIN | |
---|---|---|---|---|---|---|---|---|---|
DATE | |||||||||
1901-01-01 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | 0.0 | NaN | NaN | NaN |
1901-01-02 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | 0.0 | NaN | NaN | NaN |
1901-01-03 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | 0.0 | NaN | NaN | NaN |
1901-01-04 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | 0.0 | NaN | NaN | NaN |
1901-01-05 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | 0.0 | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2024-04-11 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | NaN | 29.0 | 36.0 | 21.6 |
2024-04-12 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | NaN | 28.9 | 35.8 | 22.6 |
2024-04-13 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | NaN | 29.3 | 34.6 | 22.5 |
2024-04-14 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | NaN | 29.6 | 35.4 | 22.3 |
2024-04-15 | IN009010100 | BANGALORE, IN | 12.967 | 77.583 | 921.0 | NaN | 28.9 | 35.4 | 22.0 |
43275 rows × 9 columns
# Check that the data was imported into a pandas DataFrame
type(bgl_ind_df)
pandas.core.frame.DataFrame
bgl_ind_df = bgl_ind_df[['PRCP', 'TAVG','TMAX','TMIN']]
bgl_ind_df
PRCP | TAVG | TMAX | TMIN | |
---|---|---|---|---|
DATE | ||||
1901-01-01 | 0.0 | NaN | NaN | NaN |
1901-01-02 | 0.0 | NaN | NaN | NaN |
1901-01-03 | 0.0 | NaN | NaN | NaN |
1901-01-04 | 0.0 | NaN | NaN | NaN |
1901-01-05 | 0.0 | NaN | NaN | NaN |
... | ... | ... | ... | ... |
2024-04-11 | NaN | 29.0 | 36.0 | 21.6 |
2024-04-12 | NaN | 28.9 | 35.8 | 22.6 |
2024-04-13 | NaN | 29.3 | 34.6 | 22.5 |
2024-04-14 | NaN | 29.6 | 35.4 | 22.3 |
2024-04-15 | NaN | 28.9 | 35.4 | 22.0 |
43275 rows × 4 columns
Ploting the precpitation column (PRCP) and Temperature columns (TAVG,TMAX,TMIN) vs time
bgl_ind_df.plot()
<Axes: xlabel='DATE'>
Ploting the precpitation column (PRCP) vs time
bgl_ind_df.plot(
y='PRCP',
title='Bengaluru-Precipitation',
xlabel='Date ',
ylabel='Precipitation in (mm)')
<Axes: title={'center': 'Bengaluru-Precipitation'}, xlabel='Date ', ylabel='Precipitation in (mm)'>
Plotting Average Temperature vs Time
# Plot the temperature vs time
bgl_ind_df.plot(
y='TAVG',
title='Bengaluru Average Temperature',
xlabel='Date ',
ylabel='Temperature in (Degree Celcius)'
)
<Axes: title={'center': 'Bengaluru Average Temperature'}, xlabel='Date ', ylabel='Temperature in (Degree Celcius)'>
Converting TAVG temperature values to Fahrenheit from Celcius
# Convert to Fahrenheit from Celcius
# Considering the average temperature (TAVG)
bgl_ind_df.loc[:,'TFah'] = (bgl_ind_df.loc[:,'TAVG'] * (9/5) ) + 32
bgl_ind_df
/tmp/ipykernel_898/1808574486.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy bgl_ind_df.loc[:,'TFah'] = (bgl_ind_df.loc[:,'TAVG'] * (9/5) ) + 32
PRCP | TAVG | TMAX | TMIN | TFah | |
---|---|---|---|---|---|
DATE | |||||
1901-01-01 | 0.0 | NaN | NaN | NaN | NaN |
1901-01-02 | 0.0 | NaN | NaN | NaN | NaN |
1901-01-03 | 0.0 | NaN | NaN | NaN | NaN |
1901-01-04 | 0.0 | NaN | NaN | NaN | NaN |
1901-01-05 | 0.0 | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... |
2024-04-11 | NaN | 29.0 | 36.0 | 21.6 | 84.20 |
2024-04-12 | NaN | 28.9 | 35.8 | 22.6 | 84.02 |
2024-04-13 | NaN | 29.3 | 34.6 | 22.5 | 84.74 |
2024-04-14 | NaN | 29.6 | 35.4 | 22.3 | 85.28 |
2024-04-15 | NaN | 28.9 | 35.4 | 22.0 | 84.02 |
43275 rows × 5 columns
Writing Python Function to convert temperature values from Celcius to Fahrenheit
def convert_to_fahrenheit(celcius):
"""Convert temperature to fahrenheit"""
return ((9/5 * celcius) + 32)
bgl_ind_df['fahrenheit_column'] = bgl_ind_df['TAVG'].apply(convert_to_fahrenheit)
bgl_ind_df
/tmp/ipykernel_898/2487292740.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy bgl_ind_df['fahrenheit_column'] = bgl_ind_df['TAVG'].apply(convert_to_fahrenheit)
PRCP | TAVG | TMAX | TMIN | TFah | fahrenheit_column | |
---|---|---|---|---|---|---|
DATE | ||||||
1901-01-01 | 0.0 | NaN | NaN | NaN | NaN | NaN |
1901-01-02 | 0.0 | NaN | NaN | NaN | NaN | NaN |
1901-01-03 | 0.0 | NaN | NaN | NaN | NaN | NaN |
1901-01-04 | 0.0 | NaN | NaN | NaN | NaN | NaN |
1901-01-05 | 0.0 | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... |
2024-04-11 | NaN | 29.0 | 36.0 | 21.6 | 84.20 | 84.20 |
2024-04-12 | NaN | 28.9 | 35.8 | 22.6 | 84.02 | 84.02 |
2024-04-13 | NaN | 29.3 | 34.6 | 22.5 | 84.74 | 84.74 |
2024-04-14 | NaN | 29.6 | 35.4 | 22.3 | 85.28 | 85.28 |
2024-04-15 | NaN | 28.9 | 35.4 | 22.0 | 84.02 | 84.02 |
43275 rows × 6 columns
Subsetting and Resampling the data (1980-2024)
# Subset the data to look at 1980-2024
bgl_ind_1980_2024_df = bgl_ind_df.loc['1980-01-01':'2024-04']
bgl_ind_1980_2024_df
PRCP | TAVG | TMAX | TMIN | TFah | fahrenheit_column | |
---|---|---|---|---|---|---|
DATE | ||||||
1980-01-01 | 0.0 | 20.0 | 30.0 | 14.0 | 68.00 | 68.00 |
1980-01-02 | NaN | 21.0 | 28.0 | NaN | 69.80 | 69.80 |
1980-01-03 | 0.0 | 21.4 | 28.0 | 15.0 | 70.52 | 70.52 |
1980-01-04 | 0.0 | 20.7 | 28.0 | 15.0 | 69.26 | 69.26 |
1980-01-05 | 0.0 | 20.3 | 27.0 | 15.0 | 68.54 | 68.54 |
... | ... | ... | ... | ... | ... | ... |
2024-04-11 | NaN | 29.0 | 36.0 | 21.6 | 84.20 | 84.20 |
2024-04-12 | NaN | 28.9 | 35.8 | 22.6 | 84.02 | 84.02 |
2024-04-13 | NaN | 29.3 | 34.6 | 22.5 | 84.74 | 84.74 |
2024-04-14 | NaN | 29.6 | 35.4 | 22.3 | 85.28 | 85.28 |
2024-04-15 | NaN | 28.9 | 35.4 | 22.0 | 84.02 | 84.02 |
16036 rows × 6 columns
# Resample the data to look at yearly mean values
bgl_ind_yearly_mean = bgl_ind_1980_2024_df.resample('YS-May').mean()
bgl_ind_yearly_mean
/tmp/ipykernel_898/124792269.py:2: FutureWarning: 'YS-May' is deprecated and will be removed in a future version, please use 'YS-MAY' instead. bgl_ind_yearly_mean = bgl_ind_1980_2024_df.resample('YS-May').mean()
PRCP | TAVG | TMAX | TMIN | TFah | fahrenheit_column | |
---|---|---|---|---|---|---|
DATE | ||||||
1979-05-01 | 2.986792 | 24.210744 | 31.276190 | 18.041667 | 75.579339 | 75.579339 |
1980-05-01 | 2.149355 | 23.731680 | 29.747604 | 18.833333 | 74.717025 | 74.717025 |
1981-05-01 | 1.909635 | 23.357300 | 29.555340 | 18.880427 | 74.043140 | 74.043140 |
1982-05-01 | 1.599344 | 24.100284 | 30.163729 | 19.156641 | 75.380511 | 75.380511 |
1983-05-01 | 3.018153 | 23.490808 | 29.018000 | 19.153759 | 74.283454 | 74.283454 |
1984-05-01 | 1.368301 | 23.718733 | 29.847516 | 18.776254 | 74.693719 | 74.693719 |
1985-05-01 | 1.292208 | 23.377500 | 29.532381 | 18.859364 | 74.079500 | 74.079500 |
1986-05-01 | 4.331392 | 23.689779 | 29.506462 | 18.811847 | 74.641602 | 74.641602 |
1987-05-01 | 2.274760 | 23.998892 | 29.813770 | 19.321453 | 75.198006 | 75.198006 |
1988-05-01 | 2.764169 | 23.545326 | 29.605694 | 18.791078 | 74.381586 | 74.381586 |
1989-05-01 | 2.824437 | 23.769337 | 29.649836 | 18.896377 | 74.784807 | 74.784807 |
1990-05-01 | 1.676897 | 23.751003 | 29.561842 | 19.209025 | 74.751805 | 74.751805 |
1991-05-01 | 3.917940 | 23.228169 | 29.040717 | 18.465414 | 73.810704 | 73.810704 |
1992-05-01 | 2.066238 | 23.389972 | 29.409627 | 18.600000 | 74.101950 | 74.101950 |
1993-05-01 | 2.938801 | 23.650139 | 29.374121 | 18.926736 | 74.570249 | 74.570249 |
1994-05-01 | 2.274154 | 23.470914 | 29.149045 | 19.011034 | 74.247645 | 74.247645 |
1995-05-01 | 3.023279 | 23.840997 | 29.788000 | 18.920847 | 74.913795 | 74.913795 |
1996-05-01 | 2.972277 | 23.367787 | 29.450156 | 19.009894 | 74.062017 | 74.062017 |
1997-05-01 | 3.340127 | 24.263085 | 30.514241 | 20.082332 | 75.673554 | 75.673554 |
1998-05-01 | 5.394643 | 23.964463 | 29.878466 | 19.394444 | 75.136033 | 75.136033 |
1999-05-01 | 3.841799 | 23.465193 | 29.458333 | 19.131987 | 74.237348 | 74.237348 |
2000-05-01 | 8.882812 | 23.564110 | 29.577746 | 19.056579 | 74.415397 | 74.415397 |
2001-05-01 | 5.906061 | 23.779178 | 29.766181 | 19.242628 | 74.802521 | 74.802521 |
2002-05-01 | 2.144444 | 24.133425 | 30.470605 | 19.430968 | 75.440166 | 75.440166 |
2003-05-01 | 2.027900 | 24.130411 | 30.660299 | 19.349032 | 75.434740 | 75.434740 |
2004-05-01 | 3.343614 | 23.390137 | 29.420950 | 18.975321 | 74.102247 | 74.102247 |
2005-05-01 | 8.056771 | 23.522466 | 29.568555 | 19.049684 | 74.340438 | 74.340438 |
2006-05-01 | 5.482569 | 23.938904 | 30.100281 | 19.279755 | 75.090027 | 75.090027 |
2007-05-01 | 8.857862 | 23.570799 | 29.414571 | 19.208091 | 74.427438 | 74.427438 |
2008-05-01 | 7.975172 | 23.784110 | 29.924011 | 19.114634 | 74.811397 | 74.811397 |
2009-05-01 | 7.068085 | 24.141758 | 29.923034 | 19.673209 | 75.455165 | 75.455165 |
2010-05-01 | 6.247368 | 23.710685 | 29.531476 | 19.455937 | 74.679233 | 74.679233 |
2011-05-01 | 5.211429 | 23.797507 | 30.409722 | 19.309091 | 74.835512 | 74.835512 |
2012-05-01 | 5.314286 | 24.285753 | 30.968956 | 19.567492 | 75.714356 | 75.714356 |
2013-05-01 | 6.118343 | 23.774521 | 30.210137 | 19.502572 | 74.794137 | 74.794137 |
2014-05-01 | 7.795122 | 23.889863 | 30.270959 | 19.380328 | 75.001753 | 75.001753 |
2015-05-01 | 5.883951 | 24.392623 | 30.613960 | 20.054517 | 75.906721 | 75.906721 |
2016-05-01 | 4.855782 | 24.260548 | 30.901290 | 19.514590 | 75.668986 | 75.668986 |
2017-05-01 | 9.550000 | 24.094521 | 30.370890 | 19.572188 | 75.370137 | 75.370137 |
2018-05-01 | 5.368493 | 24.214521 | 30.422368 | 19.727692 | 75.586137 | 75.586137 |
2019-05-01 | 4.847368 | 24.361475 | 30.606977 | 20.058576 | 75.850656 | 75.850656 |
2020-05-01 | 6.039785 | 23.961219 | 30.132168 | 19.729043 | 75.130194 | 75.130194 |
2021-05-01 | 7.696111 | 23.917534 | 29.929293 | 19.930868 | 75.051562 | 75.051562 |
2022-05-01 | 9.674302 | 23.647527 | 29.496497 | 19.271154 | 74.565549 | 74.565549 |
2023-05-01 | 6.725532 | 24.627616 | 30.688060 | 20.342424 | 76.329709 | 76.329709 |
Plot your resampled data 📈
# Plot mean annual temperature values
# using hvplot
import hvplot.pandas
(bgl_ind_1980_2024_df.hvplot(
y='TAVG',
title='Bengaluru Mean Annual Temperature',
xlabel='DATE',
ylabel='Temperature in (C)')
+
bgl_ind_yearly_mean.hvplot(
y='TAVG',
title='Bengaluru Mean Annual Temperature',
xlabel='DATE',
ylabel='Temperature in (C)',shared_axes=False)).cols(1)
# Plot Annual Max temperature values
# using hvplot
import hvplot.pandas
(bgl_ind_1980_2024_df.hvplot(
y='TMAX',
title='Bengaluru Annual Maximum Temperature',
xlabel='DATE',
ylabel='Temperature in (C)')
+
bgl_ind_yearly_mean.hvplot(
y='TMAX',
title='Bengaluru Annual Maximum Temperature',
xlabel='DATE',
ylabel='Temperature in (C)',shared_axes=False)).cols(1)
Descrition of the Plot 📈
As seen from the graph, at Bangalore, the mean annual temperature generally hovers around 30 degree celcius. A rise has been seen at 2012 and 2016 to 31 degree celcius. Overall from 1980-2023, the mean annual temperature at Bangalore is between the range from 29 degree celcius to 31 degree celcius.
Temperature rising!! Bangalore
%%capture
%%bash
#Coverting to HTML
jupyter nbconvert Bangalore_TimeSeries.ipynb --to html