Get started with open reproducible science! (API version) 🌏 📈¶


Define Open Reproducible Science: Open reproduceible science makes the process of scientific research available to all in a way that anyone can access, understand and iterate the process. This allows to share and colaborate the methods,data, and outcomes easily with transparency. Simply, open reproducible science is a way of doing research which others can access,understand and replicate without any boundation. It makes the entire process transparent and ensures that results are available freely.

Choose one of the open source tools that you have learned about (i.e. Shell, Git/GitHub, Jupyter Notebook, Python) and explain how it supports open reproducible science: Github provides a platform where we can share and collaborate on our code and discuss issues about it. This course provided me to understand and learn hands on python code with applications in Geospatial domain via Git which is shareable, replicable, and available freely.

Does this Jupyter Notebook file have a machine-readable name? Explain your answer.
Yes, this Jupyter notebook has a machine readable name Reproducible Science! with the extension .ipynb. However, to make the file name easy to read and parse, we should avoid other special charecters except dash or underscore to separate words. The filename has been changed to Bangalore_TimeSeries.ipynb by removing the special charecter '!' and using underscore between the names.

Suggestions for creating readable, well-documented scientific workflows that are easier to reproduce¶

I can write clean code by:
a) Following the standard syntax
b) Creating meaningful variable and function names
c) Using modular programming
d) Proper documentation

Advantages of clean code include:
a) Easy to find logical errors in dev environment
b) Easy to maintain the code in production environment
c) Easily readable and understandable
d) Makes the code reusable
e) Helps in code fixes and further enhancements

Assignment : Climate Coding Challenge¶


Study Area: Bengaluru, India
Bengaluru also known as Bangalore lies in the southeast of the South Indian state of Karnataka at an elevation of 921 m. It covers an area of 741 square kms. Bangalore has a tropical savanna climate with distinct wet and dry seasons. Due to its high elevation, Bangalore has a moderate climate throughout the year, although occasional heat waves can make summer hotter.

I am using data from the NOAA NCEI Climate Data Online network (network ID: GHCND:IN009010100).
image.png

Data: Precipitation and Temperature Dataset from 1901-2024

API used : National Centers for Environmental Information (NCEI), NOAA website
url: https://www.ncei.noaa.gov/

In [1]:
import pandas as pd 
In [2]:
bng_ind_url = ('https://www.ncei.noaa.gov/access/services/da'
    'ta/v1?dataset=daily-summaries&dataTypes=TAVG,TMAX,TMIN,PRCP'
    '&stations=IN009010100&startDate=1901-01-01&endDate=2024-04-15&'
    'includeStationName=true&includeStation'
    'Location=1&units=metric')
bng_ind_url
Out[2]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TAVG,TMAX,TMIN,PRCP&stations=IN009010100&startDate=1901-01-01&endDate=2024-04-15&includeStationName=true&includeStationLocation=1&units=metric'
In [4]:
bgl_ind_df = pd.read_csv(
  bng_ind_url,index_col='DATE',parse_dates=True,na_values=['NaN'])
bgl_ind_df
Out[4]:
STATION NAME LATITUDE LONGITUDE ELEVATION PRCP TAVG TMAX TMIN
DATE
1901-01-01 IN009010100 BANGALORE, IN 12.967 77.583 921.0 0.0 NaN NaN NaN
1901-01-02 IN009010100 BANGALORE, IN 12.967 77.583 921.0 0.0 NaN NaN NaN
1901-01-03 IN009010100 BANGALORE, IN 12.967 77.583 921.0 0.0 NaN NaN NaN
1901-01-04 IN009010100 BANGALORE, IN 12.967 77.583 921.0 0.0 NaN NaN NaN
1901-01-05 IN009010100 BANGALORE, IN 12.967 77.583 921.0 0.0 NaN NaN NaN
... ... ... ... ... ... ... ... ... ...
2024-04-11 IN009010100 BANGALORE, IN 12.967 77.583 921.0 NaN 29.0 36.0 21.6
2024-04-12 IN009010100 BANGALORE, IN 12.967 77.583 921.0 NaN 28.9 35.8 22.6
2024-04-13 IN009010100 BANGALORE, IN 12.967 77.583 921.0 NaN 29.3 34.6 22.5
2024-04-14 IN009010100 BANGALORE, IN 12.967 77.583 921.0 NaN 29.6 35.4 22.3
2024-04-15 IN009010100 BANGALORE, IN 12.967 77.583 921.0 NaN 28.9 35.4 22.0

43275 rows × 9 columns

In [5]:
# Check that the data was imported into a pandas DataFrame
type(bgl_ind_df)
Out[5]:
pandas.core.frame.DataFrame
In [6]:
bgl_ind_df = bgl_ind_df[['PRCP', 'TAVG','TMAX','TMIN']]
bgl_ind_df
Out[6]:
PRCP TAVG TMAX TMIN
DATE
1901-01-01 0.0 NaN NaN NaN
1901-01-02 0.0 NaN NaN NaN
1901-01-03 0.0 NaN NaN NaN
1901-01-04 0.0 NaN NaN NaN
1901-01-05 0.0 NaN NaN NaN
... ... ... ... ...
2024-04-11 NaN 29.0 36.0 21.6
2024-04-12 NaN 28.9 35.8 22.6
2024-04-13 NaN 29.3 34.6 22.5
2024-04-14 NaN 29.6 35.4 22.3
2024-04-15 NaN 28.9 35.4 22.0

43275 rows × 4 columns

Ploting the precpitation column (PRCP) and Temperature columns (TAVG,TMAX,TMIN) vs time

In [7]:
bgl_ind_df.plot()
Out[7]:
<Axes: xlabel='DATE'>
No description has been provided for this image

Ploting the precpitation column (PRCP) vs time

In [8]:
bgl_ind_df.plot(
    y='PRCP',
    title='Bengaluru-Precipitation',
    xlabel='Date ',
    ylabel='Precipitation in (mm)')
Out[8]:
<Axes: title={'center': 'Bengaluru-Precipitation'}, xlabel='Date ', ylabel='Precipitation in (mm)'>
No description has been provided for this image

Plotting Average Temperature vs Time

In [9]:
# Plot the temperature vs time
bgl_ind_df.plot(
    y='TAVG',
    title='Bengaluru Average Temperature',
    xlabel='Date ',
    ylabel='Temperature in (Degree Celcius)'
)
Out[9]:
<Axes: title={'center': 'Bengaluru Average Temperature'}, xlabel='Date ', ylabel='Temperature in (Degree Celcius)'>
No description has been provided for this image

Converting TAVG temperature values to Fahrenheit from Celcius

In [10]:
# Convert to Fahrenheit from Celcius 
# Considering the average temperature (TAVG)
bgl_ind_df.loc[:,'TFah'] = (bgl_ind_df.loc[:,'TAVG'] * (9/5) ) + 32 
bgl_ind_df
/tmp/ipykernel_898/1808574486.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  bgl_ind_df.loc[:,'TFah'] = (bgl_ind_df.loc[:,'TAVG'] * (9/5) ) + 32
Out[10]:
PRCP TAVG TMAX TMIN TFah
DATE
1901-01-01 0.0 NaN NaN NaN NaN
1901-01-02 0.0 NaN NaN NaN NaN
1901-01-03 0.0 NaN NaN NaN NaN
1901-01-04 0.0 NaN NaN NaN NaN
1901-01-05 0.0 NaN NaN NaN NaN
... ... ... ... ... ...
2024-04-11 NaN 29.0 36.0 21.6 84.20
2024-04-12 NaN 28.9 35.8 22.6 84.02
2024-04-13 NaN 29.3 34.6 22.5 84.74
2024-04-14 NaN 29.6 35.4 22.3 85.28
2024-04-15 NaN 28.9 35.4 22.0 84.02

43275 rows × 5 columns

Writing Python Function to convert temperature values from Celcius to Fahrenheit

In [11]:
def convert_to_fahrenheit(celcius):
    """Convert temperature to fahrenheit"""
    return ((9/5 * celcius) + 32)

bgl_ind_df['fahrenheit_column'] = bgl_ind_df['TAVG'].apply(convert_to_fahrenheit)
bgl_ind_df
/tmp/ipykernel_898/2487292740.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  bgl_ind_df['fahrenheit_column'] = bgl_ind_df['TAVG'].apply(convert_to_fahrenheit)
Out[11]:
PRCP TAVG TMAX TMIN TFah fahrenheit_column
DATE
1901-01-01 0.0 NaN NaN NaN NaN NaN
1901-01-02 0.0 NaN NaN NaN NaN NaN
1901-01-03 0.0 NaN NaN NaN NaN NaN
1901-01-04 0.0 NaN NaN NaN NaN NaN
1901-01-05 0.0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ...
2024-04-11 NaN 29.0 36.0 21.6 84.20 84.20
2024-04-12 NaN 28.9 35.8 22.6 84.02 84.02
2024-04-13 NaN 29.3 34.6 22.5 84.74 84.74
2024-04-14 NaN 29.6 35.4 22.3 85.28 85.28
2024-04-15 NaN 28.9 35.4 22.0 84.02 84.02

43275 rows × 6 columns

Subsetting and Resampling the data (1980-2024)

In [12]:
# Subset the data to look at 1980-2024
bgl_ind_1980_2024_df = bgl_ind_df.loc['1980-01-01':'2024-04']
bgl_ind_1980_2024_df
Out[12]:
PRCP TAVG TMAX TMIN TFah fahrenheit_column
DATE
1980-01-01 0.0 20.0 30.0 14.0 68.00 68.00
1980-01-02 NaN 21.0 28.0 NaN 69.80 69.80
1980-01-03 0.0 21.4 28.0 15.0 70.52 70.52
1980-01-04 0.0 20.7 28.0 15.0 69.26 69.26
1980-01-05 0.0 20.3 27.0 15.0 68.54 68.54
... ... ... ... ... ... ...
2024-04-11 NaN 29.0 36.0 21.6 84.20 84.20
2024-04-12 NaN 28.9 35.8 22.6 84.02 84.02
2024-04-13 NaN 29.3 34.6 22.5 84.74 84.74
2024-04-14 NaN 29.6 35.4 22.3 85.28 85.28
2024-04-15 NaN 28.9 35.4 22.0 84.02 84.02

16036 rows × 6 columns

In [13]:
# Resample the data to look at yearly mean values
bgl_ind_yearly_mean = bgl_ind_1980_2024_df.resample('YS-May').mean()
bgl_ind_yearly_mean
/tmp/ipykernel_898/124792269.py:2: FutureWarning: 'YS-May' is deprecated and will be removed in a future version, please use 'YS-MAY' instead.
  bgl_ind_yearly_mean = bgl_ind_1980_2024_df.resample('YS-May').mean()
Out[13]:
PRCP TAVG TMAX TMIN TFah fahrenheit_column
DATE
1979-05-01 2.986792 24.210744 31.276190 18.041667 75.579339 75.579339
1980-05-01 2.149355 23.731680 29.747604 18.833333 74.717025 74.717025
1981-05-01 1.909635 23.357300 29.555340 18.880427 74.043140 74.043140
1982-05-01 1.599344 24.100284 30.163729 19.156641 75.380511 75.380511
1983-05-01 3.018153 23.490808 29.018000 19.153759 74.283454 74.283454
1984-05-01 1.368301 23.718733 29.847516 18.776254 74.693719 74.693719
1985-05-01 1.292208 23.377500 29.532381 18.859364 74.079500 74.079500
1986-05-01 4.331392 23.689779 29.506462 18.811847 74.641602 74.641602
1987-05-01 2.274760 23.998892 29.813770 19.321453 75.198006 75.198006
1988-05-01 2.764169 23.545326 29.605694 18.791078 74.381586 74.381586
1989-05-01 2.824437 23.769337 29.649836 18.896377 74.784807 74.784807
1990-05-01 1.676897 23.751003 29.561842 19.209025 74.751805 74.751805
1991-05-01 3.917940 23.228169 29.040717 18.465414 73.810704 73.810704
1992-05-01 2.066238 23.389972 29.409627 18.600000 74.101950 74.101950
1993-05-01 2.938801 23.650139 29.374121 18.926736 74.570249 74.570249
1994-05-01 2.274154 23.470914 29.149045 19.011034 74.247645 74.247645
1995-05-01 3.023279 23.840997 29.788000 18.920847 74.913795 74.913795
1996-05-01 2.972277 23.367787 29.450156 19.009894 74.062017 74.062017
1997-05-01 3.340127 24.263085 30.514241 20.082332 75.673554 75.673554
1998-05-01 5.394643 23.964463 29.878466 19.394444 75.136033 75.136033
1999-05-01 3.841799 23.465193 29.458333 19.131987 74.237348 74.237348
2000-05-01 8.882812 23.564110 29.577746 19.056579 74.415397 74.415397
2001-05-01 5.906061 23.779178 29.766181 19.242628 74.802521 74.802521
2002-05-01 2.144444 24.133425 30.470605 19.430968 75.440166 75.440166
2003-05-01 2.027900 24.130411 30.660299 19.349032 75.434740 75.434740
2004-05-01 3.343614 23.390137 29.420950 18.975321 74.102247 74.102247
2005-05-01 8.056771 23.522466 29.568555 19.049684 74.340438 74.340438
2006-05-01 5.482569 23.938904 30.100281 19.279755 75.090027 75.090027
2007-05-01 8.857862 23.570799 29.414571 19.208091 74.427438 74.427438
2008-05-01 7.975172 23.784110 29.924011 19.114634 74.811397 74.811397
2009-05-01 7.068085 24.141758 29.923034 19.673209 75.455165 75.455165
2010-05-01 6.247368 23.710685 29.531476 19.455937 74.679233 74.679233
2011-05-01 5.211429 23.797507 30.409722 19.309091 74.835512 74.835512
2012-05-01 5.314286 24.285753 30.968956 19.567492 75.714356 75.714356
2013-05-01 6.118343 23.774521 30.210137 19.502572 74.794137 74.794137
2014-05-01 7.795122 23.889863 30.270959 19.380328 75.001753 75.001753
2015-05-01 5.883951 24.392623 30.613960 20.054517 75.906721 75.906721
2016-05-01 4.855782 24.260548 30.901290 19.514590 75.668986 75.668986
2017-05-01 9.550000 24.094521 30.370890 19.572188 75.370137 75.370137
2018-05-01 5.368493 24.214521 30.422368 19.727692 75.586137 75.586137
2019-05-01 4.847368 24.361475 30.606977 20.058576 75.850656 75.850656
2020-05-01 6.039785 23.961219 30.132168 19.729043 75.130194 75.130194
2021-05-01 7.696111 23.917534 29.929293 19.930868 75.051562 75.051562
2022-05-01 9.674302 23.647527 29.496497 19.271154 74.565549 74.565549
2023-05-01 6.725532 24.627616 30.688060 20.342424 76.329709 76.329709

Plot your resampled data 📈

In [14]:
# Plot mean annual temperature values
# using hvplot
import hvplot.pandas
(bgl_ind_1980_2024_df.hvplot(
    y='TAVG',
    title='Bengaluru Mean Annual Temperature',
    xlabel='DATE',
    ylabel='Temperature in (C)')
+
bgl_ind_yearly_mean.hvplot(
    y='TAVG',
    title='Bengaluru Mean Annual Temperature',
    xlabel='DATE',
    ylabel='Temperature in (C)',shared_axes=False)).cols(1)
Out[14]:
In [15]:
# Plot Annual Max temperature values
# using hvplot
import hvplot.pandas
(bgl_ind_1980_2024_df.hvplot(
    y='TMAX',
    title='Bengaluru Annual Maximum Temperature',
    xlabel='DATE',
    ylabel='Temperature in (C)')
+
bgl_ind_yearly_mean.hvplot(
    y='TMAX',
    title='Bengaluru Annual Maximum Temperature',
    xlabel='DATE',
    ylabel='Temperature in (C)',shared_axes=False)).cols(1)
Out[15]:

Descrition of the Plot 📈
As seen from the graph, at Bangalore, the mean annual temperature generally hovers around 30 degree celcius. A rise has been seen at 2012 and 2016 to 31 degree celcius. Overall from 1980-2023, the mean annual temperature at Bangalore is between the range from 29 degree celcius to 31 degree celcius.

Temperature rising!! Bangalore

In [1]:
%%capture
%%bash
#Coverting to HTML
jupyter nbconvert Bangalore_TimeSeries.ipynb --to html