Home Uncategorized ASSESSMENT 4

Uncategorized

ASSESSMENT 4

In this Assignment, you will use Python Pandas. The assignment is designed in sections, with each section having examples. You will run the code that is provided and record the results.

Select Data from Pandas Dataframes

Indexing and Selections From Pandas Dataframes

There are two kinds of indexing in pandas dataframes: location-based and label-based.

In the lesson introducing pandas dataframes, you learned that these data structures have an inherent tabular structure (i.e. rows and columns with header names) that support selecting data with indexing, such as selecting individual cells identified by their location at the intersection of rows and columns.

You can also select data from pandas dataframes without knowing the location of that data within the pandas dataframe, using specific labels such as a column name.

Location-based Indexing

After working with indexing for Python lists and numpy arrays, you are familiar with location-based indexing. You already know that Python location-based indexing begins with [0], and you have learned how to use location-based indexing to query data within Python lists or numpy arrays.

You can use location-based indexing to query pandas dataframes using the attribute .iloc and providing the row and column selection as ranges (i.e. start and stop locations along the rows and columns).

Just like with numpy arrays, the range provided is inclusive of the first value, but not the second value.

This means that you need to use the range [0:1] to select the first index, so your selection begins at [0] but does not include [1] (the second index).

For example, you can select the first row and the first column of a pandas dataframes providing the range [0:1] for the row selection and then providing the range [0:1] for the column selection.

dataframe.iloc[0:1, 0:1]

Label-based Indexing

Pandas dataframes can also be queried using label-based indexing.

This feature of pandas dataframes is very useful because you can create an index for pandas dataframes using a specific column (i.e. label) that you want to use for organizing and querying your data.

For example, you can create an index from a specific column of values, and then use the attribute .loc to select data from the pandas dataframes using a value that is found in that index.

dataframe.set_index(“column”)

dataframe.loc[[value]]

Filtering Data Values

In addition to using indexing, you can also select or filter data from pandas dataframes by querying for values that met a certain criteria.

For example, you can select data in a pandas dataframe based on specific values within a column using:

dataframe[

dataframe[“column”]

== value]

This will return all rows containing that value within the specified column.

On this page, you will review how indexing works for pandas dataframes and you will learn how to select data from pandas dataframes using both indexing and filtering of data values.

Import Python Packages and Get Data

Begin by importing the necessary Python packages and then downloading and importing data into pandas dataframes.

You can use the earthpy package to download the data files, os to set the working directory, and pandas to import data files into pandas dataframes.

# Import necessary packages

import os

import pandas as pd

import earthpy as et

# URL for .csv with avg monthly precip data

avg_monthly_precip

_url = “

https://ndownloader.figshare.com/files/12710618

“

# Download file

et.data.get_data(url=avg_monthly_precip_url)

Downloading from
https://ndownloader.figshare.com/files/12710618

‘/root/earth-analytics/data/earthpy-downloads/avg-precip-months-seasons.csv’

# Set working directory to earth-analytics

os.chdir(os.path.join(et.io.HOME, “earth-analytics”))

# Import data from .csv file

fname = os.path.join(“data”, “earthpy-downloads”,

“avg-precip-months-seasons.csv”)

avg_monthly_precip = pd.read_csv(fname)

avg_monthly_precip

Select Data Using Location Index (.iloc)

You can use .iloc to select individual rows and columns or a series of rows and columns by providing the range (i.e. start and stop locations along the rows and columns) that you want to select.

# Select first row and first column

avg_monthly_precip.iloc[0:1, 0:1]

# Select first two rows and first column

avg_monthly_precip.iloc[0:2, 0:1]

# Select first row and first two columns

avg_monthly_precip.iloc[0:1, 0:2]

# Select first row with all columns

avg_monthly_precip.iloc[0:1, :]

# Select first column with all rows

avg_monthly_precip.iloc[:, 0:1]

Select Data Using Label Index (.loc)

In addition to selecting data based on location, you can also select data based on labels.

To do this, you first create a new index using a column of values that you want to use for organizing and querying your data.

For example, you can create an index from a specific column of values using:

dataframe.set_index(“column”)

# Create new dataframe with `months` as index

avg_monthly_precip_index

= avg_monthly_precip.set_index(“months”)

avg_monthly_precip_index

Test that months no longer functions as a column by attempting to select that column name:

avg_monthly_precip_index[[“months”]]

The following error is returned:

KeyError: “None of [Index([‘months’], dtype=’object’)] are in the [columns]”

Notice that the error message indicates that the value months is not in the index. This is because months is actually now the index!

After setting an index, you can use .loc to select data from the pandas dataframe using a value that is found in that index.

When selecting text string values, you need to specify the text string with quotations “”, as shown below for the text string “Aug”.

# Select Aug using months index

avg_monthly_precip_index.loc[[“Aug”]]

Select Data Using Columns

In addition to location-based and label-based indexing, you can also select data from pandas dataframes by selecting entire columns using the column names.

For example, you can select all data from a specific column in a pandas dataframe using:

dataframe[“column”]

which provides the data from the column as a pandas series, which is a one-dimensional array. A pandas series is useful for selecting columns for plotting using matplotlib.

# Select the `months` column as series

avg_monthly_precip[“months”]

0 Jan

1 Feb

2 Mar

3 Apr

4 May

5 June

6 July

7 Aug

8 Sept

9 Oct

10 Nov

11 Dec

Name: months, dtype: object

You can also specify that you want an output that is also a pandas dataframe using:

dataframe[[“column”]]

which includes a second set of brackets [] to indicate that the output should be a pandas dataframe.

# Select the `months` column as dataframe

avg_monthly_precip[[“months”]]

Notice that your results are now a pandas dataframe.

You can also select all data from multiple columns in a pandas dataframe using:

dataframe[[“column”, “column”]]

Since the results of your selection are also a pandas dataframe, you can assign the results to a new pandas dataframe.

For example, you can create a new pandas dataframe that only contains the months and seasons columns, effectively dropping the precip values.

# Save months and seasons to new dataframe

avg_monthly_precip_text

= avg_monthly_precip[[‘months’, ‘seasons’]]

avg_monthly_precip_text

Filter Data Using Specific Values

In addition to location-based and label-based indexing, you can select or filter data based on specific values within a column using:

dataframe[dataframe[“column”] == value]

This will return all rows containing that value within the specified column.

If you are selecting data using a text string column, you need to provide the value within parentheses (e.g. “text”).

For example, you can select all rows that have a seasons value of Summer.

# Select rows with Summer in seasons

avg_monthly_precip[avg_monthly_precip[“seasons”] == “Summer”]

Again, you can also save the output to a new dataframe by setting it equal to the output of the filter.

For example, you can filter for the values in the months column that are equal to Jan and save the output to a new dataframe.

# Select rows with Jan in months

jan_avg_precip

= avg_monthly_precip[avg_monthly_precip[“months”] == “Jan”]

jan_avg_precip

You can also select data based on numeric values. Note that these selections on numeric values do not require the use of quotations “” because they are not text strings.

For example, you can select all rows that have a specific value in precip such as 1.62.

# Select rows equal to 1.62 in precip

avg_monthly_precip[avg_monthly_precip[“precip”] == 1.62]

You can also filter using a comparison operator on numeric values.

For example, you can select all rows from the dataframe that have precipitation value greater than 2.0 inches by filtering on the precip column using the greater than > operator.

# Save rows with values greater than 2.0 to new dataframe

gt2_avg_monthly_precip

= avg_monthly_precip[avg_monthly_precip[“precip”] > 2.0]

gt2_avg_monthly_precip

Turn in your highest-quality paper
Get a qualified writer to help you with

“ ASSESSMENT 4 ”

Get high-quality paper

NEW! AI matching with writer

Hire a Writer

Client Reviews

4.9

Sitejabber

4.6

Trustpilot

4.8

Our Guarantees

100% Confidentiality

Information about customers is confidential and never disclosed to third parties.

Original Writing

We complete all papers from scratch. You can get a plagiarism report.

Timely Delivery

No missed deadlines – 97% of assignments are completed in time.

Money Back

If you're confident that a writer didn't follow your order details, ask for a refund.

New to Your Trusted Assignment Help Service? Sign up & Save

Calculate the price of your order

Type of paper needed:

Pages:

You will get a personal manager and a discount.

Academic level:

We'll send you the first draft for approval by at

Total price:

$0.00

Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.

Power up Your Study Success with Experts We’ve Got Your Back.

Order Now Order Now