Day 2: Star Wars and strings

Day 2: Star Wars and strings

Welcome to class!

Announcements

Gratitude Journal

The `.str` functions in pandas

.str.strip: Strip white space
.str.replace: replace one string of characters with another.
.str.split: Separate a character string into two values.
.str.join: Join two lists together
Python for Data Science: Strings
Pandas Documentation

`.str.strip()`

s = pd.Series(['1. Ant.  ', '2. Bee!\n', '3. Cat?\t', '4. Beat?\t', np.nan])

s.str.strip()

s.str.strip('123.!? \n\t')

s.str.strip('1234.!? \n\t')

`.str.replace()`

s.str.replace('Ant.', 'Man')
s.str.replace('a', 8)
s.str.replace('a', '8')
s.str.replace('a', '8', case = False)
s.str.replace('a|e', '8', case = False)

s.str.replace('\d', '', case = False)

`.str.split()`

s2 = pd.Series(['1-20', '21-50', '51-80', '81-100', np.nan])
s3 = pd.Series(
    [
        "this is a regular sentence",
        "https://docs.python.org/3/tutorial/index.html",
        np.nan
    ]
)

s2.str.split()
s3.str.split()
s2.str.split(pat="-")

`.str.join()` or `.str.cat()`

two_columns = s2.str.split("-", expand = True).rename(
   columns = {0: 'minimum', 1: 'maximum'})

two_columns.fillna("").agg("__".join, axis = 1)

two_columns.minimum.str.cat(two_columns.maximum, sep = "__")

Fixing the column names

Here is some code to get you started:

url = 'https://github.com/fivethirtyeight/data/raw/master/star-wars-survey/StarWars.csv'

starwars_data = pd.read_csv(url, encoding = "ISO-8859-1", skiprows = 2, header = None)
starwars_cols = pd.read_csv(url, encoding = "ISO-8859-1", nrows = 2, header = None)

starwars_cols.iloc[0,:].str.upper().str.replace(" ", "!")

Validating statistical summaries

len(), .query(), and .value_counts() will be your friends.

Validating visuals

You’re going to make a lot of bar charts!

Simple bar chart tutorial.
Make Altair do the counting for you! Tutorials here and here.

Updated on 12 Oct 2020