Day 2: Star Wars and strings

Welcome to class!

Announcements

What’s something you’re grateful for today?


The .str functions in pandas


.str.strip()

s = pd.Series(['1. Ant.  ', '2. Bee!\n', '3. Cat?\t', '4. Beat?\t', np.nan])

s.str.strip()

s.str.strip('123.!? \n\t')

s.str.strip('1234.!? \n\t')


.str.replace()

s.str.replace('Ant.', 'Man')
s.str.replace('a', 8)
s.str.replace('a', '8')
s.str.replace('a', '8', case = False)
s.str.replace('a|e', '8', case = False)

s.str.replace('\d', '', case = False)


.str.split()

s2 = pd.Series(['1-20', '21-50', '51-80', '81-100', np.nan])
s3 = pd.Series(
    [
        "this is a regular sentence",
        "https://docs.python.org/3/tutorial/index.html",
        np.nan
    ]
)

s2.str.split()
s3.str.split()
s2.str.split(pat="-")

.str.join() or .str.cat()

two_columns = s2.str.split("-", expand = True).rename(
   columns = {0: 'minimum', 1: 'maximum'})

two_columns.fillna("").agg("__".join, axis = 1)

two_columns.minimum.str.cat(two_columns.maximum, sep = "__")


Fixing the column names

Here is some code to get you started:

url = 'https://github.com/fivethirtyeight/data/raw/master/star-wars-survey/StarWars.csv'

starwars_data = pd.read_csv(url, encoding = "ISO-8859-1", skiprows = 2, header = None)
starwars_cols = pd.read_csv(url, encoding = "ISO-8859-1", nrows = 2, header = None)

starwars_cols.iloc[0,:].str.upper().str.replace(" ", "!")

Validating statistical summaries

len(), .query(), and .value_counts() will be your friends.


Validating visuals

You’re going to make a lot of bar charts!