Welcome to class!
Announcements
Gratitude Journal
The .str
functions in pandas
.str.strip
: Strip white space.str.replace
: replace one string of characters with another..str.split
: Separate a character string into two values..str.join
: Join two lists together- Python for Data Science: Strings
- Pandas Documentation
.str.strip()
s = pd.Series(['1. Ant. ', '2. Bee!\n', '3. Cat?\t', '4. Beat?\t', np.nan])
s.str.strip()
s.str.strip('123.!? \n\t')
s.str.strip('1234.!? \n\t')
.str.replace()
s.str.replace('Ant.', 'Man')
s.str.replace('a', 8)
s.str.replace('a', '8')
s.str.replace('a', '8', case = False)
s.str.replace('a|e', '8', case = False)
s.str.replace('\d', '', case = False)
.str.split()
s2 = pd.Series(['1-20', '21-50', '51-80', '81-100', np.nan])
s3 = pd.Series(
[
"this is a regular sentence",
"https://docs.python.org/3/tutorial/index.html",
np.nan
]
)
s2.str.split()
s3.str.split()
s2.str.split(pat="-")
.str.join()
or .str.cat()
two_columns = s2.str.split("-", expand = True).rename(
columns = {0: 'minimum', 1: 'maximum'})
two_columns.fillna("").agg("__".join, axis = 1)
two_columns.minimum.str.cat(two_columns.maximum, sep = "__")
Fixing the column names
Here is some code to get you started:
url = 'https://github.com/fivethirtyeight/data/raw/master/star-wars-survey/StarWars.csv'
starwars_data = pd.read_csv(url, encoding = "ISO-8859-1", skiprows = 2, header = None)
starwars_cols = pd.read_csv(url, encoding = "ISO-8859-1", nrows = 2, header = None)
starwars_cols.iloc[0,:].str.upper().str.replace(" ", "!")
Validating statistical summaries
len()
, .query()
, and .value_counts()
will be your friends.
Validating visuals
You’re going to make a lot of bar charts!
- Simple bar chart tutorial.
- Make Altair do the counting for you! Tutorials here and here.