Day 4: Exporting JSON

Question 5

Let’s do an example of question 5 using the mtcars data.

Load packages and data

import pandas as pd
import numpy as np
import json

url_cars = ""
cars = pd.read_json(url_cars)

Find all the missing values

# method 1: find "official" null values
# hp, wt, and vs

# method 2: just look at the data
# car, hp, wt, vs, gear

# method 3: look at summaries
# the values in 'gear' look funny

# method 4: count up categories
# looks like 4 rows are blank

Reformat the missing values

Remember, you need to reformat your missing values to make them consistent!

Reading the examples in the replace documentation might give you some ideas.

# There are a lot of functions
# we could use to give the missing values
# a consistent format.

# `replace()` is one of the easiest
# let's change everything to np.nan
cars_new = cars.replace(999, np.nan).replace("", np.nan)

# or equivalently:
cars_new = cars.replace([999, ""], np.nan)

# did we get them all?

Saving JSON files from a pandas dataframe

You can save a DataFrame as a JSON file like this:

# save the new data as a json

The df.to_json() documentation shows us how to change the way the JSON file is organized. (By row? By column? etc.)

This is the format we would like to see in the report:

    "car": "Mazda RX4",
    "mpg": 21,
    "cyl": 6,
    "disp": 160,
    "hp": 110,
    "drat": 3.9,
    "wt": 2.62,
    "qsec": 16.46,
    "vs": 0,
    "am": 1,
    "gear": 4,
    "carb": 4

And here are the various options:

# %%
# Question 5 wants us to "include one record example"
# in our md report that "has a missing value"

# you can print out a json file like this:
json_data = cars_new.to_json()

# but that won't look good in our report.
# instead....

# you can do this.
# in this format, the json file is
# organized/printed by column
json_data = cars_new.to_json()
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent = 4)

# %%
# we can change the format of the
# json file using 'orient'
json_data = cars.to_json(orient="split")
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent = 4)

# %%
# by table
json_data = cars.to_json(orient="table")
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent = 4)

# %%
# by "record" or "row"
json_data = cars.to_json(orient="records")
json_object = json.loads(json_data)
json_formatted_str = json.dumps(json_object, indent = 4)