Summarizing Data Practice

Introduction

In this assignment, you will use several datasets to summarize data numerically and visually.

# Load the libraries and the datasets
library(rio)
library(mosaic)
library(tidyverse)
library(car)

wrong_patient <- import('https://github.com/byuistats/Math221D_Course/raw/main/Data/WrongSiteWrongPatient.xlsx') %>% tibble()

wordle <- read_csv("https://github.com/byuistats/Math221D_Course/raw/main/Data/Wordle.csv")

1 - Whoopsies!

On rare occasions, a medical procedure is performed on the wrong body part of a patient’s body or on the wrong patient. These are called wrong-site and wrong-patient mistakes. Such errors occur hundreds of times each year across the United States. The medical community is trying to eliminate these errors but have had difficulty reducing their frequency. In a small percentage of these cases, the patient files a lawsuit against the hospital. Philip Stahel et al. conducted a study on these mistakes and the lawsuits that follow.

The data in the file WrongSiteWrongPatient.xlsx represent the amount (in US dollars) hospitals have been required to pay in wrong-site and wrong-patient lawsuits sourced from JAMA Surgery. Some of the values equal zero, indicating that the hospital won the legal battle.

Question: Find the mean and median amount hospitals had to pay in Wrong-Site lawsuits. Round your answer to the nearest whole dollar.

Mean Wrong-Site:

Median Wrong-Site:

Create a histogram of the Wrong-Site data and describe its shape:

Shape of Wrong-Site:

Question: Find the mean and median amount hospitals had to pay in Wrong-Patient lawsuits. Round your answer to the nearest whole dollar.

Mean Wrong-Patient:

Median Wrong-Patient:

Create a histogram of the Wrong-Patient data and describe its shape:

Shape of Wrong-Patient:

Question: Which error is the most costly for the medical community?

2 - A la Mode

The mode is mostly used for low frequency count data or for tallying the most frequently occurring category in categorical data analysis.

Question: Give 3 examples of situations where the mode might be useful:

Worldle you like to know!

Brother Cannon’s family plays the New York Times Wordle every day. Brother Cannon and his son, Ben, are the most competitive. The dataset wordle.csv contains the number of guesses for the last few months of Wordles for Brother Cannon and his son.

Using favstats() function, create a table of summary statistics for each player:

Question: Which player has the lowest standard deviation?
Answer:

Using the table() function, create a table of frequencies for each player and score:

Question: What is the mode for each player?
Answer:

Question: Who is the better player, and why?
Answer: