Project 4 WorkBook

Tutoring Lab Info

The data science lab is a resource you can use in person, online, and in Slack.

Data Import

1. `Importing Data`

Importing Data

Importing data is the first step in any data analysis task. It involves bringing datasets into your Python environment for further processing and analysis.

Code Snippet


url = "https://raw.githubusercontent.com/byuidatascience/data4dwellings/master/data-raw/dwellings_ml/dwellings_ml.csv"
dwellings_ml = pd.read_csv(url)

Install Correct Packages

1. `pip3/pip install -m`

Install Packages

NumPy: Essential for numerical computations in Python, NumPy provides powerful array manipulation capabilities, enabling efficient handling of large datasets commonly encountered in machine learning tasks.
Seaborn: Built on top of Matplotlib, Seaborn offers a high-level interface for creating visually appealing statistical plots, facilitating quick exploration and visualization of relationships within datasets.
scikit-learn: As a versatile machine learning library, scikit-learn simplifies the implementation of various algorithms, streamlining the development and deployment of predictive models through its user-friendly API and extensive functionality.

Code Snippet

python/path/ pip -m  install pandas plotly numpy seaborn scikit-learn

Metric Libraries Used

2. `Metrics Machine Learning`

Metric Machine Learning

train_test_split: This function is used for splitting datasets into training and testing sets, essential for evaluating machine learning models’ performance.
GaussianNB: Gaussian Naive Bayes is a classification algorithm based on Bayes’ theorem, commonly used for simple classification tasks with continuous input variables.
RandomForestClassifier: RandomForestClassifier is an ensemble learning method that constructs multiple decision trees during training and outputs the mode of the classes as the prediction for classification tasks.
GradientBoostingClassifier: Gradient Boosting is a boosting ensemble learning technique that builds weak learners sequentially, focusing on the errors made by previous learners, thus improving predictive performance.
DecisionTreeClassifier: Decision trees are a non-parametric supervised learning method used for classification and regression tasks, breaking down a dataset into smaller subsets based on different feature values.
metrics: The metrics module in scikit-learn provides various evaluation metrics for assessing the performance of machine learning models, such as accuracy, precision, recall, F1 score, etc.
tree: The tree module in scikit-learn provides tools for constructing, visualizing, and interpreting decision trees and other tree-based models.

Code Snippet

from sklearn.tree import DecisionTreeClassifier


classifier_DT = DecisionTreeClassifier(max_depth=11)


classifier_DT.fit(X_train, y_train)


y_pred = classifier_DT.predict(X_test)

Project 4 WorkBook

Tutoring Lab Info

Data Import

1. Importing Data

Install Correct Packages

1. pip3/pip install -m

Metric Libraries Used

2. Metrics Machine Learning

1. `Importing Data`

1. `pip3/pip install -m`

2. `Metrics Machine Learning`