Distribution Textbook (Work in Progress)

by John Della Rosa

Logistic Regression

Logistic Regression Equations

Manual Data Input

Feature 1:
Feature 2:
Label:

Threshold: 0.5

	Predicted 0 (False)	Predicted 1 (True)
Actual 0 (False)	0	0
Actual 1 (True)	0	0

Model Metrics

Accuracy: 0%

Precision: 0%

Recall: 0%

F1-Score: 0

AUC: 0

Logistic Regression Visualization Tool: User Manual

Overview:

This tool allows you to explore logistic regression by:

Adding data points manually or via random generation.
Visualizing how the logistic regression model fits the data.
Adjusting the decision threshold dynamically.
Observing real-time updates to the decision boundary, heatmap, and confusion matrix.

Features:

1. Manual Data Input:

You can manually input points with two features and assign them a label (0 or 1).
The points will be plotted in the scatter plot, and their true label will be displayed with the corresponding color:

Red for \( y = 1 \).
Blue for \( y = 0 \).

2. Autogenerate Random Points:

Clicking the "Autogenerate Points" button will generate 50 random points clustered around different centers for each label.
Each cluster's center is chosen randomly within the range [-3, 3], and the points have a wide spread to make the dataset more interesting.

3. Logistic Regression Fitting:

The logistic regression model is automatically fitted when you input or generate data points.
The heatmap behind the scatter plot visualizes the probability of a point being classified as \( y = 1 \).
The decision boundary is shown, dividing the regions where the model predicts \( y = 1 \) and \( y = 0 \).

4. Adjusting the Threshold:

A slider is provided to adjust the decision threshold. The default threshold is 0.5, but you can adjust it to explore how different thresholds affect the predictions.
The scatter plot will update dynamically:

Circles represent correct predictions (true positive or true negative).
X’s represent incorrect predictions (false positive or false negative).

5. Confusion Matrix:

The confusion matrix, displayed below the plot, updates dynamically as you adjust the threshold.
It shows:

True Positives (TP): Correctly predicted \( y = 1 \).
False Positives (FP): Predicted \( y = 1 \) but the true label is \( y = 0 \).
True Negatives (TN): Correctly predicted \( y = 0 \).
False Negatives (FN): Predicted \( y = 0 \) but the true label is \( y = 1 \).

How to Use:

1. Manual Data Entry:

Input your desired values for Feature 1 and Feature 2 in the text boxes.
Choose a Label (either 0 or 1) from the dropdown menu.
Click the Add Point button to plot the point.

2. Autogenerate Random Points:

To quickly generate a dataset, click the Autogenerate Points button.
This will generate 50 random points, divided into two clusters with different labels.

3. Adjust the Threshold:

Use the Threshold Slider to adjust the decision threshold dynamically.
As you move the slider, observe the following changes:

The points will update to circles (correct predictions) and X’s (incorrect predictions).
The Confusion Matrix will update in real time, showing how the number of true positives, false positives, true negatives, and false negatives changes as you adjust the threshold.

4. Confusion Matrix:

The confusion matrix is displayed under the plot.
It updates automatically as you change the threshold, showing the classification performance of the logistic regression model based on the current threshold.

Example Workflow:

Input Points Manually: Enter points with Feature 1 = 2, Feature 2 = 3, Label = 1, and add them to the plot. Repeat for other points.
Adjust the Threshold: Use the slider to adjust the threshold and see how it affects predictions and the confusion matrix.
Analyze the Confusion Matrix: Observe how the model’s accuracy and classification performance (True Positives, False Positives, etc.) change with the threshold.

Technical Notes:

The logistic regression model is fitted using gradient descent and updated in real-time based on the data you input or generate.
The heatmap visualizes the probability of a point being classified as \( y = 1 \), with a custom faded color map to make it easier to distinguish the scatter points from the background.
The points are color-coded based on their true labels, and the shapes (circle or X) indicate whether they were correctly classified.