mXScriptasHTML

Version:0.9 StartHTML:0000000105 EndHTML:0000026276 StartFragment:0000001234 EndFragment:0000026260 mXScriptasHTML
# ImageAI is a Python library built to empower Computer Vision
"""

Bayes' Theorem Explained

Bayes' theorem is crucial for interpreting the results from binary classification algorithms, and a most know for aspiring data scientists. We show how Bayes' theorem can be established using the results from a binary classification machine learning algorithm.
Author: Benjamin O. Tayo Date: 5/7/2020  - regularize matrix: 5/11/2020 Max Kleiner
https://github.com/bot13956/Bayes_theorem/blob/master/Bayes_Theorem.ipynb

https://towardsdatascience.com/6-amateur-mistakes-ive-made-working-with-train-test-splits-916fabb421bb
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
plt.style.use("ggplot")

"""
detection = detector.detectObjectsFromImage(input_image=input_path, \
                 output_image_path=output_path, minimum_percentage_probability=10)
                                            
"""
# 2. Exploratory Data Analysis

df=pd.read_csv(r"C:\maXbox\mX47464\maxbox4\examples\fm_heights.csv")
print(df.head())

plt.figure()
sns.countplot(x="sex", data=df)
plt.show()

df2 = df
df2['count']=range(df.shape[0])
print(df2.head(n=10))

sns.lmplot( x="count", y="height", data=df2, hue='sex', \
                                   legend=False, fit_reg=False, aspect=1.6)
plt.legend(loc='upper left')
plt.title('Scatter plot of heights')
plt.ylabel('height (inch)')
plt.show()

plt.figure(figsize=(10,6))
sns.distplot(df['height'],bins=20)
plt.title('Probability distribution of all heights')
plt.xlabel('height (inch)')
plt.show()

plt.figure(figsize=(10,6))
sns.distplot(df[df.sex=='Male']['height'],bins=None, hist=False, label = 'Male')
plt.title('probability distribution of Male and Female heights')
sns.distplot(df[df.sex=='Female']['height'],bins=None,hist= False,label= 'Female')
plt.legend()
plt.xlabel('height (inch)')
plt.show()

# 3. Model Building and Evaluation

from sklearn.preprocessing import LabelEncoder
class_le = LabelEncoder()
y = class_le.fit_transform(df['sex'].values)
print(pd.value_counts(y))

X = df['height']
X_train, X_test, y_train, y_test= train_test_split(X, y, test_size=0.3, \
                                     random_state=0, stratify=y)
X_train = X_train.values.reshape(X_train.shape[0],1)
X_test = X_test.values.reshape(X_test.shape[0],1)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)

print('score: ',knn.score(X_test, y_test))

print(confusion_matrix(y_test, y_pred))

print(classification_report(y_test, y_pred))

print('bayes classifier detector compute ends...')


#//----app_template_loaded_code----
#//----File newtemplate.txt not exists - now saved!----
#https://stackabuse.com/object-detection-with-imageai-in-python/
# https://github.com/OlafenwaMoses/ImageAI/releases/download/1.0/yolo-tiny.h5
#https://imageai.readthedocs.io/en/latest/detection/index.html
"""

dtype: int64
[[ 39  32]
 [ 22 222]]
              precision    recall  f1-score   support

           0       0.64      0.55      0.59        71
           1       0.87      0.91      0.89       244

    accuracy                           0.83       315
   macro avg       0.76      0.73      0.74       315
weighted avg       0.82      0.83      0.82       315

bayes detector compute ends...

Bayes theorem is crucial for interpreting the results from binary classification algorithms

We will show that Bayes theorem is simply the relationship between precision and recall:

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

dtype: int64
[[ 39 32] 71 =A
[ 22 222]] 244 =B
254 315
	
            Predict
Actual 0  [[ 39  32]    71
       1   [ 22 222]]  244
             61 254    315 

1 precision: 222/254 =0.874 P(A/B)
2 recall: 222/244 =0.909 = Bayes P(B/A)

The probability P(B/A) = 222/244 = 0.91 is called the recall. It simply gives the percentage of the 244 actual B that were correctly predicted by our classification algorithm. We see that Bayes theorem is simply a relationship between recall and precision:

[Predict * precision / Actual = recall] = Bayes

(P_B*P_AB)/P_A = 0.9098360655737705

(P(B)*P(A/B))/(P(A)) = 0.9098360655737705 = P(B/A)
(254/315 * 222/254) / (244/315) = 0.9098
predict * precision / actual = recall

precison / recall = actual / predict
	
          precision    recall  f1-score   support
 
       0       0.64      0.55      0.59        71
       1       0.87      0.91      0.89       244
accuracy                           0.83       315
macro avg      0.76      0.73      0.74       315
weighted avg   0.82      0.83      0.82       315

Bayes matrix detector compute ends Confusion matrix needs both labels predictions as single-digits, not as one-hot encoded vectors; although you have done this with your predictions using model.predict_classes()
image detector compute ends...
"""