"The question of whether a computer can think is no more interesting than the question of whether a submarine can swim." — Edsger W. Dijkstra
My Machine Learning Projects
Noteworthy Machine Learning Algorithms
Machine Learning ⇒ software able to detect patterns, make decisions, predict outcomes, learn from mistakes & optimize own performance without being explicitly programmed to do so
Supervised Learning
↳ "learning a function that maps to an output based on the example of input-output pairs"
-
Linear Regression | Predict Real Values
Estimate or predict real values based on continuous variables -> establish relationship between independent variables (matrix of features) & dependent variable (output) by fitting a best line
-
Homoscedasticity
"Homoskedastic . . . refers to a condition in which the variance of the residual, or error term, [that is, the “noise” or random disturbance in the relationship between the independent variables and the dependent variable], in a regression model is constant. That is, the error term does not vary much as the value of the predictor variable changes." Investopedia
-
Multicollinearity
"[R]efers to predictors that are correlated [, that is, highly linearly related,] with other predictors. Multicollinearity occurs when your model includes multiple factors that are correlated not just to your response variable, but also to each other. In other words, it results when you have factors that are a bit redundant." Minitab
-
No Free Lunch Theorems (NFL)
"[S]tate that any one algorithm that searches for an optimal cost or fitness solution is not universally superior to any other algorithm. . . . 'If an algorithm performs better than random search on some class of problems then in must perform worse than random search on the remaining problems.'” Medium
-
Parsimonious Model
"Parsimonious models are simple models [with the least assumptions & variables but] with great explanatory predictive power. They explain data with a minimum number of parameters, or predictor variables. The idea behind parsimonious models stems from Occam's razor, or 'the law of briefness' (sometimes called lex parsimoniae in Latin)." Statistics How To
-
Derivation of Line of Best Fit | Ordinary Least Squares method | Sum of Squares Residual
SSres = Σ(y - ŷ)2 → min
-
Simple Linear Regression
Combining one variable in an equation to predict a single outcome
-
Multiple Linear Regression
Combining many variables in an equation to predict a single outcome
-
Polynomial Linear Regression
y = b0 + b1x1 + b2x22 + . . . + bnxnn
-
R Squared | Goodness of Fit Parameter
R2 = 1 - SSres/SStot
↳ where:
- ↳ SStot = Σ(y - yavg)2
-
Adjusted R Squared
Adj R2 = 1 - (1 - r2) * [(n - 1)/(n - p - 1)]
↳ where:
-
↳ p = number of regressors
↳ n = sample size -
Support Vector Regression | Classification
Use as a regression method, maintaining all the main features that characterize the algorithm (maximal margin). The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences.
-
Epsilon-Insensitive Tube
-
The Gaussian RBF Kernel
-
-
Logistic Regression | Classification
Used to estimate discrete values, binary values (0/1, yes/no, true/false) based on given set of independent variables; predicts probability between 0 & 1 as output values.
Logistic regression like its name is logarithmic. Its graph is curvilinear. If the dependent variable is binary, the graph is sigmoid. If not, the graph can be more pronounced, parabolic, etc.-
Linear Regression | Sigmoid Function | Predicting Probability (p̂)
y = b0 + b1x1
-
-
Decision Tree Regression
Supervised learning algorithm used for classification problems; works for categorical & continuous variables
-
Standard Deviation Reduction
F(T, X) = ΣP(c)S(c)
-
-
Support Vector Machines | Discriminative Classifier
Discriminative classifier formally defined by a separating hyperplane
-
Maximum Margin Hyperplane | Support Vectors
{xi, yi} where i = 1 . . . L, yi ∈ {-1, 1}, x ∈ ℝD
-
-
Kernel SVM | Nonlinear
-
Linearly Separable with Hyperplane in 3D
-
The Gaussian or Radial Basis Function (RBF) Kernel
-
Sigmoid Kernel
-
Polynomial Kernel
-
Naive Bayes Classification
Probabilistic classifier based on Bayes Theorem with an assumption of independence between predictors (aka, features or independent variables)
-
Bayes Theorem ⇒ The probability of an event given prior knowledge of related events that occurred earlier
-
-
K-Nearest Neighbors
Used for classification & regression; a simple algorithm that stores all available cases & classifies new cases by a "majority vote" of its K-nearest neighbors
-
Euclidean Distance
-
y = b0 + b1x1
Mapping to a higher-dimensional space, applying the support vector algorithm & then projecting back to lower dimensional space resulting in a nonlinear separator
φ(x1, x2) ⇒ (x1, x2, z)
↳ where:
↳ K = function applied to two vectors
↳ x = point in datasets
↳ l = landmark
↳ x = point in datasets
↳ l = landmark
K(X, Y) = tanh(γ ˙ XTY + r)
K(X, Y) = tanh(γ ˙ XTY + r)d, γ > 0
Unsupervised Learning
↳ "looks for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision"
-
K-Means Clustering
-
Within Cluster Sum of Squares (WCSS)| Quantifiable metric to evaluate how certain number of clusters performs compared to different number of clusters
-
Apriori Association
-
Support
-
Confidence
-
Lift → measuring the relevance of an associated rule & the improvement prediction
Unsupervised algorithm which solves clustering problems; follows simple/easy way to classify a dataset through a certain number of clusters
WCSS = Σ distance(Pi, C1)2 + Σ distance(Pi, C2)2 +Σ distance(Pi, C3)2
↳ where:
↳ distance = distance between each point inside cluster
↳ C = centroids, respectively
↳ C = centroids, respectively
Analyzes the association of specific preferences in customer transactions (movies watched, items purchased in convenience store - beer & pampers urban myth) to discover relationships and how items are associated with each other
Movie recommendation example:
↳ where M = specific Movie
Reinforcement Learning
↳ "how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward"
-
Upper Confidence Bound Algorithm | Deterministic Model
-
Advertising Model (requires update at every round)
Modern application of Multi-Armed Bandit Problem (reference slot machine distributions)
Step 1: Each round n considers two numbers for each ad i:
↳ Ni(n) → # of times the ad i selected up to round n
↳ Ri(n) → Σ of rewards of ad i up to round n
Step 2: From these two numbers we compute:
↳ Average reward of ad i up to round n with:
↳ Ri(n) → Σ of rewards of ad i up to round n
↳ Confidence interval [r̄i(n) - △i(n), r̄i(n) + △i(n)] at round n with:
Step 3: Select the ad i that has the maximum UCB r̄i(n) +
△i(n)
-
Thompson Sampling Algorithm | Probabilistic Model
-
Advertising Model (can accomodate delayed feedback & has better empirical evidence than UCB)
Constructs distributions of where we think the actual expected value might lie; an auxiliary mechanism to solve the problem
Step 1: Each round n considers two numbers for each ad i:
↳ Ni1(n) → # of times the ad i rewarded 1 up to round n
↳ Ni0(n) → # of times the ad i rewarded 0 up to round n
Step 2: For each ad i, we take a random draw from the distribution below:
θi(n) = β(Ni1(n) + 1, Ni0(n + 1))
↳ Ni0(n) → # of times the ad i rewarded 0 up to round n
θi(n) = β(Ni1(n) + 1, Ni0(n + 1))
Step 3: Select the ad i that has highest θi(n)
-
Random Forest Regression
-
Dimensionality Reduction
-
Gradient Boosting
Ensemble decision trees; a collection of decision trees (aka forest) to classify a new object based on attributes; each tree gives a classification & we say the tree "votes" for the class
Identifies highly significant variables when you have thousands
Ensemble of machine learning algorithms
My Neural Networks Projects
Lovely Deep Learning
Artificial Neural Networks
↳ A computing system that consist of a number of simple but highly interconnected elements or nodes, called ‘neurons’, which are organized in layers which process information using dynamic state responses to external inputs, an extremely useful algorithm for finding patterns too complex to be manually extracted
-
Neuron Definition
-
Neuron Formula
-
Sigmoid Activation Function
-
Threshold Function
-
Rectifier Function
-
Hyperbolic Tangent Function (tanh)
A mathematical operation that takes its input, multiplies it by its weight & then passes the sum through an activation function
Y1 = activation(w1x1 + w2x2 + w3x3 + . . . + wmxm)
Σwixi
Σwixi
Σwixi
Σwixi
Convolutional Neural Networks
↳ A class of deep neural networks, most commonly applied to analyzing visual imagery. CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer.
Natural Language Processing
↳ Starts with raw text in whatever format available, processes it, extracts relevant features and builds models to accomplish various NLP tasks
-
NLP Pipeline
-
Document-Term Matrix
Compute dot product (sum of the products of corresponding elements) to find similarities
a * b = Σ (a1b1 + a2b2 + a3b3 + . . . + anbn)
-
Cosine Similarity
Divide the product of two vectors by their magnitudes or Euclidean norms
-
TF-IDF Transform
Term frequency-inverse document frequency
tfidf(t, d, D) = tf(t, d) * idf(t, D)
↳ where:
tf(t, d) =
idf(t, D) = -
Stemming
Takes the root of a word removing conjugation to simplify & understand gist meaning (reducing final dimension )
Text Processing ⇒ Feature Extraction ⇒ Modeling
Lemmatization
Refers to doing things properly with the use of a vocabulary and morphological analysis of words, normally aiming to remove inflectional endings only and to return the base or dictionary form of a word, which is known as the lemma.>
About Ryan L Buchanan
I am re-skilling as a Data Analyst & Machine Learning Engineer. I
am currently enrolled in a Masters in Data Analytics. I am also
acquiring certifications as an ML Engineer & Algorithmic Trader from Udacity.
I have an MBA & an MS in Instructional Design.
I have a multi-displinary background including military intelligence,
psychology, linguistics, economics, virtual reality & educational technology. I have
worked abroad for ten years with military, universities & vocational schools.
I have working knowledge of Arabic, Chinese & French. I am very
mobile, able to relocate quickly, adapt easily to diverse working conditions
& have a current passport.
I have a passion for mathematics, statistics & artificial intelligence.
I am enthusiastic, highly self-motivated & enjoy presenting informative data
to decision makers. I am eager to work with dynamic teams to create
high quality products & services.