You must have heard about Regression models many times but you might have not heard about the techniques of solving or making a regression model step-wise.

In the Last part of Statistical Cluster Analysis series we discussed about Hierarchical cluster analysis(HCA). Which is the first method of data exploratory analysis techniques.

Decision Tree Introduction A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements. Decision trees are commonly …

CART (Classification and Regression Trees) An Introduction and Motivation for choosing CART: Two of the very basic ideas behind any algorithm used for predictive analysis are : – prediction on the basis of regression and prediction on the basis of classification. However, both are equally important concepts of data science. …

Scikit-Learn Cheat Sheet: Python Machine Learning A handy scikit-learn cheat sheet to machine learning with Python, this includes function and its brief description Pre-Processing Function Description sklearn.preprocessing.StandardScaler Standardize features by removing the mean and scaling to unit variance sklearn.preprocessing.Imputer Imputation transformer for completing missing values sklearn.preprocessing.LabelBinarizer Binarize labels in …

A tool capable of adapting the content to the individual and not the other way around and capable of updating at the speed that the world demands. Why normal is to be different. Mr. Turing helps students and university professors get their questions answered, consult articles, find solutions, update content …

Hello, Today we will discuss the codes snippets and implementation of different Machine Learning Algorithm. The code snippets is in Python as well as in R. Linear Regression Python Code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
#Import Library #Import other necessary libraries like pandas, numpy... from sklearn import linear_model #Load Train and Test datasets #Identify feature and response variable(s) and values must be numeric and numpy arrays x_train=input_variables_values_training_datasets y_train=target_variables_values_training_datasets x_test=input_variables_values_test_datasets # Create linear regression object linear = linear_model.LinearRegression() # Train the model using the training sets and check score linear.fit(x_train, y_train) linear.score(x_train, y_train) #Equation coefficient and Intercept print('Coefficient: \n', linear.coef_) print('Intercept: \n', linear.intercept_) #Predict Output predicted= linear.predict(x_test) |

R Code

1 2 3 4 5 6 7 8 9 10 11 12 |
#Load Train and Test datasets #Identify feature and response variable(s) and values must be numeric and numpy arrays x_train <- input_variables_values_training_datasets y_train <- target_variables_values_training_datasets x_test <- input_variables_values_test_datasets x <- cbind(x_train,y_train) <span class="c"># Train the model using the training sets and check score </span><span class="hl std">linear</span> <span class="hl kwb"><-</span> <span class="hl kwd">lm</span><span class="hl std">(y_train</span> <span class="hl opt">~</span> <span class="hl std">.</span><span class="hl std">,</span> <span class="hl kwc">data</span> <span class="hl std">= x</span><span class="hl std">) summary(linear)</span><span class="c"> #Predict Output predicted= </span><span class="n">predict</span><span class="p">(linear,</span><span class="n">x_test</span><span class="p">) </span> |

Logistic Regression Python Code

1 2 3 4 5 6 7 8 9 10 11 12 13 |
#Import Library from sklearn.linear_model import LogisticRegression #Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset # Create logistic regression object model = LogisticRegression() # Train the model using the training sets and check score model.fit(X, y) model.score(X, y) #Equation coefficient and Intercept print('Coefficient: \n', model.coef_) print('Intercept: \n', model.intercept_) #Predict Output predicted= model.predict(x_test) |

R Code

1 2 3 4 5 6 |
x <- cbind(x_train,y_train) <span class="c"># Train the model using the training sets and check score </span><span class="hl std">logistic</span> <span class="hl kwb"><-</span> g<span class="hl kwd">lm</span><span class="hl std">(y_train</span> <span class="hl opt">~</span> <span class="hl std">.</span><span class="hl std">,</span> <span class="hl kwc">data</span> <span class="hl std">= x,family='binomial'</span><span class="hl std">) summary(logistic)</span><span class="c"> #Predict Output predicted= </span><span class="n">predict</span><span class="p">(</span><span class="hl std">logistic</span><span class="p">,</span><span class="n">x_test</span><span class="p">)</span> |

Decision Tree Decision Tree Python Code

1 2 3 4 5 6 7 8 9 10 11 12 |
#Import Library #Import other necessary libraries like pandas, numpy... from sklearn import tree #Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset # Create tree object model = tree.DecisionTreeClassifier(criterion='gini') # for classification, here you can change the algorithm as gini or entropy (information gain) by default it is gini # model = tree.DecisionTreeRegressor() for regression # Train the model using the training sets and check score model.fit(X, y) model.score(X, y) #Predict Output predicted= model.predict(x_test) |

…

Hi, Another way that you can improve the performance of your models is to combine the predictions from multiple models. Some models provide this capability built-in such as random forest for bagging and stochastic gradient boosting for boosting. Another type of ensembling called voting can be used to combine the …

The data features that we use to train our machine learning models have a huge influence on the performance we can achieve. Irrelevant or partially relevant features can negatively impact model performance. Let’s get started. Feature Selection Feature selection is a process where we automatically select those features in our data that contribute most to …

Principal Coordinate Analysis (PCoA) is a method to represent on a 2 or 3 dimensional chart objects described by a square matrix containing resemblance indices between these objects. This method is due to Gower (1966). It is sometimes called metric MDS (MDS: Mutidimensional scaling) as opposed to the MDS (or …