Introducing scikit-learn for Machine Learning in Python

In this tutorial, we will go through a simple steps to classify whether a voltage is a digital 0 or 1. We will make things pretty straight forward with large margin so that we can easily visualize. The idea is just to introduce some of the functions in scikit-learn that we can use for linear classification.

The simple problem we want to do is given a set of voltages and its digital logic, we want to be able to determine what would be the digital logic given a new set of voltages. In this example, we use supervised learning, where we label the data and have the training set. Then given a new data, we want to determine which class it belongs to. We will use linear classifier from Support Vector Machine.

First given a training data sets of voltages, e.g.

>>> import numpy as np
>>> data=[0.5,0.1,0.2,0.15,0,0.3,0.4,0.5,0.35,0.45]
>>> data=np.array(data).reshape(len(data),1)

The first line of the above code import numpy module so that we can use its array data stucture. The second line create a list of voltages as a Python list. The third line, convert the Python list to Numpy array and change from a row vector to a column vector.

We can then train the data set by labeling the voltages. For simplicity, let’s label any voltages below 0.2 to be digital logic 0, and anything above 0.3 to be logic 1. To do this we create a target list according to our data set.

>>> target=np.array([0,0,0,0,0,1,1,1,1,1])

Now we can create our linear classifier using SVM.

>>> from sklearn import svm
>>> clf = svm.SVC(kernel='linear')

The first line import the module svm, and the second line creates a linear classifier. Next, we have to fit the data with the target.

>>> clf.fit(data,target)

And now we can predict new data. Let’s say we want to determine what the digital logic is for voltage 0.21V, we can type:

>>> clf.predict(0.21)
array([0])

Or we can also give an array:

>>> newdata=[0,0.5,0.22, 0.1,0.44]
>>> newdata=np.array(newdata).reshape(len(newdata),1)
>>> clf.predict(newdata)
array([0, 1, 0, 0, 1])

As you can see that voltages around 0.2 and below is classified as logic 0 while those around 0.3 and above is classified as logic 1.

Advertisements