Introducing scikit-learn for Machine Learning in Python

In this tutorial, we will go through a simple steps to classify whether a voltage is a digital 0 or 1. We will make things pretty straight forward with large margin so that we can easily visualize. The idea is just to introduce some of the functions in scikit-learn that we can use for linear classification.

The simple problem we want to do is given a set of voltages and its digital logic, we want to be able to determine what would be the digital logic given a new set of voltages. In this example, we use supervised learning, where we label the data and have the training set. Then given a new data, we want to determine which class it belongs to. We will use linear classifier from Support Vector Machine.

First given a training data sets of voltages, e.g.

>>> import numpy as np
>>> data=[0.5,0.1,0.2,0.15,0,0.3,0.4,0.5,0.35,0.45]
>>> data=np.array(data).reshape(len(data),1)

The first line of the above code import numpy module so that we can use its array data stucture. The second line create a list of voltages as a Python list. The third line, convert the Python list to Numpy array and change from a row vector to a column vector.

We can then train the data set by labeling the voltages. For simplicity, let’s label any voltages below 0.2 to be digital logic 0, and anything above 0.3 to be logic 1. To do this we create a target list according to our data set.

>>> target=np.array([0,0,0,0,0,1,1,1,1,1])

Now we can create our linear classifier using SVM.

>>> from sklearn import svm
>>> clf = svm.SVC(kernel='linear')

The first line import the module svm, and the second line creates a linear classifier. Next, we have to fit the data with the target.

>>> clf.fit(data,target)

And now we can predict new data. Let’s say we want to determine what the digital logic is for voltage 0.21V, we can type:

>>> clf.predict(0.21)
array([0])

Or we can also give an array:

>>> newdata=[0,0.5,0.22, 0.1,0.44]
>>> newdata=np.array(newdata).reshape(len(newdata),1)
>>> clf.predict(newdata)
array([0, 1, 0, 0, 1])

As you can see that voltages around 0.2 and below is classified as logic 0 while those around 0.3 and above is classified as logic 1.

Fixing PyInstaller on Mac Yosemite

I tried to install pyinstaller using PIP:

<code>$ pip install pyinstaller </code>

and this installed PyInstaller version 2.1, as you can check using

<code>$ pyinstaller –version</code>

When I tried PyInstaller to one of my python script, it gives me an error. I run PyInstaller in this way:

<code>$ pyinstaller minesweeper.py</code>

Then I got the following error:

<code> struct.error: argument for ‘s’ must be a string</code>

The error is caused from a file

<code>/User/lib/python2.7/site-packages/PyInstaller/loader/pyi_carchive.py</code>

from line 84 and 367. To solve this, I replaced the following lines:

Line 84

<code> rslt.append(struct.pack(self.ENTRYSTRUCT + repr(nmlen) + ‘s’, nmlen + entrylen, dpos, dlen, ulen, flag, typcd, nm + pad))</code>

with the following:

<code> 84: rslt.append(struct.pack(self.ENTRYSTRUCT + repr(nmlen) + ‘s’, nmlen + entrylen, dpos, dlen, ulen, flag, typcd, nm.encode(‘utf8’) + pad)) </code>

and Line 367

<code>cookie = struct.pack(self._cookie_format, self.MAGIC, totallen, tocpos, self.toclen, pyvers, self._pylib_name)</code>

with the following:

<code>cookie = struct.pack(self._cookie_format, self.MAGIC, totallen, tocpos, self.toclen, pyvers, self._pylib_name.encode(‘utf8’)) </code>

After that, my pyinstaller works.

Creating 2D/multidimensional list in Python

I found a link that discusses on how to create multi-dimensional list in Python:
http://pyfaq.infogami.com/how-do-i-create-a-multidimensional-list/_comments/4jn5

Move multiple files using python

Move multiple files using python. Download
$ mvmult.py [common] [targetcommon]

e.g. to move field1.dat, field2.dat, … to fieldref1.dat, fieldref2.dat,…, use the following command:
$ mvmult.py field field*.dat fieldref

The command simply replacing the [common] string in list by [targetcommon].

Installing python, numpy, and scipy in non-standard directory

1. Download all the source codes.
2. Untar the Python 2.6.5 source code

$ tar xzvf Python.2.6.5.tar.gz

3. type

$ ./configure --prefix=
$ make
$ make install

4. Set path to the non standard directory

$ export PATH=/bin:$PATH

5. untar numpyXXX.tar.gz
6. type

$ python setup.py build --fcompiler=gnu95
$ python setup.py install --prefix=

the compiler setting tell the installer to use gfortran, read INSTALL.txt for more detail.
7. When building scipy, I need to disable UMFPACK, otherwise it gave me error from swig : “Unable to find umfpack.h”. To do this type

$export UMFPACK = "None"

8. then do the same as with numpy:

$ python setup.py build --fcompiler=gnu95
$ python setup.py install --prefix=

To run the testing, we will need to install Nose. I did try to run test_umfpack.py, and it works find. I had to add “import nose” and fix some indentation. But the result of the test is “OK”.

Error in running F2Py

When I tried the sample code in : http://www.scipy.org/F2py

I got the following error:

/tmp/tmpHNH55t/src.linux-x86_64-2.6/fortranobject.h:7:20: error: Python.h: No such file or directory

It turns out I needed to install python-dev in my Ubuntu to resolve this problem. After I install python-dev, f2py works well.

Writing VTK file using Python

To write VTK file using Python, download pytvtk. You can read the examples folder to get to know how to use it.

For example to write a structured points of scalar data stored in variable named Coef, we write:


vtk=VtkData(StructuredPoints([Nx,Ny,Nz]),\
PointData(Scalars(ravel(Coef,order='F'),'my scalar')))
vtk.tofile('myfile')

this will save to a file name myfile.vtk