Today I want to discuss purely about coding itself. I wish this post is helpful for someone want to transit his career from a pure researcher to a programmer. I have been a researcher rather than a programmer. I would just want to execute something to see the result I wanted to see. If the run time is too long or my computer has no enough memory to run the code, it was a sign of new purchase to me. From industrial experience, I become to regard the better way of coding as important.


Clean Code


The book, Clean Code, by Robert C. Martin, was good and you can find its summaries by chapters on many online blogs if you do not have time to read the whole book.


There are many useful advices in the book, and statement inpired me most was that codes should be easily readable. I used to overlook this and thought if other coworker get confused then I can explain the detail in person.


The 2nd important statement is, also not to make a mistake caused by the first lesson, that you should remove unnecesary lines to explain the detail. Those contents should be understood as possible by smart namings and the coding itself.


The by-product from keeping the cleanly coding rules is that you will be able to find what the good codes are. In the era of open sources, to find the good resources for your new projects become a principal key to learn fast and effective. Do not waste your time to understand bad codes (when there are many other replacement codes out there).


Python and low-level programming ideas


Python is a popular programming language in machine learning and many machine learning scientists who do not have a base in computer science major, easily overlook the importance of the computer memory management. If you want to expand your view in machine learning and to give you more chances to challenge other interesting possibilities in IT industry, spare your time on understanding why C/C++, such old programming languages are still popular and people who are good at those gains respect. Such ideas can be applied to higher-level programming languages and data structures/management, and reduces memory usages.


Make it faster


Python is a great language and easy to learn for novice, but also known to be slow processing. Actually it is not always true any more. If you are a intermediate Python programmer, concern to use PyPy, Cython or else. They reduces the processing time of Python dramatically to the level of C/C++ by imitating them.


Trust only half of what I’d talk from now on. I can be wrong since I never be deeply into it. However, don’t be scared to use PyPy/Cython. Using it is much easier.


Before talking about Cython and PyPy, let us scan the difference between a compiler and an interpreter. The compiled language such as C/C++ run the program after compiling. Then, after compiling, can run just an excecution file, which is made to be easily read by CPU. Interpreted languages such as Python/Ruby do not have the step between run and compile. Interpreter also has a step to transfer the codes to bytecode line by line during runtime. Then running time is increasing. However, interpreted language takes an advantange of interactive scripting.


Cython’s main idea is compiling Python script and taking the benefit of compiled executable machine codes. For PyPy, it is a bit more complicated. Above all, read the document.


PyPy’s process


  1. Making interpreter with Python(more exactly RPython, subprogramming language of Python) itself
  2. Translating the Python code into RPython code.
  3. Compiling the traslated RPython with prepared chaintool (eg. C)
  4. Checking the performance of the process
  5. Optimize the process by repeat

4, 5 are confusing. You can find more detail from the above document link. The interpreter made of RPython has a loop inside, and it checks the lentgh of the program and ends the loop if the length and turns of the loop are balanced.


JIT interpreter


JIT detects which part of the codes should be interpreted first, and then reduce the time by intepreting the detected part. Thus, it is proper to use in intepreted languages.


Running PyPy


If you are a Mac user, you can install it on command line using brew.

$ brew install pypy

and then can run the PyPy by

$ pypy YOUR_PYTHON_SCRIPT_FILE.py

Then it interprets and executes it line by line without making compiled file. Then when you repeat the executions, it is slower than separating compiling and executing. (You can find how to do it from the above link, and you should make a proper loop targeting in the RPython code.)


Restrictions


PyPy and Cython are not an ultimate tool for all the kinds of Python script. Some Python libraries are not supported by PyPy interpreting. Check out the link


Running Cython


This case is more straighforward then PyPy’s. Let us think a simple code including a defined function.

# target.pyx
def f():
    for loop 
        for loop

And then, make a setup.py file

#setup.py

from distutils.core import setup
from Cython.Build import cythonize

setup(
  name = 'target app', # target.so will be produced by building
  ext_modules = cythonize("target.pyx"), the target file to be compiled
)

Now

$ python setup.py build_ext --inplace

You will see target.so file in the directory. This is a shared object and can it is dynamically linked at run time. The shared objects are not included into the executable component but are tied to the execution.


Now import the target.so file to excute the function f insde target.pyx.

# ctest.py
import target 
if __main__== '__main__':
    target.f

and run this in the command line. Try this while remove target.pyx file. ctest.py file uses so file, so it should run.

$ rm -f target.pyx

$ python ctest.py

Cython and PyPy will show the decades to hundreds times shorter run time than Python. PyPy is faster than C in some area.