Python

Table of Contents

Pseudocode which runs.

The best program to do a job is one which already ships the solution.

Aphorism 13 in the Zen of Python by Tim Peters:

There should be one – and preferably only one – obvious way to do it.

Python is nice, sure. But only until it stats warping your mind in the very late of the game. See https://www.draketo.de/proj/py2guile/ for an insightful reference.

When you start thinking about using code-templates in your editor to comply with the requirements of your language, then it is likely that something is wrong with the language.

1 Random

export PYTHONPATH="$PYTHONPATH:/home/hebi/github/reading/models"

Add some path so that I can import from there:

sys.path.append('/home/hebi/github/reading/InferSent/')
# assume in root of that directory, models.py defines InferSent class
from models import InferSent

Packaging:

setup.py:

from setuptools import setup, find_packages
setup(
    name="InferSent-Mirror",
    version="0.1",
    # packages=find_packages(),
    packages=['p1', 'p2'],
)

Directory structure:

mypackage/
  p1/
    __init__.py
    xxx.py
  p2/
    __init__.py
    yyy.py

Install locally:

python3 setup.py install --user

Install from git repo:

pip install --user git+https://github.com/lihebi/InferSent

Import:

from p1 import xxx
from p2.yyy import foo

1.1 Meta Programming

Basically eval (return value) and exec (no return value), with either string or code object created by compile. They can use the names bound by current namespace.

eval("1+2")
a=2
eval("1+a")
def foo(a):
    return a+3
eval("foo(a)")
# no return
exec("foo(a)")
eval(compile("1+a", '', 'eval'))

1.2 Function def and call

The default value of an argument is evaluated once at the function definition. Thus, the object is shared for all the invoke of the function. This is typically not desired behavior.

def foo(a=[]):
    a.append(3)
    return a
foo()
foo()
# => [3,3] !!!

Python function pass-by-object. If you pass a list, you can modify the list, and the original list is modified.

a = [1,2]
def foo(x):
    x.append(3)
foo(a)
a # => [1,2,3]

1.3 List object model

Lists are mutable. The behavior of slicing is a bit confusing. If the slicing is used directly as the target of an assignment statement, it will modify the object in place. E.g.

a = [1,2,3,4]
a[1:3] = []
a # => [1,4]

That also means all other references to a will be modified:

a = [1,2,3,4]
a[1:3] = []
# although tuple is immutable, it can still contain reference to
# mutable objects.
c=(a,)
# this will also modify a
a.append(5)
c # => ([1,4,5])

However, if the slicing is assigned to another variable (either assignment or pass-by-object function call), it is copied. Modifying this copy will not affect the original list.

a = [1,2,3,4]
b = a[1:3]
b[0] = 9
a # => [1,2,3,4]
def foo(x):
    x[1] = 8

# changing b
foo(b)
b # => [9,8]
a # => [1,2,3,4]

If you convert a list to a tuple, the elements are shallow-copied.

a = [1,2,3]
b = [a]
# this is shallow copied. Still contains reference to the object "a"
c = tuple(b)
# no reference anymore, just a tuple of (1,2,3). Will never change
# whatsoever.
d = tuple(a)

# testing:
a[2] = 8
b # => [[1,2,8]]
c # => [[1,2,8]]
d # => [1,2,3]

String is immutable sequence, thus cannot be assigned. Thus it is fairly safe to use string.

2 Language

The ultimate reference:

For python 3

atom is the most basic expression, while the enclosure is interesting.

atom      ::=  identifier | literal | enclosure
enclosure ::=  parenth_form | list_display | dict_display | set_display
               | generator_expression | yield_atom

The display is a special syntax for python to create lists and dicts, using comprehension.

list_display ::=  "[" [starred_list | comprehension] "]"
set_display ::=  "{" (starred_list | comprehension) "}"

dict_display       ::=  "{" [key_datum_list | dict_comprehension] "}"
key_datum_list     ::=  key_datum ("," key_datum)* [","]
key_datum          ::=  expression ":" expression | "**" or_expr
dict_comprehension ::=  expression ":" expression comp_for

And the comprehension grammar:

comprehension ::=  expression comp_for
comp_for      ::=  [ASYNC] "for" target_list "in" or_test [comp_iter]
comp_iter     ::=  comp_for | comp_if
comp_if       ::=  "if" expression_nocond [comp_iter]

The comp_for non-terminal contains one or more for clause, and zero or more if clause.

The evaluation of these displays will evaluate from left to right, so the last assignment will prevail in the case of set and dict. The comprehension is executed by nesting the for and if clauses from left to right.

So some examples:

((a,b) for a in range(2) for b in range(3))
[(a,b) for a in range(2) for b in range(3)]
{(a,b) for a in range(2) for b in range(3)}

{a:b for a in (1,2) for b in (3,4)}

Note the out-most braces are required, which also creates a scope.

3 Emacs support

Install the elpy package. It provides:

  • C-c C-c runs the shell and send the current buffer
  • C-c C-d runs elpy-doc
  • C-c C-t runs elpy-test, which runs the unittest discover

To enable linter python in emacs, use pylint. It will use pylint executable. And it also needs the configure file. Generate it:

pylint --generate-rcfile > ~/.pylintrc

4 Unit Test

class MyTest(unittest.TestCase):
    def test_me(self):
        self.assertEqual(1,2)
unittest.main()

python unit test can support automatic test discovery. To use that, the file must be named test_xxx.py, and run the python -m unittest discover.

5 Concept

5.1 Scoping

There're four levels:

  • current scope
  • parent scope
  • module scope (global)
  • built-in scope

nonlocal keyword specify this variable should be referenced to the parent scope. But, this will not reach global. Instead, the global keyword declares the listed variables to be in the module level scope.

The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope excluding globals.

As an example:

var = 0 # global

def outer():
  var = 1 # parent
  def inner():
    nonlocal var
    var = 2 # local
    global var
    var =3
  inner()
  # var = 2

outer()
# global var = 3

5.2 Collection

5.2.1 String

5.2.1.1 Concatenation
  • concatenate two strings directly by +.
  • need to convert integer to string before concatenate: s + str(35)
  • "".join(lst) works
5.2.1.2 split
str.split(sep=None)
default by white space
str.strip()
strip out white space at both begin and end
str.replace(old, new)
replace all.
str.startswith(s)
str.endswith(s)
5.2.1.3 Slicing

String is an immutable object. It can use slicing. E.g. reversing a string is as easy as "hello"[::-1]!

However, notice that when using a negative step, the slicing should be lst[end:begin:-1]. This is because x = i + n*k:

with a third “step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.

Also, the negative step does not always work as expect. E.g. the i index is included and j is not; the j can not be negative, then how can I include the first one in the list??

Thus if want to get a reverse of a sub-string, I would get sub-string first and then reverse it.

5.2.2 TODO tuple

5.2.3 List

5.2.3.1 Slicing

The slicing syntax is l[start:end:step]. The slicing will return a new list. Change to that list will not change the original one.

l[4]
l[4:]
l[::2]
l[:-1]

However, assign to the slicing itself will change the original one:

l[1:2] = [4,5,6]

Also, assign to a new variable only assign the reference:

a = [1,2,3]
b = a # only a reference
5.2.3.2 create a list
  • range(stop)
  • range(start, stop[, step])

Creating a matrix:

newmat=[[-1 for x in range(height)] for y in range(width)]
5.2.3.3 Modify a list
  • list.append
  • list.pop

5.2.4 Dictionary

Create:

x = {'a': 1, 'b': 2}

Dictionary is not sorted. Use collections.OrderedDict if you want this feature. Basically it remember the order when the elements are inserted.

import collections
od = collections.OrderedDict(sorted(d.items()))

Merge two dictionary (x and y):

z = x.copy()
z.update(y)

5.2.5 Set

s = set()
s.add(x)
if x in s:
  pass

5.3 Algorithm

5.3.1 TODO sort

sort a dictionary by value:

sorted(dict1, key=dict1.get) # => list
sorted(dict1, key=dict1.get, reverse=True)

5.4 Function

5.4.1 variadic parameter

use *args syntax, and args will be a tuple:

  def foo(*args):
    for a in args:
      print a

use **args to capture all keyword arguments.

def bar(**kwargs):
  for a in kwargs:
    print a, kwargs[a]

Combine them together:

def foobar(kind, *args, **kwargs):
  pass

Also, there's a concept for the reverse thing: unpack argument list from a list, with *list:

def foo(a,b):
  pass

l = [1,2]
foo(*l)

on python3, this syntax can appear on left side

first, *rest = [1,2,3,4]
first,*l,last = [1,2,3,4]

5.5 Exception

To give a quick feel:

try:
  pass
except TypeError as e: # capture the exception into a variable
  pass
except AnotherError: # does not capture
  pass
except: # all exception
  pass
else: # if doesn't raise an exception
  pass
finally:
  pass

5.6 Lambda

lambda x : x+2
lambda x: x%2==0

The usage of lambda is often in map and filter.

  • map(lambda_exp, mylist) will execute the lambda expression on each element of the list, and return a list containing the results.

5.7 Packaging

Exposing API: the following only expose foo but not bar.

__all__ = ['foo']
def foo():
  pass
def bar():
  pass

5.7.1 importing

The local structure directory must contain the __init__.py file to be able to import.

|-- main.py
|-- mypackage
    |-- __init__.py
    |-- a.py
    |-- b.py
    |-- subdir
        |-- __init__.py
        |-- c.py

The import statements should be:

from mypackage import a
from mypackage.b import foo as myfoo
from mypackage.subdir import c

5.8 Thread

from threading import Thread

class MyThread(Thread):
  def __init__(self, arg):
    Thread.__init__(self)
    self.arg = arg
  def run(self):
    pass

t = MyThread(arg)
t.start()

6 Type

6.1 Boolean

  • not True

6.2 Integer

  • i += 1

6.3 conversion

  • string to integer: int('45')
  • integer to string: str(45)
  • ASCII to char: chr(100) returns 'd'
  • char to ASCII: ord('d') returns 100

7 Black Tech

If else or:

var = d.get('key') or 0
# is equal to:
var = d.get('key') if d.get('key') else 0

list comprehension

even_squares = [x**2 for x in l if x%2 == 0]

8 Pep8

Indent:

  • function and class should be separated by 2 lines
  • In a class, function should be separated by 1 line
  • 1 space before and after variable assignment

Naming

  • function, variable, attribute: func_var_attr
  • protected instance attributes: _protected_field
  • private instance attributes: __private_field
  • class and exception: ClassExceptionName
  • module level constants: CONSTANT
  • instance method of class should use self as first parameter, refer to the object
  • class method should use cls as first parameter, refer to the class

Expression

use DONT use
a is not b not a is b
if not list if len(list) == 0

Import

  • always use absolute path
  • if must use relative, use from . import foo instead of import foo

8.1 document

One can use one line or multi-line document. The doc string can be retrieved by func.__doc__.

def func():
  """one line doc"""

def func():
  """The outline

  The above empty line is required.
  Here's the detailed documentation.
  """

9 IO

print('xxx', end='')
  f = open('text.txt')
  f.read() # return all content

  f = open('text.txt')
  for line in f:
      print(line)

  with open('a.txt') as f:
      for line in f:
          print(line)

read from stdin:

for line in sys.stdin:
  print(line)

get command line argument: sys.argv

10 Operating System

10.1 Work filesystem:

import os
for root,dirs,files in os.walk('.'):
  for f in files:
    print f
  • os.path.abspath('relative/path/to/file')
  • os.path.exists("/path/to/file")
  • os.rename('old', 'new')
  • os.path.isfile

10.2 Shell command

os.system
simply run command
os.system("some command")
os.popen
access to input output
stream = os.popen("some command")
stream.read()
  • subprocess.Popen
p = subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE)
p.stdout.read()
s = subprocess.check_output('wc -l', stdin=p.stdout)
subprocess.call
this is the same as subprocess.Popen except that it waits and gives return code.
return_code = subprocess.call("echo Hello World", shell=True, stdout=subprocess.DEVNULL)

11 Standard Library

11.1 Built-in exceptions

BaseException
 +-- SystemExit
 +-- KeyboardInterrupt
 +-- GeneratorExit
 +-- Exception
      +-- StopIteration
      +-- StandardError
      |    +-- BufferError
      |    +-- ArithmeticError
      |    |    +-- FloatingPointError
      |    |    +-- OverflowError
      |    |    +-- ZeroDivisionError
      |    +-- AssertionError
      |    +-- AttributeError
      |    +-- EnvironmentError
      |    |    +-- IOError
      |    |    +-- OSError
      |    |         +-- WindowsError (Windows)
      |    |         +-- VMSError (VMS)
      |    +-- EOFError
      |    +-- ImportError
      |    +-- LookupError
      |    |    +-- IndexError
      |    |    +-- KeyError
      |    +-- MemoryError
      |    +-- NameError
      |    |    +-- UnboundLocalError
      |    +-- ReferenceError
      |    +-- RuntimeError
      |    |    +-- NotImplementedError
      |    +-- SyntaxError
      |    |    +-- IndentationError
      |    |         +-- TabError
      |    +-- SystemError
      |    +-- TypeError
      |    +-- ValueError
      |         +-- UnicodeError
      |              +-- UnicodeDecodeError
      |              +-- UnicodeEncodeError
      |              +-- UnicodeTranslateError
      +-- Warning
           +-- DeprecationWarning
           +-- PendingDeprecationWarning
           +-- RuntimeWarning
           +-- SyntaxWarning
           +-- UserWarning
           +-- FutureWarning
	   +-- ImportWarning
	   +-- UnicodeWarning
	   +-- BytesWarning

11.2 Built-in

These functions are always available.

Numbers:

  • abs(x): absolute value
  • divmod(a,b): a pair (a // b, a % b)
  • max(arg1, arg2, *args)
  • min(arg1, arg2, *args)
  • pow(x,y): xy
  • round(x, ndigits=0)
  • sum(iterable)

Convertion

  • int(x)
  • float(x)
  • long(x)
  • chr(x): ASCII to char
  • ord(c): char to ASCII
  • bool(x): convert x to bool
  • hex(x): convert integer to lowercase hex string prefix with '0x'
  • oct(x): integer to octal string
  • bin(x): an integer to binary string

Boolean:

  • all(iterable): true if all items are true. empty => True
  • any(iterable): true if any item is true. empty => False
  • cmp(x,y)
    • x<y => negative
    • x=y => 0
    • x>y => positive

Symbol Table

  • locals()
  • globals()
  • dir()

Creation

  • dict
  • list
  • set
  • tuple

Other

  • len(s): length
  • next(iterator)
  • print(*objects, sep='', end='\n', file=sys.stdout)
  • range(stop): [0,stop)
  • range(start, stop, step=1)
  • sorted(iterable, cmp, key, reverse=False)
    • key=lambda x: x[1]
  • type(obj): get the type of obj
  • open(name, mode): return an object of file type.
    • r,w,a,b; + for read and write

11.3 Printing

  • pprint.pprint(object, stream=None): pretty print
  • 'string {0}, {hello}'.format('yes', hello=2)

11.4 File System

11.4.1 os.path

If parameter is not listed, it means a single path.

  • exists: GOOD. check whether a path exists
  • split: return a pair (head, tail). tail is the last component, without slash. If path ends with slash, tail is empty
    • basename: the tail of the split output
    • dirname: head of split output
  • normpath: collapse redundant separators and up level references
  • abspath: from relative to absolute path. normpath(join(os.getcwd(), path))
  • commonprefix(list): return the longest path prefix
  • expanduser: replace the initial component of ~ by the users directory.
  • getsize: in bytes
  • isabs: predicate for absolute
  • isfile:
  • isdir
  • islink
  • join(path, *paths): join intelligently
  • realpath: canonical path by following symbolic links

11.4.2 pathlib

Object-oriented filesystem paths. https://docs.python.org/3/library/pathlib.html

pathlib.Path is the class. pathlib.PosixPath is a subclass for non-windows paths, but seems just for implementation purpose, makes no contribution for user.

Actually not very interesting, this table tells everything:

os and os.path pathlib
os.path.abspath() Path.resolve()
os.chmod() Path.chmod()
os.mkdir() Path.mkdir()
os.rename() Path.rename()
os.replace() Path.replace()
os.rmdir() Path.rmdir()
os.remove() , os.unlink() Path.unlink()
os.getcwd() Path.cwd()
os.path.exists() Path.exists()
os.path.expanduser() Path.expanduser() and Path.home()
os.path.isdir() Path.isdir()
os.path.isfile() Path.isfile()
os.path.islink() Path.issymlink()
os.stat() Path.stat(), Path.owner(), Path.group()
os.path.isabs() PurePath.isabsolute()
os.path.join() PurePath.joinpath()
os.path.basename() PurePath.name
os.path.dirname() PurePath.parent
os.path.samefile() Path.samefile()
os.path.splitext() PurePath.suffix

Some interesting APIs that don't have counterparts:

  • Path.glob(pattern) that returns a list of all files matching the shell pattern, e.g. p.glob('*/*.py')
  • slash operator: you can directly use p / 'foo' / 'bar'
  • Path.iterdir() gives a list of directory items
  • Path.parts gives a list of string

11.4.3 TODO tempfile

11.5 os

11.5.1 Env

  • os.environ['HOME']
  • os.getenv(name)
  • os.putenv(name, value)
  • os.unsetenv(name)

11.5.2 Filesystem

  • os.getcwd(): current working directory
  • os.chdir(path): change cwd
  • os.mkdir(path)
  • os.listdir(path='.'): list all in this dir. E.g. for item in os.listdir('/path'): print (item)
  • os.makedirs(path): GOOD this is the way to go the make directories
  • os.remove(path): remove a file
  • os.rmdir(): remove an empty dir.
  • os.removedirs(path): foo/bar/aaa will try to remove aaa, than bar, then foo. Don't use! To recursively remove all contents, use shutil.rmtree
  • os.rename(src, dst)
  • os.renames(old, new)
  • os.rmdir(path): only work if dir is empty
  • os.tempnam(): a reasonable absolute name for creating temporary file
    • seems to be vulnerable
  • os.walk(top, topdown=True): for each directory including top itself, it yields 3-tuple (dirpath, dirnames, filenames). E.g. for root,dirs,files in os.walk('/path'): for f in files: print (f);

11.5.3 shutil

  • copy(src,dst)
  • copytree(src, dst): recursive
  • rmtree(path): rm -r
  • move(src, dst)

popen family is deprecated. Use subprocess.

11.5.4 Process

  • os.abort()
  • os.execl(path, arg0, arg1, …)
  • os.execle(path, arg0, arg1, …, env)
  • os.execlp(file, arg0, arg1, …)
  • os.execlpe(file, arg0, arg1, …, env)
  • os.execv(path, args)
  • os.execve(path, args, env)
  • os.execvp(file, args)
  • os.execvpe(file, args, env)
  • os.folk
  • os.wait()
  • os.system(cmd): run cmd, return exit code
  • os.times(): 5-tuple
    • user time
    • system time
    • childrens user time
    • childrens system time
    • elapsed real time

11.6 io

  • f = open('file.txt')
  • f = io.StringIO("some string"): in memory text stream
  • f = open('file', 'rb')
  • f = io.BytesIO(b"some binary data \x00\x01")
  • support with statement: with open('file.txt') as file:

11.6.1 IOBase

Methods:

  • close()
  • flush()
  • readline(): return one line
  • readlines(): return a list of lines
  • seek(offset=0)
    • 0 start
    • 1 current
    • 2 end
  • tell(): current position
  • writelines(lines): write a list of lines

11.6.2 RawIOBase : IOBase (should not use directly)

  • read()
  • readall()
  • readinto(b)
  • write(b)

11.6.3 BufferedIOBase

  • read(): read all
  • write(b)

11.6.4 FileIO : RawIOBase

11.6.5 BytesIO : BufferedIOBase

11.6.6 BufferedReader(raw)

  • peek()
  • read()

11.6.7 BufferedWriter(raw)

  • flush()
  • write()

11.6.8 TextIOBase : IOBase

  • read()
  • readline(size=1)
  • seek(offset=0)
  • tell()
  • write(s): finally the string!

11.6.9 TextIOWrapper(buffer) : TextIOBase

11.6.10 StringIO

  • getvalue()

11.7 time

  • time.sleep(secs)
  • time.time(): time in seconds since epoch
  • strptime(string[, format]): parse a string into time object
    • format default: "%a %b %d %H:%M:%S %Y"
    • time.strptime("30 Nov 00", "%d %b %y")
  • strftime(format[, t]): convert from time object to string
    • %a/A: abbr/full weekday name
    • %b/B: abbr/full month name
    • %Y: year
    • %m: month [01,12]
    • %d: day of the month [01,31]
    • %H: 24-hour [00,23]
    • %I: 12-hour [01,12]
    • %p: AM or PM
    • %M: Minute [00,59]
    • %S: second [00,61]
  • gmtime(): in seconds, from epoch
  • localtime(): convert gmtime() to local
  • clock(): processor time as floating number in seconds

class time.structtime: returned by gmtime(), localtime() and strptime()

11.8 argparse

import argparse
parser = argparse.ArgumentParser(descripton='Description here')

parser.add_argument('-q', '--query', help='query github api', require=True)
parser.add_argument('-d', '--download', help='do download', action='store_true')

args = parser.parse_args()

The most interesting method is of course the add_argument. It accepts the name, either a single string, bar, indicating positional argument, or a string starting with -, indicating optional arguments. You can supply parser.add_argument(-f, --foo) for short and full argument. The value is stored as an attribute with the same name (i.e. bar, foo) of the result, but you can change it to anther name via dest argument.

An action defines what to do with the argument. It is a string (!!!). The default is 'store', meaning store the supplied value to the result. If you don't need the value, but just want to know if the option is supplied, use store_true or store_false, which differ only in default value. The action append will collect each occurrence of the argument into a list.

By default, each option consume one argument. You can change this by the argument nargs. If it is an integer, it means how many should be consumed. The result will be a list, thus in case of 1, it is still different from default. It can be a string '*', '+', '?', which conforms to the regular expression meaning of them. * and + produce a list, + will get give error when no arguments are provided, ? will use default if missing.

In case of missing value, the default argument can be used to supply the default value. Otherwise, it is none. You can also use required argument to make sure user supplies something. A value is by default a string, you can convert it to anther data type by the type option, accepting a data type, e.g. int. You might also want to restrict the choices of the argument, so choices is a list of allowed values.

Finally, help option can be used to provide help string, and it can be printed out using parser.print_help(). To test the parser, you can use parser.parse_args(['-f', '1', 'bar']).

11.9 contextlib

from contextlib import redirect_stdout
with open('xxx.txt', 'w') as f:
    redirect_stdout(f)

Or:

sys.stdout = f

The file handle can be:

f = open(os.devnull, 'w')

It can also be a predefined handle, like sys.stderr:

with redirect_stdout(sys.stderr):
    help(dir)

11.10 Regular Expression

construction

import re
pattern = re.compile('\d+.*$')

match

s = 'this is a test string'
pattern.match(s) # return True or False

search

pattern.findall(s)

shorthand

m = re.match("[pattern]", "string")
m.group()
m = re.search("[pattern]", "string")
m.group()
re.search("pattern", "string", re.IGNORECASE)
m = re.findall("[pattern]", "string")

11.11 Concurrent

11.11.1 threading

The package name is threading, the object is Thread.

Functions

  • threading.activecount(): number of Thread object
  • threading.currentthread(): current Thread object
  • threading.enumerate(): return a list of all Thread objects
  • threading.meain(): the main Thread object
  • threading.local(): the instance of local storage. Different for different threads. Typical usage: mydata = threading.local()

Two ways to specify what to run:

  • pass a callable object to the target argument when constructing Thread
  • define a subclass of Thread and override the run method.

Methods:

  • start: start the thread. It will call run method in a separate thread. The thread terminate when run terminate
  • join(timeout=None): the calling thread will block until this thread terminate
    • timeout should be float in seconds
  • is_alive: test whether the thread terminate

11.11.2 Thread Sync

class threading.Lock

  • acquire()
  • release()

class threading.RLock

  • this is recursive lock. The same thread can acquire the lock multiple times. They will be nested and only when the last release is called, the lock can be acquired by another thead
  • acquire()
  • release()

class threading.Condition(lock=None)

  • the lock must be a Lock or RLock. If none, a RLock is created
  • acquire()
  • release()
  • wait(timeout=None): wait until notified
    • release underlying lock
    • block until notify
    • re-acquire the lock and return
    • typical usage: while not item_is_available(): cv.wait()
    • often use with statement: =with cv: cv.waitfor(pred); get();
  • waitfor(predicate, timeout=None)
    • this is same as while not predicate(): cv.wait(), thus more convenient than wait
  • notify(n=1): notify one thread
  • notifyall(): notify all threads waiting on this condition

class threading.Semaphore: this class manage resources with limited capacity.

  • acquire(): decrease capacity
  • release(): increase capacity

class threading.Event

  • isset():
  • set(): set flag to true
  • clear(): set flag to false
  • wait(timeout=None): block until internal flag is true

class threading.Timer(interval, function) : Thread

  • interval is float in seconds, function is callable. use start method to start the thread, and the function will be called after the delay.
  • cancel(): stop the timer and cancel the execution. Only work if the the timer is still waiting.

class threading.Barrier(parties, action=None, timeout=None)

  • parties is integer. Every thread calling wait will block, until parties number of such call is called. Then all players unblock and do things simultaneously.
  • wait(timeout=None)
  • reset(): reset the barrier. The thread waiting for it will receive BrokenBarrierError
  • abort(): all current and future wait call for it will get BrokenBarrierError
  • parties: number of parties
  • nwaiting: number of current waiting
  • broken: True or False
11.11.2.1 Using with statement

Lock, RLock, Condition, Semaphore can be used.

with somelock:
  # do somthing

is equivalent to:

somelock.acquire()
try:
  # do something
finally:
  somelock.release()

11.11.3 multiprocessing

This provide multiprocessing.Process class, having similar API with Thread. It seems to use fork but don't have explicit exec on the document?? Wired and seems just do something thread can do (except the sharing of memory of course).

11.11.4 Process (subprocess module)

  • subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, shell=False, timeout=None, check=False)
    • run the command and wait for it to complete. Return a CompleteProcess instance.
    • if check is True, raise CalledProcessError exception if return code non-zero. This replace the checkcall and checkoutput.

class subprocess.CompletedProcess

  • args
  • returncode
  • stdout: captured if PIPE is passed to stdout
  • stderr: captured if PIPE is passed to stderr
  • checkreturncode(): if returncode is non-zero, raise CalledProcessError

Variables:

  • subprocess.DEVNULL
  • subprocess.PIPE
  • subprocess.STDOUT: this is only used in the place of stderr to redirect it to stdout

class subprocess.CalledProcessError

  • returncode
  • cmd
  • output: same as stdout
  • stdout
  • stderr

The followings are from 2.7, now only use run.

  • subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
    • args: a list of argument, including arg0
    • it can also be a string due to that *
    • it will wait, then return returncode
    • do not use stdout=PIPE, use communicate() instead TODO
    • use shell=True is bad, but it can give me
      • shell pipes
      • filename wildcard
      • env variable expansion
      • ~ expansion
  • checkcall(args, *, …): same as call, except it will raise exception if return non-0
  • checkoutput(args, *, stdin=None, stderr=None, shell=False, universalnewlines=False)
    • if return non-0, raise exception. Otherwise return the stdout

Popen object

  • Popen constructor
    • args, bufsize=0, executable=None,
    • stdin=None, stdout=None, stderr=None,
    • preexecfn=None, closefds=False,
    • shell=False, cwd=None, env=None,
    • universalnewlines=False, startupinfo=None, creationflags=0
  • Popen.poll(): check if child process has terminated. Set and return returncode.
  • Popen.wait(): wait for process to terminate. Don't use PIPE with this.
  • Popen.communicate(input=None): to use this, the corresponding stdin, stdout, stderr should be set to PIPE.
    • send data to stdin (string)
    • read data from stdout and stderr (it returns a tuple (out, err))
    • wait for termination
  • Popen.snedsignal(signal)
  • Popen.terminate(): send SIGTERM
  • Popen.kill(): send SIGKILL
  • Popen.pid
  • Popen.returncode
    • set by poll and wait (and indirectly by communicate)
    • None indicate hasn't terminated
    • -N means terminated by signal N

11.12 Internet

11.12.1 urllib.request

package urllib.request

Functions

  • urlopen(url, data=None)
    • url can be a string or Request object
    • for http and https, returns a http.client.HTTPResponse object
    • for FTP, file, data urls, return a urllib.response.addinfourl object
  • pathname2url(path): do quoting
  • url2pathname(path): do unquoting

class Request

  • constructor: (url, data=None, headers={}, method=None)
    • url: a string
    • headers: a dictionary.
    • method: a string. 'GET' is default. Available values: 'HEAD', 'POST'

methods:

  • getmethod()
  • addheader(key, val)
  • hasheader(key)
  • getheader(key)
  • removeheader(key)
  • getfullurl()
  • headeritems(): return a list of tuples (key, value)
  req = request.Request(query)
  req.add_header("Authorization", "token " + token)
  response = request.urlopen(req)
  s = response.read().decode('utf8')
  langj = json.loads(s);
  # deprecated
  urllib.request.urlretrieve(url[, filename])

11.12.2 urllib.parse

  • quote(string)
  • quoteplus(string)
  • unquote(string)
  • unquoteplus(string)
  • urlencode(query)

11.13 Data

11.13.1 Json

import json
json.dumps({"C": 0, "D": 1})
json.loads("a string of json")

json.dump(obj, fp, indent=2)
json.load(fp)

12 Third party libraries

12.1 urllib

from urllib import request
import json

url = 'https://api.github.com'
api = '/search/repositories'
query = 'language:C&stars:>10&per_page='+size
response = request.urlopen(url+api+"?q="+query)

s = response.read().decode('utf8')
j = json.loads(s)
# j will be a mix of list and dict

12.2 XML

import xml.etree.ElementTree as ET
root = ET.fromstring(s)
# XPath
nodes = root.findall('{http://www.sdml.info/srcML/src}function')
for node in nodes:
  # do with node
  pass

APIs

  • node.find(XPath)
  • node.findall(XPath)
  • node.get(Attribute)
  • node.text

12.4 BeautifulSoup

The package is called BeautifulSoup4.

The preface to use the package:

from bs4 import BeautifulSoup
BeautifulSoup('<html>string</html>')
with open('a.html') as fp:
    BeautifulSoup(fp)

Each node can be used as a data structure, with the following fields:

  • name: the tag name
  • string: the (first?) string directly embedded inside the node
  • strings: a list of the strings
  • a-tag: the first child that is of that tag
  • attrs: a list of all attribute names
  • children: going downwards
  • descendants: intuitive
  • parent
  • parents: wow, this should be called ancestor?
  • next_sibling, previous_sibling

It can also be used as a dictionary of its attributes, e.g. s['href']. This should be a string. It is equivalent to using the get method with the class name.

Several methods are of particular interests.

  • get_text(): return all text in the node

You can also execute a query on it. In general, find_all returns a list, while find returns the first one. There are also some methods in this family, namely find_next_siblings, find_parents. E.g.

  • s.find_all('a'): return a list of all 'a' tag nodes

Or it can be a query respecting css id and classes. Although find has some support for id and class, the select is easier to use.

  • s.select("body a"): non-direct
  • s.select("p > a"): direct
  • s.select(p.c#id): class and id
  • s.select(p > #id): mix
  • s.select(a[href^=xxx]): filtering based on attribute values

12.6 matplotlib

import matplotlib.pyplot as plt

# plot some random staff
plt.plot([1,2,3,4])
plt.show()

# plot a mnist digit
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# since the data is just an array (28,28), imshow must have converted
# it to image pixel properly
plt.imshow(x_train[7777], cmap='Greys')
# must call plt.show() to open the figure window. Or, execute
# %matplotlib in the REPL, you can get the image directly after
# imshow().
plt.show()

plot to a file

pylab.ioff()
plot([1, 2, 3])
savefig("/tmp/test.png")

Subplots:

plt.ioff()
figure = plt.figure()
figure.canvas.set_window_title('My Grid Visualization')
for x in range(height):
    for y in range(width):
        # print(x,y)
        figure.add_subplot(height, width, x*width + y + 1)
        plt.axis('off')
        plt.imshow(convert_image_255(images[x*width+y]), cmap='gray')
# plt.show()
plt.savefig(filename)

12.7 Deep Learning Framework

Select visible GPU in a multi-GPU setting:

os.environ['CUDA_VISIBLE_DEVICES'] = '3'

CUDA setup

  1. Install Nvidia driver. This can be done using Ubuntu's software center. But this is the stable version, not newest
  2. Install cuda. To /usr/local/cuda-10.0. I use the "runfile", with the --override option (otherwise throw gcc version not supported error).
  3. Install cudnn by copying header files and library files into /usr/local/cuda-10.0
  4. Configure
CUDA_PATH=/usr/local/cuda-10.0
export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$LD_LIBRARY_PATH"
export PATH="$CUDA_PATH/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_PATH/extras/CUPTI/lib64"