Python, the language

1. Tools & Reference
- 1.1. Emacs support
- 1.2. pip mirror
2. Language
3. Collections
4. Standard Library
5. Third party libraries

Pseudocode which runs. – Peter Norvig (?)

The best program to do a job is one which already ships the solution.

There should be one – and preferably only one – obvious way to do it.

– Aphorism 13 in the Zen of Python by Tim Peters:

Python is nice, sure. But only until it stats warping your mind in the very late of the game. See https://www.draketo.de/proj/py2guile/ for an insightful reference.

When you start thinking about using code-templates in your editor to comply with the requirements of your language, then it is likely that something is wrong with the language.

1 Tools & Reference

For python 3

1.1 Emacs support

Install the elpy package. It provides:

C-c C-c runs the shell and send the current buffer
C-c C-d runs elpy-doc
C-c C-t runs elpy-test, which runs the unittest discover

To enable linter python in emacs, use pylint. It will use pylint executable. And it also needs the configure file. Generate it:

pylint --generate-rcfile > ~/.pylintrc

1.2 pip mirror

ustc mirror: http://pypi.mirrors.ustc.edu.cn/simple/

to use one-time, simply:

pip3 install xxx -i https://pypi.mirrors.ustc.edu.cn/simple/

# don't need this when using https
# --trusted-host pypi.mirrors.ustc.edu.cn

global configuration seems to be:

pip3 config set global.index-url https://mirrors.ustc.edu.cn/pypi/web/simple

In China, pytorch cannot be installed due to 1. large 2. cpu only has a specific url. Thus I'm using conda mirror https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/

2 Language

2.1 data type

type(obj): get the type of obj

Numerical functions:

abs(x): absolute value
divmod(a,b): a pair (a // b, a % b)
max(arg1, arg2, *args)
min(arg1, arg2, *args)
pow(x,y): x^y
round(x, ndigits=0)
sum(iterable)

Boolean:

all(iterable): true if all items are true. empty => True
any(iterable): true if any item is true. empty => False
cmp(x,y)
- x<y => negative
- x=y => 0
- x>y => positive

2.1.1 conversion

chr(): ASCII to char
ord(): char to ASCII
float(x)
long(x)
bool(x): convert x to bool
int(): string to integer
str(): integer to string
- hex(x): convert integer to lowercase hex string prefix with '0x'
- oct(x): integer to octal string
- bin(x): an integer to binary string

2.2 Scoping

There're four levels:

current scope
parent scope
module scope (global)
built-in scope

nonlocal keyword specify this variable should be referenced to the parent scope. But, this will not reach global. Instead, the global keyword declares the listed variables to be in the module level scope.

The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope excluding globals.

As an example:

var = 0 # global

def outer():
  var = 1 # parent
  def inner():
    nonlocal var
    var = 2 # local
    global var
    var =3
  inner()
  # var = 2

outer()
# global var = 3

2.3 Conditional

If else or:

var = d.get('key') or 0
# is equal to:
var = d.get('key') if d.get('key') else 0

2.4 Loop

len(s): length
next(iterator)
range(stop): [0,stop)
range(start, stop, step=1)

2.5 Function

2.5.1 Function def and call

The default value of an argument is evaluated once at the function definition. Thus, the object is shared for all the invoke of the function. This is typically not desired behavior.

def foo(a=[]):
    a.append(3)
    return a
foo()
foo()
# => [3,3] !!!

Python function pass-by-object. If you pass a list, you can modify the list, and the original list is modified.

a = [1,2]
def foo(x):
    x.append(3)
foo(a)
a # => [1,2,3]

2.5.2 Lambda

lambda x : x+2
lambda x: x%2==0

The usage of lambda is often in map and filter.

map(lambda_exp, mylist) will execute the lambda expression on each element of the list, and return a list containing the results.

2.5.3 variadic parameter

use *args syntax, and args will be a tuple:

  def foo(*args):
    for a in args:
      print a

use **args to capture all keyword arguments.

def bar(**kwargs):
  for a in kwargs:
    print a, kwargs[a]

Combine them together:

def foobar(kind, *args, **kwargs):
  pass

Also, there's a concept for the reverse thing: unpack argument list from a list, with *list:

def foo(a,b):
  pass

l = [1,2]
foo(*l)

on python3, this syntax can appear on left side

first, *rest = [1,2,3,4]
first,*l,last = [1,2,3,4]

2.6 Meta Programming

Basically eval (return value) and exec (no return value), with either string or code object created by compile. They can use the names bound by current namespace.

eval("1+2")
a=2
eval("1+a")
def foo(a):
    return a+3
eval("foo(a)")
# no return
exec("foo(a)")
eval(compile("1+a", '', 'eval'))

2.7 Exception

To give a quick feel:

try:
  pass
except TypeError as e: # capture the exception into a variable
  pass
except AnotherError: # does not capture
  pass
except: # all exception
  pass
else: # if doesn't raise an exception
  pass
finally:
  pass

2.7.1 Built-in exceptions

BaseException
 +-- SystemExit
 +-- KeyboardInterrupt
 +-- GeneratorExit
 +-- Exception
      +-- StopIteration
      +-- StandardError
      |    +-- BufferError
      |    +-- ArithmeticError
      |    |    +-- FloatingPointError
      |    |    +-- OverflowError
      |    |    +-- ZeroDivisionError
      |    +-- AssertionError
      |    +-- AttributeError
      |    +-- EnvironmentError
      |    |    +-- IOError
      |    |    +-- OSError
      |    |         +-- WindowsError (Windows)
      |    |         +-- VMSError (VMS)
      |    +-- EOFError
      |    +-- ImportError
      |    +-- LookupError
      |    |    +-- IndexError
      |    |    +-- KeyError
      |    +-- MemoryError
      |    +-- NameError
      |    |    +-- UnboundLocalError
      |    +-- ReferenceError
      |    +-- RuntimeError
      |    |    +-- NotImplementedError
      |    +-- SyntaxError
      |    |    +-- IndentationError
      |    |         +-- TabError
      |    +-- SystemError
      |    +-- TypeError
      |    +-- ValueError
      |         +-- UnicodeError
      |              +-- UnicodeDecodeError
      |              +-- UnicodeEncodeError
      |              +-- UnicodeTranslateError
      +-- Warning
           +-- DeprecationWarning
           +-- PendingDeprecationWarning
           +-- RuntimeWarning
           +-- SyntaxWarning
           +-- UserWarning
           +-- FutureWarning
	   +-- ImportWarning
	   +-- UnicodeWarning
	   +-- BytesWarning

2.8 Module

Exposing API: the following only expose foo but not bar.

__all__ = ['foo']
def foo():
  pass
def bar():
  pass

2.8.1 importing

The local structure directory must contain the __init__.py file to be able to import.

|-- main.py
|-- mypackage
    |-- __init__.py
    |-- a.py
    |-- b.py
    |-- subdir
        |-- __init__.py
        |-- c.py

The import statements should be:

from mypackage import a
from mypackage.b import foo as myfoo
from mypackage.subdir import c

export PYTHONPATH="$PYTHONPATH:/home/hebi/github/reading/models"

Add some path so that I can import from there:

sys.path.append('/home/hebi/github/reading/InferSent/')
# assume in root of that directory, models.py defines InferSent class
from models import InferSent

Packaging:

setup.py:

from setuptools import setup, find_packages
setup(
    name="InferSent-Mirror",
    version="0.1",
    # packages=find_packages(),
    packages=['p1', 'p2'],
)

Directory structure:

mypackage/
  p1/
    __init__.py
    xxx.py
  p2/
    __init__.py
    yyy.py

Install locally:

python3 setup.py install --user

Install from git repo:

pip install --user git+https://github.com/lihebi/InferSent

Import:

from p1 import xxx
from p2.yyy import foo

3 Collections

3.1 List

3.1.1 TODO tuple

3.1.2 TODO sorted

sort a dictionary by value:

sorted(dict1, key=dict1.get) # => list
sorted(dict1, key=dict1.get, reverse=True)

3.1.3 Slicing

The slicing syntax is l[start:end:step]. The slicing will return a new list. Change to that list will not change the original one.

l[4]
l[4:]
l[::2]
l[:-1]

However, assign to the slicing itself will change the original one:

l[1:2] = [4,5,6]

Also, assign to a new variable only assign the reference:

a = [1,2,3]
b = a # only a reference

3.1.4 create a list

range(stop)
range(start, stop[, step])

Creating a matrix:

newmat=[[-1 for x in range(height)] for y in range(width)]

list comprehension

even_squares = [x**2 for x in l if x%2 == 0]

3.1.5 Modify a list

list.append
list.pop

3.1.6 List object model

Lists are mutable. The behavior of slicing is a bit confusing. If the slicing is used directly as the target of an assignment statement, it will modify the object in place. E.g.

a = [1,2,3,4]
a[1:3] = []
a # => [1,4]

That also means all other references to a will be modified:

a = [1,2,3,4]
a[1:3] = []
# although tuple is immutable, it can still contain reference to
# mutable objects.
c=(a,)
# this will also modify a
a.append(5)
c # => ([1,4,5])

However, if the slicing is assigned to another variable (either assignment or pass-by-object function call), it is copied. Modifying this copy will not affect the original list.

a = [1,2,3,4]
b = a[1:3]
b[0] = 9
a # => [1,2,3,4]
def foo(x):
    x[1] = 8

# changing b
foo(b)
b # => [9,8]
a # => [1,2,3,4]

If you convert a list to a tuple, the elements are shallow-copied.

a = [1,2,3]
b = [a]
# this is shallow copied. Still contains reference to the object "a"
c = tuple(b)
# no reference anymore, just a tuple of (1,2,3). Will never change
# whatsoever.
d = tuple(a)

# testing:
a[2] = 8
b # => [[1,2,8]]
c # => [[1,2,8]]
d # => [1,2,3]

String is immutable sequence, thus cannot be assigned. Thus it is fairly safe to use string.

3.2 String

3.2.1 Concatenation

concatenate two strings directly by +.
need to convert integer to string before concatenate: s + str(35)
"".join(lst) works

3.2.2 split

str.split(sep=None): default by white space
str.strip(): strip out white space at both begin and end
str.replace(old, new): replace all.
str.startswith(s)
str.endswith(s)

3.2.3 Slicing

String is an immutable object. It can use slicing. E.g. reversing a string is as easy as "hello"[::-1]!

However, notice that when using a negative step, the slicing should be lst[end:begin:-1]. This is because x = i + n*k:

with a third “step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.

Also, the negative step does not always work as expect. E.g. the i index is included and j is not; the j can not be negative, then how can I include the first one in the list??

Thus if want to get a reverse of a sub-string, I would get sub-string first and then reverse it.

3.3 Dictionary

Create:

x = {'a': 1, 'b': 2}

Dictionary is not sorted. Use collections.OrderedDict if you want this feature. Basically it remember the order when the elements are inserted.

import collections
od = collections.OrderedDict(sorted(d.items()))

Merge two dictionary (x and y):

z = x.copy()
z.update(y)

3.3.1 Set

s = set()
s.add(x)
if x in s:
  pass

4 Standard Library

4.1 Operating System

4.1.1 Env

os.environ['HOME']
os.getenv(name)
os.putenv(name, value)
os.unsetenv(name)

4.1.2 Shell command

os.system: simply run command

os.system("some command")

os.popen: access to input output

stream = os.popen("some command")
stream.read()

subprocess.Popen

p = subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE)
p.stdout.read()
s = subprocess.check_output('wc -l', stdin=p.stdout)

subprocess.call: this is the same as subprocess.Popen except that it waits and gives return code.

return_code = subprocess.call("echo Hello World", shell=True, stdout=subprocess.DEVNULL)

4.1.3 Process

os.abort()
os.execl(path, arg0, arg1, …)
os.execle(path, arg0, arg1, …, env)
os.execlp(file, arg0, arg1, …)
os.execlpe(file, arg0, arg1, …, env)
os.execv(path, args)
os.execve(path, args, env)
os.execvp(file, args)
os.execvpe(file, args, env)
os.folk
os.wait()

os.system(cmd): run cmd, return exit code
os.times(): 5-tuple
- user time
- system time
- childrens user time
- childrens system time
- elapsed real time

4.2 IO

4.2.1 File IO

Reading:

read()
readline(size=1)
readlines()

Seeking:

seek(offset=0)
- 0 start
- 1 current
- 2 end
tell(): current position

Writing:

write(s): finally the string!
writelines(lines): write a list of lines
flush()

  f = open('text.txt')
  f.read() # return all content

  f = open('text.txt')
  for line in f:
      print(line)

  with open('a.txt') as f:
      for line in f:
          print(line)

Other IO:

f = io.StringIO("some string"): in memory text stream
f = io.BytesIO(b"some binary data \x00\x01")

4.2.2 Printing

pprint.pprint(object, stream=None): pretty print
'string {0}, {hello}'.format('yes', hello=2)

print('xxx', end='')

read from stdin:

for line in sys.stdin:
  print(line)

4.2.3 redirect stdout

from contextlib import redirect_stdout
with open('xxx.txt', 'w') as f:
    redirect_stdout(f)

Or:

sys.stdout = f

The file handle can be:

f = open(os.devnull, 'w')

It can also be a predefined handle, like sys.stderr:

with redirect_stdout(sys.stderr):
    help(dir)

4.3 File System

4.3.1 os.walk

import os
for root,dirs,files in os.walk('.'):
  for f in files:
    print f

os.path.abspath('relative/path/to/file')
os.path.exists("/path/to/file")
os.rename('old', 'new')
os.path.isfile

4.3.2 FS Operations

os.getcwd(): current working directory
os.chdir(path): change cwd
os.mkdir(path)
os.listdir(path='.'): list all in this dir. E.g. for item in os.listdir('/path'): print (item)
os.makedirs(path): GOOD this is the way to go the make directories
os.remove(path): remove a file
os.rmdir(): remove an empty dir.
os.removedirs(path): foo/bar/aaa will try to remove aaa, than bar, then foo. Don't use! To recursively remove all contents, use shutil.rmtree
os.rename(src, dst)
os.renames(old, new)
os.rmdir(path): only work if dir is empty
os.tempnam(): a reasonable absolute name for creating temporary file
- seems to be vulnerable
os.walk(top, topdown=True): for each directory including top itself, it yields 3-tuple (dirpath, dirnames, filenames). E.g. for root,dirs,files in os.walk('/path'): for f in files: print (f);

4.3.3 shutil

copy(src,dst)
copytree(src, dst): recursive
rmtree(path): rm -r
move(src, dst)

popen family is deprecated. Use subprocess.

4.3.4 os.path

If parameter is not listed, it means a single path.

exists: GOOD. check whether a path exists
split: return a pair (head, tail). tail is the last component, without slash. If path ends with slash, tail is empty
- basename: the tail of the split output
- dirname: head of split output
normpath: collapse redundant separators and up level references
abspath: from relative to absolute path. normpath(join(os.getcwd(), path))
commonprefix(list): return the longest path prefix
expanduser: replace the initial component of ~ by the users directory.
getsize: in bytes
isabs: predicate for absolute
isfile:
isdir
islink
join(path, *paths): join intelligently
realpath: canonical path by following symbolic links

4.3.5 pathlib

Object-oriented filesystem paths. https://docs.python.org/3/library/pathlib.html

pathlib.Path is the class. pathlib.PosixPath is a subclass for non-windows paths, but seems just for implementation purpose, makes no contribution for user.

Actually not very interesting, this table tells everything:

os and os.path	pathlib
os.path.abspath()	Path.resolve()
os.chmod()	Path.chmod()
os.mkdir()	Path.mkdir()
os.rename()	Path.rename()
os.replace()	Path.replace()
os.rmdir()	Path.rmdir()
os.remove() , os.unlink()	Path.unlink()
os.getcwd()	Path.cwd()
os.path.exists()	Path.exists()
os.path.expanduser()	Path.expanduser() and Path.home()
os.path.isdir()	Path.is_dir()
os.path.isfile()	Path.is_file()
os.path.islink()	Path.is_symlink()
os.stat()	Path.stat(), Path.owner(), Path.group()
os.path.isabs()	PurePath.is_absolute()
os.path.join()	PurePath.joinpath()
os.path.basename()	PurePath.name
os.path.dirname()	PurePath.parent
os.path.samefile()	Path.samefile()
os.path.splitext()	PurePath.suffix

Some interesting APIs that don't have counterparts:

Path.glob(pattern) that returns a list of all files matching the shell pattern, e.g. p.glob('*/*.py')
slash operator: you can directly use p / 'foo' / 'bar'
Path.iterdir() gives a list of directory items
Path.parts gives a list of string

4.3.6 TODO tempfile

mkstemp creates temp file, but this file is opened. The return value is the file descriptor (int) of the opened file, the same as that gets returned by os.open, thus not easy to work with
mkdtemp creates temp dir. I would just use this when creating temporary files.

folder = tempfile.mkdtemp()
fd, fname = tempfile.mkstemp()

4.4 unittest

class MyTest(unittest.TestCase):
    def test_me(self):
        self.assertEqual(1,2)
unittest.main()

python unit test can support automatic test discovery. To use that, the file must be named test_xxx.py, and run the python -m unittest discover.

4.5 time

Create time object:

time.sleep(secs)
time.time(): time in seconds since epoch
gmtime(): in seconds, from epoch
localtime(): convert gmtime() to local
clock(): processor time as floating number in seconds

The returned time object is class time.struct_time: returned by gmtime(), localtime() and strptime(). Time to format string:

strptime(string[, format]): parse a string into time object
- format default: "%a %b %d %H:%M:%S %Y"
- time.strptime("30 Nov 00", "%d %b %y")
strftime(format[, t]): convert from time object to string
- %a/A: abbr/full weekday name
- %b/B: abbr/full month name
- %Y: year
- %m: month [01,12]
- %d: day of the month [01,31]
- %H: 24-hour [00,23]
- %I: 12-hour [01,12]
- %p: AM or PM
- %M: Minute [00,59]
- %S: second [00,61]

4.5.1 datetime

date has year, month, day
time has hour, minute, second
datetime has both

import datetime

t1 = datetime.date.fromisoformat('2019-12-04')
t2 = datetime.date.fromisoformat('2018-11-24')

delta = t1 - t2
delta.days

t3 = datetime.date.today()
t4 = datetime.date.date(2019, 12, 20)

t0 = datetime.date.fromtimestamp(time.time())

4.6 csv

import csv
with open('some.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

import csv
with open('some.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(row)
    writer.writerows(someiterable)

4.7 Json

import json
json.dumps({"C": 0, "D": 1})
json.loads("a string of json")

json.dump(obj, fp, indent=2)
json.load(fp)

4.8 argparse

import argparse
parser = argparse.ArgumentParser(descripton='Description here')

parser.add_argument('-q', '--query', help='query github api', require=True)
parser.add_argument('-d', '--download', help='do download', action='store_true')

args = parser.parse_args()

The most interesting method is of course the add_argument. It accepts the name, either a single string, bar, indicating positional argument, or a string starting with -, indicating optional arguments. You can supply parser.add_argument(-f, --foo) for short and full argument. The value is stored as an attribute with the same name (i.e. bar, foo) of the result, but you can change it to anther name via dest argument.

An action defines what to do with the argument. It is a string (!!!). The default is 'store', meaning store the supplied value to the result. If you don't need the value, but just want to know if the option is supplied, use store_true or store_false, which differ only in default value. The action append will collect each occurrence of the argument into a list.

By default, each option consume one argument. You can change this by the argument nargs. If it is an integer, it means how many should be consumed. The result will be a list, thus in case of 1, it is still different from default. It can be a string '*', '+', '?', which conforms to the regular expression meaning of them. * and + produce a list, + will get give error when no arguments are provided, ? will use default if missing.

In case of missing value, the default argument can be used to supply the default value. Otherwise, it is none. You can also use required argument to make sure user supplies something. A value is by default a string, you can convert it to anther data type by the type option, accepting a data type, e.g. int. You might also want to restrict the choices of the argument, so choices is a list of allowed values.

Finally, help option can be used to provide help string, and it can be printed out using parser.print_help(). To test the parser, you can use parser.parse_args(['-f', '1', 'bar']).

4.9 Regular Expression

construction

import re
pattern = re.compile('\d+.*$')

match

s = 'this is a test string'
pattern.match(s) # return True or False

pattern.findall(s)

shorthand

m = re.match("[pattern]", "string")
m.group()
m = re.search("[pattern]", "string")
m.group()
re.search("pattern", "string", re.IGNORECASE)
m = re.findall("[pattern]", "string")

4.10 Concurrent programming

4.10.1 threading

from threading import Thread

class MyThread(Thread):
  def __init__(self, arg):
    Thread.__init__(self)
    self.arg = arg
  def run(self):
    pass

t = MyThread(arg)
t.start()

The package name is threading, the object is Thread.

Functions

threading.active_count(): number of Thread object
threading.current_thread(): current Thread object
threading.enumerate(): return a list of all Thread objects
threading.meain(): the main Thread object
threading.local(): the instance of local storage. Different for different threads. Typical usage: mydata = threading.local()

Two ways to specify what to run:

pass a callable object to the target argument when constructing Thread
define a subclass of Thread and override the run method.

Methods:

start: start the thread. It will call run method in a separate thread. The thread terminate when run terminate
join(timeout=None): the calling thread will block until this thread terminate
- timeout should be float in seconds
is_alive: test whether the thread terminate

4.10.2 Thread Sync

class threading.Lock

acquire()
release()

class threading.RLock

this is recursive lock. The same thread can acquire the lock multiple times. They will be nested and only when the last release is called, the lock can be acquired by another thead
acquire()
release()

class threading.Condition(lock=None)

the lock must be a Lock or RLock. If none, a RLock is created
acquire()
release()
wait(timeout=None): wait until notified
- release underlying lock
- block until notify
- re-acquire the lock and return
- typical usage: while not item_is_available(): cv.wait()
- often use with statement: =with cv: cv.wait_for(pred); get();
wait_for(predicate, timeout=None)
- this is same as while not predicate(): cv.wait(), thus more convenient than wait
notify(n=1): notify one thread
notify_all(): notify all threads waiting on this condition

class threading.Semaphore: this class manage resources with limited capacity.

acquire(): decrease capacity
release(): increase capacity

class threading.Event

is_set():
set(): set flag to true
clear(): set flag to false
wait(timeout=None): block until internal flag is true

class threading.Timer(interval, function) : Thread

interval is float in seconds, function is callable. use start method to start the thread, and the function will be called after the delay.
cancel(): stop the timer and cancel the execution. Only work if the the timer is still waiting.

class threading.Barrier(parties, action=None, timeout=None)

parties is integer. Every thread calling wait will block, until parties number of such call is called. Then all players unblock and do things simultaneously.
wait(timeout=None)
reset(): reset the barrier. The thread waiting for it will receive BrokenBarrierError
abort(): all current and future wait call for it will get BrokenBarrierError
parties: number of parties
n_waiting: number of current waiting
broken: True or False

4.10.2.1 Using with statement

Lock, RLock, Condition, Semaphore can be used.

with somelock:
  # do somthing

is equivalent to:

somelock.acquire()
try:
  # do something
finally:
  somelock.release()

4.10.3 multiprocessing

This provide multiprocessing.Process class, having similar API with Thread. It seems to use fork but don't have explicit exec on the document?? Wired and seems just do something thread can do (except the sharing of memory of course).

4.10.4 subprocess

subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, shell=False, timeout=None, check=False)
- run the command and wait for it to complete. Return a CompleteProcess instance.
- if check is True, raise CalledProcessError exception if return code non-zero. This replace the check_call and check_output.

class subprocess.CompletedProcess

args
returncode
stdout: captured if PIPE is passed to stdout
stderr: captured if PIPE is passed to stderr
check_returncode(): if returncode is non-zero, raise CalledProcessError

Variables:

subprocess.DEVNULL
subprocess.PIPE
subprocess.STDOUT: this is only used in the place of stderr to redirect it to stdout

class subprocess.CalledProcessError

returncode
cmd
output: same as stdout
stdout
stderr

The followings are from 2.7, now only use run.

subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
- args: a list of argument, including arg0
- it can also be a string due to that *
- it will wait, then return returncode
- do not use stdout=PIPE, use communicate() instead TODO
- use shell=True is bad, but it can give me
  - shell pipes
  - filename wildcard
  - env variable expansion
  - ~ expansion
check_call(args, *, …): same as call, except it will raise exception if return non-0
check_output(args, *, stdin=None, stderr=None, shell=False, universal_newlines=False)
- if return non-0, raise exception. Otherwise return the stdout

Popen object

Popen constructor
- args, bufsize=0, executable=None,
- stdin=None, stdout=None, stderr=None,
- preexec_fn=None, close_fds=False,
- shell=False, cwd=None, env=None,
- universal_newlines=False, startupinfo=None, creationflags=0
Popen.poll(): check if child process has terminated. Set and return returncode.
Popen.wait(): wait for process to terminate. Don't use PIPE with this.
Popen.communicate(input=None): to use this, the corresponding stdin, stdout, stderr should be set to PIPE.
- send data to stdin (string)
- read data from stdout and stderr (it returns a tuple (out, err))
- wait for termination
Popen.sned_signal(signal)
Popen.terminate(): send SIGTERM
Popen.kill(): send SIGKILL
Popen.pid
Popen.returncode
- set by poll and wait (and indirectly by communicate)
- None indicate hasn't terminated
- -N means terminated by signal N

5 Third party libraries

5.1 urllib

from urllib import request
import json

url = 'https://api.github.com'
api = '/search/repositories'
query = 'language:C&stars:>10&per_page='+size
response = request.urlopen(url+api+"?q="+query)

s = response.read().decode('utf8')
j = json.loads(s)
# j will be a mix of list and dict

5.1.1 urllib.request

package urllib.request

Functions

urlopen(url, data=None)
- url can be a string or Request object
- for http and https, returns a http.client.HTTPResponse object
- for FTP, file, data urls, return a urllib.response.addinfourl object
pathname2url(path): do quoting
url2pathname(path): do unquoting

class Request

constructor: (url, data=None, headers={}, method=None)
- url: a string
- headers: a dictionary.
- method: a string. 'GET' is default. Available values: 'HEAD', 'POST'

methods:

get_method()
add_header(key, val)
has_header(key)
get_header(key)
remove_header(key)
get_full_url()
header_items(): return a list of tuples (key, value)

  req = request.Request(query)
  req.add_header("Authorization", "token " + token)
  response = request.urlopen(req)
  s = response.read().decode('utf8')
  langj = json.loads(s);
  # deprecated
  urllib.request.urlretrieve(url[, filename])

5.1.2 urllib.parse

quote(string)
quote_plus(string)
unquote(string)
unquote_plus(string)
urlencode(query)

5.2 XML

import xml.etree.ElementTree as ET
root = ET.fromstring(s)
# XPath
nodes = root.findall('{http://www.sdml.info/srcML/src}function')
for node in nodes:
  # do with node
  pass

APIs

node.find(XPath)
node.findall(XPath)
node.get(Attribute)
node.text

5.3 Requests

http://docs.python-requests.org/en/master/

5.4 BeautifulSoup

The package is called BeautifulSoup4.

The preface to use the package:

from bs4 import BeautifulSoup
BeautifulSoup('<html>string</html>')
with open('a.html') as fp:
    BeautifulSoup(fp)

Each node can be used as a data structure, with the following fields:

name: the tag name
string: the (first?) string directly embedded inside the node
strings: a list of the strings
a-tag: the first child that is of that tag
attrs: a list of all attribute names
children: going downwards
descendants: intuitive
parent
parents: wow, this should be called ancestor?
next_sibling, previous_sibling

It can also be used as a dictionary of its attributes, e.g. s['href']. This should be a string. It is equivalent to using the get method with the class name.

Several methods are of particular interests.

get_text(): return all text in the node

You can also execute a query on it. In general, find_all returns a list, while find returns the first one. There are also some methods in this family, namely find_next_siblings, find_parents. E.g.

s.find_all('a'): return a list of all 'a' tag nodes

Or it can be a query respecting css id and classes. Although find has some support for id and class, the select is easier to use.

s.select("body a"): non-direct
s.select("p > a"): direct
s.select(p.c#id): class and id
s.select(p > #id): mix
s.select(a[href^=xxx]): filtering based on attribute values

5.5 click http://click.pocoo.org/5/

5.6 pandas

Looks like it is a dataframe library

5.7 numpy

C-implementation of multi-dimensional arrays

5.8 scipy

scitific computing algorithms, including:

linaer algebra
optimization
interpolation
integration and differential equation
clustering algorithms
statistical distributions

5.9 scikit-learn

Learning library.

Supervised learning:

linear models
SVM
Gaussian Processes
Naive Bayes
Decision Trees
KNN

Unsupervised learning:

Gaussian Mixture Models
Manifold learning
clustering
- k-means

5.10 matplotlib

import matplotlib.pyplot as plt

Reference

5.10.1 type of figures

plt.bar
plt.scatter
plt.plot: line plot
plt.hist
plt.pie

plt.plot([1,2,3,4])

Image via plt.imshow():

# plot a mnist digit
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# since the data is just an array (28,28), imshow must have converted
# it to image pixel properly
plt.imshow(x_train[7777], cmap='Greys')
# must call plt.show() to open the figure window. Or, execute
# %matplotlib in the REPL, you can get the image directly after
# imshow().
plt.show()

5.10.2 TODO plot options

5.10.3 legends, axis, more settings

Texts:

plt.xlabel()
plt.ylabel()
plt.title()
plt.axis()
plt.text()
plt.annotate
plt.grid(True)
plt.table(): attach a table to an axis!

Scale:

plt.xscale('linear')
plt.yscale('log')

5.10.4 Subplots

plt.ioff()
figure = plt.figure()
figure.canvas.set_window_title('My Grid Visualization')
for x in range(height):
    for y in range(width):
        # print(x,y)
        figure.add_subplot(height, width, x*width + y + 1)
        plt.axis('off')
        plt.imshow(convert_image_255(images[x*width+y]), cmap='gray')
# plt.show()
plt.savefig(filename)

Or better, create figure and axis, and plot for each axis:

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(19680801)
data = np.random.randn(2, 100)

fig, axs = plt.subplots(2, 2, figsize=(5, 5))
axs[0, 0].hist(data[0])
axs[1, 0].scatter(data[0], data[1])
axs[0, 1].plot(data[0], data[1])
axs[1, 1].hist2d(data[0], data[1])

plt.show()

5.10.5 export to files

Visualize using OS GUI toolkit:

plt.show()

Plot to a file:

pylab.ioff()
plot([1, 2, 3])
savefig("/tmp/test.png")

5.11 imsave

imsave is deprecated, change from

from scipy.misc import imsave

from imageio import imwrite as imsave

5.12 Nvidia GPU setting

Select visible GPU in a multi-GPU setting:

os.environ['CUDA_VISIBLE_DEVICES'] = '3'

CUDA setup

Install Nvidia driver. This can be done using Ubuntu's software center. But this is the stable version, not newest
Install cuda. To /usr/local/cuda-10.0. I use the "runfile", with the --override option (otherwise throw gcc version not supported error).
Install cudnn by copying header files and library files into /usr/local/cuda-10.0
Configure

CUDA_PATH=/usr/local/cuda-10.0
export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$LD_LIBRARY_PATH"
export PATH="$CUDA_PATH/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_PATH/extras/CUPTI/lib64"