UP | HOME

Python, the language

Table of Contents

Pseudocode which runs. – Peter Norvig (?)

The best program to do a job is one which already ships the solution.

There should be one – and preferably only one – obvious way to do it.

– Aphorism 13 in the Zen of Python by Tim Peters:

Python is nice, sure. But only until it stats warping your mind in the very late of the game. See https://www.draketo.de/proj/py2guile/ for an insightful reference.

When you start thinking about using code-templates in your editor to comply with the requirements of your language, then it is likely that something is wrong with the language.

1 Tools & Reference

1.1 Emacs support

Install the elpy package. It provides:

  • C-c C-c runs the shell and send the current buffer
  • C-c C-d runs elpy-doc
  • C-c C-t runs elpy-test, which runs the unittest discover

To enable linter python in emacs, use pylint. It will use pylint executable. And it also needs the configure file. Generate it:

pylint --generate-rcfile > ~/.pylintrc

1.2 pip mirror

to use one-time, simply:

pip3 install xxx -i https://pypi.mirrors.ustc.edu.cn/simple/

# don't need this when using https
# --trusted-host pypi.mirrors.ustc.edu.cn

global configuration seems to be:

pip3 config set global.index-url https://mirrors.ustc.edu.cn/pypi/web/simple

In China, pytorch cannot be installed due to 1. large 2. cpu only has a specific url. Thus I'm using conda mirror https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/

2 Language

2.1 data type

  • type(obj): get the type of obj

Numerical functions:

  • abs(x): absolute value
  • divmod(a,b): a pair (a // b, a % b)
  • max(arg1, arg2, *args)
  • min(arg1, arg2, *args)
  • pow(x,y): xy
  • round(x, ndigits=0)
  • sum(iterable)

Boolean:

  • all(iterable): true if all items are true. empty => True
  • any(iterable): true if any item is true. empty => False
  • cmp(x,y)
    • x<y => negative
    • x=y => 0
    • x>y => positive

2.1.1 conversion

  • chr(): ASCII to char
  • ord(): char to ASCII
  • float(x)
  • long(x)
  • bool(x): convert x to bool
  • int(): string to integer
  • str(): integer to string
    • hex(x): convert integer to lowercase hex string prefix with '0x'
    • oct(x): integer to octal string
    • bin(x): an integer to binary string

2.2 Scoping

There're four levels:

  • current scope
  • parent scope
  • module scope (global)
  • built-in scope

nonlocal keyword specify this variable should be referenced to the parent scope. But, this will not reach global. Instead, the global keyword declares the listed variables to be in the module level scope.

The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope excluding globals.

As an example:

var = 0 # global

def outer():
  var = 1 # parent
  def inner():
    nonlocal var
    var = 2 # local
    global var
    var =3
  inner()
  # var = 2

outer()
# global var = 3

2.3 Conditional

If else or:

var = d.get('key') or 0
# is equal to:
var = d.get('key') if d.get('key') else 0

2.4 Loop

  • len(s): length
  • next(iterator)
  • range(stop): [0,stop)
  • range(start, stop, step=1)

2.5 Function

2.5.1 Function def and call

The default value of an argument is evaluated once at the function definition. Thus, the object is shared for all the invoke of the function. This is typically not desired behavior.

def foo(a=[]):
    a.append(3)
    return a
foo()
foo()
# => [3,3] !!!

Python function pass-by-object. If you pass a list, you can modify the list, and the original list is modified.

a = [1,2]
def foo(x):
    x.append(3)
foo(a)
a # => [1,2,3]

2.5.2 Lambda

lambda x : x+2
lambda x: x%2==0

The usage of lambda is often in map and filter.

  • map(lambda_exp, mylist) will execute the lambda expression on each element of the list, and return a list containing the results.

2.5.3 variadic parameter

use *args syntax, and args will be a tuple:

  def foo(*args):
    for a in args:
      print a

use **args to capture all keyword arguments.

def bar(**kwargs):
  for a in kwargs:
    print a, kwargs[a]

Combine them together:

def foobar(kind, *args, **kwargs):
  pass

Also, there's a concept for the reverse thing: unpack argument list from a list, with *list:

def foo(a,b):
  pass

l = [1,2]
foo(*l)

on python3, this syntax can appear on left side

first, *rest = [1,2,3,4]
first,*l,last = [1,2,3,4]

2.6 Meta Programming

Basically eval (return value) and exec (no return value), with either string or code object created by compile. They can use the names bound by current namespace.

eval("1+2")
a=2
eval("1+a")
def foo(a):
    return a+3
eval("foo(a)")
# no return
exec("foo(a)")
eval(compile("1+a", '', 'eval'))

2.7 Exception

To give a quick feel:

try:
  pass
except TypeError as e: # capture the exception into a variable
  pass
except AnotherError: # does not capture
  pass
except: # all exception
  pass
else: # if doesn't raise an exception
  pass
finally:
  pass

2.7.1 Built-in exceptions

BaseException
 +-- SystemExit
 +-- KeyboardInterrupt
 +-- GeneratorExit
 +-- Exception
      +-- StopIteration
      +-- StandardError
      |    +-- BufferError
      |    +-- ArithmeticError
      |    |    +-- FloatingPointError
      |    |    +-- OverflowError
      |    |    +-- ZeroDivisionError
      |    +-- AssertionError
      |    +-- AttributeError
      |    +-- EnvironmentError
      |    |    +-- IOError
      |    |    +-- OSError
      |    |         +-- WindowsError (Windows)
      |    |         +-- VMSError (VMS)
      |    +-- EOFError
      |    +-- ImportError
      |    +-- LookupError
      |    |    +-- IndexError
      |    |    +-- KeyError
      |    +-- MemoryError
      |    +-- NameError
      |    |    +-- UnboundLocalError
      |    +-- ReferenceError
      |    +-- RuntimeError
      |    |    +-- NotImplementedError
      |    +-- SyntaxError
      |    |    +-- IndentationError
      |    |         +-- TabError
      |    +-- SystemError
      |    +-- TypeError
      |    +-- ValueError
      |         +-- UnicodeError
      |              +-- UnicodeDecodeError
      |              +-- UnicodeEncodeError
      |              +-- UnicodeTranslateError
      +-- Warning
           +-- DeprecationWarning
           +-- PendingDeprecationWarning
           +-- RuntimeWarning
           +-- SyntaxWarning
           +-- UserWarning
           +-- FutureWarning
	   +-- ImportWarning
	   +-- UnicodeWarning
	   +-- BytesWarning

2.8 Module

Exposing API: the following only expose foo but not bar.

__all__ = ['foo']
def foo():
  pass
def bar():
  pass

2.8.1 importing

The local structure directory must contain the __init__.py file to be able to import.

|-- main.py
|-- mypackage
    |-- __init__.py
    |-- a.py
    |-- b.py
    |-- subdir
        |-- __init__.py
        |-- c.py

The import statements should be:

from mypackage import a
from mypackage.b import foo as myfoo
from mypackage.subdir import c
export PYTHONPATH="$PYTHONPATH:/home/hebi/github/reading/models"

Add some path so that I can import from there:

sys.path.append('/home/hebi/github/reading/InferSent/')
# assume in root of that directory, models.py defines InferSent class
from models import InferSent

Packaging:

setup.py:

from setuptools import setup, find_packages
setup(
    name="InferSent-Mirror",
    version="0.1",
    # packages=find_packages(),
    packages=['p1', 'p2'],
)

Directory structure:

mypackage/
  p1/
    __init__.py
    xxx.py
  p2/
    __init__.py
    yyy.py

Install locally:

python3 setup.py install --user

Install from git repo:

pip install --user git+https://github.com/lihebi/InferSent

Import:

from p1 import xxx
from p2.yyy import foo

3 Collections

3.1 List

3.1.1 TODO tuple

3.1.2 TODO sorted

sort a dictionary by value:

sorted(dict1, key=dict1.get) # => list
sorted(dict1, key=dict1.get, reverse=True)

3.1.3 Slicing

The slicing syntax is l[start:end:step]. The slicing will return a new list. Change to that list will not change the original one.

l[4]
l[4:]
l[::2]
l[:-1]

However, assign to the slicing itself will change the original one:

l[1:2] = [4,5,6]

Also, assign to a new variable only assign the reference:

a = [1,2,3]
b = a # only a reference

3.1.4 create a list

  • range(stop)
  • range(start, stop[, step])

Creating a matrix:

newmat=[[-1 for x in range(height)] for y in range(width)]

list comprehension

even_squares = [x**2 for x in l if x%2 == 0]

3.1.5 Modify a list

  • list.append
  • list.pop

3.1.6 List object model

Lists are mutable. The behavior of slicing is a bit confusing. If the slicing is used directly as the target of an assignment statement, it will modify the object in place. E.g.

a = [1,2,3,4]
a[1:3] = []
a # => [1,4]

That also means all other references to a will be modified:

a = [1,2,3,4]
a[1:3] = []
# although tuple is immutable, it can still contain reference to
# mutable objects.
c=(a,)
# this will also modify a
a.append(5)
c # => ([1,4,5])

However, if the slicing is assigned to another variable (either assignment or pass-by-object function call), it is copied. Modifying this copy will not affect the original list.

a = [1,2,3,4]
b = a[1:3]
b[0] = 9
a # => [1,2,3,4]
def foo(x):
    x[1] = 8

# changing b
foo(b)
b # => [9,8]
a # => [1,2,3,4]

If you convert a list to a tuple, the elements are shallow-copied.

a = [1,2,3]
b = [a]
# this is shallow copied. Still contains reference to the object "a"
c = tuple(b)
# no reference anymore, just a tuple of (1,2,3). Will never change
# whatsoever.
d = tuple(a)

# testing:
a[2] = 8
b # => [[1,2,8]]
c # => [[1,2,8]]
d # => [1,2,3]

String is immutable sequence, thus cannot be assigned. Thus it is fairly safe to use string.

3.2 String

3.2.1 Concatenation

  • concatenate two strings directly by +.
  • need to convert integer to string before concatenate: s + str(35)
  • "".join(lst) works

3.2.2 split

str.split(sep=None)
default by white space
str.strip()
strip out white space at both begin and end
str.replace(old, new)
replace all.
str.startswith(s)
str.endswith(s)

3.2.3 Slicing

String is an immutable object. It can use slicing. E.g. reversing a string is as easy as "hello"[::-1]!

However, notice that when using a negative step, the slicing should be lst[end:begin:-1]. This is because x = i + n*k:

with a third “step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.

Also, the negative step does not always work as expect. E.g. the i index is included and j is not; the j can not be negative, then how can I include the first one in the list??

Thus if want to get a reverse of a sub-string, I would get sub-string first and then reverse it.

3.3 Dictionary

Create:

x = {'a': 1, 'b': 2}

Dictionary is not sorted. Use collections.OrderedDict if you want this feature. Basically it remember the order when the elements are inserted.

import collections
od = collections.OrderedDict(sorted(d.items()))

Merge two dictionary (x and y):

z = x.copy()
z.update(y)

3.3.1 Set

s = set()
s.add(x)
if x in s:
  pass

4 Standard Library

4.1 Operating System

4.1.1 Env

  • os.environ['HOME']
  • os.getenv(name)
  • os.putenv(name, value)
  • os.unsetenv(name)

4.1.2 Shell command

os.system
simply run command
os.system("some command")
os.popen
access to input output
stream = os.popen("some command")
stream.read()
  • subprocess.Popen
p = subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE)
p.stdout.read()
s = subprocess.check_output('wc -l', stdin=p.stdout)
subprocess.call
this is the same as subprocess.Popen except that it waits and gives return code.
return_code = subprocess.call("echo Hello World", shell=True, stdout=subprocess.DEVNULL)

4.1.3 Process

  • os.abort()
  • os.execl(path, arg0, arg1, …)
  • os.execle(path, arg0, arg1, …, env)
  • os.execlp(file, arg0, arg1, …)
  • os.execlpe(file, arg0, arg1, …, env)
  • os.execv(path, args)
  • os.execve(path, args, env)
  • os.execvp(file, args)
  • os.execvpe(file, args, env)
  • os.folk
  • os.wait()
  • os.system(cmd): run cmd, return exit code
  • os.times(): 5-tuple
    • user time
    • system time
    • childrens user time
    • childrens system time
    • elapsed real time

4.2 IO

4.2.1 File IO

Reading:

  • read()
  • readline(size=1)
  • readlines()

Seeking:

  • seek(offset=0)
    • 0 start
    • 1 current
    • 2 end
  • tell(): current position

Writing:

  • write(s): finally the string!
  • writelines(lines): write a list of lines
  • flush()
  f = open('text.txt')
  f.read() # return all content

  f = open('text.txt')
  for line in f:
      print(line)

  with open('a.txt') as f:
      for line in f:
          print(line)

Other IO:

  • f = io.StringIO("some string"): in memory text stream
  • f = io.BytesIO(b"some binary data \x00\x01")

4.2.2 Printing

  • pprint.pprint(object, stream=None): pretty print
  • 'string {0}, {hello}'.format('yes', hello=2)
print('xxx', end='')

read from stdin:

for line in sys.stdin:
  print(line)

4.2.3 redirect stdout

from contextlib import redirect_stdout
with open('xxx.txt', 'w') as f:
    redirect_stdout(f)

Or:

sys.stdout = f

The file handle can be:

f = open(os.devnull, 'w')

It can also be a predefined handle, like sys.stderr:

with redirect_stdout(sys.stderr):
    help(dir)

4.3 File System

4.3.1 os.walk

import os
for root,dirs,files in os.walk('.'):
  for f in files:
    print f
  • os.path.abspath('relative/path/to/file')
  • os.path.exists("/path/to/file")
  • os.rename('old', 'new')
  • os.path.isfile

4.3.2 FS Operations

  • os.getcwd(): current working directory
  • os.chdir(path): change cwd
  • os.mkdir(path)
  • os.listdir(path='.'): list all in this dir. E.g. for item in os.listdir('/path'): print (item)
  • os.makedirs(path): GOOD this is the way to go the make directories
  • os.remove(path): remove a file
  • os.rmdir(): remove an empty dir.
  • os.removedirs(path): foo/bar/aaa will try to remove aaa, than bar, then foo. Don't use! To recursively remove all contents, use shutil.rmtree
  • os.rename(src, dst)
  • os.renames(old, new)
  • os.rmdir(path): only work if dir is empty
  • os.tempnam(): a reasonable absolute name for creating temporary file
    • seems to be vulnerable
  • os.walk(top, topdown=True): for each directory including top itself, it yields 3-tuple (dirpath, dirnames, filenames). E.g. for root,dirs,files in os.walk('/path'): for f in files: print (f);

4.3.3 shutil

  • copy(src,dst)
  • copytree(src, dst): recursive
  • rmtree(path): rm -r
  • move(src, dst)

popen family is deprecated. Use subprocess.

4.3.4 os.path

If parameter is not listed, it means a single path.

  • exists: GOOD. check whether a path exists
  • split: return a pair (head, tail). tail is the last component, without slash. If path ends with slash, tail is empty
    • basename: the tail of the split output
    • dirname: head of split output
  • normpath: collapse redundant separators and up level references
  • abspath: from relative to absolute path. normpath(join(os.getcwd(), path))
  • commonprefix(list): return the longest path prefix
  • expanduser: replace the initial component of ~ by the users directory.
  • getsize: in bytes
  • isabs: predicate for absolute
  • isfile:
  • isdir
  • islink
  • join(path, *paths): join intelligently
  • realpath: canonical path by following symbolic links

4.3.5 pathlib

Object-oriented filesystem paths. https://docs.python.org/3/library/pathlib.html

pathlib.Path is the class. pathlib.PosixPath is a subclass for non-windows paths, but seems just for implementation purpose, makes no contribution for user.

Actually not very interesting, this table tells everything:

os and os.path pathlib
os.path.abspath() Path.resolve()
os.chmod() Path.chmod()
os.mkdir() Path.mkdir()
os.rename() Path.rename()
os.replace() Path.replace()
os.rmdir() Path.rmdir()
os.remove() , os.unlink() Path.unlink()
os.getcwd() Path.cwd()
os.path.exists() Path.exists()
os.path.expanduser() Path.expanduser() and Path.home()
os.path.isdir() Path.isdir()
os.path.isfile() Path.isfile()
os.path.islink() Path.issymlink()
os.stat() Path.stat(), Path.owner(), Path.group()
os.path.isabs() PurePath.isabsolute()
os.path.join() PurePath.joinpath()
os.path.basename() PurePath.name
os.path.dirname() PurePath.parent
os.path.samefile() Path.samefile()
os.path.splitext() PurePath.suffix

Some interesting APIs that don't have counterparts:

  • Path.glob(pattern) that returns a list of all files matching the shell pattern, e.g. p.glob('*/*.py')
  • slash operator: you can directly use p / 'foo' / 'bar'
  • Path.iterdir() gives a list of directory items
  • Path.parts gives a list of string

4.3.6 TODO tempfile

  • mkstemp creates temp file, but this file is opened. The return value is the file descriptor (int) of the opened file, the same as that gets returned by os.open, thus not easy to work with
  • mkdtemp creates temp dir. I would just use this when creating temporary files.
folder = tempfile.mkdtemp()
fd, fname = tempfile.mkstemp()

4.4 unittest

class MyTest(unittest.TestCase):
    def test_me(self):
        self.assertEqual(1,2)
unittest.main()

python unit test can support automatic test discovery. To use that, the file must be named test_xxx.py, and run the python -m unittest discover.

4.5 time

Create time object:

  • time.sleep(secs)
  • time.time(): time in seconds since epoch
  • gmtime(): in seconds, from epoch
  • localtime(): convert gmtime() to local
  • clock(): processor time as floating number in seconds

The returned time object is class time.struct_time: returned by gmtime(), localtime() and strptime(). Time to format string:

  • strptime(string[, format]): parse a string into time object
    • format default: "%a %b %d %H:%M:%S %Y"
    • time.strptime("30 Nov 00", "%d %b %y")
  • strftime(format[, t]): convert from time object to string
    • %a/A: abbr/full weekday name
    • %b/B: abbr/full month name
    • %Y: year
    • %m: month [01,12]
    • %d: day of the month [01,31]
    • %H: 24-hour [00,23]
    • %I: 12-hour [01,12]
    • %p: AM or PM
    • %M: Minute [00,59]
    • %S: second [00,61]

4.5.1 datetime

  • date has year, month, day
  • time has hour, minute, second
  • datetime has both
import datetime

t1 = datetime.date.fromisoformat('2019-12-04')
t2 = datetime.date.fromisoformat('2018-11-24')

delta = t1 - t2
delta.days

t3 = datetime.date.today()
t4 = datetime.date.date(2019, 12, 20)

t0 = datetime.date.fromtimestamp(time.time())

4.6 csv

import csv
with open('some.csv', newline='') as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

import csv
with open('some.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(row)
    writer.writerows(someiterable)

4.7 Json

import json
json.dumps({"C": 0, "D": 1})
json.loads("a string of json")

json.dump(obj, fp, indent=2)
json.load(fp)

4.8 argparse

import argparse
parser = argparse.ArgumentParser(descripton='Description here')

parser.add_argument('-q', '--query', help='query github api', require=True)
parser.add_argument('-d', '--download', help='do download', action='store_true')

args = parser.parse_args()

The most interesting method is of course the add_argument. It accepts the name, either a single string, bar, indicating positional argument, or a string starting with -, indicating optional arguments. You can supply parser.add_argument(-f, --foo) for short and full argument. The value is stored as an attribute with the same name (i.e. bar, foo) of the result, but you can change it to anther name via dest argument.

An action defines what to do with the argument. It is a string (!!!). The default is 'store', meaning store the supplied value to the result. If you don't need the value, but just want to know if the option is supplied, use store_true or store_false, which differ only in default value. The action append will collect each occurrence of the argument into a list.

By default, each option consume one argument. You can change this by the argument nargs. If it is an integer, it means how many should be consumed. The result will be a list, thus in case of 1, it is still different from default. It can be a string '*', '+', '?', which conforms to the regular expression meaning of them. * and + produce a list, + will get give error when no arguments are provided, ? will use default if missing.

In case of missing value, the default argument can be used to supply the default value. Otherwise, it is none. You can also use required argument to make sure user supplies something. A value is by default a string, you can convert it to anther data type by the type option, accepting a data type, e.g. int. You might also want to restrict the choices of the argument, so choices is a list of allowed values.

Finally, help option can be used to provide help string, and it can be printed out using parser.print_help(). To test the parser, you can use parser.parse_args(['-f', '1', 'bar']).

4.9 Regular Expression

construction

import re
pattern = re.compile('\d+.*$')

match

s = 'this is a test string'
pattern.match(s) # return True or False

search

pattern.findall(s)

shorthand

m = re.match("[pattern]", "string")
m.group()
m = re.search("[pattern]", "string")
m.group()
re.search("pattern", "string", re.IGNORECASE)
m = re.findall("[pattern]", "string")

4.10 Concurrent programming

4.10.1 threading

from threading import Thread

class MyThread(Thread):
  def __init__(self, arg):
    Thread.__init__(self)
    self.arg = arg
  def run(self):
    pass

t = MyThread(arg)
t.start()

The package name is threading, the object is Thread.

Functions

  • threading.activecount(): number of Thread object
  • threading.currentthread(): current Thread object
  • threading.enumerate(): return a list of all Thread objects
  • threading.meain(): the main Thread object
  • threading.local(): the instance of local storage. Different for different threads. Typical usage: mydata = threading.local()

Two ways to specify what to run:

  • pass a callable object to the target argument when constructing Thread
  • define a subclass of Thread and override the run method.

Methods:

  • start: start the thread. It will call run method in a separate thread. The thread terminate when run terminate
  • join(timeout=None): the calling thread will block until this thread terminate
    • timeout should be float in seconds
  • is_alive: test whether the thread terminate

4.10.2 Thread Sync

class threading.Lock

  • acquire()
  • release()

class threading.RLock

  • this is recursive lock. The same thread can acquire the lock multiple times. They will be nested and only when the last release is called, the lock can be acquired by another thead
  • acquire()
  • release()

class threading.Condition(lock=None)

  • the lock must be a Lock or RLock. If none, a RLock is created
  • acquire()
  • release()
  • wait(timeout=None): wait until notified
    • release underlying lock
    • block until notify
    • re-acquire the lock and return
    • typical usage: while not item_is_available(): cv.wait()
    • often use with statement: =with cv: cv.waitfor(pred); get();
  • waitfor(predicate, timeout=None)
    • this is same as while not predicate(): cv.wait(), thus more convenient than wait
  • notify(n=1): notify one thread
  • notifyall(): notify all threads waiting on this condition

class threading.Semaphore: this class manage resources with limited capacity.

  • acquire(): decrease capacity
  • release(): increase capacity

class threading.Event

  • isset():
  • set(): set flag to true
  • clear(): set flag to false
  • wait(timeout=None): block until internal flag is true

class threading.Timer(interval, function) : Thread

  • interval is float in seconds, function is callable. use start method to start the thread, and the function will be called after the delay.
  • cancel(): stop the timer and cancel the execution. Only work if the the timer is still waiting.

class threading.Barrier(parties, action=None, timeout=None)

  • parties is integer. Every thread calling wait will block, until parties number of such call is called. Then all players unblock and do things simultaneously.
  • wait(timeout=None)
  • reset(): reset the barrier. The thread waiting for it will receive BrokenBarrierError
  • abort(): all current and future wait call for it will get BrokenBarrierError
  • parties: number of parties
  • nwaiting: number of current waiting
  • broken: True or False
4.10.2.1 Using with statement

Lock, RLock, Condition, Semaphore can be used.

with somelock:
  # do somthing

is equivalent to:

somelock.acquire()
try:
  # do something
finally:
  somelock.release()

4.10.3 multiprocessing

This provide multiprocessing.Process class, having similar API with Thread. It seems to use fork but don't have explicit exec on the document?? Wired and seems just do something thread can do (except the sharing of memory of course).

4.10.4 subprocess

  • subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, shell=False, timeout=None, check=False)
    • run the command and wait for it to complete. Return a CompleteProcess instance.
    • if check is True, raise CalledProcessError exception if return code non-zero. This replace the checkcall and checkoutput.

class subprocess.CompletedProcess

  • args
  • returncode
  • stdout: captured if PIPE is passed to stdout
  • stderr: captured if PIPE is passed to stderr
  • checkreturncode(): if returncode is non-zero, raise CalledProcessError

Variables:

  • subprocess.DEVNULL
  • subprocess.PIPE
  • subprocess.STDOUT: this is only used in the place of stderr to redirect it to stdout

class subprocess.CalledProcessError

  • returncode
  • cmd
  • output: same as stdout
  • stdout
  • stderr

The followings are from 2.7, now only use run.

  • subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
    • args: a list of argument, including arg0
    • it can also be a string due to that *
    • it will wait, then return returncode
    • do not use stdout=PIPE, use communicate() instead TODO
    • use shell=True is bad, but it can give me
      • shell pipes
      • filename wildcard
      • env variable expansion
      • ~ expansion
  • checkcall(args, *, …): same as call, except it will raise exception if return non-0
  • checkoutput(args, *, stdin=None, stderr=None, shell=False, universalnewlines=False)
    • if return non-0, raise exception. Otherwise return the stdout

Popen object

  • Popen constructor
    • args, bufsize=0, executable=None,
    • stdin=None, stdout=None, stderr=None,
    • preexecfn=None, closefds=False,
    • shell=False, cwd=None, env=None,
    • universalnewlines=False, startupinfo=None, creationflags=0
  • Popen.poll(): check if child process has terminated. Set and return returncode.
  • Popen.wait(): wait for process to terminate. Don't use PIPE with this.
  • Popen.communicate(input=None): to use this, the corresponding stdin, stdout, stderr should be set to PIPE.
    • send data to stdin (string)
    • read data from stdout and stderr (it returns a tuple (out, err))
    • wait for termination
  • Popen.snedsignal(signal)
  • Popen.terminate(): send SIGTERM
  • Popen.kill(): send SIGKILL
  • Popen.pid
  • Popen.returncode
    • set by poll and wait (and indirectly by communicate)
    • None indicate hasn't terminated
    • -N means terminated by signal N

5 Third party libraries

5.1 urllib

from urllib import request
import json

url = 'https://api.github.com'
api = '/search/repositories'
query = 'language:C&stars:>10&per_page='+size
response = request.urlopen(url+api+"?q="+query)

s = response.read().decode('utf8')
j = json.loads(s)
# j will be a mix of list and dict

5.1.1 urllib.request

package urllib.request

Functions

  • urlopen(url, data=None)
    • url can be a string or Request object
    • for http and https, returns a http.client.HTTPResponse object
    • for FTP, file, data urls, return a urllib.response.addinfourl object
  • pathname2url(path): do quoting
  • url2pathname(path): do unquoting

class Request

  • constructor: (url, data=None, headers={}, method=None)
    • url: a string
    • headers: a dictionary.
    • method: a string. 'GET' is default. Available values: 'HEAD', 'POST'

methods:

  • getmethod()
  • addheader(key, val)
  • hasheader(key)
  • getheader(key)
  • removeheader(key)
  • getfullurl()
  • headeritems(): return a list of tuples (key, value)
  req = request.Request(query)
  req.add_header("Authorization", "token " + token)
  response = request.urlopen(req)
  s = response.read().decode('utf8')
  langj = json.loads(s);
  # deprecated
  urllib.request.urlretrieve(url[, filename])

5.1.2 urllib.parse

  • quote(string)
  • quoteplus(string)
  • unquote(string)
  • unquoteplus(string)
  • urlencode(query)

5.2 XML

import xml.etree.ElementTree as ET
root = ET.fromstring(s)
# XPath
nodes = root.findall('{http://www.sdml.info/srcML/src}function')
for node in nodes:
  # do with node
  pass

APIs

  • node.find(XPath)
  • node.findall(XPath)
  • node.get(Attribute)
  • node.text

5.4 BeautifulSoup

The package is called BeautifulSoup4.

The preface to use the package:

from bs4 import BeautifulSoup
BeautifulSoup('<html>string</html>')
with open('a.html') as fp:
    BeautifulSoup(fp)

Each node can be used as a data structure, with the following fields:

  • name: the tag name
  • string: the (first?) string directly embedded inside the node
  • strings: a list of the strings
  • a-tag: the first child that is of that tag
  • attrs: a list of all attribute names
  • children: going downwards
  • descendants: intuitive
  • parent
  • parents: wow, this should be called ancestor?
  • next_sibling, previous_sibling

It can also be used as a dictionary of its attributes, e.g. s['href']. This should be a string. It is equivalent to using the get method with the class name.

Several methods are of particular interests.

  • get_text(): return all text in the node

You can also execute a query on it. In general, find_all returns a list, while find returns the first one. There are also some methods in this family, namely find_next_siblings, find_parents. E.g.

  • s.find_all('a'): return a list of all 'a' tag nodes

Or it can be a query respecting css id and classes. Although find has some support for id and class, the select is easier to use.

  • s.select("body a"): non-direct
  • s.select("p > a"): direct
  • s.select(p.c#id): class and id
  • s.select(p > #id): mix
  • s.select(a[href^=xxx]): filtering based on attribute values

5.6 pandas

Looks like it is a dataframe library

5.7 numpy

C-implementation of multi-dimensional arrays

5.8 scipy

scitific computing algorithms, including:

  • linaer algebra
  • optimization
  • interpolation
  • integration and differential equation
  • clustering algorithms
  • statistical distributions

5.9 scikit-learn

Learning library.

Supervised learning:

  • linear models
  • SVM
  • Gaussian Processes
  • Naive Bayes
  • Decision Trees
  • KNN

Unsupervised learning:

  • Gaussian Mixture Models
  • Manifold learning
  • clustering
    • k-means

Other topics

  • Ensemble methods
  • Feature Selection
  • Outlier detection
  • model selection
    • grid search
    • cross validation

5.10 matplotlib

import matplotlib.pyplot as plt

Reference

5.10.1 type of figures

  • plt.bar
  • plt.scatter
  • plt.plot: line plot
  • plt.hist
  • plt.pie
plt.plot([1,2,3,4])

Image via plt.imshow():

# plot a mnist digit
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
# since the data is just an array (28,28), imshow must have converted
# it to image pixel properly
plt.imshow(x_train[7777], cmap='Greys')
# must call plt.show() to open the figure window. Or, execute
# %matplotlib in the REPL, you can get the image directly after
# imshow().
plt.show()

5.10.2 TODO plot options

5.10.3 legends, axis, more settings

Texts:

  • plt.xlabel()
  • plt.ylabel()
  • plt.title()
  • plt.axis()
  • plt.text()
  • plt.annotate
  • plt.grid(True)
  • plt.table(): attach a table to an axis!

Scale:

  • plt.xscale('linear')
  • plt.yscale('log')

5.10.4 Subplots

plt.ioff()
figure = plt.figure()
figure.canvas.set_window_title('My Grid Visualization')
for x in range(height):
    for y in range(width):
        # print(x,y)
        figure.add_subplot(height, width, x*width + y + 1)
        plt.axis('off')
        plt.imshow(convert_image_255(images[x*width+y]), cmap='gray')
# plt.show()
plt.savefig(filename)

Or better, create figure and axis, and plot for each axis:

import matplotlib.pyplot as plt
import numpy as np

np.random.seed(19680801)
data = np.random.randn(2, 100)

fig, axs = plt.subplots(2, 2, figsize=(5, 5))
axs[0, 0].hist(data[0])
axs[1, 0].scatter(data[0], data[1])
axs[0, 1].plot(data[0], data[1])
axs[1, 1].hist2d(data[0], data[1])

plt.show()

5.10.5 export to files

Visualize using OS GUI toolkit:

plt.show()

Plot to a file:

pylab.ioff()
plot([1, 2, 3])
savefig("/tmp/test.png")

5.11 imsave

imsave is deprecated, change from

from scipy.misc import imsave

to

from imageio import imwrite as imsave

5.12 Nvidia GPU setting

Select visible GPU in a multi-GPU setting:

os.environ['CUDA_VISIBLE_DEVICES'] = '3'

CUDA setup

  1. Install Nvidia driver. This can be done using Ubuntu's software center. But this is the stable version, not newest
  2. Install cuda. To /usr/local/cuda-10.0. I use the "runfile", with the --override option (otherwise throw gcc version not supported error).
  3. Install cudnn by copying header files and library files into /usr/local/cuda-10.0
  4. Configure
CUDA_PATH=/usr/local/cuda-10.0
export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$LD_LIBRARY_PATH"
export PATH="$CUDA_PATH/bin:$PATH"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_PATH/extras/CUPTI/lib64"