Python, the language
Table of Contents
- 1. Tools & Reference
- 2. Language
- 3. Collections
- 4. Standard Library
- 5. Third party libraries
Pseudocode which runs. – Peter Norvig (?)
The best program to do a job is one which already ships the solution.
There should be one – and preferably only one – obvious way to do it.
– Aphorism 13 in the Zen of Python by Tim Peters:
Python is nice, sure. But only until it stats warping your mind in the very late of the game. See https://www.draketo.de/proj/py2guile/ for an insightful reference.
When you start thinking about using code-templates in your editor to comply with the requirements of your language, then it is likely that something is wrong with the language.
1 Tools & Reference
1.1 Emacs support
Install the elpy
package. It provides:
C-c C-c
runs the shell and send the current bufferC-c C-d
runselpy-doc
C-c C-t
runselpy-test
, which runs the unittest discover
To enable linter python in emacs, use pylint. It will use pylint
executable. And it also needs the configure file. Generate it:
pylint --generate-rcfile > ~/.pylintrc
1.2 pip mirror
- ustc mirror: http://pypi.mirrors.ustc.edu.cn/simple/
to use one-time, simply:
pip3 install xxx -i https://pypi.mirrors.ustc.edu.cn/simple/ # don't need this when using https # --trusted-host pypi.mirrors.ustc.edu.cn
global configuration seems to be:
pip3 config set global.index-url https://mirrors.ustc.edu.cn/pypi/web/simple
In China, pytorch cannot be installed due to 1. large 2. cpu only has a specific url. Thus I'm using conda mirror https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/
2 Language
2.1 data type
- type(obj): get the type of obj
Numerical functions:
- abs(x): absolute value
- divmod(a,b): a pair (a // b, a % b)
- max(arg1, arg2, *args)
- min(arg1, arg2, *args)
- pow(x,y): xy
- round(x, ndigits=0)
- sum(iterable)
Boolean:
- all(iterable): true if all items are true. empty => True
- any(iterable): true if any item is true. empty => False
- cmp(x,y)
- x<y => negative
- x=y => 0
- x>y => positive
2.1.1 conversion
chr()
: ASCII to charord()
: char to ASCIIfloat(x)
long(x)
bool(x)
: convert x to boolint()
: string to integerstr()
: integer to stringhex(x)
: convert integer to lowercase hex string prefix with '0x'oct(x)
: integer to octal stringbin(x)
: an integer to binary string
2.2 Scoping
There're four levels:
- current scope
- parent scope
- module scope (global)
- built-in scope
nonlocal
keyword specify this variable should be referenced to the parent scope.
But, this will not reach global.
Instead, the global
keyword declares the listed variables to be in the module level scope.
The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope excluding globals.
As an example:
var = 0 # global def outer(): var = 1 # parent def inner(): nonlocal var var = 2 # local global var var =3 inner() # var = 2 outer() # global var = 3
2.3 Conditional
If else or:
var = d.get('key') or 0 # is equal to: var = d.get('key') if d.get('key') else 0
2.4 Loop
- len(s): length
- next(iterator)
- range(stop): [0,stop)
- range(start, stop, step=1)
2.5 Function
2.5.1 Function def and call
The default value of an argument is evaluated once at the function definition. Thus, the object is shared for all the invoke of the function. This is typically not desired behavior.
def foo(a=[]): a.append(3) return a foo() foo() # => [3,3] !!!
Python function pass-by-object. If you pass a list, you can modify the list, and the original list is modified.
a = [1,2] def foo(x): x.append(3) foo(a) a # => [1,2,3]
2.5.2 Lambda
lambda x : x+2 lambda x: x%2==0
The usage of lambda is often in map and filter.
map(lambda_exp, mylist)
will execute the lambda expression on each element of the list, and return a list containing the results.
2.5.3 variadic parameter
use *args
syntax, and args
will be a tuple:
def foo(*args): for a in args: print a
use **args
to capture all keyword arguments.
def bar(**kwargs): for a in kwargs: print a, kwargs[a]
Combine them together:
def foobar(kind, *args, **kwargs): pass
Also, there's a concept for the reverse thing: unpack argument list from a list, with *list
:
def foo(a,b): pass l = [1,2] foo(*l)
on python3, this syntax can appear on left side
first, *rest = [1,2,3,4] first,*l,last = [1,2,3,4]
2.6 Meta Programming
Basically eval
(return value) and exec
(no return value), with
either string or code object created by compile
. They can use the
names bound by current namespace.
eval("1+2") a=2 eval("1+a") def foo(a): return a+3 eval("foo(a)") # no return exec("foo(a)") eval(compile("1+a", '', 'eval'))
2.7 Exception
To give a quick feel:
try: pass except TypeError as e: # capture the exception into a variable pass except AnotherError: # does not capture pass except: # all exception pass else: # if doesn't raise an exception pass finally: pass
2.7.1 Built-in exceptions
BaseException +-- SystemExit +-- KeyboardInterrupt +-- GeneratorExit +-- Exception +-- StopIteration +-- StandardError | +-- BufferError | +-- ArithmeticError | | +-- FloatingPointError | | +-- OverflowError | | +-- ZeroDivisionError | +-- AssertionError | +-- AttributeError | +-- EnvironmentError | | +-- IOError | | +-- OSError | | +-- WindowsError (Windows) | | +-- VMSError (VMS) | +-- EOFError | +-- ImportError | +-- LookupError | | +-- IndexError | | +-- KeyError | +-- MemoryError | +-- NameError | | +-- UnboundLocalError | +-- ReferenceError | +-- RuntimeError | | +-- NotImplementedError | +-- SyntaxError | | +-- IndentationError | | +-- TabError | +-- SystemError | +-- TypeError | +-- ValueError | +-- UnicodeError | +-- UnicodeDecodeError | +-- UnicodeEncodeError | +-- UnicodeTranslateError +-- Warning +-- DeprecationWarning +-- PendingDeprecationWarning +-- RuntimeWarning +-- SyntaxWarning +-- UserWarning +-- FutureWarning +-- ImportWarning +-- UnicodeWarning +-- BytesWarning
2.8 Module
Exposing API: the following only expose foo
but not bar
.
__all__ = ['foo'] def foo(): pass def bar(): pass
2.8.1 importing
The local structure directory must contain the __init__.py
file to be able to import.
|-- main.py |-- mypackage |-- __init__.py |-- a.py |-- b.py |-- subdir |-- __init__.py |-- c.py
The import statements should be:
from mypackage import a from mypackage.b import foo as myfoo from mypackage.subdir import c
export PYTHONPATH="$PYTHONPATH:/home/hebi/github/reading/models"
Add some path so that I can import from there:
sys.path.append('/home/hebi/github/reading/InferSent/') # assume in root of that directory, models.py defines InferSent class from models import InferSent
Packaging:
setup.py:
from setuptools import setup, find_packages setup( name="InferSent-Mirror", version="0.1", # packages=find_packages(), packages=['p1', 'p2'], )
Directory structure:
mypackage/ p1/ __init__.py xxx.py p2/ __init__.py yyy.py
Install locally:
python3 setup.py install --user
Install from git repo:
pip install --user git+https://github.com/lihebi/InferSent
Import:
from p1 import xxx from p2.yyy import foo
3 Collections
3.1 List
3.1.1 TODO tuple
3.1.2 TODO sorted
sort a dictionary by value:
sorted(dict1, key=dict1.get) # => list sorted(dict1, key=dict1.get, reverse=True)
3.1.3 Slicing
The slicing syntax is l[start:end:step]
.
The slicing will return a new list. Change to that list will not change the original one.
l[4] l[4:] l[::2] l[:-1]
However, assign to the slicing itself will change the original one:
l[1:2] = [4,5,6]
Also, assign to a new variable only assign the reference:
a = [1,2,3] b = a # only a reference
3.1.4 create a list
range(stop)
range(start, stop[, step])
Creating a matrix:
newmat=[[-1 for x in range(height)] for y in range(width)]
list comprehension
even_squares = [x**2 for x in l if x%2 == 0]
3.1.5 Modify a list
- list.append
- list.pop
3.1.6 List object model
Lists are mutable. The behavior of slicing is a bit confusing. If the slicing is used directly as the target of an assignment statement, it will modify the object in place. E.g.
a = [1,2,3,4] a[1:3] = [] a # => [1,4]
That also means all other references to a
will be modified:
a = [1,2,3,4] a[1:3] = [] # although tuple is immutable, it can still contain reference to # mutable objects. c=(a,) # this will also modify a a.append(5) c # => ([1,4,5])
However, if the slicing is assigned to another variable (either assignment or pass-by-object function call), it is copied. Modifying this copy will not affect the original list.
a = [1,2,3,4] b = a[1:3] b[0] = 9 a # => [1,2,3,4] def foo(x): x[1] = 8 # changing b foo(b) b # => [9,8] a # => [1,2,3,4]
If you convert a list to a tuple, the elements are shallow-copied.
a = [1,2,3] b = [a] # this is shallow copied. Still contains reference to the object "a" c = tuple(b) # no reference anymore, just a tuple of (1,2,3). Will never change # whatsoever. d = tuple(a) # testing: a[2] = 8 b # => [[1,2,8]] c # => [[1,2,8]] d # => [1,2,3]
String is immutable sequence, thus cannot be assigned. Thus it is fairly safe to use string.
3.2 String
3.2.1 Concatenation
- concatenate two strings directly by
+
. - need to convert integer to string before concatenate:
s + str(35)
- "".join(lst) works
3.2.2 split
str.split(sep=None)
- default by white space
str.strip()
- strip out white space at both begin and end
str.replace(old, new)
- replace all.
str.startswith(s)
str.endswith(s)
3.2.3 Slicing
String is an immutable object. It can use slicing. E.g. reversing a
string is as easy as "hello"[::-1]
!
However, notice that when using a negative step, the slicing should be
lst[end:begin:-1]
. This is because x = i + n*k
:
with a third “step” parameter: a[i:j:k] selects all items of a with index x where x = i + n*k, n >= 0 and i <= x < j.
Also, the negative step does not always work as expect. E.g. the i index is included and j is not; the j can not be negative, then how can I include the first one in the list??
Thus if want to get a reverse of a sub-string, I would get sub-string first and then reverse it.
3.3 Dictionary
Create:
x = {'a': 1, 'b': 2}
Dictionary is not sorted. Use collections.OrderedDict
if you want this feature.
Basically it remember the order when the elements are inserted.
import collections od = collections.OrderedDict(sorted(d.items()))
Merge two dictionary (x
and y
):
z = x.copy()
z.update(y)
3.3.1 Set
s = set() s.add(x) if x in s: pass
4 Standard Library
4.1 Operating System
4.1.1 Env
- os.environ['HOME']
- os.getenv(name)
- os.putenv(name, value)
- os.unsetenv(name)
4.1.2 Shell command
os.system
- simply run command
os.system("some command")
os.popen
- access to input output
stream = os.popen("some command") stream.read()
subprocess.Popen
p = subprocess.Popen("echo Hello World", shell=True, stdout=subprocess.PIPE) p.stdout.read() s = subprocess.check_output('wc -l', stdin=p.stdout)
subprocess.call
- this is the same as
subprocess.Popen
except that it waits and gives return code.
return_code = subprocess.call("echo Hello World", shell=True, stdout=subprocess.DEVNULL)
4.1.3 Process
- os.abort()
- os.execl(path, arg0, arg1, …)
- os.execle(path, arg0, arg1, …, env)
- os.execlp(file, arg0, arg1, …)
- os.execlpe(file, arg0, arg1, …, env)
- os.execv(path, args)
- os.execve(path, args, env)
- os.execvp(file, args)
- os.execvpe(file, args, env)
- os.folk
- os.wait()
- os.system(cmd): run cmd, return exit code
- os.times(): 5-tuple
- user time
- system time
- childrens user time
- childrens system time
- elapsed real time
4.2 IO
4.2.1 File IO
Reading:
- read()
- readline(size=1)
- readlines()
Seeking:
- seek(offset=0)
- 0 start
- 1 current
- 2 end
- tell(): current position
Writing:
- write(s): finally the string!
- writelines(lines): write a list of lines
- flush()
f = open('text.txt') f.read() # return all content f = open('text.txt') for line in f: print(line) with open('a.txt') as f: for line in f: print(line)
Other IO:
- f = io.StringIO("some string"): in memory text stream
- f = io.BytesIO(b"some binary data \x00\x01")
4.2.2 Printing
- pprint.pprint(object, stream=None): pretty print
- 'string {0}, {hello}'.format('yes', hello=2)
print('xxx', end='')
read from stdin:
for line in sys.stdin: print(line)
4.2.3 redirect stdout
from contextlib import redirect_stdout with open('xxx.txt', 'w') as f: redirect_stdout(f)
Or:
sys.stdout = f
The file handle can be:
f = open(os.devnull, 'w')
It can also be a predefined handle, like sys.stderr
:
with redirect_stdout(sys.stderr): help(dir)
4.3 File System
4.3.1 os.walk
import os for root,dirs,files in os.walk('.'): for f in files: print f
os.path.abspath('relative/path/to/file')
os.path.exists("/path/to/file")
os.rename('old', 'new')
os.path.isfile
4.3.2 FS Operations
- os.getcwd(): current working directory
- os.chdir(path): change cwd
- os.mkdir(path)
os.listdir(path='.')
: list all in this dir. E.g.for item in os.listdir('/path'): print (item)
os.makedirs(path)
: GOOD this is the way to go the make directoriesos.remove(path)
: remove a fileos.rmdir()
: remove an empty dir.- os.removedirs(path): foo/bar/aaa will try to remove aaa, than bar,
then foo. Don't use! To recursively remove all contents, use
shutil.rmtree
- os.rename(src, dst)
- os.renames(old, new)
- os.rmdir(path): only work if dir is empty
- os.tempnam(): a reasonable absolute name for creating temporary file
- seems to be vulnerable
- os.walk(top, topdown=True): for each directory including top itself,
it yields 3-tuple (dirpath, dirnames, filenames). E.g.
for root,dirs,files in os.walk('/path'): for f in files: print (f);
4.3.3 shutil
- copy(src,dst)
- copytree(src, dst): recursive
- rmtree(path): rm -r
- move(src, dst)
popen family is deprecated. Use subprocess.
4.3.4 os.path
If parameter is not listed, it means a single path.
exists
: GOOD. check whether a path existssplit
: return a pair (head, tail). tail is the last component, without slash. If path ends with slash, tail is emptybasename
: the tail of the split outputdirname
: head of split output
normpath
: collapse redundant separators and up level referencesabspath
: from relative to absolute path. normpath(join(os.getcwd(), path))commonprefix(list)
: return the longest path prefixexpanduser
: replace the initial component of ~ by the users directory.getsize
: in bytesisabs
: predicate for absoluteisfile
:isdir
islink
join(path, *paths)
: join intelligentlyrealpath
: canonical path by following symbolic links
4.3.5 pathlib
Object-oriented filesystem paths. https://docs.python.org/3/library/pathlib.html
pathlib.Path
is the class. pathlib.PosixPath
is a subclass for
non-windows paths, but seems just for implementation purpose, makes no
contribution for user.
Actually not very interesting, this table tells everything:
os and os.path | pathlib |
---|---|
os.path.abspath() | Path.resolve() |
os.chmod() | Path.chmod() |
os.mkdir() | Path.mkdir() |
os.rename() | Path.rename() |
os.replace() | Path.replace() |
os.rmdir() | Path.rmdir() |
os.remove() , os.unlink() | Path.unlink() |
os.getcwd() | Path.cwd() |
os.path.exists() | Path.exists() |
os.path.expanduser() | Path.expanduser() and Path.home() |
os.path.isdir() | Path.isdir() |
os.path.isfile() | Path.isfile() |
os.path.islink() | Path.issymlink() |
os.stat() | Path.stat(), Path.owner(), Path.group() |
os.path.isabs() | PurePath.isabsolute() |
os.path.join() | PurePath.joinpath() |
os.path.basename() | PurePath.name |
os.path.dirname() | PurePath.parent |
os.path.samefile() | Path.samefile() |
os.path.splitext() | PurePath.suffix |
Some interesting APIs that don't have counterparts:
Path.glob(pattern)
that returns a list of all files matching the shell pattern, e.g.p.glob('*/*.py')
- slash operator: you can directly use
p / 'foo' / 'bar'
Path.iterdir()
gives a list of directory itemsPath.parts
gives a list of string
4.3.6 TODO tempfile
mkstemp
creates temp file, but this file is opened. The return value is the file descriptor (int) of the opened file, the same as that gets returned byos.open
, thus not easy to work withmkdtemp
creates temp dir. I would just use this when creating temporary files.
folder = tempfile.mkdtemp() fd, fname = tempfile.mkstemp()
4.4 unittest
class MyTest(unittest.TestCase): def test_me(self): self.assertEqual(1,2) unittest.main()
python unit test can support automatic test discovery. To use that,
the file must be named test_xxx.py
, and run the python -m unittest discover
.
4.5 time
Create time object:
- time.sleep(secs)
- time.time(): time in seconds since epoch
- gmtime(): in seconds, from epoch
- localtime(): convert gmtime() to local
- clock(): processor time as floating number in seconds
The returned time object is class time.struct_time
: returned by gmtime(),
localtime() and strptime(). Time to format string:
- strptime(string[, format]): parse a string into time object
- format default: "%a %b %d %H:%M:%S %Y"
- time.strptime("30 Nov 00", "%d %b %y")
- strftime(format[, t]): convert from time object to string
- %a/A: abbr/full weekday name
- %b/B: abbr/full month name
- %Y: year
- %m: month [01,12]
- %d: day of the month [01,31]
- %H: 24-hour [00,23]
- %I: 12-hour [01,12]
- %p: AM or PM
- %M: Minute [00,59]
- %S: second [00,61]
4.5.1 datetime
date
has year, month, daytime
has hour, minute, seconddatetime
has both
import datetime t1 = datetime.date.fromisoformat('2019-12-04') t2 = datetime.date.fromisoformat('2018-11-24') delta = t1 - t2 delta.days t3 = datetime.date.today() t4 = datetime.date.date(2019, 12, 20) t0 = datetime.date.fromtimestamp(time.time())
4.6 csv
import csv with open('some.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row) import csv with open('some.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerow(row) writer.writerows(someiterable)
4.7 Json
import json json.dumps({"C": 0, "D": 1}) json.loads("a string of json") json.dump(obj, fp, indent=2) json.load(fp)
4.8 argparse
import argparse parser = argparse.ArgumentParser(descripton='Description here') parser.add_argument('-q', '--query', help='query github api', require=True) parser.add_argument('-d', '--download', help='do download', action='store_true') args = parser.parse_args()
The most interesting method is of course the add_argument
. It
accepts the name, either a single string, bar
, indicating positional
argument, or a string starting with -
, indicating optional
arguments. You can supply parser.add_argument(-f, --foo)
for short
and full argument. The value is stored as an attribute with the same
name (i.e. bar
, foo
) of the result, but you can change it to
anther name via dest
argument.
An action defines what to do with the argument. It is a string
(!!!). The default is 'store'
, meaning store the supplied value to
the result. If you don't need the value, but just want to know if the
option is supplied, use store_true
or store_false
, which differ
only in default value. The action append
will collect each
occurrence of the argument into a list.
By default, each option consume one argument. You can change this by
the argument nargs
. If it is an integer, it means how many should be
consumed. The result will be a list, thus in case of 1
, it is still
different from default. It can be a string '*', '+', '?'
, which
conforms to the regular expression meaning of them. *
and +
produce a list, +
will get give error when no arguments are
provided, ?
will use default
if missing.
In case of missing value, the default
argument can be used to supply
the default value. Otherwise, it is none. You can also use required
argument to make sure user supplies something. A value is by default a
string, you can convert it to anther data type by the type
option,
accepting a data type, e.g. int
. You might also want to restrict the
choices of the argument, so choices
is a list of allowed values.
Finally, help
option can be used to provide help string, and it can
be printed out using parser.print_help()
. To test the parser, you
can use parser.parse_args(['-f', '1', 'bar'])
.
4.9 Regular Expression
construction
import re pattern = re.compile('\d+.*$')
match
s = 'this is a test string' pattern.match(s) # return True or False
search
pattern.findall(s)
shorthand
m = re.match("[pattern]", "string") m.group() m = re.search("[pattern]", "string") m.group() re.search("pattern", "string", re.IGNORECASE) m = re.findall("[pattern]", "string")
4.10 Concurrent programming
4.10.1 threading
from threading import Thread class MyThread(Thread): def __init__(self, arg): Thread.__init__(self) self.arg = arg def run(self): pass t = MyThread(arg) t.start()
The package name is threading
, the object is Thread
.
Functions
- threading.activecount(): number of Thread object
- threading.currentthread(): current Thread object
- threading.enumerate(): return a list of all Thread objects
- threading.meain(): the main Thread object
- threading.local(): the instance of local storage. Different for
different threads. Typical usage:
mydata = threading.local()
Two ways to specify what to run:
- pass a callable object to the
target
argument when constructing Thread - define a subclass of Thread and override the
run
method.
Methods:
start
: start the thread. It will callrun
method in a separate thread. The thread terminate whenrun
terminatejoin(timeout=None)
: the calling thread will block until this thread terminate- timeout should be float in seconds
is_alive
: test whether the thread terminate
4.10.2 Thread Sync
class threading.Lock
- acquire()
- release()
class threading.RLock
- this is recursive lock. The same thread can acquire the lock multiple times. They will be nested and only when the last release is called, the lock can be acquired by another thead
- acquire()
- release()
class threading.Condition(lock=None)
- the lock must be a Lock or RLock. If none, a RLock is created
- acquire()
- release()
- wait(timeout=None): wait until notified
- release underlying lock
- block until notify
- re-acquire the lock and return
- typical usage:
while not item_is_available(): cv.wait()
- often use
with
statement: =with cv: cv.waitfor(pred); get();
- waitfor(predicate, timeout=None)
- this is same as
while not predicate(): cv.wait()
, thus more convenient thanwait
- this is same as
- notify(n=1): notify one thread
- notifyall(): notify all threads waiting on this condition
class threading.Semaphore: this class manage resources with limited capacity.
- acquire(): decrease capacity
- release(): increase capacity
class threading.Event
- isset():
- set(): set flag to true
- clear(): set flag to false
- wait(timeout=None): block until internal flag is true
class threading.Timer(interval, function) : Thread
- interval is float in seconds, function is callable. use
start
method to start the thread, and the function will be called after the delay. - cancel(): stop the timer and cancel the execution. Only work if the the timer is still waiting.
class threading.Barrier(parties, action=None, timeout=None)
- parties is integer. Every thread calling wait will block, until parties number of such call is called. Then all players unblock and do things simultaneously.
- wait(timeout=None)
- reset(): reset the barrier. The thread waiting for it will receive
BrokenBarrierError
- abort(): all current and future wait call for it will get
BrokenBarrierError
- parties: number of parties
- nwaiting: number of current waiting
- broken: True or False
4.10.2.1 Using with statement
Lock, RLock, Condition, Semaphore can be used.
with somelock: # do somthing
is equivalent to:
somelock.acquire() try: # do something finally: somelock.release()
4.10.3 multiprocessing
This provide multiprocessing.Process class, having similar API with Thread. It seems to use fork but don't have explicit exec on the document?? Wired and seems just do something thread can do (except the sharing of memory of course).
4.10.4 subprocess
- subprocess.run(args, *, stdin=None, input=None, stdout=None,
stderr=None, shell=False, timeout=None, check=False)
- run the command and wait for it to complete. Return a
CompleteProcess
instance. - if check is True, raise CalledProcessError exception if return code non-zero. This replace the checkcall and checkoutput.
- run the command and wait for it to complete. Return a
class subprocess.CompletedProcess
- args
- returncode
- stdout: captured if PIPE is passed to stdout
- stderr: captured if PIPE is passed to stderr
- checkreturncode(): if returncode is non-zero, raise CalledProcessError
Variables:
- subprocess.DEVNULL
- subprocess.PIPE
- subprocess.STDOUT: this is only used in the place of stderr to redirect it to stdout
class subprocess.CalledProcessError
- returncode
- cmd
- output: same as stdout
- stdout
- stderr
The followings are from 2.7, now only use run.
- subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False)
- args: a list of argument, including arg0
- it can also be a string due to that *
- it will wait, then return returncode
- do not use stdout=PIPE, use communicate() instead TODO
- use shell=True is bad, but it can give me
- shell pipes
- filename wildcard
- env variable expansion
- ~ expansion
- checkcall(args, *, …): same as call, except it will raise exception if return non-0
- checkoutput(args, *, stdin=None, stderr=None, shell=False, universalnewlines=False)
- if return non-0, raise exception. Otherwise return the stdout
Popen object
- Popen constructor
- args, bufsize=0, executable=None,
- stdin=None, stdout=None, stderr=None,
- preexecfn=None, closefds=False,
- shell=False, cwd=None, env=None,
- universalnewlines=False, startupinfo=None, creationflags=0
- Popen.poll(): check if child process has terminated. Set and return returncode.
- Popen.wait(): wait for process to terminate. Don't use PIPE with this.
- Popen.communicate(input=None): to use this, the corresponding stdin,
stdout, stderr should be set to PIPE.
- send data to stdin (string)
- read data from stdout and stderr (it returns a tuple (out, err))
- wait for termination
- Popen.snedsignal(signal)
- Popen.terminate(): send SIGTERM
- Popen.kill(): send SIGKILL
- Popen.pid
- Popen.returncode
- set by poll and wait (and indirectly by communicate)
- None indicate hasn't terminated
- -N means terminated by signal N
5 Third party libraries
5.1 urllib
from urllib import request import json url = 'https://api.github.com' api = '/search/repositories' query = 'language:C&stars:>10&per_page='+size response = request.urlopen(url+api+"?q="+query) s = response.read().decode('utf8') j = json.loads(s) # j will be a mix of list and dict
5.1.1 urllib.request
package urllib.request
Functions
- urlopen(url, data=None)
- url can be a string or Request object
- for http and https, returns a http.client.HTTPResponse object
- for FTP, file, data urls, return a urllib.response.addinfourl object
- pathname2url(path): do quoting
- url2pathname(path): do unquoting
class Request
- constructor: (url, data=None, headers={}, method=None)
- url: a string
- headers: a dictionary.
- method: a string. 'GET' is default. Available values: 'HEAD', 'POST'
methods:
- getmethod()
- addheader(key, val)
- hasheader(key)
- getheader(key)
- removeheader(key)
- getfullurl()
- headeritems(): return a list of tuples (key, value)
req = request.Request(query) req.add_header("Authorization", "token " + token) response = request.urlopen(req) s = response.read().decode('utf8') langj = json.loads(s); # deprecated urllib.request.urlretrieve(url[, filename])
5.1.2 urllib.parse
- quote(string)
- quoteplus(string)
- unquote(string)
- unquoteplus(string)
- urlencode(query)
5.2 XML
import xml.etree.ElementTree as ET root = ET.fromstring(s) # XPath nodes = root.findall('{http://www.sdml.info/srcML/src}function') for node in nodes: # do with node pass
APIs
node.find(XPath)
node.findall(XPath)
node.get(Attribute)
node.text
5.3 Requests
5.4 BeautifulSoup
The package is called BeautifulSoup4
.
The preface to use the package:
from bs4 import BeautifulSoup BeautifulSoup('<html>string</html>') with open('a.html') as fp: BeautifulSoup(fp)
Each node can be used as a data structure, with the following fields:
name
: the tag namestring
: the (first?) string directly embedded inside the nodestrings
: a list of the stringsa-tag
: the first child that is of that tagattrs
: a list of all attribute nameschildren
: going downwardsdescendants
: intuitiveparent
parents
: wow, this should be called ancestor?next_sibling
,previous_sibling
It can also be used as a dictionary of its attributes,
e.g. s['href']
. This should be a string. It is equivalent to using
the get
method with the class name.
Several methods are of particular interests.
get_text()
: return all text in the node
You can also execute a query on it. In general, find_all
returns a
list, while find
returns the first one. There are also some methods
in this family, namely find_next_siblings
, find_parents
. E.g.
s.find_all('a')
: return a list of all 'a' tag nodes
Or it can be a query respecting css id and classes. Although find
has some support for id and class, the select
is easier to use.
s.select("body a")
: non-directs.select("p > a")
: directs.select(p.c#id)
: class and ids.select(p > #id)
: mixs.select(a[href^=xxx])
: filtering based on attribute values
5.5 click http://click.pocoo.org/5/
5.6 pandas
Looks like it is a dataframe library
5.7 numpy
C-implementation of multi-dimensional arrays
5.8 scipy
scitific computing algorithms, including:
- linaer algebra
- optimization
- interpolation
- integration and differential equation
- clustering algorithms
- statistical distributions
5.9 scikit-learn
Learning library.
Supervised learning:
- linear models
- SVM
- Gaussian Processes
- Naive Bayes
- Decision Trees
- KNN
Unsupervised learning:
- Gaussian Mixture Models
- Manifold learning
- clustering
- k-means
Other topics
- Ensemble methods
- Feature Selection
- Outlier detection
- model selection
- grid search
- cross validation
5.10 matplotlib
5.10.1 type of figures
- plt.bar
- plt.scatter
- plt.plot: line plot
- plt.hist
- plt.pie
plt.plot([1,2,3,4])
Image via plt.imshow()
:
# plot a mnist digit (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() # since the data is just an array (28,28), imshow must have converted # it to image pixel properly plt.imshow(x_train[7777], cmap='Greys') # must call plt.show() to open the figure window. Or, execute # %matplotlib in the REPL, you can get the image directly after # imshow(). plt.show()
5.10.2 TODO plot options
5.10.3 legends, axis, more settings
Texts:
- plt.xlabel()
- plt.ylabel()
- plt.title()
- plt.axis()
- plt.text()
- plt.annotate
- plt.grid(True)
- plt.table(): attach a table to an axis!
Scale:
- plt.xscale('linear')
- plt.yscale('log')
5.10.4 Subplots
plt.ioff() figure = plt.figure() figure.canvas.set_window_title('My Grid Visualization') for x in range(height): for y in range(width): # print(x,y) figure.add_subplot(height, width, x*width + y + 1) plt.axis('off') plt.imshow(convert_image_255(images[x*width+y]), cmap='gray') # plt.show() plt.savefig(filename)
Or better, create figure and axis, and plot for each axis:
import matplotlib.pyplot as plt import numpy as np np.random.seed(19680801) data = np.random.randn(2, 100) fig, axs = plt.subplots(2, 2, figsize=(5, 5)) axs[0, 0].hist(data[0]) axs[1, 0].scatter(data[0], data[1]) axs[0, 1].plot(data[0], data[1]) axs[1, 1].hist2d(data[0], data[1]) plt.show()
5.10.5 export to files
Visualize using OS GUI toolkit:
plt.show()
Plot to a file:
pylab.ioff()
plot([1, 2, 3])
savefig("/tmp/test.png")
5.11 imsave
imsave
is deprecated, change from
from scipy.misc import imsave
to
from imageio import imwrite as imsave
5.12 Nvidia GPU setting
Select visible GPU in a multi-GPU setting:
os.environ['CUDA_VISIBLE_DEVICES'] = '3'
CUDA setup
- Install Nvidia driver. This can be done using Ubuntu's software center. But this is the stable version, not newest
- Install cuda. To
/usr/local/cuda-10.0
. I use the "runfile", with the--override
option (otherwise throw gcc version not supported error). - Install cudnn by copying header files and library files into
/usr/local/cuda-10.0
- Configure
CUDA_PATH=/usr/local/cuda-10.0 export LD_LIBRARY_PATH="$CUDA_PATH/lib64:$LD_LIBRARY_PATH" export PATH="$CUDA_PATH/bin:$PATH" export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_PATH/extras/CUPTI/lib64"