R Language
Table of Contents
1 R lang
1.1 References
1.2 Setting up
1.2.1 Packages
library()
see the list of installed packages.library(class)
load the packageclass
.search()
see the list of loaded packages.install.packages()
andupdate.packages()
install and update packages.
If a library is "not available", you probably should add the repository:
setRepositories()
and select something likebioc*
orR-Forge
. If that still doesn't work, you can download and install from source:
install.packages("/path/to/file.tar.gz", repos=NULL)
But seems that you need to install the dependencies manually.
1.2.2 Emacs ESS
To start ESS session, run command S
.
ESS will create a command interface as a buffer.
Execute ?foo
will open the R-doc
for the function foo
.
There's a babel in org mode for R, so just C-c C-c
would work.
This will prompt to create a session.
One special for this babel is you can evaluate by line, using C-<RET>
in the edit buffer.
Keep the session using the header:
#+PROPERTY: session *R*
To export a graph:
:file image.png :results output graphics :exports both
To export ordinary result:
:exports both
To export some summary data:
:exports both :results output org
1.2.3 Interactive command
We need to know what's going on in the current workspace. getwd()
and setwd()
get and set the current dir. ls()
list the objects
currently stored. rm(x, y, z)
remove objects rm(list=ls())
remove
all objects. objects()
create and store current objects.
We can perform some IO by save.image("mydata.Rdata")
and
load("mydata.Rdata")
will save and load workspace in current
directory respectively. source("a.R")
loads a script.
sink("a.lis")
redirects the output to a file, and sink()
restore
that to standard output.
You can view documentation by calling help help(lm)
. ?lm
and
??solve
also shows documentation, while example(topic)
, as its
name indicates, shows the examples. help.start()
opens the html
documentation page.
Finally, q()
quit R.
The print
function can output the value of a variable.
print("hello") cat("world", "!", "\n") print(c(1,2,3))
To enter the debugger, a call to browser
function suffices. This allows you
to browse the values at that point. A more powerful debugger is by a call to
debug
with the function name as argument. Each time that function is called,
you enter the debug and can control the execution. Tracing can be registered by
trace
or untrace
with the name of the function. It might need to be quoted
in some case, so you'd better quote it, with double quotes. Every time the
function is invoked, the return value will be printed as trace.
1.2.4 Commonly used functions
str
show the structure of arbitrary typesummary
print the summary
To see the data, you can use:
- dim()
- length()
- head()
- tail()
summary(dataset)
shows some information like max,min,meanclass(data$col)
get typelevels(data$col)
if it is factor, get the valuesstopifnot()
: the assert
1.3 Scope
The scope is pretty weired, it is lexical scoping, but assign to the variable
has no effect. You have to use <<-
to assign.
f <- function() { var <- 0 list( print = function() { print(var) }, add1 = function() { var <- var + 1 }, sub1 = function() { var <<- var - 1 })} l <- f() l$print() l$add1() l$print() l$sub1() l$print()
The function parameter is always local, changing it have no effect outside:
f <- function (var) { var[1] <- 8 } x <- c(1,2,3) f(x) x
It is call by value, for all objects. You can NOT even change a global variable,
even using <<-
:
x <- 1 f <- function() { x <<- 2 } x
1.4 Types
1.4.1 primitives
Looks like all numbers are double by default:
typeof(1) typeof(1L)
1.4.2 Vector
Create a vector by c()
, which is append:
c(1,2,3)
x <- c(1,2,3)
c(x, 0, x)
Vectors are the array of objects of the same mode (type).
c(1, "hello")
To create a sequence:
1:10 10:1
Colon operator has higher priority:
2*1:10
The more powerful sequence function is seq
:
seq(1,10) seq(-1, 1, by=.2) seq(length=10, from=-5, by=.2)
Repeating something:
x <- c(1,2,3)
rep(x, times=3)
rep(x, each=3)
The length, mode, typeof
x <- c(1,2,3)
length(x)
mode(x)
typeof(x)
1.4.3 Indexing
Vectors can be indexed by integers, starting from 1.
x <- c(1,2,3,4)
x[2]
It can also be indexed by vector of integers
x <- c(1,2,3,4)
x[c(1,3)]
Negative index selects the elements other than those index. The index 0 will return empty (HEBI: ??).
x <- c(1,2,3,4)
x[0]
x[-1]
x[c(-1, -3)]
Indexing by logical vector
x <- c(1,2,3,4) x[c(TRUE, FALSE, TRUE, FALSE)]
Or by its name, where the string is compared partially (HEBI: ??)
x <- c(1,2,3) names(x) <- c("hello", "world", "lala") x x["hel"] x["hello"] x[c("hello", "world")]
1.4.4 cbind & rbind
This is to combine (for matrix, similar to append for vector) arguments in row-wise or column-wise, and creates matrix. This will apply broadcast.
cbind(0, rbind(1, 1:3))
1.4.5 matrix
Matrix can be created by the matrix
function.
x <- matrix(1:8, nrow=2)
x
dim(x)
Specify dimnames:
matrix(1:6, nrow = 2, dimnames = list(c("a", "b"), LETTERS[1:3]))
When indexing, the drop argument (defaulting to TRUE), if TRUE, the result is coerced to the lowest possible dimension:
m <- matrix(1:6, nrow = 2, dimnames = list(c("a", "b"), LETTERS[1:3])) m[1,] m[1, , drop = FALSE]
1.4.6 list
list is a misnomer, it is a dict
l <- list(hello=1, "world"=2) l l$hello l$world
When indexing lists, []
retains names, while [[]]
returns only the element:
c(abc = 123)[1]
c(abc = 123)[[1]]
1.4.7 Data frame
na.omit
- can omit the NA values in data frame
A data frame is a list of equal-length vectors.
When getting the data from read.csv, the result is a data frame.
Use names
to work on data frames will emit the names.
- Since it is a list, using
[]
to index will give also the list, a.k.a. data frame, retaining names. You can use a vector as index. - Using
[[]]
to index will give the value, dropping names. You cannot use a vector as index.
1.4.8 type conversion
you can change a type of a vector by
as.factor(x)
- as.numeric()
1.4.9 TODO data loading
read.csv
write
write.table
write.csv
read.table("filename", header=TRUE, sep=",")
- this ignores blank lines,
- and expect the header to be one field less than the body.
#
as comments
- read.delim
cat
outputs the data, no index, no newline
attach(data)
: make the columns into this namespacedetach(data)
: remove those
1.4.10 TODO missing value
The missing values are NA
, tested by is.na
. Illegal computations produces
NaN
, e.g. 0/0
.
1.4.11 TODO set
subset
1.4.12 TODO string
substr
1.4.13 Evaluation rules (broadcast)
- recycling rules: the shortest list is recycled to the length of longest.
- dimensional attributes: the dimension of matrix must match. No recycle for a matrix.
1.5 Operators
- arithmetic:
+-*/
,^
for exp,%%
for modulus - matrix:
%*%
matrix product,%o%
outer product - logic:
!
,&, |
for vector,&&, ||
for no vector - relative:
>, <, ==, <=, >=
- general:
<-, ->
assignments,$
list subset,:
sequence,~
for model formula
Built-in functions:
log
,exp
,sin
,cos
,tan
,sqrt
min
,max
range
: same asc(min(x),max(x))
length(x)
,sum(x)
,prod(x)
(product)mean(x)
:sum(x)/length(x)
var(x)
:sum((x-mean(x))^2)/(length(x)-1)
sort(x)
: increasing orderorder()
orsort.list()
paste(sep
" ")= function takes an arbitrary number of arguments and concatenates them one by one into character strings. The argument can be numeric.toString(8)
: convert integer to stringround(x, digits=0)
1.6 Control Structure
The compound statements are the same as C, can be a single statement without the braces.
1.6.1 Conditional
- if:
if (STMT) STMT else if (STMT) STMT else STMT
The switch is quite interesting. It is
switch(x, list...)
The semantic is to find x in the list and return the value. If x is not in list, return NULL.
An unnamed list has implicit index as name:
switch(2,1,2,3)
Using name:
switch("hello", hello=1, world=2)
not found, return NULL:
is.null(switch("nothing", hello=1))
Interestingly, even for named list, we can still use the index:
switch(2, hello=1, world=2)
1.6.2 Loop
- for loop:
for (NAME in VECTOR) STMT
- while loop:
while (STMT) STMT
- repeat:
repeat STMT
break
,next
ret <- c() for (i in 1:5) { ret <- c(ret, i) } ret
1.7 Function
function (ARGLIST) BODY
The argument list can be a symbol, a symbol=value
, or a ...
.
The body is a compound expression, surrounded with {}
.
Function can be assigned to a symbol.
The matching of formals and actual are pretty tricky.
- exact matching on tags
- partial matching on tags
- positional matching for
...
Partial matching result must be unique, but the exact matched ones are excluded before this step is entered.
f <- function(a, b, c) { return(c(a, b+c)) } f(1,2,3)
1.8 Quote (!!)
The quote will wrap the expression into an object without evaluating it. The
resulting object has the mode of call
. The eval
is used to evaluate it.
quote
substitute
eval
call
1.9 Models
1.9.1 Linear model
A linear model is created and fitted by lm()
function, with the model
formula and data frame. For example:
df = data.frame(x=c(1,2,3), y=c(2,4,6)) fm = lm(y ~ x, data=df) summary(fm)
The fitted model in the variable fm
can be accessed by:
coef
: extract the coefficientsdeviance
: the Residual Sum of Squareformula
: extract the model formulaplot
: produce four plots: residuals, fitted values, diagnostics.predict(OBJECT, newdata=DATA.FRAME)
: use the model to predictresiduals
: extract the residualssummary()
The models can be updated, if the formula only changes a little bit. In the
following example, the .
means the corresponding part of the original formula.
fs <- lm(y ~ x1 + x2, data=mydf) fs <- update(fs, . ~ . + x3) fs <- update(fs, sqrt(.) ~ .)
1.10 Plot
Process data:
- table
- cut(data, breaks=c(1,3,8))
1.10.1 Decoration
- box
- axis
- las attribute
- legend
- par
- text
- mtext
- points
1.10.2 Plot Types
1.10.2.1 plot
- lines
- abline
1.10.2.2 barplot
1.10.2.3 pie
1.10.2.4 boxplot
- quantile
1.10.2.5 hist
- lines(density(data))
1.10.2.6 TODO stem
1.10.2.7 TODO mosaicplot
1.10.2.8 pairs
1.10.3 Device Driver
When outputting some image, you have to tell R which format you want to use. The default on linux is X11, that's why it opens a image immediately after plotting. To use a device, call the device function, and after that all graphics output will be sent to that device.
- X11
- png
jpeg
When you have finished with a device, terminate it by
dev.off()
.To output to a file TODO to open plot in emacs:
pdf("test1.pdf") dev.control(displaylist = "enable") plot(1:10) dev.copy(pdf, "test2.pdf") dev.off() # should now have a valid test2.pdf dev.off() # finished
1.11 FFI
To call a C function in R, first load the C shared library:
dyn.load() dyn.unload()
Then, you can directly call a C function:
.C("foo", as.double(a), as.integer(b))
Or, if you need to pass a R object to the C function, you use
.Call("foo", a, b) .External("foo", a, b)
R also provides a command to create shared object, instead of using raw gcc:
R CMD SHLIB foo.c
1.12 Packages
1.12.1 ggplot2
qplot(totbill, tip, geom="point", data=tips) # scatter plot qplot(totbill, tip, geom="point", data=tips) + geom_smooth(method="lm") # with linear relationship line qplot(tip, geom="histogram", data=tip) # histogram qplot(tip, geom="histogram", binwidth=1, data=tips) # with custom binwidth # box plots qplot(sex, tipperc, geom="boxplot", data=tips) qplot(smoker, tipperc, geom="boxplot", data=tips) qplot(sex:smoker, tipperc, geom="boxplot", data=tips) # combine! plot the two sets of graph in two one graph qplot(totbill, tip, geom="point", colour=day, data=tips) # scatter plot with colors, in regard to "day" column
1.12.2 plot(x, y, …)
Possible ...
arguments:
type
what type of plot:p
for points,l
for lines,b
for both,h
forhistogram
like (orhigh-density
) vertical lines,
main
an overall title for the plot: seetitle
.xlab
a title for the x axis: seetitle
.ylab
a title for the y axis: seetitle
.
1.12.3 dplyr
A Grammar of Data Manipulation