数据科学中的R和Python: 03/01/2013

星期三, 三月 20, 2013

R语言玩转资产组合计算

以前都是用MATLAB或是EXCEL给学生讲资产组合的计算问题，实际R语言也可以做一样的事情。资产组合要解决的问题并不复杂，即给定一个可选的资产集合，要从中选择出一个最优组合，使其收益率较大，而风险较小。如果可选资产只有两个的话，问题非常简单，可以通过求导最优化的方式得到结果，即有效资产前沿的解析解。如果资产集合超过两个那么问题要复杂一些。首先需要给定一个要求的期望收益率，再求出在此约束下的方差最小组合，这可以通过二次规划解出。这样得到有效前沿上的一个点。然后更改期望收益率，又可以得到另一个最优解，如此反复即可得到多资产组合的有效前沿。

下面我们先用R语言中quadprog包的二次规划求解函数来计算一个例子，再用fPortfolio包中的函数来完成同样的任务。例子中使用的数据来自于fPortfolio包中的SMALLCAP.RET数据集，使用了其中的四个资产来示例。

阅读全文 »

星期六, 三月 16, 2013

灰色模型的R代码

最近帮朋友写了一个灰色模型GM(1,1)的R实现，参考网上现有的matlab代码，比较容易就可以弄出来。下面是具体过程，主函数是GM()，建立的模型是一个S3类，搭配两个自定义的泛型函数print和plot可以得到结果输出和图形。

# 本代码用来计算GM(1,1)灰度模型
# 输入：x 原始数据，adv为外推预测步长
# 输出：actual 原始数据，fit 拟合数据,degree 拟合精度，
#       C 后验差比值，P 小概率误差，predict 外推预测值
 
 
# 主函数
GM <- function(x,adv=0) {
    x0 <- x
    k = length(x0)
    # AGO
    x1 = cumsum(x0)
    # construct matrix B & Y
    B = cbind(-0.5*(x1[-k]+x1[-1]),rep(1,times=k-1))
    Y = x0[-1]
    # compute BtB...
    BtB = t(B)%*%B
    BtB.inv = solve(BtB)
    BtY = t(B)%*%Y
 
    # estimate
    alpha = BtB.inv%*%BtY
 
    # 建立预测模型
 
    predict <- function(k) {
        y = (x0[1] - alpha[2]/alpha[1])*exp(-alpha[1]*k)+alpha[2]/alpha[1]
        return(y)
    }
    pre <- sapply(X=0:(k-1),FUN=predict)
    prediff <- c(pre[1],diff(pre))
    # 计算残差
    error <- round(abs(prediff-x0),digits=6)
    emax <- max(error)
    emin <- min(error)
    # 模型评价
    incidence <- function(x) {
         return((emin + 0.5*emax)/(x+0.5*emax))
    }
    degree <- mean(sapply(error,incidence))
 
    s1 <- sqrt(sum((x0-mean(x0))^2)/5)
    s2 <- sqrt(sum((error-mean(error))^2)/5)
 
    C <- s2/s1
 
    e <- abs(error-mean(error))
    p <- length(e<0.6745)/length(e)
 
    result <- list(actual = x0,
                   fit = prediff,
                   degree = degree,
                   C = C,
                   P = p)
 
    # 外推预测第k+adv项
    if (adv > 0) {
        pre.adv <- predict(k+adv-1)-predict(k+adv-2)
 
        result$predict <- pre.adv
     }
    class(result) <- 'GM1.1'
    return(result)
}
 
# 增加对应gm1.1类的泛型函数 print & plot
print.GM1.1 <- function(mod){
    cat('the result of GM(1,1)\n')
    cat('Actual Data:', '\n',mod$actual ,'\n')
    cat('Fit Data:', '\n',round(mod$fit,2) ,'\n')
    cat('Degree:', round(mod$degree,3) ,'\n')
    cat('C:', round(mod$C,3) ,'\n')
    cat('P:', round(mod$P,3) ,'\n')
    if (!is.null(mod$predict)) {
        cat('Predict Data:', round(mod$predict,2), '\n')
    }
}
 
plot.GM1.1 <- function(mod,adv=5){
    prex <- numeric(adv)
    x <- mod$actual
    for (k in 1:adv){
        prex[k] <- GM(x,k)$predict    
    }
 
    value = c(x,prex)
 
    res <- data.frame(index = 1:length(value),
                      value = value,
                      type = factor(c(rep(1,length(x)),rep(2,length(prex)))))
    library(ggplot2)
    p <- ggplot(res,aes(x=index,y= value))
    p + geom_point(aes(color=type),size=3)+ 
        geom_path(linetype=2) +
        theme_bw()
}
 
 
# 原始数据
x = c(26.7,31.5,32.8,34.1,35.8,37.5)
 
# 预测第7项
res <- GM(x,1)
print(res)
plot(res,3)

星期五, 三月 08, 2013

数据挖掘的三行俳句

最近才看到Tom Khabaza写的一篇很有份量的文章，阐述了数据挖掘的九大法则，在最后他以俳句方式进行了总结，可谓是字字珠玑。原文很长，只将俳句和各法则的纲要翻译放在这里。

First the business goal // Is all in data mining // This defines the field
Second is knowledge // Of business at every stage // This is the centre
Third prepare data // The form permits the question // Shape the problem space
Fourth is no free lunch / /By searching find the model // And the goal may change
Fifth is Watkins law / /There will always be patterns // Traces in data
Sixth is perception // Data mining amplifies // Thus we find insight
Seventh prediction // Generalised from data // New facts locally
Eighth is the value // All value comes from business // Not technical things
Ninth and last is change // No pattern lasts forever // We do not stand still
Miners know this truth / /How can business be better? / /If we see what is
Can the process change? // Technology can change it / /None has done so yet

1、商业目标是数据挖掘的起点
2、业务知识贯穿着数据挖掘的每个阶段
3、数据挖掘中的大部分工作是数据准备
4、对于给定的问题，只有不断尝试才能得到适合的模型
5、模式总会存在
6、数据挖掘能增强对业务的理解
7、通过归纳，预测模型增加了信息量
8、数据挖掘的价值并不取决于模型的准确或稳定
9、所有的模式都会改变

页面