2010-10-02 16 views
6

Chcę uruchomić dwustopniową regresję najmniejszych kwadratów probitowych w R. Czy ktoś wie jak to zrobić? Czy jest tam jakaś paczka? Wiem, że można to zrobić za pomocą Staty, więc wyobrażam sobie, że można to zrobić z R.Dwustopniowy najmniejszy kwadrat w R

Odpowiedz

8

Możesz chcieć być dokładniejszym, gdy powiesz "dwustopniowe probit-najmniej-kwadraty". Ponieważ odwołujesz się do programu Stata, który implementuje to, domyślam się, że mówisz o pakiecie CDSIMEQ, który implementuje procedurę Amemiya (1978) dla modelu Heckit (a.k.a Generalized Tobit, a.k.a. Tobit type II model, itp.). Jak powiedział Grant, systemfit zrobi dla ciebie Tobiasza, ale nie z dwoma równaniami. Pakiet MicEcon miał Heckit (ale pakiet rozdzielił się tak wiele razy, że nie wiem, gdzie jest teraz).

Jeśli chcesz, co robi CDSIMEQ, można łatwo wdrożyć w R. napisałem funkcję, która replikuje CDSIMEQ:

tspls <- function(formula1, formula2, data) { 
    # The Continous model 
    mf1 <- model.frame(formula1, data) 
    y1 <- model.response(mf1) 
    x1 <- model.matrix(attr(mf1, "terms"), mf1) 

    # The dicontionous model 
    mf2 <- model.frame(formula2, data) 
    y2 <- model.response(mf2) 
    x2 <- model.matrix(attr(mf2, "terms"), mf2) 

    # The matrix of all the exogenous variables 
    X <- cbind(x1, x2) 
    X <- X[, unique(colnames(X))] 

    J1 <- matrix(0, nrow = ncol(X), ncol = ncol(x1)) 
    J2 <- matrix(0, nrow = ncol(X), ncol = ncol(x2)) 
    for (i in 1:ncol(x1)) J1[match(colnames(x1)[i], colnames(X)), i] <- 1 
    for (i in 1:ncol(x2)) J2[match(colnames(x2)[i], colnames(X)), i] <- 1 

    # Step 1: 
    cat("\n\tNOW THE FIRST STAGE REGRESSION") 
    m1 <- lm(y1 ~ X - 1) 
    m2 <- glm(y2 ~ X - 1, family = binomial(link = "probit")) 
    print(summary(m1)) 
    print(summary(m2)) 

    yhat1 <- m1$fitted.values 
    yhat2 <- X %*% coef(m2) 

    PI1 <- m1$coefficients 
    PI2 <- m2$coefficients 
    V0 <- vcov(m2) 
    sigma1sq <- sum(m1$residuals^2)/m1$df.residual 
    sigma12 <- 1/length(y2) * sum(y2 * m1$residuals/dnorm(yhat2)) 

    # Step 2: 
    cat("\n\tNOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS") 

    m1 <- lm(y1 ~ yhat2 + x1 - 1) 
    m2 <- glm(y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 
    sm1 <- summary(m1) 
    sm2 <- summary(m2) 
    print(sm1) 
    print(sm2) 

    # Step 3: 
    cat("\tNOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS\n\n") 
    gamma1 <- m1$coefficients[1] 
    gamma2 <- m2$coefficients[1] 

    cc <- sigma1sq - 2 * gamma1 * sigma12 
    dd <- gamma2^2 * sigma1sq - 2 * gamma2 * sigma12 
    H <- cbind(PI2, J1) 
    G <- cbind(PI1, J2) 

    XX <- crossprod(X)       # X'X 
    HXXH <- solve(t(H) %*% XX %*% H)   # (H'X'XH)^(-1) 
    HXXVXXH <- t(H) %*% XX %*% V0 %*% XX %*% H # H'X'V0X'XH 
    Valpha1 <- cc * HXXH + gamma1^2 * HXXH %*% HXXVXXH %*% HXXH 

    GV <- t(G) %*% solve(V0) # G'V0^(-1) 
    GVG <- solve(GV %*% G)  # (G'V0^(-1)G)^(-1) 
    Valpha2 <- GVG + dd * GVG %*% GV %*% solve(XX) %*% solve(V0) %*% G %*% GVG 

    ans1 <- coef(sm1) 
    ans2 <- coef(sm2) 

    ans1[,2] <- sqrt(diag(Valpha1)) 
    ans2[,2] <- sqrt(diag(Valpha2)) 
    ans1[,3] <- ans1[,1]/ans1[,2] 
    ans2[,3] <- ans2[,1]/ans2[,2] 
    ans1[,4] <- 2 * pt(abs(ans1[,3]), m1$df.residual, lower.tail = FALSE) 
    ans2[,4] <- 2 * pnorm(abs(ans2[,3]), lower.tail = FALSE) 

    cat("Continuous:\n") 
    print(ans1) 
    cat("Dichotomous:\n") 
    print(ans2) 
} 

Dla porównania, możemy powtórzyć próbkę od autora CDSIMEQ w ich article about the package.

> library(foreign) 
> cdsimeq <- read.dta("http://www.stata-journal.com/software/sj3-2/st0038/cdsimeq.dta") 
> tspls(continuous ~ exog3 + exog2 + exog1 + exog4, 
+  dichotomous ~ exog1 + exog2 + exog5 + exog6 + exog7, 
+  data = cdsimeq) 

     NOW THE FIRST STAGE REGRESSION 
Call: 
lm(formula = y1 ~ X - 1) 

Residuals: 
     Min  1Q Median  3Q  Max 
-1.885921 -0.438579 -0.006262 0.432156 2.133738 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
X(Intercept) 0.010752 0.020620 0.521 0.602187  
Xexog3  0.158469 0.021862 7.249 8.46e-13 *** 
Xexog2  -0.009669 0.021666 -0.446 0.655488  
Xexog1  0.159955 0.021260 7.524 1.19e-13 *** 
Xexog4  0.316575 0.022456 14.097 < 2e-16 *** 
Xexog5  0.497207 0.021356 23.282 < 2e-16 *** 
Xexog6  -0.078017 0.021755 -3.586 0.000352 *** 
Xexog7  0.161177 0.022103 7.292 6.23e-13 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6488 on 992 degrees of freedom 
Multiple R-squared: 0.5972,  Adjusted R-squared: 0.594 
F-statistic: 183.9 on 8 and 992 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ X - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49531 -0.59244 0.01983 0.59708 2.41810 

Coefficients: 
      Estimate Std. Error z value Pr(>|z|)  
X(Intercept) 0.08352 0.05280 1.582 0.113692  
Xexog3  0.21345 0.05678 3.759 0.000170 *** 
Xexog2  0.21131 0.05471 3.862 0.000112 *** 
Xexog1  0.45591 0.06023 7.570 3.75e-14 *** 
Xexog4  0.39031 0.06173 6.322 2.57e-10 *** 
Xexog5  0.75955 0.06427 11.818 < 2e-16 *** 
Xexog6  0.85461 0.06831 12.510 < 2e-16 *** 
Xexog7  -0.16691 0.05653 -2.953 0.003152 ** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.14 on 992 degrees of freedom 
AIC: 770.14 

Number of Fisher Scoring iterations: 6 


     NOW THE SECOND STAGE REGRESSION WITH INSTRUMENTS 
Call: 
lm(formula = y1 ~ yhat2 + x1 - 1) 

Residuals: 
    Min  1Q Median  3Q  Max 
-2.32152 -0.53160 0.04886 0.53502 2.44818 

Coefficients: 
       Estimate Std. Error t value Pr(>|t|)  
yhat2   0.257592 0.021451 12.009 <2e-16 *** 
x1(Intercept) 0.012185 0.024809 0.491 0.623  
x1exog3  0.042520 0.026735 1.590 0.112  
x1exog2  0.011854 0.026723 0.444 0.657  
x1exog1  0.007773 0.028217 0.275 0.783  
x1exog4  0.318636 0.028311 11.255 <2e-16 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.7803 on 994 degrees of freedom 
Multiple R-squared: 0.4163,  Adjusted R-squared: 0.4128 
F-statistic: 118.2 on 6 and 994 DF, p-value: < 2.2e-16 


Call: 
glm(formula = y2 ~ yhat1 + x2 - 1, family = binomial(link = "probit")) 

Deviance Residuals: 
    Min  1Q Median  3Q  Max 
-2.49610 -0.58595 0.01969 0.59857 2.41281 

Coefficients: 
       Estimate Std. Error z value Pr(>|z|)  
yhat1   1.26287 0.16061 7.863 3.75e-15 *** 
x2(Intercept) 0.07080 0.05276 1.342 0.179654  
x2exog1  0.25093 0.06466 3.880 0.000104 *** 
x2exog2  0.22604 0.05389 4.194 2.74e-05 *** 
x2exog5  0.12912 0.09510 1.358 0.174544  
x2exog6  0.95609 0.07172 13.331 < 2e-16 *** 
x2exog7  -0.37128 0.06759 -5.493 3.94e-08 *** 
--- 
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1) 

    Null deviance: 1386.29 on 1000 degrees of freedom 
Residual deviance: 754.21 on 993 degrees of freedom 
AIC: 768.21 

Number of Fisher Scoring iterations: 6 

     NOW THE SECOND STAGE REGRESSION WITH CORRECTED STANDARD ERRORS 

Continuous: 
       Estimate Std. Error t value Pr(>|t|) 
yhat2   0.25759209 0.1043073 2.46955009 0.01369540 
x1(Intercept) 0.01218500 0.1198713 0.10165068 0.91905445 
x1exog3  0.04252006 0.1291588 0.32920764 0.74206810 
x1exog2  0.01185438 0.1290754 0.09184073 0.92684309 
x1exog1  0.00777347 0.1363643 0.05700519 0.95455252 
x1exog4  0.31863627 0.1367881 2.32941597 0.02003661 
Dichotomous: 
       Estimate Std. Error z value  Pr(>|z|) 
yhat1   1.26286574 0.7395166 1.7076909 0.0876937093 
x2(Intercept) 0.07079775 0.2666447 0.2655134 0.7906139867 
x2exog1  0.25092561 0.3126763 0.8025092 0.4222584495 
x2exog2  0.22603717 0.2739307 0.8251618 0.4092797527 
x2exog5  0.12911922 0.4822986 0.2677163 0.7889176766 
x2exog6  0.95609385 0.2823662 3.3860070 0.0007091758 
x2exog7  -0.37128221 0.3265478 -1.1369920 0.2555416141 
2

istnieje kilka pakietów dostępnych w R, aby wykonać dwa najmniejsze pola stanu. oto kilka

  1. sem: Two-Stage Least Squares
  2. Zelig: Link usunięty, nie jest już funkcjonalny (28.07.11)

dać mi znać, jeśli służą one swój cel.

+0

Dzięki, spróbuję. Wydaje się pasować do moich potrzeb. –

2

systemfit również załatwi sprawę.

Powiązane problemy