Examples of Semi-Logarithmic Models

The following TSP program estimates two OLS regressions. In both cases the dependent (Y-axis) variable is the natural logarithm of wages.

In TSP, the command:
lwage = log(wage);
defines lwage to be the natural log of the variable wage
age20 = Age - 20 measures age as the number of years beyond 20 years of age.

The TSP Program

options memory = 6; 
options crt; 
 in 'mydat.tlb' ; 
?
? Create 2 new variables
?
age20 = age -20;
lwage = log(wage);
?
?
olsq lwage c f edy tenure age20 pu;
?
? Create fedy & fpu
?
fedy = edy*f;
fpu = f*pu;
msd edy;
olsq lwage C f edy fedy  tenure age20 PU fpu;

TSP Output: OLS Regression No. 1


                                      Equation   1
                                      ============

                       Method of estimation = Ordinary Least Squares


 Dependent variable: LWAGE
 Current sample:  1 to 3114
 Number of observations:  3114

        Mean of dep. var. = 2.72737      LM het. test = .829502E-02 [.927]
   Std. dev. of dep. var. = .499781     Durbin-Watson = 1.92043 [<.019]
 Sum of squared residuals = 568.478  Jarque-Bera test = 194.603 [.000]
    Variance of residuals = .182908   Ramsey's RESET2 = 27.8753 [.000]
 Std. error of regression = .427678   F (zero slopes) = 228.629 [.000]
                R-squared = .268903    Schwarz B.I.C. = 1794.72
       Adjusted R-squared = .267727    Log likelihood = -1770.58

            Estimated    Standard
 Variable  Coefficient     Error       t-statistic   P-value
 C         1.73013       .047948       36.0838       [.000]
 F         -.256329      .015617       -16.4129      [.000]
 EDY       .057169       .310267E-02   18.4257       [.000]
 TENURE    .012459       .104200E-02   11.9568       [.000]
 AGE20     .699638E-02   .886666E-03   7.89066       [.000]
 PU        .109227       .018699       5.84128       [.000]

In the above regression model there are 2 dummy variables (F and PU) and three continuous variables (edy, tenure and age20)

Continuous Variable Coefficients According to the online notes on the semilog model, the coefficient on a continuous explanatory variable such as edy, implies that the effect of a one unit change in edy on the expected wage is estimated to be approximately 5.72%. Actually, the ratio of predicted wages for two people with identical characteristics is exp(0.057169) = 1.0588, which implies the extra year of education increases the expected wage by 5.88%. This illustrates that interpreting coefficients as percent change effect is subject to some error. In this case the error is about (5.88 - 5.72) = 0.16 percentage points. The error is small for coefficients close to zero, but increases as the coefficient increases.

Consider the smaller coefficient on Tenure, which is 0.012459. This implies that an extra year in the same job raises the expected wage by 1.246% However, exp(0.012459) = 1.012537, which implies a 1.254% increase. In this case the difference is just 0.008 percentage points. (Note that when we interpret the coefficient on Tenure, Age20 is held constant so to increase Tenure by one year we must compare two people with the same age but who differ in their time on the current job by one year - this does not have the same effect on the expected wage as an individual person aging by one year and remaining in the same job).

Dummy Variable Coefficients The dummy variable PU is unity if the person works in the public sector and zero otherwise. The ratio of the predicted wage of a public sector work to a the predicted wage of a comparable private sector worker is exp(0.109) which implies a public sector wage premium of about 10.9% Actually, exp(0.109) = 1.1152, which implies a premium of 11.52% (This is the more accurate interpretation of the regression coefficient.) The word "comparable" is limited by the list of variables that are included in the regression. Thus the comparison is between workers of the same sex, with the same level of formal education, have held their current job for the same period of time and are the same age. Beyond that, the workers in the sample may differ. For example, they may live in different provinces and if there are regional differences in wages this omitted factor could make the estimated public sector wage premium misleading. This idea is developed in the next paragraph.

Impact of Omitted Variables Suppose for example that the population public sector wage premium is actually the same in all provinces and therefore equal to that for Canada as a whole. Now suppose workers in the Atlantic region earn less than identical workers in say Ontario. If our sample includes a disproportionate number of public sector workers from the Atlantic region (too many in relation to the population) and too few public sector workers from Ontario (and by implication too many private sector Ontario workers), this will lower the estimated public sector wage premium. The lower estimate of the public sector wage premium results from the fact that the sample's public sector workers are drawn mainly from the Atlantic region where wages are generally lower. The public sector wage premium "looks" low because we are essentially comparing private sector workers in highly-paid Ontario with public sector workers from the lower-paid Atlantic region. Notice that if the model included provincial/regional dummy variables, the regional difference in wages would be taken into account when the public sector premium is estimated. By including regional dummy variables we "correct" for the fact that the survey "oversampled" public sector workers in the Atlantic region and "undersampled" the public sector workers in Ontario.

The Female-Male Wage Gap

Consider now the coefficient on the female indicator variable F. It implies that "comparable" men and women earn different wages and specifically that the ratio of women's to men's wages is exp(-0.2563). The "quick" interpretation is that women's wages are on average 25.6% lower than comparable men's wages. More precisely, the estimate is given by exp(-0.2563) = 0.7739 (not 1 - 0.2563 = 0.7437), which implies women's wages are 77.4% of men's wages or 22.6% lower than men's wages. Another way to express the difference is that men's wages are on average 1/0.774 = 1.2921 times women's wages: men's wages are estimated to be 29.2% higher than women's wages.

We now turn to a model in which the male/female wage difference is allowed to vary with (a) the level of education and (b) the sector of employment. In other words, the revised model will reveal if the wage gap between men and women (in percentage terms) is the same at all levels of education, or varies with the level of education. It will also allow the male/female wage gap to differ between the public and private sectors. The empricial results are reported in regression model No. 2

TSP Output: OLS Regression No. 2


                                      Equation   2
                                      ============

                       Method of estimation = Ordinary Least Squares


 Dependent variable: LWAGE
 Current sample:  1 to 3114
 Number of observations:  3114

        Mean of dep. var. = 2.72737      LM het. test = .322636E-04 [.995]
   Std. dev. of dep. var. = .499781     Durbin-Watson = 1.91106 [<.011]
 Sum of squared residuals = 565.350  Jarque-Bera test = 197.026 [.000]
    Variance of residuals = .182019   Ramsey's RESET2 = 12.5107 [.000]
 Std. error of regression = .426637   F (zero slopes) = 166.560 [.000]
                R-squared = .272926    Schwarz B.I.C. = 1794.17
       Adjusted R-squared = .271288    Log likelihood = -1761.99

            Estimated    Standard
 Variable  Coefficient     Error       t-statistic   P-value
 C         1.85062       .056989       32.4734       [.000]
 F         -.603022      .089581       -6.73158      [.000]
 EDY       .048514       .387020E-02   12.5353       [.000]
 FEDY      .024030       .637462E-02   3.76967       [.000]
 TENURE    .012355       .104088E-02   11.8701       [.000]
 AGE20     .725723E-02   .886815E-03   8.18347       [.000]
 PU        .094792       .024863       3.81263       [.000]
 FPU       .026071       .036390       .716447       [.474]

The new variable FEDY = F*EDY and it is the coefficient on FEDY that allows the wage gap between men and women to vary with the level of education. Similarly, the coefficient on FPU = F*PU will measure the difference in the female/male wage gap across the two sectors of employment. One way to clarify the model's implications is to consider men and women's wages separarately. The wage full model can be written as follows:

ln(W)^{^} = a + b₁F + b₂EDY + b₃FEDY + b₄TENURE + b₅AGE20 + b₆PU + b₇FPU [1]

For men: F = 0, FEDY = F*EDY = 0 and FPU = F*PU = 0. Consequently, the wage equation for men is:

ln(W)^{^} = a + b₂EDY + b₄TENURE + b₅AGE20 + b₆PU [2] (Men)

For women: F = 1, FEDY = F*EDY = EDY and FPU = F*PU = PU. Consequently, the wage equation for women is:

ln(W)^{^} = a + b₁ + b₂EDY + b₃EDY + b₄TENURE + b₅AGE20 + b₆PU + b₇PU (Women)

which simplifies to:

ln(W)^{^} = a + b₁ + (b₂ + b₃)EDY + b₄TENURE + b₅AGE20 + (b₆ + b₇)PU [3] (Women)

Now compare the wage equations for men and women. For men, the effect of education on wages is b₂. For women, the effect of education on wages is (b₂ + b₃. It follows that b₃ measures how much the male/female log-wage gap varies with the level of education. If b₃ is zero, increasing the level of education raises men's and women's expected wages by the same amount, so the wage gap is unchanged. If b₃ is positive, then as years of education increase, women's expected wages rise faster than men's - in other words the male/female wage gap is greatest at low levels of education and diminishes as the level of education increases.

Does the Female-Male Wage Gap Vary with Education?

The OLS results show that b₃ = 0.024 and the associated t-statistic is 3.77 We therefore reject the hypothesis that the F/M wage gap is independent of the level of education. According to the estimated model, the F/M wage gap closes at the rate of 2.4% for every additional year of education. b₃ tells how the wage gap changes with education, but what is the estimated F/M wage gap?

In this model, the coefficient on F is the estimated F/M log-wage gap at EDY = 0 i.e., for workers with no formal education. There are no such workers in the sample which is why we should not be shocked by the fact that b₁ = -0.60, which implies a wage gap on the order of 60% at EDY = 0. The TSP program calculates summary statistics for EDY. The results are:

                   Mean       Std Dev       Minimum       Maximum 
 EDY           14.08526       2.58061       7.00000      19.00000

To compare the expected wages of men and women, we simply subtract equation [2] from equation [3].

ln(W_F)^{^} - ln(W_M)^{^} = b₁ + b₃EDY + b₇PU [4] (Log-Wage gap equation)

Since the difference in logarithms is the logarithm of the ratio, we can rewrite [4] as

ln(W_F)/W_M)^{^} = b₁ + b₃EDY + b₇PU [5] (Log of the F/M wage ratio)

Notice that in the subtraction of [2] from [3] the term involving Tenure cancels provided Tenure has the same numerical value in equations [2] and [3] (no matter what that numerical value is). Hence, the predicted wage gap is relevant to men and women for whom Tenure and Age20 are the same. In the next paragraph we consider the female/male log-wage gap as a function of education. It is convenient to set PU = 0 in the log-wage gap equation [4] i.e., we consider the F/M wage gap in the private sector. (However, the behaviour of the F/M wage gap as a function of years of education is the the same in the two sectors of employment.)

At the mean level of EDY (say 14 years), the logarithmic wage gap given by equation [4] in the private sector (PU=0) is estimated to be

-0.60 + 14*(0.024) = -.6 + 0.336 = -0.264

At the mean level of education women's expected wage in the private sector is about 26.4% less then the expected wage for men (also in the private sector.) At 15 years of education (one above the mean) the wage ratio declines to -0.60 - 15*(0.024) = -0.240. Thus, the wage gap falls from 26.4% to 24.0%, a decline of 2.4 percentage points.

The Public Sector Wage Premium

We have already noted that in regression No 2 the public sector wage premium is allowed to differ between men and women. From the log-wage equation for men [2], we can see that the public sector wage premium for men is the coefficient b₆ - the one attached to PU. The regression results imply a public sector wage premium of 0.095 i.e., men in the public sector earn about 9.5% more than comparable men in the private sector.

Equation [3], which applies to women, shows the public sector premium is b₆ + b₇, which is estimated to be 0.095 + 0.026 = 0.121. Women in the public sector earn approximately 12.1% more than women in the private sector.

The fact that the t-statistic on the coefficient b₇ is 0.716 means that we do not reject the hypothesis that the public sector wage premium is the same for men and women. Normally, we would not spend much space discussing estimated differences that are not statistically significantly different from zero (if there is no difference, why discuss it?). We have done so here simply as an exercise in model interpretation.

Finally, it is instructive to compare the results of regressions No. 1 and No. 2 In No. 1 the public sector wage premium applies to all workers - loosely speaking it is an average of the individual premia that apply to men and to women. In No. 1, the overall premium is estimated to be 10.9% It makes sense that this "overall" estimate lies between the male wage premium of 9.5% and the female wage premium of 12.1% that are estimated by Regression No. 2