Statisticians love variable transformations. log-em, square-em, square-root-em, or even use the all-encompassing Box-Cox transformation, and voilla: you get variables that are "better behaved". Good behavior to statistician parents means things like kids with normal behavior (=normally distributed) and stable variance. Transformations are often used in order to be able to use popular tools such as linear regression, where the underlying assumptions require "well-behaved" variables.

Moving into the world of business, one transformation is more than just a "statistical technicality": the log transform. It turns out that taking a log function of the inputs (X's) and/or output (Y) variables in linear regression yields meaningful, interpretable relationships (there seems to be a misconception that linear regression is only useful for modeling a linear input-output relationship, but the truth is that the name "linear" describes the linear relationship between Y and the coefficients... very confusing indeed, and the fault of statisticians, of course!). Using log transforms enables modeling a wide range of meaningful, useful, non-linear relationships between inputs and outputs. Using a log-transform moves from unit-based interpretations to percentage-based interpretations.

So let's see how the log-transform works for linear regression interpretations.

Note: I use "log" to denote "log base e" (also known as "ln", or in Excel the function "=LN"). You can do the same with log base 10, but the interpretations are not as slick.

Let's start with a linear relationship between X and Y of the form (ignoring the noise part for simplicity):

The interpretation of b is:

Now, let's assume an exponential relationship of the form:

If we take logs on both sides we get:

The interpretation of b is:

Techical explanation:

Take a derivative of the last equation with respect to X (to denot a small increase in X). You get

1/Y dY/dx = b, or equivalently, dY/Y = b dX.

dX means a small increase in X, and dY is the associated increase in Y. The quantity dY/Y is a small proportional increase in Y (so 100 time dY/Y is a small percentage increase in Y). Hence, a small unit increase in X is associated with an average increase of 100b% increase in Y.

Another popular non-linear relationship is a log-relationship of the form:

Here the (approximate) interpretation of b is:

Finally, another very common relationship in business is completely multiplicative:

The approximate interpretation of b is:

Finally, note that although I've described a relationship between Y and a single X, all this can be extended to multiple X's. For example, to a multiplicative model such as:

Although this stuff is extremely useful, it is not easily found in many textbooks. Hence this post. I did find a good description in the book Regression methods in biostatistics: linear, logistic, survival, and repeated models by Vittinghoff et al. (see the relevant pages in Google books).

Moving into the world of business, one transformation is more than just a "statistical technicality": the log transform. It turns out that taking a log function of the inputs (X's) and/or output (Y) variables in linear regression yields meaningful, interpretable relationships (there seems to be a misconception that linear regression is only useful for modeling a linear input-output relationship, but the truth is that the name "linear" describes the linear relationship between Y and the coefficients... very confusing indeed, and the fault of statisticians, of course!). Using log transforms enables modeling a wide range of meaningful, useful, non-linear relationships between inputs and outputs. Using a log-transform moves from unit-based interpretations to percentage-based interpretations.

So let's see how the log-transform works for linear regression interpretations.

Note: I use "log" to denote "log base e" (also known as "ln", or in Excel the function "=LN"). You can do the same with log base 10, but the interpretations are not as slick.

Let's start with a linear relationship between X and Y of the form (ignoring the noise part for simplicity):

**Y = a + b X**The interpretation of b is:

*a unit increase in X is associated with an average of b units increase in Y.*Now, let's assume an exponential relationship of the form:

**Y = a exp(b X)**If we take logs on both sides we get:

**log(Y) = c + b X**The interpretation of b is:

**This approximate interpretation works well for |b|<0.1. Otherwise, the exact relationship is: a unit increase in X is associated with an average increase of 100(exp(b)-1) percent.***a unit increase in X in associated with an average of 100b percent increase in Y.*Techical explanation:

Take a derivative of the last equation with respect to X (to denot a small increase in X). You get

1/Y dY/dx = b, or equivalently, dY/Y = b dX.

dX means a small increase in X, and dY is the associated increase in Y. The quantity dY/Y is a small proportional increase in Y (so 100 time dY/Y is a small percentage increase in Y). Hence, a small unit increase in X is associated with an average increase of 100b% increase in Y.

Another popular non-linear relationship is a log-relationship of the form:

**Y = a + b log(X)**Here the (approximate) interpretation of b is:

*(Use the same steps in the previous technical explanation to get this result). The approximate interpretation is fairly accurate (the exact interpretation is: a 1% increase in X is associated with an average increase of (b)(log(1.01)) in Y, but log(1.01) is practically 0.01).***a 1% increase in X is associated with an average b/100 units increase in Y**.Finally, another very common relationship in business is completely multiplicative:

**Y = a X**. If we take logs here we get^{b}**log(Y) = c + b log(X)**.The approximate interpretation of b is:

*. Like the exponential model, the approximate interpretation works for |b|>0.1, and otherwise the exact interpretation is: a 1% increase in X is associated with an average 100*exp(d log(1.01)-1) percent increase in Y.***a 1% increase in X is associated with a b% increase in Y**Finally, note that although I've described a relationship between Y and a single X, all this can be extended to multiple X's. For example, to a multiplicative model such as:

**Y = a X1**^{b }**X2**^{c }**X3**.^{d }Although this stuff is extremely useful, it is not easily found in many textbooks. Hence this post. I did find a good description in the book Regression methods in biostatistics: linear, logistic, survival, and repeated models by Vittinghoff et al. (see the relevant pages in Google books).