I just want to know the logarithmic calculation for Domain / Page authority. Otherwise I'm left asking, "what does 30 actually mean?" In addition, is the keyword difficulty logarithmic? SEOmoz doesn't say.

To confirm, I don't care how they work out DA / PA.

The result is between 0 and 100, but is it...

log(x) ?

log(x+3) ?

log2(x) ?

etc ?

Hi Adam,

You can see a distribution of DA over the entire Mozscape index here:

http://www.seomoz.org/blog/introducing-seomoz-updated-page-authorit...

(look at the bottom panel in the 4th chart, titled "Domain Authority, distribution full index"). Note that this plots log(Domain Count) so there are many many more domains with small DA then large ones! The mean is about 10.88 and the median is 8.77.

Keyword difficulty uses the Mozscape metrics and it too takes some logs before computing the score. Since the raw metrics are highly skewed, we apply the log frequently to remove some of the skewness. I'm not sure what the median difficulty score is and this would be really hard to calculate since it would also depend on the distribution of keywords themselves which will have a very long tail. The best estimate I could make would just use the difficulty scores for keywords that have been run in the tool, but we don't save those in an easily usable form.

Well if you're input data is rescaled logarithmically then yes, it would be impossible to tell me what a DA of 30 is in relationship to a higher or lower DA.

But can you tell me this? What is the distribution of DA values across all values? It would be nice to know that "the median DA across all sites in our database is x." That would at least put the numbers in some perspective - and it's perspective I'm trying to get.

Can you also confirm if the "keyword difficulty" is also calculated with logarithmic inputs? And what's the median keyword difficulty?

Adam

Hi eatyourveggies,

The scales on PA and DA run from 1-100 with the largest, most important sites in the internet having PA/DA of 100 (Google, Facebook, etc). Beyond that, we don't attribute any special meaning to a value of "30" or "50" other then as a relative ordering. The keyword difficulty scale is similar with 100 signifying the most difficult keyword to rank for.

PA and DA are the output from a machine learning model that we then rescale to values between 1-100. The raw output from the model is dimensionless and doesn't have any interesting meaning. The rescaling is linear, but the inputs to the model are rescaled logarithmically before being used in the model. We use the natural log (base e) but the base is pretty arbitrary since one can transform from one base to another by changing coefficients, and the coefficients themselves are set in a regression. The key point is that since the inputs have a log applied to them it is much harder to increase DA from say 70 to 80 then it is from 30 to 40.

