To advance and promote the discipline and professional practice of epidemiology in Ontario public health units
Please click here to visit our new website




10 Standardization of Rates - Member Page

APHEOLIST Discussion

Question re. "Reporting Standardized Rates" - posted 2009/06/24 12:05 PM
     Response 1 to "Reporting Standardized Rates" - posted 2009/06/25 2:22 PM
     Response 2 to "Reporting Standardized Rates" - Posted 2009/07/02 3:30 PM
Question re. standardizing CCHS rates - posted 2009/07/28 8:54 AM
     Response 1 to "standardizing CCHS rates" - posted 2009/07/28 9:37 AM
     Response 2 to "standardizing CCHS rates" - posted 2009/07/31 10:09 AM

Question re. "Reporting Standardized Rates" - posted 2009/06/24 12:05 PM

I thought I would throw this out for discussion with the APHEO group.
I am wondering what your thoughts are on the need to calculate and report on both age-adjusted standardized rates (direct standardization) and standardized morbidity/mortality ratios (indirect standardization) in the same report.  That is, if ASRs are calculated for a PHU and Ontario, ant the two are shown with Confidence Intervals, are SMRs also needed?
I've always thought they serve slightly different purposes, and that the ASR is most useful when comparing between PHUs, while the SMR is best to use when comparing to the province.  However, we often calculate ASRs to show trends over years, and include the province in the same graphical display.  After doing that is it necessary to also compute the SMR?  Essentially the ratio will be slightly different than if comparing the two ASRs to each other, but similar enough I think.  Yet, I've still always thought both were needed.  Perhaps not?

Response 1 to "Reporting Standardized Rates" - posted 2009/06/25 2:22 PM

In working with small populations, I have always found that both have been useful. SMRs have the advantage over Srates that there is no ambiguity regarding significance difference between esitmates....Srates with overlapping confidence intervals which do not touch their comparitors point estimates always drove me crazy when explaining reports to folks.

S-rates on the other hand are more redily interpretable by the non-epi. There is political traction lost in a report if someone living in a certain area can't look to the findings and say 'A-ha! The mordibidity rate of disease X is Y in my area.' S-rates serve that purpose while allowing for some standardization around age distribution.

My two cents: Keep them both....especially if you run into that overlapping confidence interval issue. If not, you could arguably ditch the SMRs.

Response 2 to "Reporting Standardized Rates" - Posted 2009/07/02 3:30 PM

The interpretation of directly and indirectly standardized rates is different, and they yield different estimates, so I'm going to risk disagreeing with "Response 1": I wouldn't include both. Do you really want to have to explain why the directly and indirectly standardized rates and ratios are different? Rather, choose one method and then provide the crude rate, the standardized rate, and the standardized rate ratio. As noted below, you can convert between rate and ratio quite easily.

A directly standardized rate indicates the expected rate if the study population had the same age structure as the standard population. An indirectly standardized rate indicates the expected rate if the study population had the same age-specific risks of morbidity/mortality as the standard population (while retaining its own age structure).

The default calculation for the direct method yields a standardized rate, while the indirect method yields a standardized ratio. However, the form of output shouldn't be the main issue, since both the rate and ratio can be converted from one form to the other. In the direct method, the study population standardized rate can be converted to a ratio by dividing it by the rate in the standard population. In the indirect method, the study population SMR can be converted to a rate by multiplying the study population crude rate by its SMR. (Kahn & Sempos, p. 97)

The direct method is more often used, presumably because we usually have more information about our local population than we do about the standard population. Also, directly standardized rates from diverse populations all standardized to the same standard population can be directly compared, while SMRs usually shouldn't be compared because they have different local age distributions (though Rothman and Greenland indicate that the bias will usually be small). Because of PHPD, lack of access to data is not an impediment. One reason to prefer indirect standardization is when your age-specific rates are based on small numbers and are therefore unstable. Also, when the age-specific counts in the study population are small or zero, the direct method essentially drops them, thereby wasting information. In the indirect method, on the other hand, the "observed" is the total number of events in the study population.

So, should you choose direct or indirect? Assuming you have the data to use either, I would lean toward the indirectly standardized method. Two reasons: 1) you avoid the problem of small age-specific counts in the study population; 2) it is easier to calculate and interpret the confidence interval on the SMR, which is the default output from the indirect method. You can then also include the indirectly standardized rate by multiplying the crude rate by the SMR, as noted above. With the confidence interval on the SMR, you needn't include a confidence interval on the rates, which means you avoid the overlapping confidence interval problem noted by Rob. The SE of the SMR is very easy to calculate: sqrt(SMR/E) where E is the expected number of events. With some relatively mild assumptions, you can calculate the CI as SMR +/- 1.96*(sqrt(SMR/E)). If you are using a computer program like Stata to do this, it will have a built-in Poisson distribution function and so will give you a somewhat more accurate confidence interval since you don't need the "large" number assumption.

You could accomplish the same thing with the direct method. The equation that I have for the directly standardized rate ratio from Newman (2001, page 253) is more complicated than for the SMR method. That may be because, for the direct method, Newman assumes that both the standard and study populations are measured with error, while the SMR equation reduces to a Poisson variable (the observed) divided by a constant (the expected). The SMR is reduced this way because it is assumed that the standard population counts have no variance, i.e., that they are a census. Given the different methods and interpretations of direct and indirect standardization, I'm not sure if the direct method could make the same simplifying assumption of a Poisson variable divided by a constant, since both the observed and expected rates are based on the local event counts and only the age distribution is taken from the standard population. I'd be interested in what a biostatistician would say about this. If true, this raises the interesting question, related to our previous discussion of a couple weeks ago on superpopulation theory, of whether we should assume that the standard population counts/rates for the SMR should be considered to have a variance of zero. If not, then you'd have to calculate the variance for both the local population and the standard population and add them together to get the total variance, probably in a similar way to the directly standardized rate ratio equation given in Newman (ibid). Like I said above, I'd be interested in a biostatistician's opinion of this.

It's easy to see why we epidemiologists are always making simplifying assumptions. Otherwise our heads would probably spin off and all of our confidence intervals would be so wide that we'd never find any significant differences anywhere! :) It's too bad that we are forced to hunt for significant differences, either for publication purposes, or to show need. It would probably be healthier for our science if we instead got points for taking into account as many sources of error as possible, with prizes going to those with the WIDEST credible confidence intervals! Then we'd all be learning Baysian modeling, and measurement error modeling, and random effects modeling, and multiple imputation methods.... And eventually all of our confidence intervals would be infinitely wide and we'd see that all is One, and just about then we'd shuffle off this mortal coil and merge into the Dao....

Question re. standardizing CCHS rates - posted 2009/07/28 8:54 AM

I have a question about standardizing CCHS rates and I'm wondering if anyone can offer any advice. We often do a lot of analysis using CCHS comparing our regional data with Ontario data. In the past, we have never standardized our rates in order to compare to the province, we just put both sets of crude rates in a table under Waterloo Region or Ontario, respectively. I'm beginning to think that maybe we should be standardizing our rates if we are comparing them provincially because the planners that use this information are directly comparing to see if we are different from the province as a whole. I am wondering what your thoughts are on this?

A few of the concerns we have are that it takes longer and more effort to standardize rates and right now we are not fully staffed so are having resource issues. The other question is how do we standardize rates in the context of bootstrapping with CCHS?

I am not sure if anyone has come across this and am wondering what approaches have been used by others? If you do standardize rates, is there a method or template that you can provide to do this (while bootstrapping)?

Response 1 to "standardizing CCHS rates" - posted 2009/07/28 9:37 AM

Good question...

I have been hoping that the Health Indicators project would age-standardize CCHS indicators for some time now. However, they have been providing lots of age-specific rates - we tend to use these now.

I thought there was a macro you can use in SPSS? Did the StatsCan folk develop a macro? With the Ontario Health Survey, we used SUDAAN to get age-standardized rates.

Response 2 to "standardizing CCHS rates" - posted 2009/07/31 10:09 AM

This is a great question.

There is a short answer to it on the APHEO website here:

The web page indicates that you first obtain the age-specific  rate/proportions and variances for the populations in question using your usual bootstrapping method. Then, it gives you a formula for
standardizing those results.

>From your email, I gather you are trying to make this as quick, easy and routine as possible. So, here is a little more detail on how you could do it in one step, rather than calculating the results first in the bootstrapping program and then applying the age-standardization in a separate step. The method I give below uses Stata.

You may choose to standardize to an external population standard, like the 1991 Canadian population, or you may choose to standardize to the total age distribution of the CCHS file. I'll give both methods.

A) Method using external standard (1991 Canadian population, 18 age groups):

1. Generate the age groups you want to standardize on. I use the following code, which may look a little daunting at first, but is just a for-loop that generates and labels the age groups by counting five-year chunks. I do it this way because I'm lazy and don't like writing out replace and label statements for every five-year increment:

* Specify name of age variable local agevar age

* Generate age groups gen agegr = .

local lower = 0
local upper = 4

forvalues x = 1(1)17 {
replace agegr = `x' if `agevar'>=`lower' & `agevar'<=`upper'
label define agegr `x' "`lower'-`upper'", modify
local lower = `lower' + 5
local upper = `upper' + 5

replace agegr = 18 if `agevar'>=85
label define agegr 18 "85+", modify

label values agegr agegr

2. Generate another variable with the 1991 standard population age group proportions. For the 1991 standard population using 18 age groups, these are the proportions:

gen stdwgt = .
replace stdwgt = 0.069464491 if agegr==1
replace stdwgt = 0.069453787 if agegr==2
replace stdwgt = 0.068033804 if agegr==3
replace stdwgt = 0.068495219 if agegr==4
replace stdwgt = 0.075015901 if agegr==5
replace stdwgt = 0.089944280 if agegr==6
replace stdwgt = 0.092399822 if agegr==7
replace stdwgt = 0.083387858 if agegr==8
replace stdwgt = 0.076062804 if agegr==9
replace stdwgt = 0.059535887 if agegr==10
replace stdwgt = 0.047649321 if agegr==11
replace stdwgt = 0.044041186 if agegr==12
replace stdwgt = 0.042326254 if agegr==13
replace stdwgt = 0.038569897 if agegr==14
replace stdwgt = 0.029659391 if agegr==15
replace stdwgt = 0.022127296 if agegr==16
replace stdwgt = 0.013595381 if agegr==17
replace stdwgt = 0.010237423 if agegr==18

3. I have previously set my survey parameters using -svyset- to define the probability weight variable (e.g., wtse_s in CCHS 3.1) and the bootstrap weights (e.g., bsw1-bsw500). Having done that already, and having now generated my age group variable and my standard population weights (above), I simply add standardization options to my usual survey analysis command. Here is an example from CCHS 3.1, calculating the proportion of people who have ever had a flu shot (CCHS variable flue_160) by health unit (geoedhr4), excluding the refused/not stated/DK values. And, yes, it also calculates the confidence intervals.

This would be the command issued for non-standardized bootstrapped estimates:
svy brr: proportion flue_160 if flue_160<3, over(geoedhr4)

And this would be the command issued for age-standardized bootstrapped estimates:
svy brr: proportion flue_160 if flue_160<3, over(geoedhr4) stdize(agegr) stdweight(stdwgt)

B) Method using internal standard (distribution of total CCHS file)

One little quirk of combining the internal standard distribution technique with survey data is that the standardization is applied *after* the probability weighting. Therefore, while you might expect
that the standardized and unstandardized estimates for the total CCHS file population should be the same, it turns out that they aren't. I don't suppose that this really matters since the age structure we standardize to is more or less arbitrary, but it may be slightly disconcerting if that's what you're expecting. In any case, the directly standardized rate is synthetic, so it's absolute value isn't important.

1. Generate the age groups same as step 1 above.

2. Generate total age distribution to be used as "internal" standard population

* Generate a constant equal to 1 to indicate that each observation is one person
gen dum = 1

* Generate the total dataset population
egen tot = total(dum)

* Generate the age-group specific total populations
egen agegrtot = total(dum), by(agegr)

* Generate the proportion of people in each age group
gen intstdpop = agertot/tot

3. Now the variable intstdpop contains your standard population distribution. So you can follow Step 3 above using this internal standard distribution rather than the external standard.
svy brr: proportion flue_160 if flue_160<3, over(geoedhr4) stdize(agegr) stdweight(intstdpop)

As you can see, combining the standardization features with the survey bootstrapping features makes it very easy to get age-standardized survey results in Stata. I imagine that this would be possible in SPSS also. There is no doubt a way to program SPSS to "save" each of the stratum- and age-specific point estimates and standard errors that are produced in the bootstrapping program and then age-standardize them according to the formulae provided on the APHEO webpage at the top of this message.


Treasurer/Secretary | Admin | Members Login