answer

  

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

· QUESTION 1

For a factor to be a confounder, two things must be true. What are they?

· QUESTION 2

Explain the difference between error and bias 

Save Time On Research and Writing
Hire a Pro to Write You a 100% Plagiarism-Free Paper.
Get My Paper

· QUESTION 3

List and explain the two primary sources of bias in epi studies.

upload PowerPoint for helping you.. 

Interaction

In this, your last lesson, we’ll talk about a special type of confounding known as interaction.
1

Overview
Overview of Interaction
How to Detect Interaction

2

About Interaction
Secondary exposure obscures truth
E and E* interact
Synergistic
Antagonistic

3
As mentioned, interaction is a special type of confounding that obscures the truth about the primary exposure’s relationship to the disease of interest. Interaction is a biological phenomena that occurs when the effects of the primary exposure and secondary exposure act together to produce a much greater(synergistic) or much weaker (antagonistic) measure of effect than the mere sum of the two exposures..
Recall that with confounding, we want to control for (or eliminate) the effects of E* so as to get closer to the “truth” between E and D. With interaction, we do not want to do this because it isn’t the secondary exposure (E*) in and of itself that is obscuring the truth.
Because interaction is a biological phenomena, it should be appreciated and communicated if found.
Let’s look at an example.

Example
To determine the relationship between oral cancer and alcohol use, you conduct a ca-co study using 400 cases and 400 controls. Your data showed that 320 cases and 180 controls were alcohol drinkers. Because many drinkers smoke cigarettes, and there is an independent relationship between smoking and oral cancer, you measure for smoking status as well. Results show that among drinkers, 252 cases and 72 controls smoked. Of those who did not drink, 48 cases and 40 controls smoked.

To determine the relationship between oral cancer and alcohol use, you conduct a ca-co study using 400 cases and 400 controls. Your data showed that 320 cases and 180 controls were alcohol drinkers. Because many drinkers smoke cigarettes, and there is an independent relationship between smoking and oral cancer, you measure for smoking status as well. Results show that among drinkers, 252 cases and 72 controls smoked. Of those who did not drink, 48 cases and 40 controls smoked.
So, your primary exposure of interest is………..
And your secondary exposure of interest is……..
f you said drinking and smoking in that order, then give yourself a hearty pat on the back. =)
4

Step 1: Set up Data Table for E and Calculate Crude OR
Ca Co Total
E+
(Drinkers) 320 180 500
E-
(Nondrinkers) 80 220 300
Total 400 400 800

To determine the relationship between oral cancer and alcohol use, you conduct a ca-co study using 400 cases and 400 controls. Your data showed that 320 cases and 180 controls were alcohol drinkers. Because many drinkers smoke cigarettes, and there is an independent relationship between smoking and oral cancer, you measure for smoking status as well. Results show that among drinkers, 252 cases and 72 controls smoked. Of those who did not drink, 48 cases and 40 controls smoked.
Drinking is your primary E of interest, so calculate the OR for drinking and oral cancer after plugging these data into your 2×2 and calculate a crude OR.
OR crude = ad/bc
= (320 x 220) / (180 x 80)
=70,400/14,400
= 4.9
5

Step 2: Stratify E by E*
ca co total
SMOKERS (E*+)
E+ (drinkers) 252 72 324
E- (nondrinkers) 48 40 88
Totals 300 112 412
NON-SMOKERS (E*-)
E+ (drinkers) 68 108 176
E- (nondrinkers) 32 180 212
Totals 100 288 388
Grand Total 400 400 8000

6
To determine the relationship between oral cancer and alcohol use, you conduct a ca-co study using 400 cases and 400 controls. Your data showed that 320 cases and 180 controls were alcohol drinkers. Because many drinkers smoke cigarettes, and there is an independent relationship between smoking and oral cancer, you measure for smoking status as well. Results show that among drinkers, 252 cases and 72 controls smoked. Of those who did not drink, 48 cases and 40 controls smoked.
If you are unsure how I derived these numbers, please review the lesson on “confounding” where I walk you through the process.

Step 3: Calculate Strata-Specific ORs and compare to Crude OR
OR smoker = 2.9
OR nonsmoker = 3.5
Crude OR = 4.9

7
OR nonsmoker = (252 x 40)/ (72 x 48) = 10,080/3456 = 2.9
OR smoker = ((68 x 180) / (108 x 32) = 12,240/3456 = 3.5
So when we compare our E* s-s ORs to our crude OR, we notice a few things. First, we see that both strata differ from the crude. Second, we see that the general rule #1 for confounding is met because both s-s ORs are lower than the crude. We also note that the s-s ORs are not that similar; thus, general rule 2 for confounding states we shouldn’t try to adjust for E*. Instead, we suspect synergistic interaction, aka interaction. And we suspect it, we must investigate it. Let me show you how. =)

 

Step 4: Determine Excess Risk

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Alcohol Smoking Ca Co OR Excess Risk
Row 1
Row 2
Row 3
Row 4

8
To test for interaction, we create what is called an excess risk table. Yay! More tables!! An excess risk table allows us to determine if there really is a synergistic effect, by comparing each strata of E,E* to a referent.
I’m going to walk you through how the table is set up and calculated in the next several slides, but will also post a video as well on how to do this.
First, you set up the table as shown on the slide. It is VERY important to label each column explicitly, to keep yourself organized, as such
Column 1 = primary E
Column 2 = secondary E (E*)
Column 3 = D+ (cases)
Column 4 = D- (controls)
Column 5 = Odds Ratio
Column 6 = Excess risk
We will have 4 rows, which I will explain in the next slide

Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Alcohol Smoking Ca Co OR Excess Risk
Row 1 No No
Row 2 Yes No
Row 3 No Yes
Row 4 Yes Yes

Columns 2 and 3 by Row

9
Here’s how to set up your rows:
Row 1 = E-, E*- (non drinker, non smoker)
Row 2 = E+, E*- (drinker, non smoker)
Row 3 = E-, E*= (non drinker, smoker)
Row 4 = E+, E*+ (smoker, drinker)
Row 1 must always be the unexposed on both Es and row 4 must always be the exposed on both Es. You can switch the order of rows 2 and 3 with no consequences…
Now, let’s plug in our numbers….

Columns 3 & 4
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Alcohol Smoking Ca Co OR Excess Risk
Row 1 No No 32 180
Row 2 Yes No 68 108
Row 3 No Yes 48 40
Row 4 Yes Yes 252 72

ca co total
SMOKERS (E*+)
E+ (drinkers) 252 72 324
E- (nondrinkers) 48 40 88
Totals 300 112 412
NON-SMOKERS (E*-)
E+ (drinkers) 68 108 176
E- (nondrinkers) 32 180 212
Totals 100 288 388
Grand Total 400 400 8000

10
To show you where I’m getting my numbers, I copied the data table from Step 2 and color coded the numbers. =)
Note: if you’ve set your stratified data up with the E*+ 2×2 table on top, then the Row 1 in your Excess Risk Table will be the last row in your data table and Row 4 will be the first.
Ok, now let’s get to the OR and Excess Risk columns….

Column 5: Compute Row ORs
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Alcohol Smoking Ca Co OR Excess Risk
Row 1 No No 32 (c) 180 (d) 1 (null)
Row 2 Yes No 68 (a) 108 (b) 3.5
Row 3 No Yes 48 (a) 40 (b) 6.8
Row 4 Yes Yes 252 (a) 72 (b) 19.7

11
When testing for interaction, we use row 1 as our referent. Because these people are “exposure free,” they — in essence – represent the natural course of the D of interest, and thus are the null.
Calculating OR. As I noted above, the first row is the referent and therefore will always have the OR of 1 (because 1 is the null value of OR). The other rows are calculated like a normal OR, using the data from the referent row as cells C and D. The A and B cells come from the data in each specific row. We the calculate the OR as usual: OR = ad/bc …. So, in this example we have:
Row 2 = (68 x 180) / (108 x 32) = 12,240/3456 =3.5
Row 3 = (48 x 180) / (40 x 32) = 8640/1280 =6.8
Row 4 = (252 x 180) / (72 x 32) = 45,360/2304 = 19.7
Still with me?

Column 6: Subtract null from OR
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6
Alcohol Smoking Ca Co OR Excess Risk
Row 1 No No 32 (c) 180 (d) 1 (null) —
Row 2 Yes No 68 (a) 108 (b) 3.5 2.5
Row 3 No Yes 48 (a) 40 (b) 6.8 5.8
Row 4 Yes Yes 252 (a) 72 (b) 19.7 18.7

12
Column 6 shows us how much risk there given the exposure(s) over what occurs naturally. Because Row 1 is the referent, this cell will be blank. For Rows 2 – 4, though, we subtract 1 from the OR values to determine the excess risk. Why? Because the null value of OR = 1; therefore anything above 1 is excess. =) To reiterate, Excess Risk = OR – 1
So:
Row 2 = 3.5 – 1 = 2.5
Row 3 = 6.8 – 1 = 5.8
Row 4 = 19.7 – 1 = 18.7
Now that our Excess Risk table is complete, let’s move on to step 5

Step 5: Make Interaction Determination
Additive Excess Risk = 8.3
Combined Excess Risk = 18.7
Excess Risk Ratio: 2.3

13
Recall, interaction is when variables act together to produce a much greater (synergistic) or much weaker (antagonistic) measure of effect than the mere sum of the two variables. Therefore, we want to compare the observed excess risk of the two individual Es summed (rows 2 and 3) to the observed excess risk of the two Es together (row 4)
Additive Excess Risk = Excess Risk of Row 2 + Excess Risk of Row 3 = 2.5 + 5.8 = 8.3
Combined Excess Risk = Excess Risk of Row 4 = 18.7
Next, compute how much greater (or lesser) the excess risk of the combined exposures (row 4) is compared to the summed (additive) excess risk of the individual exposures (Row 2 + Row 3). We do this by computing an excess risk ratio as such:
Excess Risk Ratio = Combined Excess Risk / Additive Risk = 18.7/8.3 = 2.3
Interpretation: Both variables acting together produce roughly 2.3 times greater excess risk than the variables produce additively.
Here’s the rule: If the combined excess risk (row 4) is more than twice the additive excess risk (row 2 + row 3), then we have interaction.

Now what?

So are we finished? Well, it depends… Because we detected interaction, and we never want to control for such a phenomena, we are finished. If, however, we don’t detect interaction, then we need to go back and control for confounding.
The reason we do this is because our stratified E* ORs differed from our crude E OR and our general rule #1 was met. Because it’s a judgement call on how different the E* ORs are from one another, we sometimes get it right by “eyeballing” and sometimes don’t. But because the math is so much fun to do, we really don’t care. =)
14

Follow-Up Studies

Aka Cohort Studies

1
In this lecture, we begin learning about analytical epi designs. Recall these are TRUE epi designs in that they are observational rather than experimental…

Lesson Overview
Overview of follow-up studies
Advantages
Disadvantages
Types of follow-up studies
Prospective
Retrospective
Exposure status
Analysis

2

F-up Overview
Also called:
Longitudinal Study
Span long periods of time
Cohort Study
Follows a defined group of people over time

Follow-up studies are also sometimes called “longitudinal” studies in that they are conducted over a long period of time. They are also commonly called “cohort” studies because they follow a defined group of people ( a “cohort” ) over time.
3

Advantages of F-up Studies
Direct measurement of incidence in E+ and E-
Can study multiple outcomes
Different diseases developed
Different causes of death
Knowledge of exposure precedes disease
Good for rare exposures

4
Follow-up studies are the “gold standard” when exploring the association between a risk factor (or in prevention studies, a health-enhancing factor), and health status outcomes or health events. Like the name implies, this type of study “follows up” on people, based on their exposure status, as they progress through time. Because of this, f-up studies can:
assess multiple outcomes (effects) of a single exposure
assess the temporal relationship between exposure and disease
If prospective, minimize bias in the ascertainment of exposure status
Allow direct measurement of incidence of disease in the exposed and non exposed population
F-up studies are good for determining the health outcomes of rare exposures because the “starting point” is the exposure status.

Starts with Exposure

As mentioned on the previous slide, follow-up studies select study subjects based on exposure status. At time zero, the two groups – the exposed and the unexposed – are all disease free. Each group is then followed through time to determine the occurrence of the disease (incidence) or death (mortality) from it.
F-up studies allow us to compare the incidence of the disease in an exposed population to the incidence of disease in an unexposed population. As such, f-up studies estimate the magnitude of an association between exposure and disease, and indicates the likelihood of developing the disease in the exposed group relative to those who are not exposed. A classic example is cigarette smokers: cigarette smokers are 9-10 times more likely to develop lung cancer than non smokers.
(CCD, recall, is common closing date – or when the study ends).
5

Follow-Up Ends for a person when
She/he gets the disease of interest (or, in mortality studies, dies from it)
She/he dies (from any cause)
She/he becomes “lost to follow-up”
the common closing date (CCD) is reached

6
In a f/up study, the follow up on an individual subject ends when she/he:
gets the disease of interest (or, in mortality studies, dies from it)
2) dies (from any cause)
3) becomes “lost to follow-up” (so that information on D+ or D- is no longer available)
Does this look familiar? =)
Otherwise, follow-up for the study ends on the common closing date (CCD) designated by the investigator.
Cases that occur after the CCD do not count as D+. THIS IS IMPORTANT TO REMEMBER!!
Commit the information on this slide to memory!!!

Disadvantages of F-up Studies
Expensive
Long study period
Not good for rare disease
Limited number of exposures
Losses to follow up

7
Difficulties or problems with all f/up studies:
1.For a rare disease, in order to observe any incident (D) cases, your must observe a whole lot of people and/or follow them for a great length of time.
2. Choice of so-called E- group is not always straight forward (this is an example of selection bias and is a factor that affects validity). E- groups can be:
a) external comparison group — people with characteristics that are different (geographic, racial, etc.) than those of the E+ group. This, potentially, reduces the comparability between the two groups with respect to other factors (known as confounding).
b) internal comparison group — people from within the same group who we believe — or who claim to be — unexposed. The problem here is that there is no way to know whether or not they were, perhaps, exposed at a lower level.
3. Loss to follow up – sometimes more than 30 to 40% (increases with length of study period); this attrition introduces bias and can weaken the validity of the study.
4. Nonparticipation – introduces concerns regarding ability to generalize findings

Follow-Up Studies Can Be…
Prospective (concurrent)
Retrospective (nonconcurrent)

8
F/up studies can be:
1) Prospective (aka “concurrent”): follow-up period begins today and extends into the future; or,
2) Retrospective (aka “nonconcurrent”): follow-up period begins at some point in the past and ends in the past.

Prospective F/Up Study
Epidemiologist starts at time zero
Groups based on exposure status
Everyone is D-
Everyone is followed through time
CCD
Disease status assessed

9
In a prospective f-up study, the epidemiologist starts at time zero, which is when the exposure occurs, and groups people based on their exposure status. For example, let’s say there was a chemical spill in a factory and all of the employees who worked the first shift were exposed, but those who worked the third shift were not. Thus, the first shift workers would be the E+ group and the third shift workers would be the E- group. Both groups are then followed through time, and at a specified point in time (the CCD), everyone is assessed for D status.
The disadvantages of prospective f/up studies are that they typically require the investigator to wait a long time for the disease to occur, if the induction period for the disease is long, and “time is money.” A good cohort study will cost you millions of $$ and often take 10 – 15 years to produce meaningful results. Prospective f-up study can use p-years which permits a reduction in the number of study subjects, though, while increasing the number of years of exposure.

Retrospective F/Up Study
Epidemiologist starts at CCD (or present)
Groups still based on exposure status
Looking back to time zero to determine E status
Some are D+, some are D-

10
A retrospective follow-up study still selects subjects based on exposure status, but the difference between it and a prospective f/up study is where the epidemiologist is in the study. Everything that is going to happen has already happened. For example, 10 years ago there was a chemical spill in a factory. The first shift workers were exposed, but the third shift workers were not. I decide to contact all of the first shift workers (E+) and third shift workers (E-) to determine D status.
Retrospective F-ups are the second most common study design. They don’t take as long as prospective cohort studies and can be done quickly. This is an especially good design for diseases with long induction periods and especially useful in occupational studies where they have good records on exposure status. However, they also have some pretty significant methodological issues which affects validity.. Retrospective studies require “historical” measurement (or inferences) about exposure at time-zero: Was each individual E+ or E-? Remember for an exposure to cause a disease, the exposure must occur first. If E+, what was the “dose” of E? (This is an example of information bias and is a factor that affects validity).
Another major disadvantage of retrospective f/up studies is that they are prone to confounding (which we’ll learn about in a future lesson) for several reasons. First, there is the prospect that what you think caused the death/illness of a cohort is in fact only a marker for some other cause. Secondly, there is a tendency toward observational bias because the researcher is often working with medical records containing historical information, with little likelihood of confirmation; the decision to exclude or include medical record is based on observer interpretation, and this may vary form on observer to another. Additionally, provider data may also be highly subjective (strong emotions are often involved in recalling the circumstances of illness or death).

Prospective Retrospective
Costs (time and $) — +-
Control of bias & confounding +++ –
Attrition/Loss to follow up – +
Clear E/D temporal sequence + –
Rare diseases – -+
Rare exposures ++ +-
Multiple D outcomes + ++
Allows estimation of AR + –

Prospective vs Retrospective F-up Studies

Here’s a handy summary table that can help you be able to compare and contrast the pros and cons of prospective and retrospective f-up studies.

11

Determining Exposure Status
Preexisting records
Medical Records
Study Subjects
Surrogate for study subject

12
Another central issues in a follow-up study concerns the basis on which a given individual should be considered exposed.
The problem here is that there is no way to know whether or not they were, perhaps, exposed at a lower level. For example, let’s say we want to look at the relationship between smoking and cancer. E+ are those who smoke and E- are those who don’t. Seems fairly clear, right? Well, what about those E- who smoked in the past but are now non-smokers? What about those E- who are exposed regularly to second-hand smoke?
When determining exposure status, epidemiologists can use medical records and/or other preexisting records. Advantages include low cost and availability; disadvantages include potential for bias, no or limited information on confounders, and lack of details. Other sources for determining exposure status include interviewing or surveying study subjects (or their surrogates)

Analysis: F/Up Studies
Incidence rate of disease among the exposed compared to incidence rate of disease among unexposed

13
F-up studies are analyzed by using RR, which you already know how to do: The incidence of D in the exposed group is divided by the incidence of D in the unexposed (i.e., if smokers develop 80 lung cancers per 1000, and nonsmokers only 5 per 1000, then the RR = 80 divided by 5 = 16; smokers were 16 times more likely to develop lung cancer).

Example:
To determine the relationship between oral contraceptive (OC) use and myocardial infarction (MI), a total of 3276 pre-menopausal female nurses were followed for a fixed period of time. Among those who did not use OC (n=2949), 133 had an MI by the CCD as compared to 23 of the 327 who used OC. What is the relative risk of having an MI with OC use?

Let’s walk through an example….
14

1. Set up 2×2
D+ D- total
E+ 23 327-23=
304 327
E- 133 2949-133 =
2816 2949
total 23+133 = 156 304+2816=
3118 3276

15
Step 1: set up your data in the 2×2 table form (epidemiological matrix).
To determine the relationship between oral contraceptive (OC) use and myocardial infarction (MI), a total of 3276 pre-menopausal female nurses were followed for a fixed period of time. Among those who did not use OC (n=2949), 133 had an MI by the CCD as compared to 23 of the 327 who used OC. What is the relative risk of having an MI with OC use

2. Calculate RR
RR = incidence in E+/incidence in E-
Where IE+ = a/(a+b) and
Where IE- = c/(c+d)
RR = [23/(23+304)] / [133/(133+2816)]
= (23/237) / (133/2949)
=.07/.045
= 1.56

16
RR = Incidence in the exposed divided by incidence in the unexposed, where IE+ = a/(a + b) and IE- = c/(c + d)
RR = [23/(23+304)] / [133/(133+2816)]
= (23/237) / (133/2949)
=.07/.045
= 1.56

Step 3: Interpret Results
The risk of myocardial infarction among OC users is 1.56 times as likely as the risk among those who do not use OCs (or 56% higher).

17
This part should be review! =)

Confidence Intervals

1

Overview
Determining difference
P-values
Confidence intervals
Population and Samples
Calculating CI

In this lesson we are going to talk about determining significance from a statistical perspective. Please keep in mind that the statistical significance of a study and the significance of a study are not the same thing, and the presence or absence of one does not ensure or preclude the other.
2

In analytical epi studies, we always need to answer two questions…
1. What is the point estimate?
2. Is this point estimate significant?

3
In any analytical epi study we want to know two things: the point estimate, and whether or not this point estimate represents a significant difference between our groups on the outcome of interest. So far this semester, we’ve done a bunch of calculations to determine the point estimates. Now it’s time to determine differences. Fun times ahead!
Groups: how people were selected, either by exposure status (f-up) or disease status (ca-co).
Outcomes: what we’re looking to see is different, either disease status (f-up) or exposure status (ca-co)

Determining Difference
1. Use the p value
2. Use confidence intervals

4
When determining if our groups are different, we have two basic options: p-value and confidence intervals. Let’s take a look at each.

p-value
Based on hypothesis testing
Null hypothesis = no difference
Calculated probability of null being true
Size matters!

But before I yammer on about p-values, let’s first talk about null hypotheses. Remember those from stats class? In an analytical epi study, a null hypothesis basically states that there is no difference between the groups you’re studying on the outcome of interest. So, for example, in a f-up study, the null would say that incidence of a disease would not differ by exposure status.
When testing the null, we calculate the probability that our null hypothesis is true. This calculation is known as the p-value. When our p-value is high, it means that the probability of our null being true is high, meaning there’s a high probability our two groups really do not differ on the outcome of interest. When our p-value is very low, it means the probability of our null being true is very low, meaning there’s a very low probability that our two groups do not differ on the outcome of difference.
A decision to reject a null hypothesis is based on how confident you want to be that what we’ve observed is “the truth.” Most often, the “level of confidence” (aka alpha) is typically set at 95%, but there’s no magic to this number. The confidence level can be set higher or lower depending on what it is you’re investigating.
With a 95% level of confidence, we would reject our null if our p-value was less than 0.05. Essentially, our confidence level tells people how willing we are to live with being wrong about rejecting our null. With a 95% level of confidence, we are willing to be wrong 5% of the time, or 1 in 20 times.
When we reject the null, we are technically stating that our two groups are not the same on the outcome of interest to the level of confidence we have used. What we typically say, though, is that they groups are different, or that we have a statistically significant finding of difference.
5

The problem with p-values
Point Estimate Study Size p-value Statistical Significance?
Far from null Large Low yes
Close to null Large High or low Yes or no
Far from null Small High or low Yes or no
Close to null Small High No

6
The problem with p-values is that we are not able to interpret whether the presence or absence of statistical significance, as reflected in the p-value, is mostly a function of the effect size (distance from the null) or the study size, as illustrated by the chart on this slide.
So, for RR or OR that is well above or well below the null value of 1, you may not have a p-value that indicates statistical significance if there are few people in your study (row 3). Therefore, we may overlook a truly significant finding just because we were limited by sample size.
Conversely, you could have a RR that is close to 1, meaning the groups are not really different, but because there are a lot of people in the sample (row 2), the p value could be low enough to indicate statistical significance, even though the groups don’t really differ.
Rows 1 and 4 don’t really concern us, because they likely represent the “truth.”

Confidence Interval
Range of values around point estimate

In analytical epi studies, we often use a confidence interval (CI) to determine differences. The CI is a calculated range of values around the point estimate, which represents the probability of including the true effect value of a population. So, what exactly does this mean? Well, to explain, I need to first talk a bit about populations and samples.
7

Population & Sample

A population is all members of a defined group that we are studying. For example, in a f-up study, we are interested in everyone who was exposed to a certain agent, or in a ca-co study, everyone who had a specific disease. Because is it usually impossible (and extremely cost-prohibitive) to include every member of a population in an analytical study, we select a subset of the population to study. This subset is known as a sample. We then use this sample to make inferences about the population.
8

What CI tells us
Statistical significance
Magnitude

9
You can easily tell whether or not statistical significance has been reached using CI, just as you can in a hypothesis test. How? If the confidence interval does not enclose the null value, this represents a difference that is statistically significant. But, if the confidence interval includes the null value, the point estimate is statistically non-significant.
The advantage CI has over p-values, though, is a confidence interval tells us how accurate our point estimate is likely to be. Because samples do not perfectly reflect the “truth” about the entire population, they will be off by a little bit, and possibly by a great deal. The width between the upper and lower bounds of the CI tells us give us information on the magnitude — how big or small — the true effect might plausibly be given our selected level of confidence (typically 95%, as with p-values).

Width of CI

Each line on the slide represents sample data from the same population. The blue dot represents the point estimate, and the black line represents where the population truth may be. The distance between the left end of the line (lower limit) and right end of the line (upper limit) represents the width of the CI. When the distance between these two points is short, we say it is “narrow.” A narrow CI is stronger (has more magnitude) and thus closer to the population truth than a CI that is longer (called wide).
So, you may be wondering how two different samples from the same population could yield such different CIs. Two factors affect the CI: variability and sample size. Variability is how different or similar people in the population are to one another. When members of your population are similar, there is low variability which means the samples that you select will more closely resemble one another than when population members are very different from one another. Sample size is how many people are selected. When you have low variability, sample size is less of an issue than it is when there is great variability.
10

Formula for determining
Confidence Intervals
CI = RR 1+ (z/)
Where
RR = point estimate
Z = level of significance
(chi) = Cell A – [(E+ x D+)/N] ÷ √ [(E+ x E- x D+ x D-) ÷ N2 x (N-1)]

11
The derivation of the formula for confidence limits of the point estimate (upper and lower bounds) is beyond the scope of this class. Through the power of trust, however, I give to you the formula itself so that you, too, can calculate confidence intervals with the pros! This formula, for those of you who are budding epi purists, is known as the “test-based” formula.
 
RR 1+ (z/)
 
Where RR is the point estimate
Z corresponds to the level of significance (researcher’s choice, often 95%)
(chi) is a very big formula that I am going to try to describe in words to supplement and explain the formula presented on the slide. It is calculated using numbers from the
Chi Numerator: the numerator is the observed cell a value minus the product (E+ row total multiplied by D+ column total) divided by N.
Chi Denominator: the denominator is the square root of all of the following: the product of (E+ row total multiplied by E – row total multiplied by D+ column total x D- column total) divided by [N2 multiplied by (N minus 1)]

Note: Chi is based on the data layout in the 2 x 2 table  
Note: There is a different formula for calculating confidence limits around the risk difference. I’m not teaching that here because it is rarely used. If you have a burning desire to learn it though, I will be happy to provide you with the information. Just email me.
Yet another note: Recall from statistics that the z score is a measure of the distance in standard deviations of a sample from the mean. Each level of confidence has its own z score. For the purposes of this class, always assume a 95% level of confidence, where z = 1.96.
Z Scores for Commonly Used Confidence Intervals
Desired Confidence Interval Z Score
90% 1.645
95% 1.96
99% 2.576
Note: I’m using RR as an example. It’s the same with OR.

Determining Width of CI
Distance between:
lower confidence limit: RR 1 – (z/)
upper confidence limit: RR 1+ (z/)

12
The “width” of the confidence interval is the distance between the lower confidence limit and the upper confidence limit.

Example
We want to determine how much more common occupational benzene exposure in among people with leukemia. We conducted a case-control study in which 85 of the 125 workers with leukemia were exposed to benzene on the job. Conversely, among the 125 controls, only 40 had been exposed

Let’s use a 95% level of confidence…
13

Calculating CI
Step 1: Set up data table

Ca Co total
E+ 85 (a) 40 (b) 125 (E+)
E- 40 (c) 85 (d) 125 (E-)
total 125 (D+) 125 (D-) 250(N)

14

Step 2: Calculate Point Estimate
OR = ad/bc
= (85×85)/(40×40)
= 7225/1600
= 4.5
The odds of exposure is 4.5 times more common among workers with leukemia.

15
In this case, the point estimate is the Odds Ratio
OR = ad/bc
= (85 x 85) ÷ (40 x 40)
= 7225 ÷ 1600
= 4.5

Step 3: Calculate Chi

(chi) = Cell A – [(E+ x D+)/N] ÷√(E+ x E – x D+ x D-) / [N2 x (N-1)]

16
Let’s start by calculating the numerator for Chi: Cell A minus [(E+ total multiplied by D+ total) divided by N]
= 85 – [(125 x 125) ÷ 250]
= 85 – (15,625 ÷ 250)
= 85 – 62.5
= 22.5
Now, let’s calculate the denominator: square root of all of the following [(E+ total multiplied by E – total multiplied by D+ total multiplied by D- total) divided by (N2 multiplied by N minus 1)
= √[(125 x 125 x 125 x 125) ÷ [2502 x (250 – 1)]
= √[244140625 ÷ (62500 x 249)]
= √(244140625 ÷ 15,562,500)
= √15.69
=3.96
Now, let’s calculate Chi from our numerator and denominator = 22.5 ÷ 3.96 = 5.68
Note: There are very simple to use square root and exponent calculators online. One that I use is calculator.net.

Step 4: Calculate Width
Lower Limit = OR 1- (z/)
= 4.5 1 – (1.96/5.68)
= 4.5 .655
= 2.68
Upper Limit = OR 1 + (z/)
= 4.5 1 + (1.96/5.68)
= 4.5 1.345
= 7.56

Next we will calculate the width of our CI by first calculating our lower and upper limits.
Lower Limit = OR 1- (z/)
= 4.5 1 – (1.96/5.68)
= 4.5 .655
= 2.68
Upper Limit = OR 1 + (z/)
= 4.5 1 + (1.96/5.68)
= 4.5 1.345
= 7.56
The width of the confidence interval is the difference between the upper limit (UL) and lower limit (LL), which we express with this equation:
CI = UL – LL 
= 7.56 – 2.68
= 4.88
17

Step 5: State and Interpret Findings

OR = 4.5 (2.68, 7.56)
CI = 4.88

18
Because 1 is not included between our upper and lower bounds, we can say that there is a statistically significant difference in the odds of exposure among our workers with leukemia, and that we are 95% confident that the population measure lies between 2.68 and 7.56.
Because there is great variability between epidemiologists on what constitutes a narrow or wide interval, we shall leave it at that for now.
NOTE: While you’ll have to do calculations for your exercises and your quiz, you won’t have to do the math or memorize this formula on your exam. You’re welcome. =)
 

Confounding

1
NOTE: For those of you who had difficulty with standardization, it is IMPERATIVE that you go back and review and get the concepts solidified. You will need if for the following sections.

OVERVIEW
Explanation of confounding
Assessing confounding
Conditions for confounding
Controlling for confounding
Adjusting for confounders

2

Overview of Confounding
Recall: Purpose of epi studies is to determine the “truth” about the relationship between E and D.
Most Ds have many Es.
Sometimes Es are related to each other

We start this lesson by recalling that the whole purpose of epi studies is to determine the “truth” about the relationship between E and D. So far, we’ve looked at determining point estimates of one exposure’s relationship to the disease. Recall, too, though, that most diseases have more than one exposure (think back to webs of causation). Sometimes these different exposures are also related to one another, something known as confounding.
3

Confounding is…
A distortion in the point estimate due to the presence of a second exposure.
This “other E” is responsible for, or somehow related to, D.
As such, the magnitude of the association between E and D is unclear

4
Confounding is a distortion in the point estimate (measure of association, RR, OR, etc.) due to the presence of a second exposure. When something other than the E of interest is responsible for, or somehow related to D, the magnitude of the association between E and D becomes unclear. That’s because the presence of this “third” variable – the “other E” — gets in the way… In more scientific terms, we say the results are confounded by this other exposure.

Illustrating a point…
A f-up study reveals that people who drink alcohol are 5 as likely to have lung cancer as non-drinkers (RR=5.1)
Should we thus conclude that there is a strong relationship between drinking alcohol and lung cancer ?

5
Before we move on to the concepts, let’s think about the issue of determining the “truth” between E and D.
I’ll use an example to illustrate:
Let’s say we did a follow-up study of alcohol consumption and lung cancer. We find that people who drink alcohol are 5 times as likely to have of lung cancer as non-drinkers (RR=5.1). Should we thus conclude that there is a strong relationship between drinking alcohol and lung cancer and recommend that people should refrain from drinking??
 

Not yet.
Even if we’ve addressed issues of validity (bias) and precision (chance), we still need to rule out confounding before we can say it’s the “truth.”

6
There are four possible explanations for RR results:
1) bias (systematic error)
2) chance (random error)
3) confounding
4) “the truth”
Assuming we’ve addressed precision and validity issues, let’s now look at confounding before we assume this RR is “the truth”.
 

Assessing Confounding
Could the results I found be explained by the presence of another E that is different than the E of interest?

7
When we look at confounding, we ask the question “Could the results I found be explained by the presence of some other E than the primary E of interest?”
 
To answer this question, we must ask ourselves two things:
1)      what are the known and highly suspected risk factors for the D of interest?
2)      Do our groups differ much in their histories of exposure to these other risk factors? That is, do some of the people in the primary E of interest group also have this other suspected E (written as E*) and some do not? If so, do these exposures account for some (or all) of the effect (RR)?
 
If we know #1, we can measure history of exposure to these factors so that we can answer #2.

Class Example:
Lung Cancer and Alcohol
1. What are the other known or highly suspected risk factors for lung cancer?
2. Do our groups differ in their histories of exposure to these other risk factors?

8
Back to the example…
Question 1: What are there other known risk factors for lung cancer? Anyone? =) I think I heard someone say “smoking,” which is a great answer! We are fully aware that cigarette smoking is a known risk factor of lung cancer.
Question 2: Do our alcohol groups (E+ and E-) differ in their history of exposure to smoking?
Well, we can measure history of smoking among our alcohol+ and alcohol- groups in order to assess potential confounding…. Let’s say we do that and find that many or most of our alcohol+ subjects (the drinkers) are also in the smoking+ group (smokers), while very few of our alcohol- subjects (the non-drinkers) are in the smoking- group (non-smokers).

Uh – oh
After adjusting:
RR = 1.2

9
So, after adjusting for the presence of our second E* (smoking), we find that RR=1.2, or that the there really isn’t any significant difference in lung cancer between people who drink and those who don’t. (remember, null = 1).
Thus, the crude (unadjusted) RR=5, which suggested a very strong relationship between alcohol consumption and lung cancer, was quite misleading in that the alcohol itself was not responsible for the results observed. MOST of the effect was due to the fact the many/most of the drinkers smoked and very few of the non-drinkers did. Therefore, it was the smoking, not the drinking, that was responsible for most of the incidence rate of lung cancer among drinkers.
 
As the adjusted RR is quite different from the crude RR, we can conclude quite confidently that smoking is a confounder in the relationship between alcohol and lung cancer.

Still with me?

So let’s go back to the original alcohol-lung cancer scenario and pretend that when I asked the question “What are the other known or highly suspected risk factors for lung cancer?,” one of you answered “popcorn” instead of smoking.
We all know which one of you gave that answer.

10

WHAT?!?!?!

Well, sure enough, the drinkers and non-drinkers differed on their popcorn consumption, so we adjusted for popcorn consumption. Lo and behold, after adjusting for popcorn consumption, our RR fell to 1.3.
Does this mean we conclude that popcorn consumption is a confounder in the relationship between alcohol use and lung cancer….. which then means popcorn consumption is a risk factor for lung cancer???
11

Is eating popcorn really associated with lung cancer?
Well, the crude RR and adjusted RR are very different…..

12
At first glance, you may say yes because of the significant difference in the crude and adjusted RRs. However, in order for an exposure to qualify as a confounder, two conditions must be met. If you think back to the two questions for assessing confounding, you’ll should know what they are. But I will tell them to you again, in a simpler form. And when I do, I would highly recommend memorizing them as I predict that you will be asked about them later. Perhaps during a quiz or a test. Yes, this is a hint.

Two Conditions for Confounding

In order for there to be confounding, these two conditions must exist:
1.   E* must be associated with D independent of its association with E
2.  E* must be associated with E
    Note – the confounding E is written as E*
13

First Example: Alcohol & Cigarettes

This slide shows how in the first example, smoking (our suspected confounding exposure) meets both of the two criteria for confounding:
there is clear, independent evidence that smoking and lung cancer is associated independent of alcohol use AND
most of our drinkers smoked, therefore E* is associated with E
Therefore, we would want to adjust for smoking to determine if it affects the relationship between drinking and smoking.
14

Second Example: Alcohol & Popcorn

In our second example, only the second condition has been met.
2. Popcorn is associated with drinking.
However, the first condition is not met insomuch as there is no independent association between eating popcorn and getting lung cancer and no biological plausibility (to date!) that such a relationship exists.
Thus, popcorn cannot be a confounder, and the RR that we got when adjusting for popcorn is a fallacy, and that really all of the smokers also ate a bunch of popcorn while drinking.
NOTE: Even though I adjusted for popcorn consumption for this example, in real life we would not do so because both conditions were not met.
15

Controlling Confounding
Good Planning is Necessary!
Design Stage
Subject characteristic restriction
Matching
Analysis Stage
Stratified analysis
Modeling/multivariate analysis

16
Confounding is much more of a problem in observational studies than in experimental studies. Because we do not manipulate exposure in observational studies, it is usually NOT true that our two groups (E+ and E- or D+ and D-) are alike with respect to other known and suspected risk factors of diseases. Therefore, we must be cautious and vigilant about the potential of confounding when we study any E D relationship. We must always question whether a non-null point estimate is due to differences in an exposure other than the E of interest.
We must always be prepared to control for potential confounders. To control for a confounder you must:
1.      know WHAT to measure
2.      measure it in your study subjects
To limit the effects of confounding, good planning is necessary. In the design stage, we can do two things to avoid confounding
1.   subject characteristic restriction: you can choose to study only those subjects free of the potential confounder. For instance, in our alcohol-lung cancer study, you could exclude all smokers from the sample. You would have to then locate non-smoking drinkers and non-drinkers for your study. The drawbacks are that you cut down on pool of potential subjects and it may be more difficult to generalize results
2.   Matching: select subjects in your two groups (E+ and E- or cases and controls) that have a similar profile with regard to the potential confounders. Matching can be done one-on-one as where you select an appropriate control for each case you use or it can done as “frequency matching” where you make sure your summary statistics are the same in one group as in the other. Regardless of which method you use, if we are successful in matching, then E* will not qualify as a confounder as E* will be equally as common in each group. What we have done here, in essence, is make the groups “the same” with regard to the potential confound, thus equally distributing the threat of E*. The drawbacks with matching are that you must search very hard for just the right subjects; this requires an enormous amount of time and effort, which becomes cost prohibitive, because many people have to be interviewed or examined to find the right subjects.
We can also limit confounding in the analysis stage in two primary ways:
1.    stratified analysis: calculated RR estimates in narrow strata of the confounding factor. For example, we could look at the alcohol-lung cancer relationship in 3 groups: light smokers, moderate smokers, and heavy smokers. A RR can be calculated for each of the groups using 2 x 2 tables. You’ll learn how to do this later!! I can feel your excitement!!!!!
2. odeling/multivariate analysis: a mathematical model can quantify the effects of the exposure of interest E and the potential confounder E*. “Pieces” of the RR can then be attributed to each E and E*. You will not learn how to do this in this class. I’m so sorry. I know that must disappoint you greatly. =)
 
Note: In the example used earlier, we magically controlled for smoking and popcorn in the analysis stage. You’re about to learn how that magic happens. =)

Example
A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 stomach cancer cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 stomach cance cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 stomach cancer cases and 10 controls had a high nitrate diet

17
To determine if there’s confounding, we need look at the crude and adjusted point estimates, as we did in examples earlier. These next several slides will walk you through the process, using the example on this slide. For clarity, I will bold the part of the example that is being illustrated in each slide.
Before we begin, let’s determine what our exposures of interest are for developing stomach cancer by reading through the case together.
A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 stomach cancer cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 stomach cancer cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 stomach cancer cases and 10 controls had a high nitrate diet
Typically, the primary exposure will be mentioned first. Thus, h pylori infection is our primary exposure of interest. The second exposure of interest is a diet high in nitrates. Dietary nitrates are independently associated with stomach cancer, and is also associated with h pylori infections.

Ca (D+) Co (D-) Total
H pylori (E+) 62 35 97
No H pylori (E-) 68 95 163
Total 130 130 260

Step 1: Set up Data Table for E and Calculate Crude OR

18
We first construct our 2X2 data table with the primary E of interest and D and then calculate our crude measure of effect – in this case, OR. You’ve done this all semester, so this should be a piece of cake. However, I’ll review…
Example: A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 stomach cancer cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 stomach cancer cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 stomach cancer cases and 10 controls had a high nitrate diet
We plug the known numbers into the table and calculate the rest. To review, let’s take the cases (D+) column total to illustrate. We know that we have a total of 130 cases and we know that 200 of them – our “a cell” of E+D+ — had exposure to h pylori infection. To find out how many cases did not have exposure to h pylori infection – our “c cell” of E-D+ — we subtract our “a cell” from the total cases.
OR = (a x d) / (b x c) = 2.5 This is our crude (unadjusted) OR and what we will use for comparisons.
Interpretation: Among stomach cancer cases, h pylori infection was 2.5 as common than controls.

Ca Co Total
High Nitrate Diet (E*+)
H Pylori 44 15 59
No H Pylori 26 10 36
E*+ Total 70 25 95
Low Nitrate Diet (E*-)
H pylori
No H pylori
E*- Total
Grand Total 130 130 260

Step 2: Stratify E by E*

19
Next, we will stratify this data table on the suspected confounding exposure (E*) – a high nitrate diet. To do this, we are going to create two 2x2s, one for each level of E* :
1. High nitrate diet (E*+); and,
2. Low nitrate diet ( E*- )
If you look at the table, you’ll see it’s just two 2×2 tables stacked on top of each other… I’ve plugged the “known” numbers in from original 2×2, and recalculated totals.
Example: A case-control study was done to determine the relationship between stomach cancer and h pylori infection. Among the 130 cases, 62 had h pylori infection as compared to 35 of the 130 controls. Of those who had h pylori infection, 44 cases and 15 controls had a high nitrate diet. Among those without h pylori infection, 26 cases and 10 controls had a high nitrate diet
Now, here’s where our power of epi magic come into play. Drum roll please!

Ca Co Total
High Nitrate Diet (E*+)
H Pylori 44 15 59
No H Pylori 26 10 36
E*+ Total 70 25 95
Low Nitrate Diet (E*-)
H pylori
No H pylori
E*- Total 130 – 70 = 60 130 – 25 = 105 60+105 = 165
Grand Total 130 130 260

Still Step 2: Stratify E by E*

20
We’re going to go slowly here… Let’s add in the case and control column totals for low nitrate.
As a math check, lets make sure that the E*+ total (95) and the E*- total (165) equals the grand total (260). It does. YAY!

Ca Co Total
High Nitrate Diet (E*+)
H Pylori 44 15 59
No H Pylori 26 10 36
E*+ Total 70 25 95
Low Nitrate Diet (E*-)
H pylori 62- 44 = 18 35-15 =20 18+20 =38
No H pylori 68 – 26 =42 95-10 =85 42+85=127
E*- Total 60 105 165
Grand Total 130 130 260

More Step 2

21
To fill in the next part, we have to refer back to our primary E 2×2 in step 1. There we had the following:
Cell a (E+D+) = 62
Cell b (E+D-) = 35
Cell c (E-D+) = 68
Cell d (E-D-) = 95
This is where thinking and math come into play. Let’s look at the Case (D+) column in the low-nitrate 2×2. From Step 1, we know that there are 62 cases of stomach cancer among those with h pylori infections. Therefore, the “a cell” in the low nitrate diet is what’s left over after subtracting the known number of high-nitrate ( E*) cases of h pylori (44) from the total h pylori cases: 62 – 44. Similarly, there were 68 cases of stomach cancer among those who DID NOT have h pylori. Thus to get the c cell value for the low nitrate diet, we subtract high nitrate c cell from the total: 68 – 26.
You do the same for the control (E*-) cells also. , and calculate your totals. Make sure to double check that all of your totals give you the grand total.
See how that works!! Just take your time and think it through. This is the hardest part, I promise. =)
Note: a mediasite lecture is posted that walks you through this process in greater detail, so don’t panic if this doesn’t quite make sense yet.

Calculate E* Stratum-Specific ORs
OR E*+ = (44 x 10) / (15 x 26)
= 440/390
= 1.3
OR E*- = (18 x 85) / (20 x 42)
= 1530/840
= 1.8

22
All we’re doing here is calculating the OR as normal for each E* 2×2

Compare Crude OR to S-S ORs
Crude E OR = 2.5
Stratified E* ORs = 1.3 (high nitrate), 1.8 (low nitrate)

So, let’s take a moment and look at what we know: The crude OR for h pylori and stomach cancer was 2.5, which suggests the odds of exposure are greater among those who have stomach cancer. Both estimates of the odds ratio for the secondary exposure – high nitrate and low nitrate diets — are lower than the odds ratio based on the entire sample.
Hmm….Wouldn’t you expect to find the crude odds ratio to be a weighted average of the stratified odds ratios? Think about it….
When the stratified E* ORs are similar to the crude E OR, then the presence of E* makes no difference; thus, E* is not confounding the relationship between E and D. However, when the stratified E* ORs are different than the crude E OR, it means that the presence of E* has an effect on the relationship between E and D.
The next several slides go over three general rules on determining if E* is a confounder in the relationship between E and D.
23

General Rule #1
Suspect confounding if:
E* + OR < Crude OR > E* – OR
or
2. E*+ OR > Crude OR < E* - OR General rule 1: When both of the s-s E* ORs are either higher than or lower than the crude E OR, then we suspect confounding. (this holds true for RR also). 24 Our Data OR high nitrate < Crude OR > OR low nitrate
1.3 < 2.5 > 1.8

So, in our example, both of the E* strata-specific ORs (high nitrate diet, low nitrate diet) are less than the crude OR. So, by virtue of general rule 1, we can say confounding is suspected.
25

General Rule #2
Adjust confounding only if:
E*+ OR ≈ E*- OR

General rule 2: Control (adjust) for confounding if the our two stratum-specific E* ORs are similar to one another. However, if the stratum-specific E* estimates are sufficiently different from one another, they should not be combined, as this would obscure useful information. You’ll learn about this in your next lesson.
If the E* stratum-specific ORs are similar to one another, they can be combined to obtain an un-confounded (adjusted) OR so that we can know the truth about the relationship of E and D without the effect of E* Another way of saying this is that we “control for” the dietary nitrate level.
In our example, the s-s E* ORs are similar (1.3 and 1.8); thus, conditions for general rule #2 are met.
26

Step 3: Calculate Adjusted E* OR
OR Mantel-Haenszel =  (ad/n) /  (bc/n)

27
The way we adjust for confounding is by using the Cochran Mantel-Haenszel test.
OR Mantel-Haenszel =  (ad/n) /  (bc/n)

Scary as it looks, all it’s really doing is multiplying the concordant cells (a,d) of each E* strata divided by that strata’s n (total) and adding them together. This becomes the numerator. Then, we do the same thing with the E* discordant cells (b,c) and make that our denominator. Thus, it looks like this:
OR Mantel-Haenszel= [ (high nitrate ad/n) + (low nitrate ad/n)] / [ (high nitrate bc/n) + (low nitrate bc/n)]
= [(44×10)/95 + (18 x 85)/165)] ÷ [(15×26)/95 + (20 x 42)/165]
= (440/95 + 1530/165) ÷ (390/95 + 840/165)
= (4.6 + 9.3) ÷ (4.1 + 5.1)
= 13.9 ÷ 9.2
= 1.5
(Note:  is the symbol that means “sum of” )

Step 4: Compare Crude OR to Adjusted OR
OR crude = 2.5
OR adjusted = 1.5

28
So…. Is there meaningful confounding? Did adjusting for dietary nitrate level “clean up” the data (remove the effects of E*) so we have a better knowledge of the relationship between E and D?

General Rule # 3
Confounding exists when:
1. crude E OR ≥ 110% adjusted OR
2. Crude E OR ≤ 90% adjusted OR

To answer this, we go to general rule 3, which evaluatesthe magnitude of confounding by observing the degree of discrepancy observed between the crude and adjusted ORs. Generally, when the crude estimate changes by at least 10%, meaningful confounding exists.
So, we look at the difference between the two estimates to determine the degree of discrepancy. If there’s at least a 10% difference in the two estimates, whether it’s higher or lower, we conclude that there’s meaningful confounding. If it’s less than 10%, the confounding is not meaningful, meaning that the presence of E* does not obscure the E D truth, and we’ve basically done all of this work for nothing. Haha. Actually, it’s not for nothing. It’s for E D truth!!

29

Step 5: State Conclusion
Dietary nitrate level is a meaningful confounder in the relationship between h pylori infection and stomach cancer.

So in order to state our conclusion, we need to determine if the adjusted OR is at least 10% different than the crude OR. We can do that in a couple of ways. First, we could calculate the upper and lower cut-off points for meaningful confounding using a + 10% of the crude:
Upper Limit: 2.5 x 1.10 = 2.8
Lower Limit: 2.5 x .90 = 2.3
Then, we’d look to see if the adjusted OR is either equal to or more than the UL OR, or equal to or less than the LL.
Or, you could just do it the simple way through division: OR Crude/ OR adjusted = 2.5/1.5 = 1.7 = 70% difference between crude and adjusted.. (null = 1, so it’s what ever is above or below the null).
Now, just so you’ll know, there are other ways of controlling (adjusting) for confounding other than M-H. I won’t force you to learn multivariate analysis though. You’re welcome. =)
That’s it for this lesson.
30

Case-Control Studies

1

Overview of Lesson
Odds vs Probability
Overview of Case-Control Studies
Advantages
Disadvantages
Analysis
Calculating Odds Ratio (OR)

2

Probability vs Odds

Probability
Range of values: 0 – 1
Expressed in %

Probability is defined as the fraction of desired outcomes in the context of every possible outcome. Probabilities have a value between 0 and 1, where 0 is no chance of the desired outcome and 1 is guaranteed desired outcome. Probabilities are usually given as percentages.
Example: When flipping a coin, the coin will either land on heads or tails. Let’s say my desired outcome is heads. Each time I flip the coin, I will either get my desired outcome (heads) or an undesired outcome (tails). Because there are only two possible outcomes – heads or tails — the probability of each flip is Heads/(Heads + Tails). Numerically, that would be 1/2, or 50%. Thus, each time I flip a coin, the probability of getting heads is 50%.
Let’s say I had a three-sided coin that has two heads and one tail. Thus, with every flip there are three possible outcomes – heads, heads, or tails. Two of those outcomes are my desired outcome. Thus I have a 2/3 — or 67% probability – of getting heads each time I flip my three-sided coin.
Note: Probability calculation includes the numerator in the denominator of the calculation, because probability considers the context of the entire event.
4

Probability of getting a yellow M&M?
8 red
4 blue
3 orange
4 yellow
2 green
3 brown

If you were to close your eyes and pick one of the M&Ms from this spilled bag, what is the probability that you would get a yellow M&M?
5

Calculating Probability

The probability of getting a yellow m&m is calculated by taking the total number of yellow m&ms (4) and dividing it by the total number of m&ms (24) – which is all possible outcomes
If you said the probability of me getting a yellow m&m was 17% (or 16.7%), you’d be correct!
6

Odds
Range of values: 0 to ∞
Ratio: desired outcome to undesired outcome

Odds can have any value from zero to infinity and they represent a ratio of desired outcomes versus undesired outcomes. Odds can be expressed in two different ways: ‘odds in favor’ and ‘odds against’. ‘Odds in favor’ are the odds describing if an event will occur, while ‘odds against’ will describe if an event will not occur. With odds, the numerator is NOT a part of the denominator.
So, let’s go back to the coin example. For a regular coin, each flip will either be heads or tails – one desired outcome, one undesired outcome. Thus, the odds in favor of heads for a coin flip is 1:1. The odds against heads is also 1:1.
For my three-sided coin with two heads, the odds for heads is 2:1, whereas the odds against heads are 1:2
7

Odds of getting a yellow M&M?
8 red
4 blue
3 orange
4 yellow
2 green
3 brown

So, back to the m&ms… If I were to close my eyes and pick one, what are the odds that I’d get a yellow one? What are the odds that I wouldn’t?
8

Odds of getting yellow

The odds in favor of getting yellow are 4:20 or 1:5. Notice how the yellow M&Ms are in the numerator with the “in favor of” calculation.
9

Odds against getting yellow

The odds against getting yellow are 20:4 or 5:1. Notice how the yellow M&Ms are in the denominator for the against calculations.
10

M&Mmmmmmmm

Events that have a high probability, also have high odds. And, there is a mathematical relationship between the two, which goes beyond the scope of this mini-lesson and my understanding of math. But the reason for this information is to help you better understand the calculations used in this lesson on case-control studies.
11

Let’s learn about Case-Control Studies!

12

Starts in Present with D

In case-control studies, people get into the study based on their disease status and THEN information on past exposures is collected. People who are selected because they are D+ are called cases, and those who are selected because they are D- are called controls. Cases can be either incident (newly diagnosed) or prevalent (existing at a point in time) cases of the disease.
A case-control study looks at the odds of being exposed among cases and controls to tell us the likelihood of exposure based on disease status.
One famous epidemiologist calls ca-co studies “trohoc studies” because they are backwards cohort studies. The advantages of case-control studies are the disadvantages of f-up studies and the disadvantages of case-control studies are the advantages of f-up studies. Let’s take a look, shall we?
13

Advantages of Ca-Co
Suitable for rare and common diseases
Economical (Time and $$)
Allow testing of multiple hypotheses
Extremely efficient
Very valuable in controlling confounding and evaluating information

14
Case-control studies offer a solution to the difficulty of studying diseases with very long induction or latency periods, as investigators can identify affected (D+) and unaffected (D-) individuals and then look backward in time to assess their antecedent exposures, rather than having to wait a number of years for the disease to develop as with f-up designs. This also makes them ideally suited for investigating rare diseases, which would otherwise need to follow tremendously large numbers of individuals in order to accumulate a sufficient number of D+s. This process is also what makes case-control studies efficient, in terms of both time and costs, relative to other analytic approaches.
Ca-Co also allow for the evaluation of a wide range of potential etiologic exposures that might relate to a specific disease as well as the inter-relationships among these factors. Therefore case-control studies can be used to test specific hypothesis in the absence of an a priori hypothesis and explore a range of exposures among affected and non-affected individuals.
This study design is especially useful in the early stages of the development of knowledge about a particular disease or outcome of interest.

Disadvantages
Susceptible to bias
Study only one disease
Not good for rare exposures
Temporal relationship between E and D unknown
Cannot measure incidence
Sometimes difficult to find a control group

15
The major disadvantage of case-control studies is that both the exposure and the disease have already occurred at the time the participants enter into the study which makes this design especially susceptible to bias.
A fundamental problem with ca-co studies is that there is no way to prove to anyone that you have selected a proper control group. As a result, this design is particularly susceptible to bias in the selection of either the cases or the controls into the study on the basis of their exposure status, as well as from differential reporting or recording of exposure information between study group based on the disease status.
Another disadvantage in that you can only study one disease. It also is not good for rare exposures, and the time sequence between E and D may be uncertain
Ca-co studies also cannot measure incidence (or mortality) as the investigation begins with the selection of subjects based on D status.

Odds Ratio (OR)

16
A case-control study compares the odds of exposure among cases (D+) relative to the odds of controls (D-) who are E+. Let’s take a look at the 2×2 table to show how the each of these odds are calculated…

Odds of E+ among Cases
Case (D+) Control (D-)
E+ a b
E- c d
Total a+c b + d

So, what are the odds in favor of being exposed if you are a case? I’ll give you two choices:
a/(a+c)
a/c
Scroll down for the answer.

#2 is correct. If you said #1, think back to the M&M example, where the odds in favor of selecting a yellow M&M was the number of yellow M&Ms divided by all of the M&Ms EXCEPT the yellow ones….
17

Odds of E+ among Controls
Case (D+) Control (D-)
E+ a b
E- c d
Total a+c b + d

So, what are the odds in favor of being exposed if you are a control? If you said anything other than b/d, please review the odds information or email me directly for assistance.
18

Calculaing OR
OR = odds of E among Ca/odds of E among Co
= (a/c) / (b/d)
= (a x d) / (b x c)
= ad/bc

So to calculate OR, we could calculate the odds of E+ among cases and the odds of E- among controls, then calculate a ratio of the two odds. Or, we can do the math magic on the slide and boil it down to a very simple equation of ad/bc.
Note: It’s really not math magic. When dividing fractions by fractions, we flip the denominator and multiply the two fractions. Here’s a link to a refresher. https://www.wikihow.com/Divide-Fractions-by-Fractions
19

Concordant Cells
Case (D+) Control (D-)
E+ a b
E- c d
Total a+c b + d

a and d are called “concordant cells”
20

Discordant Cells
Case (D+) Control (D-)
E+ a b
E- c d
Total a+c b + d

B and C are called discordant cells
21

OR = Concordant/Discordant
Case (D+) Control (D-)
E+ a b
E- c d
Total a+c b + d

So, OR = concordant cells divided by discordant cells. Fun times!
22

Interpretation of OR
When OR = 1.00
Exposure does not affect odds of disease
This is the NULL
When OR < 1.00 Negative association Odds of exposure lower among cases When OR > 1.00
Positive association
Odds of exposure higher among cases

23
An odds ratio of
• 1.0 (or close to 1.0) indicates that the odds of exposure among cases are the same as, or similar to, the odds of exposure among controls. Thus, the exposure is not associated with the disease.
• Greater than 1.0 indicates that the odds of exposure among cases are greater than the odds of exposure among controls. The exposure might be a risk factor for the disease.
• Less than 1.0 indicates that the odds of exposure among cases are lower than the odds of exposure among controls. The exposure might be a protective factor against the disease.
As with interpreting RR, the further away from 1, the stronger the association, be it positive or negative.

Example
We want to determine how much more common occupational benzene exposure is among persons with leukemia. We conducted a case-control study in which 85 of the 125 workers with leukemia were exposed to benzene on the job. Conversely, among the 125 controls, only 40 had been exposed.

24

Step 1: Set up data table
Case (D+) Control (D-) Total
E+ 85 40 125
E- 40 84 125
Total 125 125 250

25
Note: among the cases, we knew that 85 had been exposed. That is our cell a. To get our cell c, we subtract that number from the total number of cases, which we were given. We were given the value for cell b (exposed among controls), and determined cell d by subtracting that number from the total number of controls, which we were given.

Step 2: Calculate OR (ad/bc)

26
OR = ad/bc
= (85 x 85) / (40 x 40)
= 7225/1600
= 4.5

Step 3: Interpret
The odds of benzene exposure among cases is 4.5 times as great as the odds of exposure among controls.

27
The odds of benzene exposure among cases is 4.5 times as great as the odds of exposure among controls.

Word of Caution
Very common
Very commonly wrong

Case-Control studies make up about 80 – 85% of all published studies. Unfortunately, many of these studies really are not case-control studies, despite being touted and published as such. This is because the case-control design is difficult to grasp and perform. Most commonly we see data from cross-sectional studies being used as a case-control study.
28

Calculate your order
Pages (275 words)
Standard price: $0.00
Client Reviews
4.9
Sitejabber
4.6
Trustpilot
4.8
Our Guarantees
100% Confidentiality
Information about customers is confidential and never disclosed to third parties.
Original Writing
We complete all papers from scratch. You can get a plagiarism report.
Timely Delivery
No missed deadlines – 97% of assignments are completed in time.
Money Back
If you're confident that a writer didn't follow your order details, ask for a refund.

Calculate the price of your order

You will get a personal manager and a discount.
We'll send you the first draft for approval by at
Total price:
$0.00
Power up Your Academic Success with the
Team of Professionals. We’ve Got Your Back.
Power up Your Study Success with Experts We’ve Got Your Back.

Order your essay today and save 30% with the discount code ESSAYHELP