How to Become Smarter (3 page)

Authors: Charles Spender

Tags: #Self-Help, #General

BOOK: How to Become Smarter

8.63Mb size Format: txt, pdf, ePub

Two definitions would be useful here. A
response rate
of a medical treatment is the percentage of patients who show improvement after receiving this treatment. For instance, if a drug reduces symptoms of a disease in 70% of patients, this means that this drug has a response rate of 70%. A
placebo rate
of a disease is the percentage of patients in the placebo group who show improvement during a clinical trial.

Even if a placebo control is unfeasible, it is often possible to tell whether the observed benefits of a treatment are due to the placebo effect. A scientist can do this by comparing the response rate to the placebo rate from previous studies with the same disease. In clinical trials with patients with anxiety disorders, placebo rates can be as high as 50%. Therefore, antianxiety treatments that produce improvement in 50% of patients are likely to be no better than a placebo. Some studies have questioned the validity of the placebo effect itself. They show that much of the effect is the natural course of the disease, or spontaneous improvement without any intervention [
32
,
33
,
904
,
931
]. Comparison of placebo control to no-treatment control in anxiety shows that the difference is insignificant. In patients with anxiety (or depression) an authentic placebo effect is either non-existent or miniscule, being a result of biases and imperfections of measurement [
931
]. These data suggest that what people commonly mistake for a placebo effect in patients with anxiety or depression is the irregular course of these diseases. The latter involves numerous and spontaneous improvements and relapses.

Statistical significance is the third important criterion to consider when evaluating strength of evidence. In lay language, the meaning of the term
statistically significant
is “a result that could not occur by chance” or “not a fluke.” Conversely, statistically
insignificant
means “likely to be a fluke” or “the result may have occurred by chance.” If a clinical trial shows that the effect of some treatment is statistically
insignificant
, the evidence of its effectiveness is weak. Statistical significance is not a complicated concept and an example will illustrate how it works. Suppose we have two groups of test subjects and each consists of 10 depressed patients. One group will serve as control and the other as an experimental group in a clinical trial of a novel antidepressant drug. Psychiatrists measure severity of depression using special questionnaires that allow for expressing symptom severity as a numeric value. Let’s assume that the rating of symptoms for most patients in the two groups is a number between 10 (mild depression) and 30 (severe depression). We will also assume that this number is different for each patient in a group. In order to describe how widely the symptoms vary among the patients, we will use a measure called standard deviation. The greater the standard deviation, the wider the range of symptoms in a group. On the other hand, if standard deviation were zero, there would be no variability and all patients would have the same value of the rating of symptoms. Without going into detail about the calculation of standard deviation, let us assume that the variability of symptoms (standard deviation) equals five in each group of patients. Another useful measure that we will need is the average rating of symptoms of depression in each group of patients. This average (also known as “the mean”) equals the sum of the ratings of every patient divided by the number of patients in a group. Let’s say the average rating of symptoms is 20 in each group, which means “moderate depression.” A good rule of thumb regarding standard deviation and the average value is the following. Ninety five percent of items in a group are usually within two standard deviations (5 x 2) of the average value (20). In other words, let’s say the average value (the mean) is 20 and the standard deviation is 5. This means that 95% of people in this group will have symptoms between 10 and 30 and the remaining 5% of the group are outside this range.

After the clinical trial finishes, we find that treatment with the novel antidepressant drug lowered the average rating of symptoms from 20 to 18 in the experimental group (10 patients). The average rating stayed the same (20) in the control group (another 10 patients), who received a placebo pill. For simplicity’s sake, we will assume that the variability of symptoms among patients stayed the same in both groups after treatment (standard deviation is still 5). We now have everything we need for assessing the statistical significance of the observed beneficial effect of the antidepressant drug. We know the number of patients in each group (10) and the average rating of symptoms after treatment in the control (20) and the experimental group (18). We also know the standard deviation of symptoms in each group after treatment (both are 5). You don’t really need to know the complicated formula used for calculating statistical significance. You can tell if the effect of treatment is statistically significant by eyeballing the above numbers. (Curious readers can find free calculators on the Internet that will determine statistical significance of a study based on the above variables; look for a t-test for two samples.) The beneficial effect of the drug is
not
statistically significant in our example (the calculator will produce
p
value greater than 0.05). This means that the change of symptoms may have occurred by chance. In other words, the evidence of effectiveness of this novel antidepressant drug is weak.

There are two simple reasons for this lack of statistical significance. One is the small size of the effect of treatment compared to the variability of depressive symptoms among the patients. The average rating of symptoms differs by a measly 2 between the treatment group and the placebo group (18 versus 20), whereas variability of symptoms (standard deviation) is a whopping 5 in both groups. The clinical change produced by the drug is less than one half of the standard deviation, which makes this result unimpressive. The other reason for the lack of statistical significance is the small number of participants in the study. We have fewer than 20 subjects in each group, which is a small number. If, however, exactly the same results transpired with 100 patients in each group, the result would be statistically significant (
p
value is less than 0.05). In a study that includes a large number of test subjects, random factors are less likely to influence the results. All else being equal, the results will be more statistically significant. Results of a clinical trial are likely to be statistically
insignificant
when two conditions are true:

the study includes a small number of test subjects (under 20);
the effect of treatment is small (less than a half of standard deviation of symptoms).

The effect of treatment will be statistically
significant
if the study includes a large number of test subjects and the effect of treatment is greater than a half of the standard deviation of symptoms. If a result is not statistically significant, this does not necessarily mean that it’s a fluke. It can be a valid result, but there is uncertainty whether it’s a fluke or not. A more rigorous (statistically significant) study is necessary to either confirm or refute the validity of the result.

The fourth factor to consider when interpreting evidence from a study on human subjects is the size of the effect of a treatment. Although statistical significance helps us to answer the question “How likely is it that the change observed in patients after treatment occurred by chance?” it does not address the question “How much did the treatment improve the symptoms in the patients?” A result can be statistically significant, but the actual change of symptoms can be tiny, to the point of being imperceptible to the patients. For a treatment to be effective it should produce a change in symptoms that will be noticeable to the patients. A measure known as “clinical change” is useful in this regard. In the example about the novel antidepressant, the drug lowered the average rating of symptoms from 20 (moderate depression) to 18 (also around moderate depression). Let’s say that we repeated the clinical trial of this drug with several hundred patients each in the control and in the experimental group. The results of the trial turned out the same. Now the drop of 2 points in the average symptoms in the experimental group (from 20 to 18) is statistically significant, but it is not
clinically
significant. Both ratings, 18 and 20, are in the range of moderate depression and the change will go unnoticed by most patients. If, on the other hand, the treatment had resulted in the average rating of 10 (mild depression), then this effect would have been
clinically
significant. This means that the treatment would have produced a change noticeable to patients. A useful measure of clinical significance is clinical change. This measure shows how far a given treatment moved the patient along the path from disease to health. Scientists express this measure in percentage points. One hundred percent clinical change means full remission (disappearance of all signs and symptoms of a given disorder) and 0% clinical change means no change in symptoms.

Let’s say we decided to test an older, widely prescribed antidepressant. When tested under the same conditions as above, treatment with the older drug resulted in an average rating of depressive symptoms that equals 10 (mild depression) in the experimental group. In the placebo group, the rating of symptoms equals 20 (moderate depression, unchanged), and the results of the trial are statistically significant. Let’s also assume that the rating of 5 corresponds to mental health, or the absence of depressive symptoms. There are 15 points between the rating of 20 (moderate depression) and 5 (health). The average rating of 10 (mild depression) in the experimental group means that the older antidepressant moved the patients 10 points closer to health (20 minus 10). We will assume that the maximum possible clinical change is 100%, which is equivalent to a change of 15 points from 20 (moderate depression) to 5 (health). We can calculate the clinical change after treatment with the older antidepressant as 10 divided by 15 and multiplied by 100%, which equals 67%. This is a big clinical change. In comparison, the clinical change produced by the novel antidepressant that we talked about earlier is 2 divided by 15 and multiplied by 100%, which equals 13%. This is a tiny clinical change, which will be almost unnoticeable to the patients. Thus, the evidence of effectiveness is strong for the older antidepressant and weak for the newer one.

What about healthy human subjects? How do scientists measure the size of the effect of treatment in studies on healthy volunteers? Usually they use standard deviation. Remember that standard deviation describes how widely a measure (such as IQ) varies within a group of test subjects. Let’s say that the average IQ in a group of 200 volunteers is 100 and the standard deviation is 15 points. According to the formula of standard deviation, this usually means that 95% of people in this group have an IQ within 2 standard deviations (15 x 2) of the mean or average value (100). Put another way, if the average IQ in a group is 100 and standard deviation is 15, then 95% of people in the group have an IQ between 70 and 130. The remaining 5% of the group are outside this range. If some hypothetical treatment can increase the average IQ by 15 points in this group of 200 volunteers, this is a large effect. A change that equals the value of one standard deviation (even 80% of standard deviation) or greater is considered large. A change by one half of a standard deviation is a moderate size of the effect. One-fourth of standard deviation or less corresponds to a small effect size. In summary, we can express the size of the effect of treatment using either standard deviation or clinical change. A small effect size means that evidence of the effectiveness of the treatment in question is weak. The evidence is weak even if the results of the study are statistically significant.

The fifth criterion useful in assessing strength of evidence is publication of the results of the experiment. If researchers publish the results in a peer-reviewed scientific journal, these results are likely to be trustworthy (you can find a good database of scientific biomedical literature at
www.PubMed.gov
). Each article submitted to a scientific journal undergoes a thorough review by one or more independent experts in the field. Based on this expert opinion, the journal reaches a decision about either publication or rejection of the manuscript. This process is called peer review. The reviewers find fault with every word in the scientific article. Publication of fraudulent results, if an investigation proves this to be the case, has serious consequences for the authors. The journal will report the fraudulent activities to the authors’ employer and research funding agencies. This may result in the authors’ dismissal, demotion, or loss of research funding. The journal retracts the article containing fraud and has its record deleted from databases of scientific literature (indexing services such as PubMed).

Research results presented in media or publications other than scientific journals are less trustworthy. When researchers present unpublished results at a scientific conference, you have to be skeptical. If a scientist reports biomedical findings in a book or on a personal or commercial website, the data are less trustworthy still. When people publish research data in a book or on a website, nobody performs critical review of the data. Sometimes a publishing house does some review but it is always superficial, and there are no penalties for fraud. There are rare exceptions, such as academic textbooks: numerous experts review them thoroughly and these books do not contain any original unpublished research. Most academic textbooks are a trustworthy source of information. The problem with textbooks is that they become outdated quickly, especially in the biological and health sciences. This is due to the lengthy writing, review, and publication process associated with academic textbooks. Other trustworthy sources of information are health-related government websites that are peer-reviewed (they say so at the bottom of the page).

Other books

Cookie Dough or Die by Virginia Lowell

A Passion Rekindled by Nolan, Rontora

The Inside of Out by Jenn Marie Thorne

The Blackmail Baby by Natalie Rivers

Doctor Bear: BBW Paranormal Shape Shifter Romance (Bear Bluff Book 2) by Harmony Raines

Shadows of Forgotten Ancestors by Carl Sagan, Ann Druyan

Sinthetica by Scott Medbury

Zendikar: In the Teeth of Akoum by Robert B. Wintermute

Messiah by Vidal, Gore

In Certain Circles by Elizabeth Harrower