This blog entry purports to explain the unmixed waters Ken Smith photographed. Thanks to Jane Shevtsov for pointing this out!

]]>The foam in the picture #7 could be forming ice, in the encounter between colder salt water and just above freezing fresh water.While the ice could be forming sub-surface, the lower density of ice would ensure that it all floated to the surface along the *linear interface* where both water masses meet the air, and any ice that wandered into either body of water would quickly melt– it’s a dynamic formation!

But I suppose the same argument can be applied to any reaction that occurs along the interface, and produces a low-density product.

]]>Somehow it seems one hyperlink didn’t appear in the above comment, so I repeat:

And apart from the Marin County example, there could eventually be other locations which may be interesting for a study about correlations between nuclear waste sites and cancer.

If you want you could e.g. have a look at statistics over here (they have some local differentiation in there) and compare them with sites mentioned in this article (where I don’t know where to find the list of the sites) (see also this post on our blog about the issue.)

]]>On the Azimuth Forum, Graham Jones wrote:

]]>Roko Mijic said this on Google+:

Now, I wonder how big the variance in death figures over a 13 week period is? Since I don’t have the time to chase up data on deaths over a 13 week period for the past 10 years, I don’t know. But there is a way to get a lower bound on it. Conveniently, they provide a table of deaths in the individual weeks 12-25. For all the weeks in 2011 and 2010 combined, the standard deviation of those weekly figures is 556 deaths. Multiplying by the square root of 13, we get an estimate of the standard deviation of the sum of 13 weeks. And the answer is: 2006 deaths.

I just read this on the blog and I’m replying here.

This is not a good way to estimate the standard deviation of the weekly figures. The paper is about differences between various 14 week periods. If you combine the data for two of them, including the allegedly unusual one, your estimate for the standard deviation will likely too high, since it will be increased by differences between periods. Even if you use the periods separately, and ignore the period after Mar 2011, there may be seasonal effects within the periods which boost the estimated standard deviation. (There certainly are such effects around Xmas.)

Like Roko Mijic, I can’t be bothered to download more data. It is not difficult to do, but it is too like my day job. Using the data from the paper, I think the best way to estimate the standard deviation, allowing for seasonal effects, is to take the two ‘before’ periods (Dec 2009 – Mar 2010 and Dec 2010 – Mar 2011), subtract corresponding weeks, and estimate the standard deviation of the differences, and then divide by sqrt(2). I get 332 for the sd of the weekly figures.

Next, I simulated data from a normal with mean 11000 and sd 332, and found out how often the method in the paper produced a result which is more unusual than the value of the statistic they calculated in Appendix Table 4. This happens about 20% of the time for a two-sided result or 10% for a one-sided result.

Here is my R code for you to check.

x2010spring <- c(11010, 11097, 11075, 10712, 10940, 10549, 10637, 10389,

10491, 10352, 9894, 10781, 10178, 10290)x2011spring <- c(12137, 11739, 12052, 10928, 10743, 10826, 11251, 11300,

11132, 10839, 9538, 10770, 10981, 10779)x2010winter <- c(10323, 7942, 8288, 11557, 11299, 10110, 10832,

10524, 9877, 9802, 10198, 10586, 10699, 9969)x2011winter <- c(10702, 8339, 8194, 11804, 10775, 10689, 10420,

10295, 10700, 10952, 10762, 10779, 10639, 10274)par(mfrow=c(2,1))

maxd <- max(x2010spring, x2011spring, x2010winter, x2011winter)

mind <- min(x2010spring, x2011spring, x2010winter, x2011winter)

plot(c(x2010winter,x2010spring), ylim=c(mind, maxd))

plot(c(x2011winter,x2011spring), ylim=c(mind, maxd))esd <- sd(x2010winter-x2011winter)/sqrt(2)

n <- 0

for (i in 1:10000) {

w0 <- sum(rnorm(14, mean=11000, sd=esd))

w1 <- sum(rnorm(14, mean=11000, sd=esd))

s0 <- sum(rnorm(14, mean=11000, sd=esd))

s1 <- sum(rnorm(14, mean=11000, sd=esd))O <- s1/s0

E <- w1/w0

mean1 <- sqrt(s1)^-1 * O

mean2 <- sqrt(s0)^-1 * E

X 5.7) { n <- n+1 }

cat (X, "\n")

}n

And apart from the Marin County example, there could eventually be other locations which may be interesting for a study about correlations between nuclear waste sites and cancer.

If you want you could e.g. have a look at statistics over here (they have some local differentiation in there) and compare them with sites mentioned in this article (where I don’t know where to find the list of the sites) (see also this post on our blog about the issue.)

]]>And you may be biased against nuclear power—this Freudian slip is a hint:

I am currently against using nuclear power for commercial power generation and who knows eventually this may partially be a reason for the freudian slip. However I have rational reasons for this opinion and I tried to explain them for example here . Moreover a computation can be checked and discussed. In particular I didn’t say that I would do the calculations.

And who knows – people like for example Barry Brook (I might be wrong, but I think I saw him around the Azimuth blog) would eventually like to jump in, do the investigation and document it thoroughly? Barry Brook did great charts on his blog. And as I understood by looking for example at this

report he can’t be called biased against nuclear power.

Nad wrote:

Maybe someone could make a Sage notebook for computing statistical significance.

That would be good to try! Alas, I’m not a sage when it comes to Sage. And you may be biased against nuclear power—this Freudian slip is a hint:

]]>There is a nuclear dumb site …

Actually this reminds me of a study which investigated cancer cases in Marin county. In the study they looked at childhood Leukemia cases in seven counties in the San Francisco bay area and found:

no evidence of a non-random spatial pattern of childhood leukemia among six of these counties. The data from San Francisco County, however, produce a moderately small significance probability (0.08)

The study did not convince me. There is a nuclear dumb site in the water near San Francisco, near Marin county. And in a blog post I described that by boldly looking at the cancer registry (links in the blogpost) that:

one can actually observe spatial pecularities for Marin county, i.e. the five year death counts of cancer for Marin county seem to be increased for: skin cancer, breast cancer and Leukemia (where no difference between adult/non-adult had been made). (I looked only at death counts in order to avoid errors from over-diagnostization).

This is of course in no way a statistically sound analysis, but it looked to me as if one should look at this more carefully if this hasn’t been done in the meantime. Maybe someone could make a Sage notebook for computing statistical significance.

]]>I never got around to checking this, but here’s a blog entry documenting some other serious mistakes in the paper by Joseph J. Mangano and Janette D. Sherman:

• Michael Moyer, Are babies dying in the Pacific Northwest due to Fukushima? A look at the numbers, Observations blog, *Scientific American*, 21 June 2011.

In particular, they claim a spike in infant mortality in the Pacific Northwest after Fukushima, but Moyer examined the data and got this graph:

Read the blog entry to understand exactly what the graph means. But here’s the main point: Mangano and Sherman compare the 4 weeks right before Fukushima—the green dots—to the weeks right after it, and claim there’s a big increase in deaths!

This is called ‘cherry-picking the data’. The blue line is Moyer’s least-squares linear fit.

]]>