Periodic Patterns in Peptide Masses

Gheorghe Craciun is a mathematician at the University of Wisconsin who recently proved the Global Attractor Conjecture, which since 1974 was the most famous conjecture in mathematical chemistry. This week he visited U. C. Riverside and gave a talk on this subject. But he also told me about something else—something quite remarkable.

The mystery

A peptide is basically a small protein: a chain of made of fewer than 50 amino acids. If you plot the number of peptides of different masses found in various organisms, you see peculiar oscillations:

These oscillations have a frequency of about 14 daltons, where a ‘dalton’ is roughly the mass of a hydrogen atom—or more precisely, 1/12 the mass of a carbon atom.

Biologists had noticed these oscillations in databases of peptide masses. But they didn’t understand them.

Can you figure out what causes these oscillations?

It’s a math puzzle, actually.

Next I’ll give you the answer, so stop looking if you want to think about it first.

The solution

Almost all peptides are made of 20 different amino acids, which have different masses, which are almost integers. So, to a reasonably good approximation, the puzzle amounts to this: if you have 20 natural numbers $m_1, ... , m_{20},$ how many ways can you write any natural number $N$ as a finite ordered sum of these numbers? Call it $F(N)$ and graph it. It oscillates! Why?

(We count ordered sums because the amino acids are stuck together in a linear way to form a protein.)

There’s a well-known way to write down a formula for $F(N)$. It obeys a linear recurrence:

$F(N) = F(N - m_1) + \cdots + F(N - m_{20})$

and we can solve this using the ansatz

$F(N) = x^N$

Then the recurrence relation will hold if

$x^N = x^{N - m_1} + x^{N - m_2} + \dots + x^{N - m_{20}}$

for all $N.$ But this is fairly easy to achieve! If $m_{20}$ is the biggest mass, we just need this polynomial equation to hold:

$x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \dots + 1$

There will be a bunch of solutions, about $m_{20}$ of them. (If there are repeated roots things get a bit more subtle, but let’s not worry about.) To get the actual formula for $F(N)$ we need to find the right linear combination of functions $x^N$ where $x$ ranges over all the roots. That takes some work. Craciun and his collaborator Shane Hubler did that work.

But we can get a pretty good understanding with a lot less work. In particular, the root $x$ with the largest magnitude will make $x^N$ grow the fastest.

If you haven’t thought about this sort of recurrence relation it’s good to look at the simplest case, where we just have two masses $m_1 = 1, m_2 = 2.$ Then the numbers $F(N)$ are the Fibonacci numbers. I hope you know this: the $N$th Fibonacci number is the number of ways to write $N$ as the sum of an ordered list of 1’s and 2’s!

1

1+1,   2

1+1+1,   1+2,   2+1

1+1+1+1,   1+1+2,   1+2+1,   2+1+1,   2+2

If I drew edges between these sums in the right way, forming a ‘family tree’, you’d see the connection to Fibonacci’s original rabbit puzzle.

In this example the recurrence gives the polynomial equation

$x^2 = x + 1$

and the root with largest magnitude is the golden ratio:

$\Phi = 1.6180339...$

The other root is

$1 - \Phi = -0.6180339...$

With a little more work you get an explicit formula for the Fibonacci numbers in terms of the golden ratio:

$\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }$

But right now I’m more interested in the qualitative aspects! In this example both roots are real. The example from biology is different.

Puzzle 1. For which lists of natural numbers $m_1 < \cdots < m_k$ are all the roots of

$x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1$

real?

I don’t know the answer. But apparently this kind of polynomial equation always one root with the largest possible magnitude, which is real and has multiplicity one. I think it turns out that $F(N)$ is asymptotically proportional to $x^N$ where $x$ is this root.

But in the case that’s relevant to biology, there’s also a pair of roots with the second largest magnitude, which are not real: they’re complex conjugates of each other. And these give rise to the oscillations!

For the masses of the 20 amino acids most common in life, the roots look like this:

The aqua root at right has the largest magnitude and gives the dominant contribution to the exponential growth of $F(N).$ The red roots have the second largest magnitude. These give the main oscillations in $F(N),$ which have period 14.28.

For the full story, read this:

• Shane Hubler and Gheorghe Craciun, Periodic patterns in distributions of peptide masses, BioSystems 109 (2012), 179–185.

Most of the pictures here are from this paper.

My main question is this:

Puzzle 2. Suppose we take many lists of natural numbers $m_1 < \cdots < m_k$ and draw all the roots of the equations

$x^{m_k} = x^{m_k - m_1} + x^{m_k - m_2} + \cdots + 1$

What pattern do we get in the complex plane?

I suspect that this picture is an approximation to the answer you’d get to Puzzle 2:

If you stare carefully at this picture, you’ll see some patterns, and I’m guessing those are hints of something very beautiful.

Earlier on this blog we looked at roots of polynomials whose coefficients are all 1 or -1:

The pattern is very nice, and it repays deep mathematical study. Here it is, drawn by Sam Derbyshire:

But now we’re looking at polynomials where the leading coefficient is 1 and all the rest are -1 or 0. How does that change things? A lot, it seems!

By the way, the 20 amino acids we commonly see in biology have masses ranging between 57 and 186. It’s not really true that all their masses are different. Here are their masses:

57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

I pretended that none of the masses $m_i$ are equal in Puzzle 2, and I left out the fact that only about 1/9th of the coefficients of our polynomial are nonzero. This may affect the picture you get!

128 Responses to Periodic Patterns in Peptide Masses

1. Greg Egan says:

There’s a typo in the formula for $F(N)$ for the Fibonacci numbers; apart from the lower-case $n$ on the RHS, the second root should be $1-\phi$.

• John Baez says:

I meant to say

$\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (-\Phi)^{-(N+1)} \right) }$

but perhaps

$\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }$

is clearer. Anyway, the two roots are $\Phi$ and $-\Phi^{-1} = 1 - \Phi.$

2. Greg Egan says:

I was puzzled as to why there are so many more than 20 roots plotted if there are 20 amino acids, so I looked at the paper. It talks about adding 41 more masses because of post-translational changes to the proteins, but I think there are more than 61 roots in the plot, so there must be something else going on that adds a few more masses.

• Greg Egan says:

Oh, sorry, that was a dumb comment! I got confused between the number of terms in the polynomial and the degree.

• Dan Piponi says:

I came here to make the same “dumb comment”. Confused minds think alike?

• John Baez says:

Yes, I made the mistake too at one point.

I hadn’t noticed about the ‘post-translational changes’. I told Craciun about selenocysteine, an unusual amino acid that contains selenium. It’s only used in certain organisms, and it’s coded for by a codon that’s normally one of the stop codons. It would be really cool if the peptides found in one of these other organisms showed oscillations with a notably different periodicity.

Although it is found in the three domains of life, it is not universal in all organisms. Unlike other amino acids present in biological proteins, selenocysteine is not coded for directly in the genetic code. Instead, it is encoded in a special way by a UGA codon, which is normally a stop codon. Such a mechanism is called translational recoding and its efficiency depends on the selenoprotein being synthesized and on translation initiation factors. When cells are grown in the absence of selenium, translation of selenoproteins terminates at the UGA codon, resulting in a truncated, nonfunctional enzyme. The UGA codon is made to encode selenocysteine by the presence of a selenocysteine insertion sequence (SECIS) in the mRNA. The SECIS element is defined by characteristic nucleotide sequences and secondary structure base-pairing patterns. In bacteria, the SECIS element is typically located immediately following the UGA codon within the reading frame for the selenoprotein. In Archaea and in eukaryotes, the SECIS element is in the 3′ untranslated region (3′ UTR) of the mRNA, and can direct multiple UGA codons to encode selenocysteine residues.

3. John Baez says:

Here’s another cool puzzle.

Greg Egan worked out the roots of 200 polynomials, where in each case he randomly chose 20 masses between 50 and 200, and threw out any repeated masses. They look like this:

Compared to the pictures in The Beauty of Roots, this looks pretty chaotic and boring except for the repulsion of the real axis, the strip of positive real roots with large magnitude, and the ‘bands’ or ‘tendrils’ of complex roots with large positive real part and small imaginary part.

But, these few interesting visible features are connected to the peptide puzzle! The roots with largest magnitude contribute most to the asymptotic growth rate of $F(N),$ and the complex roots with largest magnitude—the tips of the ‘tendrils’—will give the visible oscillations. So, let’s zoom in on those:

Puzzle 3. Suppose we randomly chose 20 natural numbers

$50 < m_1 < \cdots m_{20} < 200$

and work out the roots of the polynomial

$x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \cdots + 1$

What, on average, is the imaginary of the complex roots with largest real part? And what is its standard deviation? I’m hoping that this quantity, say $d,$ has a small standard deviation and mean $\langle d \rangle$ with

$2\pi/\langle d \rangle \; \approx \; 14.28$

If that’s true, the periodicity of the peptide masses is a robust phenomenon, largely independent of the precise masses of the peptides!

Puzzle 4. How do things change as we change the numbers 20, 50 and 200 in the above problem?

• Greg Egan says:

I think you need to pick the complex root with the largest magnitude, rather than the largest real part; if not, the value is a long way off.

Using 10,000 random samples, chosen in the same way as I used for the plot (pick 20 numbers between 50 and 200 inclusive, and discard any repetition – so there might be less than 20 numbers after that step), I get:

$\langle d \rangle \; \approx \; 0.630314$

Standard deviation of $d \; \approx \; 0.32496$

$2\pi/\langle d \rangle \; \approx \; 9.96835$

• Greg Egan says:

If I change the upper limit on the mass from 200 to 500, while keeping the lower limit at 50 and the number of masses at 20, that doesn’t alter the average of the periods very much. For 1,000 random samples, I get:

$\langle d \rangle \; \approx \; 0.651247$

with a standard deviation of 0.305363, and:

$2\pi/\langle d \rangle \; \approx \; 9.64792$

If I go back to the original mass limits of 50 and 200, but double the number of masses from 20 to 40, then for 1,000 random samples, I get:

$\langle d \rangle \; \approx \; 0.566247$

with a standard deviation of 0.354997, and:

$2\pi/\langle d \rangle \; \approx \; 11.0962$

• John Baez says:

Thanks for the calculations!

I think you need to pick the complex root with the largest magnitude, rather than the largest real part; if not, the value is a long way off.

The problem is, if the real part of a root is small, that root gives oscillations that don’t grow very fast, or even decay. That wouldn’t explain the dramatic growing oscillations that we see. What we see is likely to be dominated by the roots with the largest real parts.

More precisely, we have

$F(N) = a_1 x_1^N + \cdots a_k x_k^N$

where $x_1, \dots, x_k$ are the roots and $a_k$ are some coefficients. The asymptotic behavior of $F(N)$ as $N \to \infty$ is dominated by the roots with $a_k \ne 0$ for which $\mathrm{Re}(x_k)$ is the largest. So, roots with $a_k = 0$ would throw off my simplified claim above.

• Greg Egan says:

OK. So maybe I need to calculate something more elaborate.

Just to be clear, for the specific set of roots given in the diagram from the paper, the two red diamonds that Craciun associates with the period are the non-real roots with the largest magnitude. There are other non-real roots with larger real parts (the two small diamonds with very small phases, around plus or minus 3 degrees, and magnitude around 1.014).

• John Baez says:

Hmm. I’m finding this whole phase-magnitude way of plotting a bit confusing. What matters in my mind (and I could be wrong!) are the real and imaginary parts of the roots $x_i,$ since these control the growth rate and frequency of the corresponding functions $x_i^N$.

So, I can imagine asking you to draw a picture in terms of real and imaginary parts. But even more useful might be these: a picture of Craciun’s specific roots overlaid on your picture of roots of randomly chosen polynomials. That would let us see how special the biological example is—or not.

• Greg Egan says:

Surely $x^N = \exp(N \log(x))$ has its growth rate controlled by the real part of $\log(x)$, which depends on the magnitude of $x$, rather than the real part of $x$ itself?

• John Baez says:

Now we’re even on dumb comments for the week. Thanks for straightening me out!

• John Baez says:

Greg drew a phase/magnitude plot of the roots for 1000 random examples (where in each case he randomly chose 20 masses between 50 and 200, and throw out any repeated masses), together with the roots for the polynomial he got from applying the same construction to the 20 amino acid masses (which include some repeated masses), shown in red:

Here’s a closeup of the most important part:

Unfortunately for my dreams, the complex roots that control the dominant oscillations in the peptide mass distribution do not lie on the “first tendril”, so these biological oscillations are not “controlled by math” in a simple way. They seem to lie near the edge of the “fourth tendril”—and the fourth tendril, unlike the first three, is barely visible.

So, the oscillations we see in nature are slower than they you might get with randomly chosen amino acid masses.

(There should be some fairly simple explanation of these tendrils, but I don’t see it.)

• John Baez says:

I’ve corrected my confusion about real part vs. magnitude and imaginary part vs. phase in the main post and some comments. But here I should really issue a separate, corrected, version of Puzzles 3 and 4. If we’re trying to understand the period of the oscillations in the function

$x^N$

we need to look, not at the imaginary part of the root $x,$ but its phase, or argument. If

$x = |x| e^{i \theta}$

then

$x^N = |x|^n e^{i N \theta}$

so the period of the oscillations in this function is $2\pi \theta.$ So:

Puzzle 3′. Suppose we randomly chose 20 natural numbers

$50 < m_1 < \cdots m_{20} < 200$

and work out the roots of the polynomial

$x^{m_{20}} = x^{m_{20} - m_1} + x^{m_{20} - m_2} + \cdots + 1$

What, on average, is the phase of the complex roots with largest magnitude? And what is its standard deviation? I’m hoping that this quantity, say $\theta,$ has a small standard deviation and mean $\langle \theta \rangle$ with

$2\pi/\langle \theta \rangle \; \approx \; 14.28$

If that’s true, the periodicity of the peptide masses is a robust phenomenon, largely independent of the precise masses of the peptides!

(I’m not at all optimistic that this is true, now that Greg has prepared an image comparing the average situation to the situation that arises with actual peptides. But this at least is the correct way of asking the question!)

4. Graham Jones says:

By the way, the 20 amino acids we commonly see in biology have masses ranging between 57 and 186. It’s not really true that all their masses are different. Here are their masses:
57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

When amino acids join to make peptides, a water molecule is lost at each bond, so for peptide made from n amino acids, subtract (n-1)*18.

• John Baez says:

Hmm, that’s important! I wonder if Hubler and Craciun discuss that in their paper. It’s a big deal, yet easy to deal with. You can just subtract 18 from the mass of each peptide, then do all the math I described to work out a formula for $F(N),$ then add 18. The last step would not affect the period of $F(N),$ so if you only care about the period, you can skip it.

5. 14 is the rounded atomic mass of a CH2 unit. You’re overthinking this….

• John Baez says:

The paper mentions that. However, they say the fit of 14.28 is noticeably better, and that’s what they get from their more detailed analysis.

Of course, the two explanations are not necessarily mutually exclusive.

And of course, amino acids involve a lot more than CH2’s so it’s not obvious that CH2’s explain the truly dramatic oscillations that we see! One would need to give a mathematical argument to prove this.

• Greg Egan says:

I think the paper’s comments about the structural subunits of the amino acids explain both the fact that the period is close to but not exactly 14, and also the fact that the period is distinctly non-generic (i.e. quite different from what random masses in the same range produce):

We should also point out that a period around 14 Da is reasonable since the amino acid residues can all be constructed (with rearrangement) from five groupings of elements, a kind of elemental basis: C, CH2, NH, O, and S, with nominal masses of 12, 14, 15, 16, and 32, respectively. Since S is only in two of the twenty amino acids, it is reasonable to suppose that the other four constituent components would have an effect in a periodic pattern arising from objects made up of this basis.

• Greg Egan says:

Hmm, I guess I should add that the mere fact that all the amino acid masses can be expressed as sums of elements of the set {12, 14, 15, 16, 32} is actually completely non-informative, because this is true of every integer from 50 to 200!

So my previous comment was a bit of a stretch. There must be something else about the composition of amino acids that makes the masses non-generic … and of course there are probably fairly simple reasons to do with valences why you can’t stick any old numbers of these units (C, CH2, NH, O, and S) together, let alone stick them together and get something that counts as an amino acid.

• John Baez says:

About this magic number 14: if it’s really important that CH2 has mass 14, then this might show up in lots of amino acids having masses whose differences are multiples of 14. If I were a biochemist I’d just know whether lots of amino acids differ by one or more CH2’s, but since I’m a mathematician I imagine taking the amino acid masses

57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186

computing their differences, and seeing if lots differ by multiples of 14. Let’s see:

71-57 = 14
113-99 = 14
115-101 = 14
129-115 = 14

That seems like a lot. I’m too lazy to look for differences that equal 28 or other multiples of 14.

One thing we could do is take 20 numbers between 50 and 100 with a lot that differ by 14, and generate a polynomial from those numbers in the usual way, and look at its roots. Claim: the complex root with largest magnitude has phase $\theta$ such that

$2 \pi / \theta \approx 14$

• Graham Jones says:

Multiples of 14.28 are:
57.12 71.40 85.68 99.96 114.24 128.52 142.80 157.08 171.36 185.64
That’s a good match for most of the amino acid masses, until you subtract 18 from them.

6. It’s so weird in how subtle ways combinatorics interact with our lives.

7. Philip Gibbs says:

The distribution of the masses of the amino acids might not be well modeled by a flat distribution between 50 and 200. It looks more like a normal distribution centred on about 115 with a small standard deviation (about 20?) Would that make a significant difference?

• John Baez says:

Good question! This might tend to push the largest-magnitude complex root onto the fourth tendril, which is what we see in nature. But I have no intuition for whether that’s true, because I don’t understand those tendrils.

• Philip Gibbs says:

It might be worth solving the case where the amino acid masses are all evenly separated e.g. 48,56,64,…,200. In that case the polynomial takes a simplified form if multiplied by $x^8-1$ with just four terms, and it is a polynomial in $x^8.$ Perhaps that would give a distribution with clear tendrils.

• Philip Gibbs says:

Actually more like mean = 118, stdev = 31, but the mean should not affect the period right?

• Graham Jones says:

The mean can affect the strength of the periodicity, From 48,56,64,…,200 you can only make multiples of 8, but from 51,59,67,… 203, you can make much more.

• John Baez says:

Let’s look at how having many similarly spaced amino acids affects the mass distribution of peptides. Suppose we have numbers

$a , a + b, a + 2b , \dots, a + k b$

How many ways are there to write $N$ as an ordered sum of these?

I’m too busy to dive in and work that out, so let’s start out with an easy example and hope that helps us with harder ones. Suppose we have numbers

$1, 2, \dots, k$

Let $F(N)$ be the number of ordered sums of these that add up to $N$.

Then

$F(N) = F(N-1) + \cdots + F(N-k)$

so if we take as our ansatz

$F(N) = x^N$

we get

$x^N = x^{N-1} + \cdots + x^{N - k}$

or

$x^k = x^{k-1} + \cdots x + 1$

But

$\displaystyle{ x^{k-1} + \cdots + x + 1 = \frac{x^k - 1}{x - 1} }$

so our equation becomes

$\displaystyle{ x^k = \frac{x^k - 1}{x - 1} }$

or

$x^k (x - 1) = x^k - 1$

What are the solutions like, especially those with the largest magnitude?

Hmm, I’m too busy to even tackle this in detail right now! But I’ll leave this here and maybe return to it later. It could be fun to study problems of this sort.

(For $k = 5$, the solution with greatest magnitude seems to be $x \approx 1.96$. For $k = 10,$ it seems to be $x \approx 1.999.$ So for large $k$ this root seems to be approaching 2. But this isn’t the main thing I’m interested in.)

• Scott Hotton says:

Following these calculations where $a$ and $b$ are not 1, I get for the characteristic polynomial

$(x^b)^k (x^b -1 ) - ((x^b)^k - 1)$

The $a$ drops out so unless I am mistaken the loss of water molecules in peptide bond formation does not matter when the masses are equally spaced.

The polynomial is zero if $x^b = 1$. In the $b=14$ case the fourteenth roots of unity are roots of the characteristic polynomial. And one of the fourteenth roots of unity is

$\exp(i 2\pi/14)$

Numerically I find that for $k=19$ the root of

$x^k (x -1 ) - (x^k - 1)$

with the largest magnitude is nearly 2. The positive fourteenth root of 2 is about 1.05. So

$1.05 \exp(i 2\pi/14)$

approximates a root of

$(x^{14})^{19} (x^{14} -1 ) - ((x^{14})^{19} - 1)$

and it ties for having the maximum magnitude of all the roots.

• Greg Egan says:

I think the equation for generic $a,b$ is:

$\displaystyle{ x^{a+(k-1)b} = \frac{x^{k b} - 1}{x^b - 1}}$

or equivalently:

$\displaystyle{ x^{a+(k-1)b} (x^b - 1) - (x^{k b} - 1) = 0}$

Of course if $x$ is a bth root of unity, it will still be a solution regardless of $a$.

I think $a$ does affect the other solutions, though.

• Scott Hotton says:

You’re right, the equation is

$x^{a+(k-1)b} = \frac{x^{k b}-1}{x^b-1}$

Not all of the $a$‘s cancel as I thought.

• Greg Egan says:

Now that I check more carefully, it’s not safe to multiply through by $x^b - 1$ when $x$ is a bth root of unity, and in fact those roots don’t solve the equation.

But my numerical calculations seem to agree with the idea that the largest-magnitude roots are multiples of the bth roots of unity.

• Greg Egan says:

When $a$ is divisible by $b$ (as in the example $a=56, b=14$ that I described in another comment), then you get $b$ solutions that are some constant real multiple of all the bth roots of unity. But when $a$ is not divisible by $b$, things aren’t so symmetrical.

• Scott Hotton says:

Using $a = 56, b=14, k=20$ and setting $y = x^b$ I got

$y^{23} (y - 1) - (y^{20}-1) = 0$

It looks like the root with the largest magnitude is

$y \approx 1.37995$

This would make the magnitude of $x$ about 1.0233 which looks close to the magnitude of the red diamonds in Hubler and Craciuns’ figure. Perhaps too close given that the amino acid masses are not equally spaced.

• Greg Egan says:

I agree that for $a=56,$ $b=14,$ $k=20$ there’s a set of 14 roots that all have magnitudes of about 1.0233.

For the real data, I calculate a magnitude for the largest complex root of 1.02552. I don’t know to what extent that’s just a coincidence, or whether there’s some correlation here that doesn’t depend so much on the details of the mass set as the fact that the periods are very close.

8. Greg Egan says:

If you use masses that lie in a perfect arithmetic sequence, I don’t think you get periodic behaviour in $F(N)$.

The first plot in the image above shows what would be the period from the largest-magnitude complex root for the series of masses with $a=1, b=1$, but the next plot shows $F(N)$ itself for the series with 5 masses, and like the Fibonacci numbers it doesn’t oscillate.

The third and fourth plots show what you get with a perfect arithmetic sequence with $a=56, b=14$, and what happens if you alternately add and subtract 1 from the masses.

So this seems to suggest that exact spacing gives non-generic (and perhaps always non-oscillatory) results, but adding some variation to an exactly spaced set of masses can give an oscillation that comes close to the original spacing.

• Greg Egan says:

For the exact series $a=1, b=1$, the largest magnitude of any non-real root seems to approach 1 asymptotically from below as you add more masses, while as JB noted the real root seems to approach 2 from below. So for large N, the exponential behaviour will always dominate.

For the exact series $a=56, b=14$, it turns out that there are 14 roots with magnitudes exactly equal to the magnitude of the largest real root (which is greater than 1). So that plot where I showed the period associated with the largest complex root seeming to jump back and forth between different values was an artifact of trying to pick just one root, when the “dominant period” in this case is actually multi-valued.

And in fact these roots all lie equally spaced around a circle in the complex plane. So one of them does correspond to a period of exactly 14, but there are equal-magnitude roots with periods 14/2=7, 14/3, 14/4=7/2, etc.

I don’t know what actually happens with $F(N)$ in this case, because I haven’t been able to obtain enough values by combinatorial methods to solve for the coefficients of the terms associated with all the roots.

However, for the case where you perturb the exact series by adding or subtracting 1 from the masses, I can solve for everything, and you do get clear periodic behaviour.

• John Baez says:

Thanks! By the way, my original claim that the Fibonacci numbers show no oscillation was based on my silly confusion about $x^N$ versus $\exp(N x).$ If $x$ is negative then the former oscillates even though the latter does not. The former is what matters for our problem. For the Fibonacci numbers the root

$x = 1 - \Phi \approx -0.6180339$

is negative so we do get an oscillatory effect:

$\displaystyle{ F(N) = \frac{1}{\sqrt{5}} \left( \Phi^{N+1} - (1-\Phi)^{N+1} \right) }$

However, the oscillations are exponentially damped since $|1 - \Phi| < 1.$ The oscillations can only be seen if we subtract them off, leaving numbers

$\displaystyle{\frac{1}{\sqrt{5}} \Phi^{N+1} }$

which go like this:

0.723
1.171
1.894
3.065
4.960
8.025

The Fibonacci numbers are alternately larger and smaller than these, so we can say they oscillate, in a sense!

However, these oscillations are vastly milder than the growing oscillations we see when there’s a negative or complex $x$ with $|x| > 1.$

(I imagine you know all this stuff, but other people may not have thought about it very much, and I need to correct any lingering confusion I sowed.)

9. Philip Gibbs says:

Greg Egan’s analysis above may suggest that smooth distributions don’t capture the right effect, but let’s continue with that for now.

Suppose the amino acid masses $m$ have a smooth distribution $\rho(m-\mu)$ about some mean value $\mu$ that we take to be an integer. Rewrite the equation for the roots as

$x^{\mu} = x^{\mu - m_1} + ... +x^{\mu - m_A}$

where $A = 20$ is the number of amino acids.

If $x$ is near one the smooth distribution can be used to write this as an integral.

$x^{\mu} = A \int \rho(m-\mu) x^{\mu-m} dm$

$x^{\mu} = A \int \rho(m') x^{-m'} dm'$

Take $x = \exp{\Delta}$ with $\Delta$ small

$\exp{\Delta \mu} = A \int \rho(m') \exp{-\Delta m'} dm'$

So you get an equation to solve involving a Laplace transform.

• Philip Gibbs says:

I want to try this for a normal distribution rather than a flat distribution because the result is more tractable and because I think it models the actual numbers better.

$\rho(m) = \frac{1}{\sqrt{2\pi} \sigma} \exp(-\frac{m^2}{2\sigma} )$

$\exp(\mu \Delta) = \frac{A}{\sqrt{2\pi} \sigma} \int \exp(-\frac{m^2}{2\sigma} - m \Delta) dm$

The integral can be done giving

$\exp(\mu \Delta) = A \exp(-\frac{\sigma^2 \Delta^2}{2})$

This reduces to a series of quadratic equations to solve for $\Delta:$

$\frac{\sigma^2 \Delta^2}{2} + \mu \Delta + 2\pi n i - \ln(A) = 0$

for integers $n$

In the real case $A = 20$, $\mu = 118$ and $\sigma = 31$

• Philip Gibbs says:

The real roots ($n = 0$) are
$\Delta = -0.26877$ or $\Delta = 0.023196$

Taking the second dominant root the growth factor is

$x = \exp(\Delta) = 1.0234671$

I think that is quite close.

• Philip Gibbs says:

Taking the dominant root for $n = 1$

$\Delta = 0.029388 - 0.042964 i$

This has a larger real part than the real root and an imaginary part that is far too small. So some random element might be needed here too, and that is too hard to do analytically.

10. Greg Egan says:

Suppose you start with 20 masses in an arithmetic series, and then randomly add or subtract 1, or do nothing, to each mass. The periods implied by the largest complex root, for 200 random trials of that process, are shown in the histograms above. In each case, the arithmetic series starts at $a=150$, and the trials were done with $b=14, 22, 34$.

The results for the first two cases are tightly clustered a bit below the original spacing of the masses. But for reasons I don’t understand, things go weird for the third case, where $b=34$. I’m using very high precision arithmetic (1000 digits), so it’s unlikely that the numerical approximation is misidentifying the correct root, and if there are multiple complex roots that share the same magnitude, I pick the one with the smallest argument, which ought to yield the highest possible period in each case. But for $b=34$, none of the mass sets yield a period greater than 16.85.

Of course, these periods are all based on the assumption that these particular roots will have non-zero coefficients in the final formula for $F(N)$. But generically, you’d expect that to be true.

Anyway, it does seem clear that there are cases where the dominant period in $F(N)$ will be set by the spacing between the masses.

These examples don’t readily translate to the amino acid masses, though, as the spacings between consecutive masses for them is: 14,16,10,2,2,2,10,0,1,1,13,0,1,2,6,10,9,7,23.

11. Greg Egan says:

If you start out with masses in a perfect arithmetic sequence, and if the starting value of the sequence, $a$, is divisible by the increment, $b$, then the characteristic polynomial will have $b$ roots that are all of equal magnitude, and are all multiples of bth roots of unity. That’s what I see numerically, and I think Scott Hotton has shown that analytically.

If you perturb the sequence by adding or subtracting 1 to the masses at random, you break that perfect symmetry. What seems to happen at first is that the positive real root becomes the largest, and the second-largest roots are a conjugate pair with phases of approximately $\pm 2\pi / b$.

But as you keep perturbing the masses, the $b$ roots that were originally arranged symmetrically on a perfect circle move around, and if you make enough changes, eventually you can “dethrone” the conjugate pair with the smallest phase, and one of the other pairs of roots becomes larger.

In the root plots below, the largest root(s) are shown in red, and the second-largest are shown in blue. Eventually, for both $b=14$ and $b=34$, the lowest-phase roots get beaten by some other pair of roots, and the phase is abruptly multiplied by some (roughly integer) factor.

I’m still not entirely sure why the randomly perturbed cases for $b=34$ that I plotted in the histogram in an earlier comment never had the lowest-phase roots being the largest, but it might be due to the relationship between $a$ and $b$; you’re only starting from a perfect circle when $a$ is divisible by $b$, and how much things are distorted by non-divisibility must depend on the particular numbers.

12. John Baez says:

On the basis of what Greg has done, I’m willing to guess that if we have a list of integer masses $0 < m_1 \le \cdots \le m_k$ and a large fraction of them have differences equal to some number $\Delta,$ and we plot a histogram of how many ways each natural number $N$ can be written as an ordered sum of these masses, we’ll usually get oscillations with period close to $\Delta.$ But sometimes funny things happen!

It might be fun, Greg, to take one of the test cases you examined and actually work out $F(N),$ the number of ways $N$ can be written as an ordered sum of the masses on one of your lists.

$F(N) = \sum_i a_i x_i^N$

and trying to figure out the coefficient $a_i$ for each root $x_i$ of the relevant polynomial, a less tiring approach might be to actually generate all the ordered sums up to some number $N_{\textrm{max}}$ and then count how many of these sums give $N$ for each $N \le N_{\mathrm{max}}.$ (This could be tiring for your computer, but I hope not for you.)

I’m wondering, for example, if the second-largest roots always manifest themselves as a noticeable oscillatory pattern in $F(N).$ Conceivably they might not, if the coefficient $a_i$ of $x_i^N$ for those roots $x_i$ was very small. Unless that coefficient were zero, as $N \to \infty$ those terms $a_i x_i^N$ would eventually beat the terms coming from other complex roots… but by then it might be completely overshadowed by the term coming from the real root.

A case where another pair of complex roots dethrones the lowest-phase pair would be fun to see. I don’t get what’s really happening there.

13. John Baez says:

By coincidence yesterday I ran into 3 undergrad math majors at UCR who did a project on numerical semigroups. A numerical semigroup is just a set of natural numbers that’s closed under addition and has finite complement. (If it doesn’t have finite complement, every number in it must be a multiple of some fixed natural number $k > 1,$ so we can divide everything in it by $k$ and simplify things.)

This is relevant because if we take any set of natural numbers

$m_1, \dots, m_k$

and keep adding them to our hearts content, the numbers we get form a numerical semigroup, at least if all the $m_i$ aren’t divisible by some number $k > 1.$

The Frobenius number of a numerical semigroup is the largest integer that’s not in it.

A numerical semigroup is called irreducible if it’s not the intersection of two numerical semigroups that properly contain it.

We can ask: how many numerical semigroups are there with Frobenius number $n$? And this turns out to grow in an interesting way, which has bumps with period 6:

(Click to enlarge.) This picture is from the students’ paper, where they prove the lower and upper bound shown:

• Clarisse Bonnand, Reid Booth, Carina Kaainoa, and Ethan Rooke, Bounds on the number of irreducible semigroups of fixed Frobenius number.

Needless to say, this caught my eye both because the peptide problem is related and because it shows periodic bumps!

14. Greg Egan says:

Here’s a simple example of period-doubling that I was able to solve completely:

• Greg Egan says:

To add a bit more detail about what happens at the level of the roots: the blue curve has a dominant complex root of magnitude 1.10551 and period 5.50192, while the orange curve has a dominant complex root of magnitude 1.08076 and period 2.2945; it also has a root of magnitude 1.05422 and period 5.4668. Both have non-zero coefficients in the solution, and in fact the larger-magnitude root (with the shorter period) also has a larger coefficient. That surprised me, because it seems to take a while to clearly dominate the oscillations, but I guess at low values there are always going to be complicated things going on with all the other roots.

• John Baez says:

Wow, great! I was confused about ‘period-doubling’ versus ‘frequency-doubling’. but I think I’ve got it straightened out now. Period-doubling is of course a famous thing in dynamical systems, but frequency-doubling, or in other words period-halving, also shows up, e.g. when you blow hard on a flute and the note jumps up an octave.

Of course you might argue it’s all a matter of convention, but the question is whether the ‘unexpected’ situation has period twice or half the ‘expected’ period. And here it’s half… which I find quite remarkable from one point of view.

You take the numbers 6, 11, 16, 21, 26, each of which is 5 more than the last, and you count how many ways you can get $N$ as an ordered sum of these. Unsurprisingly, perhaps, the count oscillates with period 5… well, okay, actually 5.50192.

Then you change the number 6 to 7, and repeat the count, and now the count oscillates with period 5/2! Well, okay, actually 2.2945. But the period has halved, roughly.

If you’d asked me before you started doing all these calculations, I might have guessed that the period could double, because it might take more time for something to come into alignment. (Not a very clear reason.)

But of course once you showed that the complex root with the smallest phase could be ‘dethroned’ by one with roughly twice that phase, it became clear that this funny phenomenon should actually cause ‘frequency-doubling’, hence ‘period-halving’.

But I had trouble believing that until I saw an example! I still have no intuition for why the counts start wiggling twice as fast when you change the 6 to a 7.

Anyway, it’s great.

I also find it cute that while the orange curve goes negative, it has the good sense not to do so at integer values.

And I also find it interesting how the “period 5… well, okay, actually 5.50192” mimics the story in the biological case, where an easy argument suggests that the count should oscillate with period 14, but in fact the period is 14.28.

• Greg Egan says:

I knew as soon as I’d posted that I should have said “period-halving” rather than “period-doubling”, but by then it was too late.

• John Baez says:

It’s okay, I’d been pondering period-doubling versus period-halving in this problem so you gave me an excuse to vent!

I’m still trying to figure out how to pose the question “what’s really going on in this period-halving effect?” in a useful way. I think it’s a wonderful and curious thing.

• John Baez says:

Here’s one idea. One can imagine wondering not just how many ways there are to write $N$ as an ordered sum of numbers $m_1, \dots, m_k,$ but also more about how we can write it as a sum. Could it be that then the period halves, we get pulses of two different kinds, in which some average characteristic of the ordered sums flips and flops?

For example, does the average fraction of $m_i$s appearing in the sum stay roughly constant, or does it vary significantly with $n$? This is just the simplest characteristic I can think of.

(Even if it stays constant, what’s the constant? If I take a number $N$ and write it as an ordered sum of numbers $m_1, \dots, m_k$ in all possible ways, do each of these numbers tend on average to show up in equal proportion, or do smaller ones show up more? If we were talking about unordered sums, I should be able to answer this using standard techniques of statistical mechanics, sort of like how we figure out the energy distribution of atoms in a gas.)

Guesses aside, ultimately this question is about how the expression

$F(N) = \sum_j a_j x_j^N$

‘knows’ how many ways we can write $N$ as an ordered sum of numbers $m_1, \dots, m_k,$ or in other words, how the roots $x_j$ and coefficients $a_j$ encode what is going on. Somehow they encode everything about what’s going on. But the meaning of the individual oscillating functions $x_j^N$ is obscure, since each one is affected by all the numbers $m_i.$ It’s a bit like a Fourier transform or holograph.

More concretely, suppose we’re trying to write $N$ as an ordered sum of the numbers 7, 11, 16, 21, 26. Then we can write $N+5$ as an ordered sum in a bunch of ways. Many of these ways come from taking an 11 in a sum that gives $N$ and replacing it with a 16, or taking a 16 and replacing it with a 21, or taking a 21 and replacing it with a 26. So if $N$ tended to have a lot of 11’s in it, maybe $N + 5$ will tend to have a lot of 16’s in it, and so on.

I guess it’s again wise to start with the very simple cases, where the $m_i$ are in an arithmetic progression.

15. Graham Jones says:

I had a look at Hubler and Craciun’s paper, and at a few other things. I learned:

The masses 57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186 are for residues, ie, without a H2O. So the 18s have already been subtracted.

The first graph in the blog post is not “the number of peptides of different masses found in various organisms”. Instead, they are the result of splitting up (digesting) a single protein (Lys-C or Trypsin) into peptides. Both Lys-C and Trypsin are proteases, i.e., proteins which digest proteins. They split at certain residues, which means that each peptide they make will begin (or end) with one of a few residues.

I think (from the shape of the graph) that these semi-digested proteins contain few very short peptides, and this is important for periodicity. If short ones (say <6 residues) are very rare, peptides with masses in the range 300-500 will be dominated by the residues with small mass like 57, 71, 87, 97, 99, 101, 103.

I am dubious about F(N) being used as a model. There is no process that I know of in the cell, or in the in vitro digestion, that cares about mass. On the other hand, I reckon that if you get a lot of Trypsin and wait for it to self-destruct, there will be a particular distribution of peptide lengths. And so I think a better approach (if anyone cares about the biochemistry) would look at “what masses can you make with r residues?” for various r. Convolutions, Fourier transforms, etc.

• John Baez says:

Thanks!

The masses 57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186 are for residues, i.e, without a H2O. So the 18s have already been subtracted.

Yes, Craciun mentioned this to me in email, saying:

This is exactly what we do (and is what everybody does — it’s the standard approach in mass spectrometry).

The amino acid masses after you subtract 18 from their “molecular mass” are called “(mono-isotopic) residue masses”. For example, when you look at the mass of the smallest amino acid, Glycine, in various databases you see 57Da (which is its residue mass), and not 75Da (which is its molecular mass).

For example, at http://www.matrixscience.com/help/aa_help.html they list the masses of amino acid residues, i.e., without the H at the N-terminus and OH at the C-terminus, since these are not part of the “building block” structure of the amino acid.

When we add together such amino acid masses we obtain a total residue mass for each peptide, but, as you pointed out, if you want you can just add 18 to it to obtain the molecular mass of that peptide (like they do here http://www.peptidesynthetics.co.uk/tools/ ).

I am dubious about $F(N)$ being used as a model. There is no process that I know of in the cell, or in the in vitro digestion, that cares about mass. On the other hand, I reckon that if you get a lot of Trypsin and wait for it to self-destruct, there will be a particular distribution of peptide lengths. And so I think a better approach (if anyone cares about the biochemistry) would look at “what masses can you make with r residues?” for various r. Convolutions, Fourier transforms, etc.

Interesting! But I don’t quite see what mathematical model you might be proposing here.

While the biology is exciting, unfortunately the math problems raised by $F(N)$ are so fundamental and so gripping that I’m perfectly happy thinking about those, for a while at least. Of course there’s nothing mathematically profound about how many ways you can write $N$ as an ordered sum of the numbers 57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, and 186. But the generalized question, where we consider arbitrary collections of numbers, feels important, and fun. So it’s probably been studied already. So I should poke into the literature.

By the way, these problems are also basically problems of convolutions and Fourier transforms, even though we’re not using those words a lot. $F(N)$ is obtained by iterated convolution—that’s what the recurrence relation says—and that lets express it as a linear combination of exponentials, which is a kind of Fourier transform adapted to exponentially growing functions.

• Graham Jones says:

Interesting! But I don’t quite see what mathematical model you might be proposing here.

If there is a distribution p(r) on the lengths of the peptides, and w is the distribution of residue masses, and $w^{*r}$ is the r-fold convolution of w, then I mean to replace F with $\sum_r p(r) w^{*r}$.

• John Baez says:

Okay, now I get your idea. Just to compare using the same notation, our $F$ is given by

$\displaystyle{ F(n) = \sum_{r = 1}^\infty w^{* r}(n) }$

We didn’t write it that way, but that’s what it is. So, you’re suggesting modifying it by weighting the sum by a probability distribution $p(r)$:

$\displaystyle{ F_{\textrm{Jones}}(n) = \sum_{r = 1}^\infty p(r) w^{* r}(n) }$

As long as $p(r)$ is constant for $r \le R,$ $F_{\textrm{Jones}}(n)$ will be proportional to $F(n)$ for $n \le 57R,$ since 57 is the residue mass of the lightest amino acid.

Thus, I don’t feel very bad about using our original model—at least until someone proposes a specific choice of $p.$

Also, unless $p$ oscillates, I have trouble seeing how it could create dramatic oscillations in $F_{\textrm{Jones}}(n)$ if there weren’t already such oscillations in the original $F(n).$ Since the original puzzle was to explain the observed oscillations in empirical data, I think it’s good that we can see them, and understand their appearance, in the original $F(n).$

In short, to the extent to which your model is the ‘real thing’, I think the original simplified model is a useful first step toward the real thing.

• Graham Jones says:

I think your $F$ is $F(n) = \sum_{r=0}^\infty (20^r) w^{*r} (n)$, assuming we are using 20 masses.

Perhaps I wasn’t clear. I meant by “w is the distribution of residue masses” to imply that $\sum_{r=0}^\infty w(n) = 1$, and similarly for p. $F_{Jones}$ is supposed to be a probability distribution.

• John Baez says:

Okay, in my comment I used $w$ to mean the function on the integers given by the sum of Kronecker deltas supported at the residue masses of the amino acids. So, it was 20 times your $w.$

16. arch1 says:

I’m just skimming so this might have been answered in earlier comments: Is it clear yet what underlies the apparent structure in John’s initial phase/magnitude scatterplot?

I think I see in that plot some right-facing circular or elliptical arcs, and maybe some left-facing parabolic or hyperbolic ones.

Of course one possible answer is ‘the viewer’s brain’ but I guess there are statistical tests that could help w/ that.

• Philip Gibbs says:

I think it is interesting to see what happens when all the masses are divided by 14.28

57/14.28 = 3.99159
71/14.28 = 4.97198
87/14.28 = 6.09243
97/14.28 = 6.79271
99/14.28 = 6.93277
101/14.28 = 7.07282
103/14.28 = 7.21288
113/14.28 = 7.91316
114/14.28 = 7.98319
115/14.28 = 8.05322
128/14.28 = 8.96358
129/14.28 = 9.03361
131/14.28 = 9.17366
137/14,28 = 9.59383
147/14.28 = 10.29411
156/14.28 = 10.92436
163/14.28 = 11.41456
186/14.28 = 13.02531

What you notice is that most of them are close to integers with a few exceptions at the heavier end. In particular the lightest two are very close to integers. This is what is responsible for the oscillations. You can see from the recurrence relation why that would be the case. Greg Egan’s experiments with equally spaced masses also shows it. In a growing sequence the lightest masses control the dominant terms in the recurrence relation.

Is there a biochemical reason why the masses should be close to multiples of 14.28? I can see that CH_2 and N both have masses of 14 and may dominate the chemistry, O is 16 while P and S are 31 and 32 which are a little over twice the period, pushing the average a little higher. An extra H on the end of a chain has a similar effect.

I wish these oscillations would turn out to have some significance for life. That would be very profound. However as someone said, masses don’t have a big effect on chemistry.

• John Baez says:

Do you mean this picture by Hubler and Craciun?

I don’t think we know explanations of the patterns you’re alluding to, or even whether those patterns are ‘real things’ (i.e., have interesting explanations).

• arch1 says:

Yes, that is the diagram I asked about. I agree that the explanation may well be “it’s all in your head”. Also that (as Philip observes) they are less impressive without the reflection symmetry.

• Philip Gibbs says:

I think there is still some possibility that the arcs are a real effect. If the masses were all exact multiples of some integer then we would get some straight vertical lines. So if they are mostly close to multiples of some number there is a possibility that the lines could be distorted into arcs. I just can’t see that idea quite matching up with what we see. Perhaps it is more complicated.

• Philip Gibbs says:

To give a little better idea of what I mean, here is an analysis of how you could get some arcs under the right conditions

Suppose we have a case where all the masses are near multiples of some number $P$ I.e. $m_i = n_i P + r_i$ where the multipliers $n_i$ are positive integers and the remainders $r_i$ are small, e.g. between -0.5 and 0.5. $P$ should also be quite large, e.g. greater than $2\pi$, but it does not have to be an integer.

Suppose $x_0$ a root of the polynomial equation $1 = \sum_i x^{-m_i}$ (not necessarily a real root.) Are there other routes near $x_0 e^{\frac{2\pi k i}{P}}$ for integers $k = ... -1, 0, 1, ...$ ? Try $x_1 = x_0 e^{\frac{2\pi k i}{P} + \delta_1 k + \delta_2 k^2 + ...}$ for a small complex correction $\delta_1, \delta_2, ...$ Substituting into the polynomial gives

$\sum_i x_0^{-m_i}e^{-\frac{2\pi k r_i i}{P} - \delta_1 k m_i - \delta_2 k^2 m_i - ...} = 1$

Some multiples of $2\pi i$ came out of the exponent making it small, so the exponential can now be expanded as a power series. Coefficients of powers $k$ can be matched up to find a solution. The first power gives

$\sum_i x_0^{-m_i} (\frac{2\pi r_i i}{P} + \delta_1 m_i) = 0$

$\delta_1 = -\frac{2\pi i}{P} \frac{\sum_i x_0^{-m_i} r_i}{\sum_i x_0^{-m_i} m_i}$

I wont write the rest out but you can expand the exponential to second order and set the coefficient of $k^2$ to zero to get an expression for $\delta_2$ and so on. Provided $P$ is big and $r_i$ are small this series will converge rapidly, so the result describes a series of roots following a smooth curve in the complex plane. I don’t know if this fully accounts for what is being seen. It might be useful to setup a cleaner case where the remainders $r_i$ are made smaller to see how it looks.

• Philip Gibbs says:

What does the plot look like if the masses are adjusted to

57, 71, 86, 100, 100, 100, 100, 114,114, 114, 114, 128, 128, 128, 128, 143, 143, 157, 157, 186 ?

These numbers are nearer multiples of 14.28. Does this make the arcs more visible?

• arch1 says:

I hope Greg (whose ‘red dot’ code could answer your Q without modifiation) sees your post. I am starting to see linear features as well and want to put this to rest before I start seeing horsies and doggies.

• John Baez says:

Philip wrote:

What does the plot look like if the masses are adjusted to 57, 71, 86, 100, 100, 100, 100, 114, 114, 114, 114, 128, 128, 128, 128, 143, 143, 157, 157, 186 ?

That sounds like an interesting question! I’d gotten busy with some funeral arrangements in the last week, but I haven’t lost interest in these puzzles. I hope someone with better computational skills than I can draw this plot. Maybe I’ll try to drum up interest on G+.

• Greg Egan says:

I’m working on an interactive page that will let people plot both F(N) and the roots for any set of integers they like, but it’ll be a couple of days before it’s up.

• John Baez says:

Wow! I should have guessed you were up to something, not slacking off like me. That sounds fun.

• Philip Gibbs says:

I see those arcs too. They are very convincing, but if I cover up the bottom half I am not sure they are so convincing. Is it the reflection symmetry that creates the illusion?

• Philip Gibbs says:

To continue with my other point which I somehow got mixed up in this thread, the chemistry seems to be based on chains of CH_2 (mass 14) NH (mass 15) CO (mass 28) CNH_3 (mass 29) S (mass 32) N_2C_3H_2 (mass 66) benzene (mass 76). What oscillations do you get if you use these 7 masses as your starting point?

17. Gheorghe Craciun says:

In general, for the sake of some possible applications, I always thought it may be more interesting if the the period of the oscillations does somehow depend on the peptide masses — i.e., if the period captures some property of the peptide masses, which may allow us to say something about the probability that the masses present in some sample set are (or are not) compatible with a given set of “building block” masses.

18. Gheorghe Craciun says:

In case anybody may be interested, in a different paper Shane Hubler and I have also looked at some general mathematical properties of this type of recurrence relations: “Mass distributions of linear chain polymers”, Journal of Mathematical Chemistry, 2012 http://www.math.wisc.edu/~craciun/PAPERS_NEW/Hubler_Craciun_JMathChem_2012.pdf

19. Gheorghe Craciun says:

Shane Hubler and I have pointed out in our paper that 70% of the amino acid masses are within 1.4Da of an integer multiple of 14.28 — see Table 2 and the second paragraph in section 5 “Why14.28Da?” in the paper at

I think it would be interesting to check numerically the following: if we make many random selections of 20 integers (all between 50 and 200) such that most of them are very close to an integer multiple of 14.28, is it true that for a large percentage of those choices the oscillation period is close to 14.28? (For an idea of how large the “noise” could be, you can have a look at the leftmost column labelled “Remainder” in Table 2 in our paper.)

• Philip Gibbs says:

Yes, this is the correct explanation. Furthermore it is the low mass components that most control the oscillation and more of these are within your limits for the remainder. The contribution from each mass in the recurrence relation is suppressed according to the growth factor. E.g. with a growth factor of 1.02 per dalton the mass at 57 makes a larger contribution than the mass at 103 by a factor of $1.02^{103-57} = 2.5$ The fact that the two lowest mass components are particularly close to multiples of 14.28 is most significant.

I don’t know much about biochemistry but from the Wikipedia page on amino acids it looks like the lightest components in the chains that they are formed from are CH_2 (mass 14) NH (mass 15) CO (mass 28) CNH_3 (mass 29) S (mass 32) . More complex components such as the benzene ring are much heavier so their influence is suppressed by the growth factor. Being close to twice the period (i.e. 28.56) is just as good as being close to the period itself. The frequency with which these occur as components in the amino acid chains will also be relevant.

This explanation is a little handwaving but it might be possible to do a more detailed analysis of how this works out.

• Philip Gibbs says:

Here is a more detailed analysis.

The overall growth factor $x_0$ is the largest real number root of

$\displaystyle{ x^{-m_{20}} + \cdots + x^{m_1} = 1 }$

This can be computed as

$x_0 = 1.0282107$

Next try to find a second complex root assuming that the masses are all close to integer multiples of some fixed size i.e. define

$\displaystyle{ T_i = \frac{m_i}{n_i}}$

for integers $n_i$ such that the values $T_i$ are all close to 14. Look for a second root of the form $x_1 = r e^{\frac{2\pi i}{P}}$ where $P$ will be the period of the oscillation and $r$ is real.

So solve

$\displaystyle{ \sum_i r^{-m_i} e^{-\frac{2\pi i m_i}{P}} = 1 }$

Subtract a multiple of $2\pi i$ in the exponents to make them small and simplify.

$\displaystyle{ \sum_i r^{-m_i} e^{-2\pi n_i i (\frac{T_i}{P} - 1) } = 1 }$

Since the exponents are now small use the first order approximation to the exponential

$\displaystyle{ \sum_i r^{-m_i} (1 - 2\pi n_i i (\frac{T_i}{P} - 1) = 1 }$

The real part of the equation gives $r = x_0$ (first order approximation) and the imaginary part gives

$\displaystyle{ \sum_i x_0^{-m_i} (n_i T_i) = P \sum_i x_0^{-m_i} n_i }$

In other words the period $P$ is just the weighted mean of the values $T_i$ with weights $x_0^{-m_i} n_i$

The approximation for the growth rate of the oscillations $r$ could be improved by using a second order approximation to the exponential to show that it grows just slightly slower than the growth factor $x_0$ by an amount determined by the size of the remainders.

• Greg Egan says:

Nice analysis! It gets the period fairly close for 6,11,16,21,26; it gives 5.73981 vs 5.50192.

For 7, 11, 16, 21, 26 it’s not as good; it gives 6.16646 vs the (lowest-phase) period of 5.4668.

But for 57, 71, 87, 97, 99, 101, 103, 113, 113, 114, 115, 128, 128, 129, 131, 137, 147, 156, 163, 186, it’s extremely close: 14.2533 vs 14.2767.

• Philip Gibbs says:

I calculated this weighted sum on a spreadsheet and got 14.2775

• Philip Gibbs says:

What values for $n_i$ did you use Greg? Mine were
4 5 6 7 7 7 7 8 8 8 8 9 9 9 10 10 11 11 13
For a few it was not obvious. I could have chosen a different number and the result would be different

• Greg Egan says:

I used the same numbers except for 12 rather than 11 for the second-last (mass 163). I rounded the masses divided by 14 to get the $n_i$; dividing by 14.28 instead would change that 12 to the 11 you used and leave everything else the same.

I guess there are all sorts of refinements possible in making these choices, but that’s not really the point — we can compute the roots exactly if we really want to know them, exactly. But your analysis gives a nice intuitive explanation of why the period is roughly what it is in relation to the $T_i$.

• Philip Gibbs says:

You can double all the values of $n_i$ giving a new root at half the period, but the growth rate will go down. For the case 7, 11, 16, 21, 26 you could then change $n_1$ from 2 to 3 to give a better match.

• John Baez says:

Great stuff!

• Philip Gibbs says:

Sorry ignore this one, half the comment was lost.

20. Graham Jones says:

I made a plot of the roots of many random polynomials with coefficients 1, 0 or 1, 0 or 1,…., 0 or 1, -1. In the complex plane they look quite like the Littlewood ones, but a C not an O. Here’s a link to an image in Mod/Arg space, and the R code. (Goodness knows what WordPress will make of this.)

plot(NULL, NULL, xlim=c(.5,2), ylim=c(-3.4,3.4), xlab="Mod", ylab="Arg")
for(i in 1:100000) {
n <- sample(1:40, 1, prob=1.2^(1:40))
rs <- polyroot(c(1, sample(0:1, n, TRUE), -1))
points(Mod(rs), Arg(rs), pch='.', col="#00000010")
}

• John Baez says:

Cool stuff, Graham! But I’m confused by this:

I made a plot of the roots of many random polynomials with coefficients 1, 0 or 1, 0 or 1,…., 0 or 1, -1.

Maybe there’s a typo there, but even turning on my automatic typo-correction facilities I can’t tell what class of polynomials you’re plotting roots of.

• Graham Jones says:

No typo that I can see, but I suppose it is a bit terse and cryptic – and influenced by my R code. I mean polynomials of the type described in Puzzle 2. I rewrote them like $\sum_{i=0}^{n} c_i x^i$, where $c_0=1, c_n=-1$ and the one between are either 0 or 1.

First n is sampled randomly between 3 and 42 (with a bias towards big n), then the coefficients filled in at random.

• John Baez says:

I see! You meant

coefficients 1, (0 or 1), (0 or 1),…., (0 or 1), -1.

but for some reason I read it as

coefficients (1, 0) or (1, 0) or (1,…., 0) or (1, -1).

which makes no sense at all.

21. Philip Gibbs says:

Nobody said anything about puzzle 1 yet. Here is one approach that might work.

If a polynomial has all real roots then all its derivatives have all real roots too. Differentiate the polynomial until you are left with a low degree polynomial, e.g. a cubic. Its coefficients are non-negative. In which cases can this have all real roots?

• Graham Jones says:

I have a sketchy proof that the polynomials in the puzzle only have all roots real when m=1 or 2.

Put $h(x) = \sum_{i=0}^{m-1} c_i x^i$, where each $c_i$ is 0 or 1, and $c_0=1$. Put $g(x) = x^m$.
The polynomial is then f(x) = g(x) – h(x). I get bounds on h(x) via various geometric sums.

First, f has exactly one positive root in [1,2]. h(x) is bigger than g(x) for $0<=x=2$. So f has at least one root in [1,2], and no positive ones outside. If y is the smallest root in [1,2], show all derivatives of f at y are positive. So there aren’t any more.

Next look at negative roots. f(0)=-1, and the quickest possibility for f(x) to get up to 0 as -x gets bigger is for $c_i=1$ when i is odd, otherwise 0. It follows that all negative roots lie in [-2,-.618]. Suppose f has m real roots. Then it has m-1 roots in this range, so f’ has at least m-2 roots in this range, f” has at least m-3, and so on, until you get to the (m-2)’th derivative, which has one root in the range. But this is a quadratic, and both its roots lie too close to 0 for $m >= 4$. m=3 can be checked manually.

• Graham Jones says:

Let’s try the third paragraph again.

First, f has exactly one positive root in [1,2]. h(x) is bigger than g(x) for $0 \leq x=1$, and h(x) is smaller than g(x) for $x \geq 2$. So f has at least one root in [1,2], and no positive ones outside. If y is the smallest root in [1,2], show all derivatives of f at y are positive. So there aren’t any more.

• Gheorghe Craciun says:

If we do not assume that masses are distinct, i.e., if we look at polynomial equations of the form $x^m - \sum_{i=0}^{m-1} c_i x^i = 0$ where each $c_i$ is nonnegative (instead of being restricted to 0 or 1), then there are many possible choices for the numbers $c_i$ for which all the roots are real. For example, if we consider the polynomial with m-1 roots being exactly -1, and the mth root being a positive number M, then if M is large enough we obtain a (monic) polynomial of the type considered above.
(This is easy to check: we look at the product $(x-M) * \prod_{i=0}^{m-1}(x+1)$ and note that this is product between $(x-M)$ and a polynomial with positive coefficients, so it will have the desired coefficient signs for large enough M.)

• Gheorghe Craciun says:

On the other hand, note that the polynomials we obtain this way (i.e., by using the product $(x-M) * \prod_{i=0}^{m-1}(x+1)$ for large M) correspond to a case where all positive integers less than M are masses of some “amino acids”, which is a very restrictive case.

More generally, consider the equation $x^m - \sum_{i=0}^{m-1} c_i x^i = 0$ with nonnegative coefficients $c_i$ (and with $c_0>0$). If we assume that we have two or more “missing masses”, i.e., two or more coefficients $c_i$ are zero (but still $c_0>0$), then I think we can use Descartes’s rule of signs https://en.wikipedia.org/wiki/Descartes%27_rule_of_signs to argue that not all roots can be real (because in this case Descartes’s rule of signs implies that there is exactly one positive root, and also implies there are at most m-2 negative roots, so all together there are at most m-1 real roots, since zero is not a root).

If there is exactly one “missing mass”, i.e., exactly one coefficient $c_i$ is zero (and still $c_0>0$), then again I think we can use Descartes’s rule of signs to argue that, if all roots are real, then the zero coefficient must be $c_{m-1}$ (because, like before, there is exactly one positive root, so we have to have m-1 negative roots; but this means that, according to Descartes’s rule of signs criterion for negative roots, we must have a sign change from $(-x)^m$ to the next monomial after it. But, if $c_{m-1} > 0$, we fail to get this sign change; therefore $c_{m-1} = 0$).

By the way, there do exist polynomials of this type for which all the roots are real. For example, all roots of $x^4 - 4x^2 - 4x -1$ are real (another example is $x^2 - 1$). But, this is not true for all such polynomials, for example not all roots of $x^4 - x^2 - x -1$ are real.

To summarize, I think that if all roots of $x^m - \sum_{i=0}^{m-1} c_i x^i = 0$ are real (and all $c_i$ are nonnegative, and $c_0>0$) then $c_1, c_2, ..., c_{m-2}$ must also be strictly positive.

• Gheorghe Craciun says:

Also, if we now restrict the question only to the equations of the form $x^m - \sum_{i=0}^{m-1} c_i x^i = 0$ with $c_i \in \{0,1\}$ and $c_0 = 1$ (i.e., the way Puzzle 1 was originally formulated by John), then, unless I made some mistake in the comment above, it follows that the only equations of this form that have only real roots must be of the type $x^m - \sum_{i=0}^{m-1} x^i = 0$, or of the type $x^m - \sum_{i=0}^{m-2} x^i = 0$.
But, if we multiply these equations by $(x-1)$, we have a lot of cancellations, and we are left with an equation of the form fewnomial = 0, where “fewnomial” is a sparse polynomial, i.e., a polynomial with few nonzero coefficients (in this case, after multiplying by $(x-1)$ and many cancellations, we are left with at most 4 nonzero coefficients, for any m).
Then, after using Descartes’s rule of signs for the “fewnomial”, (and maybe also after checking a couple cases of degree 3 and 4 by hand) I think we can conclude that the only polynomials of the two types mentioned above for which all the roots are real are $x-1$, $x^2-x-1$, and $x^2-1$.

22. domenico says:

I try to obtain a demonstration that some polynomial have only complex solution, but I obtain nothing.
I try to evaluate the gradient of the f(N) in the real field in the form:
$f(x) = 1-\sum_k x^{-m_k}$
$f(x_1)=f(x_0+\epsilon) = f(x_0)+\epsilon f(x_0)+\frac{\epsilon^2}{2}f(x_0)+\cdots \simeq 0$
so that if the infinitesimal displacement is in the complex field, then all the solution are complex.
The two solution for the displacements are:
$\epsilon=\frac{-\partial_x f_0 \pm \sqrt{(\partial_x f_0)^2-2 f_0 \partial^2_x f_0} }{\partial^2_x f_0}$
so that if the discriminant is ever negative, then the solutions are all in the complex plane (only oscillations):
$\Delta = (\partial_x f_0)^2-2 f_0 \partial^2_x f_0 = -\sum_{ks} m_k m_s [1+ 2 m_s] x^{-m_k-1} x^{-m_s-1}$
and this seem a dead end.

23. lwbut says:

Forgive my ignorance, most of the math here goes way over my head but i am fairly good with the simpler stuff, not to mention basic chemistry.

I have noticed a couple of points that may open new areas of thought.
Point 1. In the first graph of peptide mass numbers the peaks are approx 14 daltons apart ( which i take means there are a greater occurrence of peptides with the peak mass numbers?) 14 daltons is the equivalent mass of a carbon atom link in a peptide chain as most such single links have 2 hydrogen atoms connected to the carbon: total mass = 12 + 1 + 1 = 14. adding a single such link to a peptide to form a new stable peptide would most likely ocur when one such CH2 connection is added or subtracted.

Point 2 14.28(57) recurring is exactly 100 divided by 7.

The reason i visited the site was to look for information concerning the Fibonacci sequence. Specifically why the number 5 (and 11) is so important to the sequence? In the above post you give a fibonacci formula involving the inverse square root of this number and i believe i have discovered (and not read somewhere yet) a puzzling 5 connection, namely that for all fibonacci numbers above number 6 in the sequence (5) multiplying the number by 11 and adding the number 5 places previously results in the fibonacci number 5 places subsequent to the original! Starting with 5( x 11) + 0 = 55 (place numbers 6, 1 and 11 respectively), then 8 (x 11) + 1 = 89, 13 (x 11) + 1 = 144, ….

love.

24. Philip Gibbs says:

It’s interesting to think about what would happen if hydrogen came mostly in the form of deuterium so it had a mass number of 2 instead of 1. List the mass and valency numbers for the elements that make up most proteins

Deuterium: m = 2, v = 1
Carbon: m = 12, v = 4
Nitrogen: m = 14, v = 3
Oxygen: m = 16, v = 2
Sulphur: m = 32, v = 2

In every case it happens that m + 2v = 4 (mod 16)
What can be said about the mass of any molecule that has a tree structure, i.e. no circuits or double bonds? The number of atoms A is then equal to the number of bonds B plus one. A = B + 1

Adding up the congruence over all atoms gives M + 2V = 4A (mod16)
where M is the total mass, V is the total valency, but V = 2B
So M = 4A – 4B = 4 (mod 16)

In other words, if all hydrogen were deuterium then all tree-like molecules made from these elements would have a mass number that is 4 more than a multiple of 16.

In the real world hydrogen is not mostly deuterium and amino acids do have loops and double bonds so the situation is more complex. The difference between masses are reduced to being often approximately multiples of 14.28 instead of always exactly 16.

• John Baez says:

Wow, that’s an amazing alternate reality there!

25. Reid Booth says:

I took a look at the properties of the generators to see if I could find any reasons for the periodicity. I wrote software which calculates the remainders of dividing out by a number, i.e. 14.28, then sorts the lists, and plots them at integer intervals. Around 14.28, I see this:

We don’t see much of interest yet, because Craciun and Hubler noted this in the paper. I thought it was interesting, though, that when the data is plotted in this way there’s an obvious symmetry going on, in fact, if the generators divided by 14.28 have remainders close to 14.28, you would expect sums of them to as well, giving us periodicity. So then, the step function be in that sense the “most periodic”, as it’s generators would be multiples of all one number. So then I wrote more code to give us a sense of how far one of these plots is from the step function by calculating the area between each plot of the remainders of diving by n and the function that’s 0 on [0,10) and n on [10,20]. I then normalized the areas and plotted them, and got this plot:

Excusing the lack of a vertical axis, we see an obviously pronounced minimum around 14.28. As well, we get a few other minimums, which suggest at a few other values the remainders line up more around the top and bottom symmetrically than at others. Interestingly, Craciun and Hubler mention other periodicities, one of which is 18.8. The plot shows another local minimum there.

It may be that there’s a relationship between the equivalence classes of the generators and the periodicity of the semigroup.

• Philip Gibbs says:

Nice work. It is hard to see where all the dips lie but these could correspond to other roots. Perhaps you could list them. Harmonics are to be expected because if the generators lie close to multiples of 14.28 they will also lie close to multiples of 14.28/n, but it is interesting to see other dips.

• Reid Booth says:

Hi Philip, the other minimums are at:

10.38
11.41
12.58
14.28
16.35
18.76

I could check for dips around 14.28/n, however for low values the plot becomes a bit chaotic.

26. Greg Egan says:

Here’s an interactive page for doing calculations with ordered sums:

http://www.gregegan.net/SCIENCE/OrderedSums/OrderedSums.html

The numerical methods I’ve used aren’t as robust as in a package like Mathematica, so sometimes, especially for high-order polynomials, the program will fail to find the roots, or fail to find them with sufficient accuracy to determine the interpolation function. But I’ve tried to include enough internal consistency checks that when it fails like this it just admits it, rather than displaying anything wildly inaccurate.

• Philip Gibbs says:

Brilliant! I am finding that it fails with numbers over 140 but if for example I use

57,71,86,100,100,100,100,114,114,114,114,128,128,128,128

I get a plot with very clear arcs following the 14.26 period. So it now looks like the arcs in the original plot are probably real, if less clear. Some of them may be following the resonances that Reid Booth found, rather than the main one and there may be all sorts of other effects complicating the picture

• arch1 says:

Philip, the behavior might be device specific but I found that w/ your numbers, adding a final n which is even fails for n>168 except n=172, 180, 190. Adding any odd n > 128 seems to fail.

• arch1 says:

Thanks Greg (also for the explanatory material which pushed my understanding along some more, and the mention of Cuisenaire rods which triggered some long-dormant memories)! Phil, if I trust my eyes your numbers yield a magnitude-phase plot with 9 arcs in each (top/bottom) half, some concave left and some concave right.

Any idea what is going on at N~143 in the plot of F(N) for Phil’s numbers? Large maxima occur every seventh maximum, but this seventh large maximum is smaller than expected. It’s hard to tell what’s going on between N= 200 and 350, but I see no recurrences after that.

• Philip Gibbs says:

If you join up the dots very carefully you find that the nine arcs form one continuous curve that winds round nine times.

• arch1 says:

Interesting! Phil, do you mean that the continuous curve winds round the cylinder that I get when I join the phase=180 and phase = -180 lines of the plot?

• Philip Gibbs says:

Yes, or alternatively in the ordinary argand plot of the complex plane it is a curve going 9 times round the origin near the circle of radius one, but it is only possible to see it clearly in the expanded phase/magnitude plot.

• arch1 says:

Thanks Philip; very careful is right, but I do indeed see a graceful continuous curve with an interesting interweaving pattern. Is it true that the period is 128/14.2572 ~ 8.978 windings?

• Scott Hotton says:

Philip Gibb’s idea about using deuterium led me to a thought experiment. If the periodicity of about 14 in peptide masses is due to the CH_2 moiety having a mass of about 14 then it seems that increasing the mass of the hydrogen atoms by 1 should change the periodicity to about 16.

If the periodicity in peptide masses is due to ordered sums with the amino acid masses then perhaps we would get a result different from 16. Towards this end I computed the masses of the 20 amino acid residues (rounded to the nearest integer) assuming all of the hydrogen atoms had a single neutron.

$62, 78, 94, 106, 110, 110, 110, 122, 122,$
$126, 126, 138, 138, 142, 142, 146, 158, 170,$
$174, 198$

and the characteristic polynomial is

$x^{198} - x^{136} - x^{120} - x^{104} - x^{92} - x^{88}$
$- x^{88} - x^{88} - x^{76} - x^{76} - x^{72} - x^{72}$
$- x^{60} - x^{60} - x^{56} - x^{56} - x^{52} - x^{40}$
$- x^{28} - x^{24} - 1$

If the non-real root with the largest magnitude produced a periodicity much below 16 then we could distinguish between whether the periodicity is due to the mass of CH_2 or to ordered sums.

I tried using Greg Egan’s web page to find the roots but failed. Although it looks like there are some well defined peaks and valleys in the plot of $F(N)$ for $N$ between 800 and 900.

In actual practice the organisms would die if all the hydrogen atoms had a single neutron so this is just a thought experiment anyway.

• Philip Gibbs says:

If you remove the 198 the program finds the roots as a very close approximation. Alternatively you can divide every number by 2 because they are all even to find the polynomial in $x^2$ and multiply the period that gives by 2. The answer is 15.5734. The main thing that makes it smaller than 16 is the double bond in the carboxyl group that appears in every amino acid next to the amine group (see how Wikipedia can make me sound like an expert)

I am very curious about why organisms would die if the hydrogen was replaced by deuterium. Is there a specific reason that is easily understood?

• Scott Hotton says:

It is my understanding that the doubling of the nucleus’ mass alters the electron orbitals and chemistry of the atom enough that some biomolecules become much less stable. I don’t recall which biomolecules in particular although it does occur to me that the two strands of a DNA double helix are held together by hydrogen bonds.

Of course there is always some deuterium present and increasing it a little should be okay. It seems to me to be much more difficult to calculate the effects of a hydrogen/deuterium mixture than for either isotope alone. 15.57 and 16 seem too close to hope that a slight enhancement of deuterium would produce a noticeable effect.

• Philip Gibbs says:

It’s an interesting observation if you think that chemistry may be fine-tuned for life based on the familiar biochemistry. The problem with that idea is the chemistry is mostly determined by valency, and the fine structure constant. You would need a few more parameters to do much tuning. If the mass of the isotopes can have some influence on biochemical stability it provides more scope.

• John Baez says:

Philip wrote:

I am very curious about why organisms would die if the hydrogen was replaced by deuterium. Is there a specific reason that is easily understood?

Rats die of heavy water poisoning when about half the hydrogen in their body is replaced by deuterium: it seems the main problem is that it inhibits the mitotic spindle formation required for cell division.

But what’s really interesting is that prokaryotes (like bacteria) can survive if the only water they’re given is D2O, while eukaryotes (like plants and animals) cannot. I believe this shows that as organisms evolve and become more sophisticated, they become more fine-tuned to the precise properties of chemical reactions. (I will let other people argue about whether eukaryotes are truly more ‘sophisticated’.)

I have always wanted to write a murder mystery in which someone was poisoned by heavy water. It would be rather hard to detect unless one knew to look for it.

Effect on biological systems

Different isotopes of chemical elements have slightly different chemical behaviors, but for most elements the differences are far too small to use, or even detect. For hydrogen, however, this is not true. The larger chemical isotope-effects seen between protium (light hydrogen) versus deuterium and tritium manifest because bond energies in chemistry are determined in quantum mechanics by equations in which the quantity of reduced mass of the nucleus and electrons appears. This quantity is altered in heavy-hydrogen compounds (of which deuterium oxide is the most common) more than for heavy-isotope substitution in other chemical elements. This isotope effect of heavy hydrogen is magnified further in biological systems, which are very sensitive to small changes in the solvent properties of water.

Heavy water is the only known chemical substance that affects the period of circadian oscillations, consistently increasing the length of each cycle. The effect is seen in unicellular organisms, green plants, isopods, insects, birds, mice, and hamsters. The mechanism is unknown.[22]

To perform their tasks, enzymes rely on their finely tuned networks of hydrogen bonds, both in the active center with their substrates, and outside the active center, to stabilize their tertiary structures. As a hydrogen bond with deuterium is slightly stronger[23] than one involving ordinary hydrogen, in a highly deuterated environment, some normal reactions in cells are disrupted.

Particularly hard-hit by heavy water are the delicate assemblies of mitotic spindle formation necessary for cell division in eukaryotes. Plants stop growing and seeds do not germinate when given only heavy water, because heavy water stops eukaryotic cell division.[24][25] The deuterium cell is larger and it is a modification of the direction of division.[26][27] The cell membrane also changes, and it reacts first to the impact of heavy water. In 1972 it was demonstrated that an increase in the percentage content of deuterium in water reduces plant growth.[28] Research conducted on the growth of prokaryote microorganisms in artificial conditions of a heavy hydrogen environment showed that in this environment, all the hydrogen atoms of water could be replaced with deuterium.[29][30][31] Experiments showed that bacteria can live in 98% heavy water.[32] However, all concentrations over 50% of deuterium in the water molecules were found to kill plants.

Effect on animals

Experiments in mice, rats, and dogs[33] have shown that a degree of 25% deuteration causes (sometimes irreversible) sterility, because neither gametes nor zygotes can develop. High concentrations of heavy water (90%) rapidly kill fish, tadpoles, flatworms, and Drosophila. Mammals (for example, rats) given heavy water to drink die after a week, at a time when their body water approaches about 50% deuteration.[34] The mode of death appears to be the same as that in cytotoxic poisoning (such as chemotherapy) or in acute radiation syndrome (though deuterium is not radioactive), and is due to deuterium’s action in generally inhibiting cell division. It is more toxic to malignant cells than normal cells but the concentrations needed are too high for regular use.[33] As in chemotherapy, deuterium-poisoned mammals die of a failure of bone marrow (bleeding and infection) and intestinal-barrier functions (diarrhea and fluid loss).

Despite the problems of plants and animals in living with too much deuterium, prokaryotic organisms such as bacteria, which do not have the mitotic problems induced by deuterium, may be grown and propagated in fully deuterated conditions, resulting in replacement of all hydrogen atoms in the bacterial proteins and DNA with the deuterium isotope.[33][35]

Full replacement with heavy atom isotopes can be accomplished in higher organisms with other non-radioactive heavy isotopes (such as carbon-13, nitrogen-15, and oxygen-18), but this cannot be done for the stable heavy isotope of hydrogen.

Deuterium oxide is used to enhance boron neutron capture therapy, but this effect does not rely on the biological effects of deuterium per se, but instead on deuterium’s ability to moderate (slow) neutrons without capturing them.[33]

Toxicity in humans

Because it would take a very large amount of heavy water to replace 25% to 50% of a human being’s body water (water being in turn 50–75% of body weight[36]) with heavy water, accidental or intentional poisoning with heavy water is unlikely to the point of practical disregard. Poisoning would require that the victim ingest large amounts of heavy water without significant normal water intake for many days to produce any noticeable toxic effects.

Oral doses of heavy water in the range of several grams, as well as heavy oxygen 18O, are routinely used in human metabolic experiments. See doubly labeled water testing. Since one in about every 6,400 hydrogen atoms is deuterium, a 50 kg human containing 32 kg of body water would normally contain enough deuterium (about 1.1 g) to make 5.5 g of pure heavy water, so roughly this dose is required to double the amount of deuterium in the body.

The American patent U.S. Patent 5,223,269 is for the use of heavy water to treat hypertension (high blood pressure). A loss of blood pressure may partially explain the reported incidence of dizziness upon ingestion of heavy water. However, it is more likely that this symptom can be attributed to altered vestibular function.[37]

• arch1 says:

Thanks John. That a couple of effective-mass adjustments being 1/3600 instead of the usual 1/1800 can kill off complex creatures is really striking to me.

• Philip Gibbs says:

Thanks for the comprehensive answer. A followup question (possibly not answerable) is what would happen if life evolved in an environment of D2O instead of H2O? It would not stop the prokaryotes, but would the eukaryotes have been inhibited from evolving, or could they simply have adjusted their biochemistry slightly to adapt to living in D2O? The latter sounds more likely but the former would be more interesting for fine-tuning arguments.

• John Baez says:

I strongly suspect, but cannot prove, that eukaryotes could still have evolved with hydrogen atoms that are twice as heavy. I think they’re just too finely adapted to the chemistry of our world to survive a sudden change to this alternative world.

• Greg Egan says:

FWIW, I’ve tweaked the program so it now recognises any non-trivial GCD in the exponents of the characteristic polynomial, and uses that to solve the lower-degree polynomial whose roots can be used to construct those of the original. So the mass sequence 62,78,94,106,110,110,110,122,122,126,126,138,138,
142,142,146,158,170,174,198 now (usually) gets a direct answer.

• Greg Egan says:

After a few more tweaks to the Ordered Sums software, it can now (usually) solve for the sequence:
57, 71, 86, 100, 100, 100, 100, 114,114, 114, 114, 128, 128, 128, 128, 143, 143, 157, 157, 186.

There are still no guarantees of finding the roots every time, for this sequence or any other, but by allowing the program to fall back progressively to less stringent criteria for accepting a candidate root, it seems to be able to solve a wider range of polynomials.

• John Baez says:

Is Mathematica better at finding roots of high-degree polynomials? (It should be, because they’re paying a lot of people to solve such problems.) Do you have any idea what tricks they use? If you use something like Newton’s method, it’s an interesting problem in dynamical systems. Smale posed the question of which polynomials can be solved by ‘generally convergent iterative systems’ (which are a class of methods more sure-fire than yours), and this paper discusses how they work for quintics:

• Peter Doyle and Curtis McMullen, Solving the quintic by iteration, Acta Mathematica, 163 (1989), 151–180.

They show a polynomial equation is solvable by a ‘tower’ of generally convergent iterative systems iff its Galois group is nearly solvable: it’s built from abelian groups and A5 by iterated extensions. This includes any quintic but not, say, a typical sextic.

• Greg Egan says:

The Ordered Sums page uses the Weierstrass–Durand–Kerner method, which is an application of Newton’s method. I tried Aberth’s method, but that did much worse.

Mathematica has no problem with any of the polynomials I get from any sequences of the kind we’ve been discussing, but I suspect they have tens of thousands of lines of code, which switches between dozens of possible approaches depending on a detailed analysis of the polynomial. Of course on top of all the issues of convergence, there is the problem of how to implement these various algorithms with floating point numbers.

• John Baez says:

This is great, Greg! I’ll do some experiments with your software. It’s really cool to see this interplay between the discrete and continuous: F(Z) is continuous and often has easily understood oscillations, but if you examine its integral values only when Z is an integer, these patterns are obscured and other patterns become visible.

27. jessemckeown says:

Typographers Everywhere now want to know what the approximate periodicity of English is. Or at least those who read xkcd and azimuth do https://xkcd.com/1676/

28. I’m not a mathematician, but a chemist, so my apologies if this has already been pointed out. I just did a really simple analysis of the data. I weighted the amino acids by abundance and summed their molecular weights (minus H2O) to get a “typical” (natural abundance) amino acid in the peptide. Then I did a similar weighted sum of the number of heavy (non-hydrogen) atoms per amino acid (again weighting by abundance)–this corresponds to the number of vertices in the skeletal formula of the “typical” amino acid. Then I divided the two ($A = MW_{weighted}/v_{weighted}$) to give the “typical” molecular weight of a single vertex in the skeletal formula the peptides. The answer comes out to ~14.2 regardless of what database you use. I used Table 6 here (A = 14.2095) and the “Eukaryota” and “All” rows from Table 2 <a href=”https://academic.oup.com/nar/article/45/D1/D1112/2333931>here (A = 14.1811 and 14.1705, respectively). Interestingly, if you don’t weight by abundance, but simply take averages, you still get A = 14.1531. So that’s my incredibly simple-minded guess: the 14.28 number has to do with the “typical” vertex molecular weight of the dataset used.

This site uses Akismet to reduce spam. Learn how your comment data is processed.