We use three types of diagrams to display project results graphically: relationship diagrams, phylogeny diagrams and network diagrams. These can make relationships among the haplotypes of men in the project easier to see, but reading these diagrams can be tricky, and it is worth spending a little time to make sure you know how they are constructed, what information they contain, and how to interpret them.
- Relationship diagrams are like conventional family descendants charts, except that they show only the male line; these are discussed at About Relationship Diagrams.
- Phylogeny Diagrams look something like relationship diagrams, but they are based on a hypothetical order in which STR changes might have taken place, and they do not coincide exactly with facts we know from conventional genealogies. Some of this stuff is too technical for the beginner — just skip to the Network Diagrams discussion below.
- Network Diagrams look nothing like the other two kinds of diagram. They display genetic distances in a way that makes it especially easy to see clusters of what are or may be closely related men.
Our phylogeny diagrams were prepared using Microsoft Excel. So far, we have prepared a phylogeny diagram only for those individuals in M222+ (or Ui Niall) haplogroup, which includes only the men in Ewing Groups 1, 2 and 3. Individuals in the diagram are shaded to correspond with their Group membership.
You can read about how Group membership is assigned in the Results Introduction document.
The phylogeny diagram contains all of the information that is displayed in the Results Tables, but it is displayed in a very different way. At the very top of the chart, the box labeled R1b represents the R1b1b1c modal haplotype. The line below that is labeled with the eleven mutations that distinguish this from the M222+ modal haplotype. If you will have a look at the results table on the chart, you can see that each place that the R1b and M222+ modals differ shows up on the line separating these two boxes. Similarly, the line between the M222+ box and the Ewing modal is labeled with the 7 mutations that distinguish the M222+ modal haplotype from the Ewing modal haplotype. Take a look at EL in Group 3b (off to the left of the M222+ box). What this diagram says is that he exactly matches the M222+ haplotype except he has CDYa = 37 and CDYb = 38 (or CDYa/b = 37/38 for short—the two mutations on the line just below the M222+ box, before the line takes off to the left), DYS 456 = 16 (which he shares with all of the other men in Group 3 so far), DYS 447 = 24 and DYS 576 = 17 (which he shares with RA2, the only other man in Group 3b so far), and DYS 464c = 17 (which only he has — this one is shown in red because it is a back mutation and matches the R1b1b1c modal rather than the M222+ modal, see below.) For any participant on the chart, we can see where his haplotype differs from any of those above him on the chart by just working back up the lines connecting him to the men or modals above him.
Back Mutations: Notice that on the line from M222+ to the Ewing modal haplotype, the mutations CDYa/b = 37/38 are shown in red. This signifies that they are 'back mutations,' which means that R1b had these values, they mutated to CDYa/b = 38/39 in the M222+ haplotype, and then they mutated 'back' to CDYa/b = 37/38 in the Ewing modal. Notice also that there are five more mutations below CDYa/b leading to the Ewing modal, but that a line takes off laterally to the branch containing Group 3 before these mutations are shown.
It is rather interesting that all of the M222+ Ewings not in the closely related group of Ewings (which is to say, all of the Ewings in Group 3) share the M222+ off-modal marker DYS 456 = 16, but we do not know what to make of this.
This is because all of the men in Group 3 except HM do share CDYa/b = 37/38 but do not share the other five mutations with the rest of the Ewings. HM has CDYa = 38, which we are interpreting here as a back mutation to the M222+ value.
Parallel Mutations: Notice now that in the vertical line leading down to TD, the mutation DYS 576 = 19 is shown in blue. This signifies that this is a 'parallel mutation.' This means that there are other men in this chart who also have DYS 576 = 19, but they appear in the chart in a way that shows (or more strictly speaking, makes the claim) that they did not inherit it from a common ancestor with TD — the same mutation occurred twice on 'parallel' branches, if you will, by coincidence rather than because of common descent. Indeed, this mutation appears in four different places on the chart, a couple of which have driven us crazy. Look in the first row under the Ewing modal haplotype and just to the left of it. DN in Group 1b differs from the Ewing modal only at DYS 576 = 19, and RB and GW in Group 1a also differ from the Ewing modal only at DYS 576 = 19. Since DN, RB and GW have identical haplotypes, they appear in the same node in network diagrams, which are constructed using only the Y-DNA results. Here, we have put them in separate boxes because conventional genealogy shows that RB and GW are descended from John Ewing of Carnashannagh, who cannot have had DYS 576 = 19 because most of his descendants do not have this, but rather they must have inherited this from John Ewing (born 1754).
See the Group 1a Relationship Diagram to see why this must be so.
But DN is not descended from John Ewing (born 1754), so he must have gotten DYS 576 = 19 from somewhere else; that is, there must have been a parallel mutation by coincidence in the line leading from his ancestor James Ewing of Inch Island.
Two more men on the diagram have DYS 576 = 19: RA and AL. What we know about the conventional genealogy of RA does not allow us to connect him with any of the known kinship groups in the chart, but he does not have DYS 391 = 10 (which would put him in Group 2), so we have put him in Group 1*. RA is genetic distance two from the Ewing modal haplotype, so we could have just put him on his own branch below the Ewing modal haplotype showing both mutations. Instead, we have put him below DN and RB/GW with dotted lines going to each, signifying that we do not have any evidence preferring one choice or the other, but the fact that he is only genetic distance one from each of these suggests that he might be related to either.
Notice that RA’s other mutation, DYS 390 = 24, also appears in blue, signifying that it is also a parallel mutation, and is shared by JC in Group 1e and TG in Group 2*. We could move RA’s box to show that he inherited this mutation from a common ancestor with either of these men, but this would require adducing some back mutations and other shenanigans that I will leave it to you to figure out.
Except GR in Group 1b, whose conventional genealogy shows him to be descended from James Ewing of Inch, and since the other descendants of James of Inch do not have DYS 391 = 10, we have concluded that he must have had a parallel mutation at this marker. The alternative is to argue that his conventional genealogy is mistaken.
so we have put this mutation first in the line leading to all of the men in Group 5. There is no good reason for the order in which the other two mutations leading to AL are shown, DYS 448 = 19 and DYS 576 = 19. We could as easily have put DYS 448 = 19 first, and then stuck in a branch point with one branch labeled DYS 448 = 19 going to AL, and another going to RA. Can you see what labels would have to be on that branch? It would have to have RA’s other mutation, DYS 390 =24, and also a back mutation at DYS 391, from 10 back to 11, the Ewing modal at that marker. DYS 391 is a rather slowly mutating marker, and we would like not to have to make claims about frequent mutations at that marker, especially not back mutations, because the probability that DYS 391 would mutate forward and then back within the number of generations that we are speaking about here is rather low.
Difference Between Phylogeny Diagrams and Relationship Diagrams: Though phylogeny diagrams are a little more like family trees than network diagrams are, there are important differences. One is that we do not have conventional genealogic evidence linking most of the individuals on the diagram, but we show all of them on the same tree anyway. Another is that the vertical distances between individuals on the tree have nothing to do with how many generations separate them, but rather with how many mutations separate them. Indeed, all of the individuals shown on these diagrams are roughly contemporaneous. There are no ancestors shown. Those individuals near the top of the chart that you might think represent ancestors have haplotypes that are closer to what we think the ancestral haplotype was, but this does not mean that they lived closer in time to the ancestors, but rather only that there have been fewer mutations in the line leading from the ancestor to the individuals at the top of the chart than to those at the bottom.
These phylogeny diagrams are not maximum parsimony trees: In biology, phylogeny diagrams are usually constructed by using algorithms designed to make 'maximum parsimony trees.' That is, individual haplotypes are placed on the tree so as to minimize the total number of mutations required to explain the differences among the haplotypes. These diagrams are not like that, because in cases where we have conventional genealogical evidence of a family relationship between two or more men, we have generally forced them to appear on the same branch of the tree even if this requires us to assume more mutations.
We can force an individual haplotype to appear on any branch of the chart by using a suitable combination of parallel and back mutations: If this does not bother you, you are not paying close enough attention. What I am saying here is that we can force the data into virtually any tree structure we like. Take a look at individual GR, who is the rightmost yellow-shaded individual in the chart. Notice that he differs from the Ewing modal at DYS 391 = 10. This mutation is shown on the chart in blue to signify that it is a parallel mutation, or rather that it is our hypothesis that it is a parallel mutation. You may recall that DYS 391 = 10 is what we have used to define Group 2 — the green-shaded individuals on the chart. If we did not have conventional genealogy linking GR to Group 1b, we would have put him in Group 2. Perhaps you can see that if we did that, he would show up in the second row on a new branch, one mutation (DYS 460 = 9) below RC and JM2. That is a more parsimonious solution (because it requires only one DYS 391 = 10 mutation rather than two), but choosing it is the same as arguing that GR is mistaken about his conventional genealogy. Maybe he is. Indeed, the more shenanigans of this kind we have to use to put an individual on the chart where we think they ought to fit, the more likely it is that we are mistaken.
Here is another example. Take a look at Group 2a, which consists of TW2 and all the men below him on the Phylogeny Diagram, and see what we had to do to keep them together. First trace the line from the Ewing modal to TW2. There is nothing unusual in the steps leading to TW2. First, we have DYS 391 mutating from the Ewing modal of 11 to 10, and then CDYa down from 37 to 36, CDYb down from 38 to 37, and then CDYa down another step to 35. TW2 is genetic distance 4 from the Ewing modal, with two steps at CDYa. Now to keep WR and TNS in this group, we had to adduce a back mutation at CDYb from 37 back to 38, and then to show another couple of mutations for TNS, one of them a unique back mutation to the M222+ and R1b modal value of DYS 442 = 12.
It is also really interesting to see that JN and DG also both have back mutations at different markers to the M222+ modal — this makes one wonder if we could construct an alternative tree that had Group 2a branching off before the Ewing modal somewhere — like Group 3. But DYS 19 = 15 unifies the whole closely related Ewing group.
We could also root Group 2a on JW and get rid of the back mutation CDYb = 36, but this does violence to the conventional genealogy. Start with William?, put his mutations in first, then work on down.
Our Network Diagrams were prepared using Network, a shareware program from Fluxus Engineering, which is available for free download from their web site. These diagrams include only 37-marker data; project participants that have not been tested for 37-markers do not appear in these diagrams, and only 37 markers are considered for those that have had additional markers tested. Network Diagrams are not family trees and they are not intended to show kinship relationships, but rather show relationships among haplotypes. Now, there is considerable overlap between kinship relationships and relationships between haplotypes, but these are by no means identical.
Circles and Colors: In these diagrams, for example, haplotypes are represented by circles. The size of a circle is proportional to the number of participants who have that exact haplotype. As most participants in our project have unique haplotypes, most of the circles are small and represent just one individual. The largest circle is the one representing the Ewing modal haplotype, because five project participants match the Ewing modal haplotype exactly, so this circle represents five individuals. There are also a couple of circles representing three individuals and some circles representing two individuals. The circles are color-coded to identify which of the Ewing groups each of the project participants in the diagrams belongs to.
You can read about how Group membership is assigned in the Results Introduction document.
In the circles that represent more than one individual, the colors are applied to 'pie slices,' but this is only evident when participants with identical haplotypes are in different groups, as with the Ewing modal haplotype.
Lines: The lengths of the lines connecting these circles are proportional to the genetic distance between the haplotypes represented by the circles. The Network program allows us the option of showing the actual mutations along the lines, but this makes the diagram almost impossibly busy and difficult to read. Please be careful to notice that genetic distance is not represented 'as the crow flies,' but only by the paths along the lines. The orientation of circles and their absolute proximity on the page has no meaning. The only thing that 'counts' is the distance along the lines connecting circles. In many cases, there are alternative pathways connecting two circles, though these are always the same length. The significance of alternative pathways is that they represent alternative orders in which mutations might have occurred.
Details: This is not the place to discuss details of how the Network program makes decisions, but suffice it to say that the program allows users to make many changes in the way Network calculates and displays networks. Anyone interested in the details of this is encouraged to have a look at the Network Users Manual. I have also prepared a simplified, step-by-step set of directions for how I have made the program work for me, which is available here.
Mistakes and Corrections