We use three types of diagrams to display project results graphically: relationship diagrams, phylogeny diagrams and network diagrams. These can make relationships among the haplotypes of men in the project easier to see, but reading these diagrams can be tricky, and it is worth spending a little time to make sure you know how they are constructed, what information they contain, and how to interpret them.
- Relationship diagrams are like conventional family descendants charts, except that they show only the male line; these are discussed in Relationship Diagrams.
- Phylogeny Diagrams look something like relationship diagrams, but they are based on a hypothetical order in which STR changes might have taken place, and they do not coincide exactly with facts we know from conventional genealogies. Some of this stuff is too technical for the beginner — just skip to the Network Diagrams discussion below.
- Network Diagrams look nothing like the other two kinds of diagram. They display genetic distances in a way that makes it especially easy to see clusters of what are or may be closely related men.
Our phylogeny diagrams were prepared using Microsoft Excel. So far, we have prepared a phylogeny diagram only for those individuals in M222+ (or Ui Niall) haplogroup, which includes only the men in Ewing Groups 1, 2 and 3. Individuals in the diagram are shaded to correspond with their Group membership.
You can read about how Group membership is assigned in the Results Introduction document.
The phylogeny diagram contains all of the information that is displayed in the Results Tables, but it is displayed in a very different way. At the very top of the chart, the box labeled R1b represents the R1b1b1c modal haplotype. The line below that is labeled with the eleven mutations that distinguish this from the M222+ modal haplotype. If you will have a look at the results table on the chart, you can see that each place that the R1b and M222+ modals differ shows up on the line separating these two boxes. Similarly, the line between the M222+ box and the Ewing modal is labeled with the 7 mutations that distinguish the M222+ modal haplotype from the Ewing modal haplotype. Take a look at EL in Group 3b (off to the left of the M222+ box). What this diagram says is that he exactly matches the M222+ haplotype except he has CDYa = 37 and CDYb = 38 (or CDYa/b = 37/38 for short—the two mutations on the line just below the M222+ box, before the line takes off to the left), DYS 456 = 16 (which he shares with all of the other men in Group 3 so far), DYS 447 = 24 and DYS 576 = 17 (which he shares with RA2, the only other man in Group 3b so far), and DYS 464c = 17 (which only he has — this one is shown in red because it is a back mutation and matches the R1b1b1c modal rather than the M222+ modal, see below.) For any participant on the chart, we can see where his haplotype differs from any of those above him on the chart by just working back up the lines connecting him to the men or modals above him.
Back Mutations: Notice that on the line from M222+ to the Ewing modal haplotype, the mutations CDYa/b = 37/38 are shown in red. This signifies that they are 'back mutations,' which means that R1b had these values, they mutated to CDYa/b = 38/39 in the M222+ haplotype, and then they mutated 'back' to CDYa/b = 37/38 in the Ewing modal. Notice also that there are five more mutations below CDYa/b leading to the Ewing modal, but that a line takes off laterally to the branch containing Group 3 before these mutations are shown.
It is rather interesting that all of the M222+ Ewings not in the closely related group of Ewings (which is to say, all of the Ewings in Group 3) share the M222+ off-modal marker DYS 456 = 16, but we do not know what to make of this.
This is because all of the men in Group 3 except HM do share CDYa/b = 37/38 but do not share the other five mutations with the rest of the Ewings. HM has CDYa = 38, which we are interpreting here as a back mutation to the M222+ value.
Parallel Mutations: Notice now that in the vertical line leading down to TD, the mutation DYS 576 = 19 is shown in blue. This signifies that this is a 'parallel mutation.' This means that there are other men in this chart who also have DYS 576 = 19, but they appear in the chart in a way that shows (or more strictly speaking, makes the claim) that they did not inherit it from a common ancestor with TD — the same mutation occurred twice on 'parallel' branches, if you will, by coincidence rather than because of common descent. Indeed, this mutation appears in four different places on the chart, a couple of which have driven us crazy. Look in the first row under the Ewing modal haplotype and just to the left of it. DN in Group 1b differs from the Ewing modal only at DYS 576 = 19, and RB and GW in Group 1a also differ from the Ewing modal only at DYS 576 = 19. Since DN, RB and GW have identical haplotypes, they appear in the same node in network diagrams, which are constructed using only the Y-DNA results. Here, we have put them in separate boxes because conventional genealogy shows that RB and GW are descended from John Ewing of Carnashannagh, who cannot have had DYS 576 = 19 because most of his descendants do not have this, but rather they must have inherited this from John Ewing (born 1754).
See the Group 1a Relationship Diagram to see why this must be so.
But DN is not descended from John Ewing (born 1754), so he must have gotten DYS 576 = 19 from somewhere else; that is, there must have been a parallel mutation by coincidence in the line leading from his ancestor James Ewing of Inch Island.
Two more men on the diagram have DYS 576 = 19: RA and AL. What we know about the conventional genealogy of RA does not allow us to connect him with any of the known kinship groups in the chart, but he does not have DYS 391 = 10 (which would put him in Group 2), so we have put him in Group 1*. RA is genetic distance two from the Ewing modal haplotype, so we could have just put him on his own branch below the Ewing modal haplotype showing both mutations. Instead, we have put him below DN and RB/GW with dotted lines going to each, signifying that we do not have any evidence preferring one choice or the other, but the fact that he is only genetic distance one from each of these suggests that he might be related to either.
Notice that RA’s other mutation, DYS 390 = 24, also appears in blue, signifying that it is also a parallel mutation, and is shared by JC in Group 1e and TG in Group 2*. We could move RA’s box to show that he inherited this mutation from a common ancestor with either of these men, but this would require adducing some back mutations and other shenanigans that I will leave it to you to figure out.
On the other hand, AL does have DYS 391 = 10, so he is in Group 2. We have assumed that everyone in the project with DYS 391 = 10 has inherited this from a common ancestor,
Except GR in Group 1b, whose conventional genealogy shows him to be descended from James Ewing of Inch, and since the other descendants of James of Inch do not have DYS 391 = 10, we have concluded that he must have had a parallel mutation at this marker. The alternative is to argue that his conventional genealogy is mistaken.
so have put this mutation first in the line leading to all of the men in Group 5. There is no good reason for the order in which the other two mutations leading to AL are shown, DYS 448 = 19 and DYS 576 = 19. We could as easily have put DYS 448 = 19 first, and then stuck in a branch point with one branch labeled DYS 448 = 19 going to AL, and another going to RA. Can you see what labels would have to be on that branch? It would have to have RA’s other mutation, DYS 390 =24, and also a back mutation at DYS 391, from 10 back to 11, the Ewing modal at that marker. DYS 391 is a rather slowly mutating marker, and we would like not to have to make claims about frequent mutations at that marker, especially not back mutations, because the probability that DYS 391 would mutate forward and then back within the number of generations that we are speaking about here is rather low.
Difference Between Phylogeny Diagrams and Relationship Diagrams: Though phylogeny diagrams are a little more like family trees than network diagrams are, there are important differences. One is that we do not have conventional genealogic evidence linking most of the individuals on the diagram, but we show all of them on the same tree anyway. Another is that the vertical distances between individuals on the tree have nothing to do with how many generations separate them, but rather with how many mutations separate them. Indeed, all of the individuals shown on these diagrams are roughly contemporaneous. There are no ancestors shown. Those individuals near the top of the chart that you might think represent ancestors have haplotypes that are closer to what we think the ancestral haplotype was, but this does not mean that they lived closer in time to the ancestors, but rather only that there have been fewer mutations in the line leading from the ancestor to the individuals at the top of the chart than to those at the bottom.
These phylogeny diagrams are not maximum parsimony trees: In biology, phylogeny diagrams are usually constructed by using algorithms designed to make 'maximum parsimony trees.' That is, individual haplotypes are placed on the tree so as to minimize the total number of mutations required to explain the differences among the haplotypes. These diagrams are not like that, because in cases where we have conventional genealogical evidence of a family relationship between two or more men, we have generally forced them to appear on the same branch of the tree even if this requires us to assume more mutations.
We can force an individual haplotype to appear on any branch of the chart by using a suitable combination of parallel and back mutations: If this does not bother you, you are not paying close enough attention. What I am saying here is that we can force the data into virtually any tree structure we like. Take a look at individual GR, who is the rightmost yellow-shaded individual in the chart. Notice that he differs from the Ewing modal at DYS 391 = 10. This mutation is shown on the chart in blue to signify that it is a parallel mutation, or rather that it is our hypothesis that it is a parallel mutation. You may recall that DYS 391 = 10 is what we have used to define Group 2 — the green-shaded individuals on the chart. If we did not have conventional genealogy linking GR to Group 1b, we would have put him in Group 2. Perhaps you can see that if we did that, he would show up in the second row on a new branch, one mutation (DYS 460 = 9) below RC and JM2. That is a more parsimonious solution (because it requires only one DYS 391 = 10 mutation rather than two), but choosing it is the same as arguing that GR is mistaken about his conventional genealogy. Maybe he is. Indeed, the more shenanigans of this kind we have to use to put an individual on the chart where we think they ought to fit, the more likely it is that we are mistaken.
Here is another example. Take a look at Group 2a, which consists of TW2 and all the men below him on the Phylogeny Diagram, and see what we had to do to keep them together. First trace the line from the Ewing modal to TW2. There is nothing unusual in the steps leading to TW2. First, we have DYS 391 mutating from the Ewing modal of 11 to 10, and then CDYa down from 37 to 36, CDYb down from 38 to 37, and then CDYa down another step to 35. TW2 is genetic distance 4 from the Ewing modal, with two steps at CDYa. Now to keep WR and TNS in this group, we had to adduce a back mutation at CDYb from 37 back to 38, and then to show another couple of mutations for TNS, one of them a unique back mutation to the M222+ and R1b modal value of DYS 442 = 12.
It is also really interesting to see that JN and DG also both have back mutations at different markers to the M222+ modal — this makes one wonder if we could construct an alternative tree that had Group 2a branching off before the Ewing modal somewhere — like Group 3. But DYS 19 = 15 unifies the whole closely related Ewing group.
We could also root Group 2a on JW and get rid of the back mutation CDYb = 36, but this does violence to the conventional genealogy. Start with William?, put his mutations in first, then work on down.
Our Network Diagrams were prepared using Network, a shareware program from Fluxus Engineering, which is available for free download from their web site. These diagrams include only 37-marker data; project participants that have not been tested for 37-markers do not appear in these diagrams, and only 37 markers are considered for those that have had additional markers tested. Network Diagrams are not family trees and they are not intended to show kinship relationships, but rather show relationships among haplotypes. Now, there is considerable overlap between kinship relationships and relationships between haplotypes, but these are by no means identical.
Circles and Colors: In these diagrams, for example, haplotypes are represented by circles. The size of a circle is proportional to the number of participants who have that exact haplotype. As most participants in our project have unique haplotypes, most of the circles are small and represent just one individual. The largest circle is the one representing the Ewing modal haplotype, because five project participants match the Ewing modal haplotype exactly, so this circle represents five individuals. There are also a couple of circles representing three individuals and some circles representing two individuals. The circles are color-coded to identify which of the Ewing groups each of the project participants in the diagrams belongs to.
You can read about how Group membership is assigned in the Results Introduction document.
In the circles that represent more than one individual, the colors are applied to 'pie slices,' but this is only evident when participants with identical haplotypes are in different groups, as with the Ewing modal haplotype.
Lines: The lengths of the lines connecting these circles are proportional to the genetic distance between the haplotypes represented by the circles. The Network program allows us the option of showing the actual mutations along the lines, but this makes the diagram almost impossibly busy and difficult to read. Please be careful to notice that genetic distance is not represented 'as the crow flies,' but only by the paths along the lines. The orientation of circles and their absolute proximity on the page has no meaning. The only thing that 'counts' is the distance along the lines connecting circles. In many cases, there are alternative pathways connecting two circles, though these are always the same length. The significance of alternative pathways is that they represent alternative orders in which mutations might have occurred.
Details: This is not the place to discuss details of how the Network program makes decisions, but suffice it to say that the program allows users to make many changes in the way Network calculates and displays networks. Anyone interested in the details of this is encouraged to have a look at the Network Users Manual. I have also prepared a simplified, step-by-step set of directions for how I have made the program work for me, which is available below.
Mistakes and Corrections
I use Fluxus Engineering's free shareware program, Network, infrequently enough that I have to spend a little time each time I do it figuring out how it works all over again. Today, I made myself a set of directions for displaying Y-DNA STR data in median joining network diagrams, and I thought maybe some of you would find these useful--especially if you haven't used this program before. You can get the program from www.fluxus-engineering.com.
A user guide is available at http://www.fluxus-engineering.com/Network4510_user_guide.pdf.
If you are going to make this work, you need to pay careful attention to file extensions--when you try to open the files you make, the default file type is almost always wrong, which makes it so that you can't find your file unless you change the file type in the "Open" window. Anyhow, maybe these instructions will help you get started with a useful but not very intuitive program. If not, they didn't cost you much.
- Prepare an Excel file with participant IDs in the first column and markers in FtDNA order (or use another standard order, but be sure to tell the McGee program what order to use in step 3 below).
- Copy the IDs and all the markers, but if you have the marker names in the first row, do not copy those.
- Go to http://www.mymcgee.com/tools/yutility.html?mode=ftdna_mode.
- Under "Generate Tables" check the box "Generate Fluxus phylogenetic network.ych data" and uncheck all the other boxes except "FTDNA order haplotype comparison."
- Under "General Setup" uncheck "Show Status." Depending on whether you want a modal haplotype calculated from your data to appear in your completed network diagram, you can check modal haplotype or not, as you wish. The Network program will treat the modal as if it were another individual.
- Paste your Excel data into the "Paste haplotype rows here" field and click "Execute."
- A new window will open. scroll to the bottom of the new window, labeled "Fluxus data..." select everything in that window (click in the window somewhere and press Cntrl+A), and then copy and paste it into a text file. I use my Notepad utility for this.
- With your text file open, click "Save As." Under "Save as type" choose "All files." Name the file what you wish, followed by .ych. it is crucially important that the file extension be .ych. Save it on your desktop.
- Alternatively, if you have this data in a file ending with .txt, you can just change the .txt to .ych but it HAS to be .ych.
- Open the Network program. (This is available as a free download from http://www.fluxus-engineering.com/.)
- Click on "Calculate Network" then "Network Calculations" then "Median Joining".
- in the new window, click on "File," then "Open," and then in the "File Type" pull-down menu, choose "Y-chromosomal data file (*.ych).
- Navigate to the .ych file you made with your plain text program and click open.
- It won't look as though anything has happened, but click on "Calculate network" again and you will see some action.
- A "Save As" window will appear with a default file name the same as your .ych file except with a .out extension. This is a good name to use. Just click "Save."
- You will get a message, "File saved successfully. You may proceed to the Draw Network menu to draw your *.out file." Click OK.
- A new window will open. Click on "Draw network." Another window will open.
- Click on "File," then "Open," and then change the "Files of type" pull-down menu to "MJ or RM out files (*out)." Now select the .out file you made in step 15 above and click "Open." You will get message "Diagram is not adapted to the screen. It will be redrawn." Click OK.
- You will get message "The torso has been completed. Do you wish to modify the torso?" Click NO.
- A diagram will appear. At this point, you can do a tremendous amount of fooling around. If you click on a line, it will become highlighted and you can click and drag either end of it without changing the length of the line. If you click on a node without first selecting a line, you can move the node but this will change the length of the line. As a rule, you don't want to do this, because the line length is proportional to the genetic distance. If you do something you don't like, click "Undo." I move things around so they are not stacked on top of one another too badly. If you can't get a hold of something, zoom in so that you can see what you are doing better.
- When you are done with this preliminary fooling around, click "Finalize" and you can fool around some more.
- Now a diagram with node names, "median vectors," and mutated positions appears. Uncheck "Display mutated position" or you won't be able to see anything.
- I usually find that changing font size to 10 works best for me. Then right click on a node name (not the node, the name), change font style to "Bold," check "Apply to all taxa" then OK.
- Right click on a line, then change the link color to light gray and check both "Apply to all links outside torso" and "Apply to all links within torso."
- Double click on a node and it will show you how many individuals are in the node (these will be individuals with identical haplotypes) and will tell you their IDs. In the diagram, each node is labeled only with the ID of the first individual of that haplotype.
- Right click on a node and you can change the color of the node or add pie slices and color them how you like.
- When you are done fooling around, uncheck "Show median vectors." For my purposes these just clutter up the diagram, but you can't move the nodes around while preserving the line lengths without having these show, so get rid of them last.
- If you don't have too many individuals, or if you aren't using too many markers, you might want to click "Display mutated positions" again. This will put the mutations along each link. You can change the color, size and style of type by right clicking on one of the mutation names.
- When you are all done, you can print the result from the File menu, but the result is pretty rough to my taste. I have had better luck by making a screen shot jpeg by pushing Alt+PrtScn, and then pasting it into another document (I use PowerPoint, but I suppose lots of different programs would work for this) for cropping and resizing by pushing Ctrl+V.
- When you close, Network will ask if you want to save the diagram. It will save your diagram as a .fdi file, but opening it is a little tricky. The first time you open it, your machine won't know which program to use. Tell it to always use Network to open this type of file. Then Network will open to its start screen. Click on "Draw network," then "File," then "Open," then change the "Files of Type" pull-down menu to "Formatted diagrams (*.fdi)" and then you can open your .fdi file.
Whew! No wonder it took so long to figure out how to do this. Good luck.