I'll try to answer most of issues together.
The primary point is that this change in the fold only exposed otherwise protedted heme causing a major misfuntion. These areas that are now exposed existed already. They are not new.
Hm.... So what is a "new" binding site. I would say that a non exposed region is not a binding site as nothing can bind to it and it is not under selection pressure for any kind of binding. It a newly exposed region has something to bind it it is a new previously non existing binding site, subject to a new previously non existing selection pressure.
Can you provide a specific example of a new binding site or are you only speaking of changes to existing binding sites resulting in changed binding affinities as you indicated?
In so closely related species as human and chimp most changes are in the affinity (a lot of examples in "fine tuning" of gene expression), simply because time from divergence has not been long enough to make a big leap. Given the time available, there is no way that in one of these two an insertion of about 150 bp (which would code 50 amino acid (aa in the rest of the text) long section, a reasonable length for a new binding domain) would accumulate trough mutations. As you mentioned, mutation rate is about 1 per 100,000 base pairs, estimates for eucariotes are somewhere in that order, or even 1 per 1 000 000, per generation. But consider that insertion is a less likely event than nucleotide substitution, that less than 2% of genome is protein coding, a third of mutations in coding regions are silent because of redundancy of genetic code and a majority of mutations don't get fixed in population. That a new section of one gene coding a totally new binding site (not duplication, not changed, not duplicated and changed) would appear in such a short time... no, I can't find a specific example of that.
For genes involved in basic metabolic pathways and, any new domain is unlikely. Example: human myoglobin (154 amino acids) differs from sperm whale myoglobin in 25 aa, and in 88 when compared with shark myoglobin. Now 500 million years separates humans from sharks and still our myoglobin has no new binding domain in spite of aa changes. Selection pressure, myoglobin has a job to do. There is a twist in myoglobin evolution which I will explain later.
Now the genes that are subject to more quicker evolution are those involved in "fine tuning", that is regulation of DNA expression, in immune system and cell to cell communication. A typical genes from this group are usually code proteins with names something like:
- [insert name of protein]-like protein
- hypothetical protein, similar to [insert name of protein]
- cell surface receptor/antigen
- [a name for a short DNA sequence]box binding protein
and so on. In this type of genes, a minor change in aa sequence can change the pattern of protein-protein and protein-DNA reaction. Protein-DNA interaction can be followed with several variants of microarray experiments where changes in expressions are monitored across thousands of genes spotted on a small piece of glass. Small changes in aa result for example in 200 genes to be upregulated, 200 genes to be downregulated. Protein-protein interactions and how they are changed are not studied beyond individual cases, because there are no cheap systems available to monitor them. There was a total protein-protein interaction map done on
S. cerevisiae (yeast), but they only have 3000 or so proteins. Human 20000 or so genes code several times of that proteins, that further modify each other in mature forms. So one would have to check binding of 1 protein against at last 1 000 000 other proteins. Not possible with current technology.
New protein-protein interactions via new binding sites (totally new, not just changed affinity) are monitored only trough comparing some more researched enzymes of distant organisms. A classic example is DNA polymerase. In procariotes, it is very simple, in eucariotes, it binds with a bunch of other proteins (activators, inhibitors, helicases, gyrases....). So here you are, totally new binding sites on the DNA polymerase core protein evolved to interact with other proteins.
Of course, such a distant comparison is problematic, as it can only deal with proteins available in both organisms and because of great differences in generation time in both organisms. Currently, there are not a lot of organisms between mammals and procariotes with finished genome sequences (I can think of chicken, 2 fishes and fruit fly - hm, is it still the only invertebrate?). Comparisons between them and human has only recently begun.
To avoid distant species and different generation time, a detailed globin family study was done to monitor "evolution in progress" as my Biochemistry book by Mathews & van Holde says. Now euchariotes have much lower rate of mutations than procariotes and longer generation times (
E. coli - about half an hour, human - 20 years or so). So most evolution in eucariotes was done by small changes with big consequences in gene expression (see microarrays experiments) or by duplications of individual domains (like transmembrane chains, kinase domains...) or duplications of whole genes. This duplications have the advantage that from the pair can maintain previous function, while the other can evolve with no fear of strong negative selection pressure. Exon-intron organisation of eucariote genes seem to enable such domain duplications, while mobile elements like transposones can copy paste section of genes or whole genes. When monitoring evolution of such a gene in a single species, generation time and mutation rate is a constant.
So, gene coding ancestral globin diverged trough gene duplication 800 million years ago (estimates obtained when comparing organisms that diverged previously) in myoglobin and hemoglobin. Myoglobine can is a single chain protein holding one heme group. Hemoglobin evolved further as 500 million years ago when it was duplicated into alpha and beta chains. These developed new binding domains that enabled them to form a 2-alpha chains, 2 beta chains and 4 heme groups protein. These binding domains are not present in myoglobine that is incapable of binding with other globins. But most of differences in genes/proteins in human (at least 9 functioning globin variants and a bunch of fast evolving pseudogenes) is in affinities, time of expression of globin and tissue where it is expressed. This is where diversity of higher organisms, like mammals, comes from.