Getting Personal: Omics of the Heart: Ep 18 Khetarpal

Ep 18 Khetarpal

Jul 23, 2018

Jane: Hi, everyone. Welcome to Episode 18 of Getting Personal: Omics of the Heart. I'm Jane Ferguson, and this podcast is brought to you by the Circulation: Genomic and Precision Medicine Journal and the American Heart Association Counsel on Genomic and Precision Medicine. It is July 2018, which means that the best possible place to be listening to this episode is at the beach, but failing that I can also recommend listening on planes, during your commute, while exercising or while drinking a nice cup of tea.

So before I get into the papers we published this month, I want to ask for your help. If you're listening to this right now, hi, that means you, we're a year and a half into podcasting and I would love to know what content you like and where we could improve things. We have a poll up on Twitter this week, and I would really appreciate your input. If you're listening to this a little bit later and miss the active voting part of the poll, you can still leave suggestions.

Okay, so what I would like you to do right now is to go to Twitter. You can find us as Circ_Gen and locate the poll. If you don't already follow us on Twitter, go do that now too. We want you to let us know what content we should focus on and what is most useful to you, so go ahead and pick your favorites from the options and also please reply or tweet at us with other thoughts and suggestions.

Options include giving summaries of the recent articles like I'm about to do later this episode, conducting interviews with authors of recently published papers, interviews with people working in cardiovascular genomics, broader topics. For example, to get their insight on career paths and lessons learned along the way.

And something we have not done yet on the podcast but are considering, would be to record podcasts that focus on particular topics in genomics and precision medicine. These could give some background on an emerging field or technology and we could talk to experts who are leading particular innovations in the field. So, if that sounds good to you, let me know! If you're not on Twitter, I don't want to exclude you, so you can email me at jane.f.ferguson@vanderbilt.edu and give me your thoughts that way. I'm looking forward to hearing from you.

Okay, so on to the July 2018 issue of Circ.: Genomic and Precision Medicine. First up is a PhWAS from Abrahim Rao, Eric Ingelsson, and colleagues from Stanford. The discovery of the PCSK9 gene as a regulator of cholesterol levels has led to a new avenue of LDL lowering therapies through PCSK9 inhibition. However, some studies suggest that long term use of PCSK9 inhibitors could have adverse consequences. Because of the long follow-up time required, it will take many more years to address this question through clinical studies. However, genetic approaches offer a fast and convenient alternative to address the issue.

In this paper, entitled: "Large Scale Phenome-Wide Association Study of PCSK9 Variants Demonstrates Protection Against Ischemic Stroke," the authors use genetic and phenotype data from over 300,000 individuals in the UK BioBank to address whether genetic loss of function variants in PCSK9 are associated with phenotypes including coronary heart disease, stroke, type II diabetes, cataracts, heart failure, atrial fibrillation, epilepsy, and cognitive function.

The missense variant RS11591147 was associated with protection against coronary heart disease and ischemic stroke. This SNP also associated with type II diabetes after adjustment for lipid medication status. Overall, this study recapitulated the associations between PCSK9 and coronary disease, and revealed an association with stroke.

Previous studies suggested use of LDL lowering therapies may increase risk of cataracts, epilepsy, and cognitive dysfunction, but there was no evidence of association in this study. Overall, this study provides some reassurance that the primary effect of PCSK9 is on lipids and lipid related diseases, and that any effects on other phenotypes appear to be modest at best. While a PhWAS can't recapitulate a clinical trial, what this study indicates is that PCSK9 inhibition is an effective strategy for CVD prevention, which may confer protection against ischemic stroke and does not appear to convey increased risk for cognitive side effects.

Next up we have a manuscript form Jason Cowan, Ray Hershberger, and colleagues from Ohio State University College of Medicine. Their paper, "Multigenic Disease and Bilineal Inheritance in Dilated Cardiomyopathy Is Illustrated in Non-segregating LMNA Pedigrees," explored pedigrees of apparent LMNA related cardiomyopathy identifying family members who manifested disease, despite not carrying the purported causal LMNA variant. Of 19 pedigrees studies, six of them had family members with dilated cardiomyopathy who did not carry the family's LMNA mutation. In five of those six pedigrees, the authors identified at least one additional rare variant in a known DCM gene that was a plausible candidate for disease causation.

Presence of additional variants was associated with more severe disease phenotype in those individuals. Overall, what this study tells us is that in DCM, there is evidence for multi-gene causality and bilineal inheritance may be more common than previously suspected. Future larger studies should consider multi-genic causes and will be required to fully understand the genetic architecture of DCM.

Yukiko Nakano, Yasuki Kihara, and colleagues from Hiroshima University published a manuscript detailing how HCN4 gene polymorphisms are associated with tachycardia inducted cardiomyopathy in patients with atrial fibrillation. Tachycardia induced cardiomyopathy is common in subjects with atrial fibrillation, but the pathophysiology is poorly understood. Recent studies have implicated the cardiac hyperpolarization activated cyclic nucleotide gated channel gene, or HCN4, in atrial fibrillation and ventricular function.

In this paper, the authors enrolled almost 3,000 Japanese subjects with atrial fibrillation, both with and without tachycardia-induced cardiomyopathy, as well as non-AF controls. They compared frequency of variants in HCN4 in AF subjects with or without tachycardia-induced cardiomyopathy, and found a SNP, RS7164883, that may be a novel marker of tachycardia-induced cardiomyopathy in atrial fibrillation.

Xinyu Yang, Fuli Yu, and coauthors from Tianjin University were interested in finding causal genes for intracranial aneurysms, and report their results in a manuscript entitled, "Rho Guanine Nucleotide Exchange Factor ARHGEF17 Is a Risk Gene for Intracranial Aneurysms." They sequenced the genomes of 20 Chinese intracranial aneurysm patients to search for potentially deleterious, rare, and low frequency variants. They found a coding variant in the ARHGEF17 gene which was associated with associated with increased risk in the discovery sample, and which they replicated in a sample of Japanese IA and in a larger Chinese sample.

They expanded this to other published studies, including individuals of European-American and French-Canadian origin and found a significantly increased mutation burden in ARHGEF17 in IA patients across all samples. They were interested in further functional characterization of this gene and found that Zebra fish ARHGEF17 was highly expressed in blood vessels in the brain. They used morpholinos to knock down ARHGEF17 in Zebra fish, and found that ARHGEF17 deficient Zebra fish developed endothelial lesions on cerebral blood vessels, and showed evidence of bleeding consistent with defects in the vessel. This study implicates ARHGEF17 as a cerebro-vascular disease gene which may impact disease risk through effects on endothelial function and blood vessel stability.

Sumeet Khetarpal, Paul Babb, Dan Rader, Ben Voight, and colleagues from the University of Pennsylvania used targeted resequencing to look at determinants of extreme HDL cholesterol in their aptly titled manuscript, "Multiplexed Targeted Resequencing Identifies Coding and Regulatory Variation Underlying Phenotypic Extremes of HDL Cholesterol in Humans." Stay tuned because we're gonna hear more about this paper from the first author Dr. Sumeet Khetarpal later this episode.

Rounding out this issue we have a Perspective article from Chris Haggerty, Cynthia James, and coauthors from Geisinger and Johns Hopkins Medical Center entitled, "Managing Secondary Genomic Findings Associated With Arrhythmogenic Right Ventricular Cardiomyopathy: Case Studies and Proposal for Clinical Surveillance." In this paper the authors discuss the challenges for returning findings from clinical sequencing for arrhythmogenic right ventricular cardiomyopathy, presenting case studies exemplifying these challenges. They also propose a management approach for returning clinical genomic findings, and discuss new innovations in the light of precision medicine.

We also published a review article by Pradeep Natarajan, Siddhartha Jaiswal, and Sekar Kathiresan from MGH on "Clonal Hematopoiesis Somatic Mutations in Blood Cells and Atherosclerosis", which discusses recent advances in our knowledge on the role of somatic mutations in cardiovascular disease risk.

Finally, we have an update on some pharmacogenomics research into CYP2C19 Genotype-Guided Antiplatelet Therapy by Craig Lee and colleagues which we published a few months ago. Dr. Lee was also featured on Podcast episode 15 in April of this year.

Jernice Aw and colleagues from Khoo Teck Puat Hospital, Singapore shared from complimentary data from their sample of 247 Asian subjects which found the risk for major adverse cardiovascular events was over 30-fold greater for poor metabolizers, as defined by CYP2C19 genotype on clopidogrel, as compared to those with no loss of function allele.

You can read that letter and the response from Dr. Lee and colleagues online now. And, as usual, all of the original research articles come with an editorial to help give some more background and perspective to each paper. Go to circgenetics.ahajournals.org to find all the papers and to access video summaries and more.

Our interview is with Dr. Sumeet Khetarpal who recently completed his MD-PhD training at the University of Pennsylvania, and is currently a resident in Internal Medicine at Massachusets General Hospital. Sumeet kindly took some time out from his busy residency schedule to talk to me about his recently published paper, and to explain how molecular inversion probe target capture actually works.

So I am here with Dr. Sumeet Khetarpal who is co-first author on a manuscript entitled, "Multiplexed Targeted Resequencing Identifies Coding and Regulatory Variation Underlying Phenotypic Extremes of High-Density Lipoprotein Cholesterol in Humans."

Welcome Sumeet, thanks for taking the time to talk to me.

Dr. Khetarpal: Thank you so much Dr. Ferguson, it's really a pleasure to talk to you today.

Jane: Before we get started, maybe you could give a brief introduction on yourself and then how you started working on this paper.

Dr. Khetarpal: Sure, so this work actually was a collaboration that came out at the University of Pennsylvania that I was involved with through my PhD thesis lab, my mentor was Dan Rader, and also a lab that is a somewhat newer lab at Penn, Benjamin Voight's lab which is a strong sort of computational genomic lab.

This work actually highlights the fun of collaborating within your institution. We had, for some time, been interested in developing a way to sequence candidate genes. Both known genes and also new genes that have come out of genome-wide association studies that underlie the extremes of HDL cholesterol, namely very high cholesterol versus low HDL cholesterol. We've been looking for a cost-effective and scalable way to do this.

Independently, Ben, who is very interested in capturing the non-coding genome, was interested in developing a method to better understand the non-coding variation, both common and rare variation that may be present at all of these new loci that have come out for complex traits such as HDL.

We, at some Penn event several years ago, were talking about our common interest and Ben had actually identified this work that had come out of J. Shendure's lab at the University of Washington. A paper by the first author, Brian O'Rouke, in Science in 2012 in which they had developed an approach that involved molecular inversion probes, or MIPs, to capture regions of the genome related to target the gene that they were interested in studying for autism-spectrum disorders.

They had applied this largely to coding regions of, I think, almost 50 genes and almost 2,500 patients with the feedback to do deep, targeted sequencing. So our thought was, well, we could try to apply this approach and adapt it to capture non-coding regions, and also see if we can expand the utility of this approach to study the phenotypic extremes of a complex trait such as HDL cholesterol.

Jane: Yeah, that's really cool. I love how you saw this method in a totally different application and then realized that there was expertise at Penn that you could bring together to apply this in a different way.

I'd love to hear more about this MIP, the molecular inversion probe. How does it work? How difficult is it to actually do? Is it very different from normal library preparation for sequencing or is it something that's actually relatively easy to apply?

Dr. Khetarpal: These MIP probes are oligonucleotide probes that capture your region of interest by flanking them and capturing by gap filling. There's a method to capture parts of the genome in a library-free way. They do ultimately involve barcoding the way traditional library-based target capture does and then deep sequencing.

But the most impressive feature about them is just that they're very scalable. I think in the original paper by O'Rouke and colleagues they were able to sequence their set of genes and their set of samples at about a sample preparation cost of $1 per sample, and we were actually able to do about the same for our study.

The main utility of the approach is just the economic scalability, and the ability to customize your panel to capture several regions of the genome that are adjacent to each other.

Jane: Right, so how many genes or regions can you multiplex at the same time? Is it just one prep, like you just design all of your oligos, you put them all together in one reaction, or are you doing separate reactions for each region?

Dr. Khetarpal: We're actually doing all of our oligos together. In our case, I think it ended up being around the order of almost 600 oligos together to capture our ultimately 50kB of genomic territory that we wanted to capture. Really, our study was kind of a pilot experiment where we picked a few genes or regions of high interest to us, both known genes that effect HDL and also those that have been implicated in genome-wide association studies that were of high interest to our labs.

I think that this approach could actually be expanded to capture much more genomic territory in a single capture reaction. We sort of touched the surface probably of what we could do.

Jane: Wow, that's cool! And then for sequencing it, I guess it's really just a function of how many samples you wanna multiplex and how much you want to sequence from each region. So I suppose the way you did it, you had about 50kB and then you had over 1,500 participants and you were able to do those on a single HiSeq run, right?

Dr. Khetarpal: Right.

Jane: So I suppose if you'd done more genetic regions, you would've had fewer people and vice versa so you can balance that out depending on if you're having more samples or more genomic regions to sequence.

Dr. Khetarpal: Exactly, in certain ways the design of our experiment we had a limited sample size that did afford us some luxury in terms of knowing that we would have deep coverage of the region that we were targeting. I think that's always a critical question in sort of targeted or just sequencing in general. The balance between the number of regions that you want to sequence and the number of samples you want to sequence is going to dictate what your sequencing depth with be.

Jane: Right, okay so I guess if we go on to what you actually found, how'd you pick this? You picked seven regions which encompasses eight candidate genes for HDL, so how did you select those?

Dr. Khetarpal: The population that we were studying, the samples we were looking to sequence were largely individuals which fall into two bins if you will. One was extremely high HDL cholesterol which we're defining as the greater than the 95th percentile, but really there was a range within that population that spanned individuals with probably greater than the 99th percentile of HDL.

We were hoping as a proof of principle effort to identify variation in genes that were known causes of high HDL cholesterol in prior studies of Mendelian genes for HDL. So genes such as LIP gene which encodes endothelial lipase or CETP or SCARB1, these 3 genes are, at this point, well-known genes that loss of function mutations are associated with extremely high HDL. We thought that capturing some of those genes would potentially both provide a level of validation for the approach, hypothesizing that individuals with high HDL would be enriched with these genes, but also may allow us to find new variants in these genes or also non-coding variants which has not previously been studied before.

Some of the genes came out from that line of thinking, then some of the other genes happened to be genes that in the Rader laboratory we had a vested interest in understanding the genetic variation that might link the genes to HDL, which may not have necessarily come out before.

For example, the gene GALNT2 is one of the first g-loss implicated novel genes for HDL, novel as in the earliest g-loss study for plasma lipids had identified that gene as associated with HDL but it never had come out before as being so. Our laboratory was very interested in better understanding the genetic relationship between genes such as GALNT2 and several of the others such as CCDC92 and ZNF664 with HDL.

It ended up being a hodge-podge or a sampling of genes that had at some level been implicated with HDL, but really it's just a proof of principle that this method could work for both identifying variation in known genes and also less studied ones.

Jane: You validated the MIP genotyping by exome genotyping, and then saw concordance of over 90%, is that lower than you were expecting? Was it about what you were expecting based on these two different methods of genotyping?

Dr. Khetarpal: Yes, I think we were expecting somewhere on the order of 90 plus percent. It's hard to know why we just hit that, we likely would've benefited from being able to genotype all of the individuals by the exome chip that we had sequenced as well, where we were able to validate in about two-thirds of those individuals.

It's hard to know exactly what the cause of the about 10% discordance rate might be, whether it's just in certain samples the genotyping quality was perhaps on the border of being valid or the sequencing quality.

Jane: Right, I'm wondering sort of with the MIP, what's the gold standard? Is the XM chip genotyping still the gold standard and the MIP maybe is more error-prone, or perhaps the other way around? Or is it you can't tell at this point which is the true genotype and which is an error potentially for those discordant ones?

Dr. Khetarpal: Certainly whenever there's a new sequencing methodology that is proposed I think it's critical to have some sort of validation. We happened to cover regions that would span the genome enough that we had XM chip genotyping in a large subset, that that might be the best approach. But if you had a limited number of regions or variance that you were interested in one could imagine also doing Sanger sequencing as the tried and tested validation approach. Of course it becomes not so scalable at a certain point.

Certainly we would say that the MIPs, while the method has been developed and expanded by the Shendure lab, our hope is that through our studies maybe it will be applied further. It's still very much a new approach and so validation is key.

Jane: Very important. What do you think was the most exciting finding that came out of this, after you analyzed the data, what were you most excited about seeing?

Dr. Khetarpal: The critical finding for us, which I think implies the utility of the approach, was just the validation of four of the loci that we had studied. Validation in our cohort of known genome-wide significant associations for HDL that had been published previously in almost 200,000 individuals in terms of sample size, in our experiment involving just about 1,500 people we were able to find consistent associations of those same variants that segregated with low versus high HDL. Directionally consistent with the large genome-wide association studies.

I think the value of this finding is really just to emphasize the utility of the case control design in these phenotypic extremes, in addition to the overarching goal of our study, which was in a way that perhaps provides the most validation of the approach in terms of concordance with prior known studies.

Jane: So if somebody was listening to this and was trying to decide should they use MIP for a study they have in mind, should they use another technique? Based on your experience, what would you recommend?

Dr. Khetarpal: I think in our current stage it's a very exciting time because we're just seeing whole genome sequencing really take off and being used at scale to ask critical questions about non-coding variation as it relates to both disease and complex traits. I don't think we're quite there yet with being able to apply that approach in a cost effective manner. The ability to annotate and analyze that data is still at it's infancy. The utility of the MIPs is that it provides a very cheap alternative.

I can say from my experiences actually doing the capture and preparation from sample to sequencer stage that it's a very easy to use methodology that is very fast and cheap. That if one is really interested in a handful, or more than a handful, of candidate genes and their non-coding regions as it relates to a trait or disease of interest, it may not be the era for going full on with whole genome sequencing, especially at the current cost. That's where I think the MIPs really come in to be very useful.

Jane: It sounds great, is there anything else that you'd like to mention?

Dr. Khetarpal: Just to say that we recognize it's a relatively small study as our pioneer approach with this method but that the Rader lab and Voight labs are actively pursuing larger applications of this to study, not only HDL, but other complex traits, such as diabetes, in much larger populations. I can't overemphasize how easy of a method it is to apply, but also that I think a bigger take home of this study for me as a very recent graduate student working in a very collaborative institution the ability of two laboratories to come together with different sets of expertise to try to tackle a problem that I think goes beyond the individual science. For any human geneticist how to find the variation you're interested in and not break the bank is kind of at the core of what we do, and so I think it was very fun to be part of this collaboration and our hope is that the outcome of it is a method that can be useful for many people, both in our field and beyond.

Jane: I think it's great and I'm hoping this will inspire a lot of other people to try this method and see if it can work for them. So, congratulations on the study, it's really nice work.

Dr. Khetarpal: Thank you so much!

Jane: That's all I have for you for July, thanks for listening. Send me your thoughts on the podcast via Twitter or email, or leave us a review in Itunes. I look forward to talking to you next month.