The role of computational biology in drug discovery: an inside perspective

Dr. David DeGraaf - The role of computational biology in drug discovery

Dr. David DeGraaf

We spoke to Dr. David De Graaf, CEO of biotech company, Abcuro, on the issue of data flow for pharma companies, as well as the different ways that he can see computational tools making a big difference in drug discovery.

Let’s start by talking about what you’re doing at Abcuro?

We’re focused on developing two clinical leads – one targeting an autoimmune muscle-wasting disease known as inclusion body myositis, and the other targeting cancer cells. In both cases, the targets are cytotoxic T cells that express a receptor known as KLRG1. For inclusion body myositis, we’re developing an anti-KLRG1 antibody that can selectively deplete cytotoxic T cells present in muscle tissue, effectively removing the source of immune attack. Whereas for cancer we want to turn on these cytotoxic T cells and direct them towards the tumor – which we’re doing using a different antibody. It’s challenging to extract value from cancer programs because many times, until you’re in phase 2, there is no clear proof of efficacy.

And how did you identify that KLRG1 target?

That’s a great question because it relates right back to bioinformatics. The founder of Abcuro is Dr. Steven Greenberg, who is a neurologist. He was being referred patients with inclusion body myositis, but he was becoming very frustrated, because there was absolutely nothing that could be done for them. So, he decided to change that. After taking time out to study bioinformatics at MIT, he came back and set up a small lab. He took muscle biopsies from these patients, cut out the invading T cells that were eating away at the muscle fiber, and used bioinformatics to find a marker in those cells that was selective for those T cells and spared the rest of the immune system. The target that he found was KLRG1, which is the one we’re using today to deplete these cytotoxic T cells.

Are you still doing computational biology today?

No, but we’d like to! The thing is, for a small biotech company like ours, drug discovery is not a big driver, because that’s not where the business value is. The reality is that drug pipelines don’t pay the bills, successful drugs do. And so, companies like ours must be selective, we have to focus on some very specific questions, and it doesn’t matter quite so much how long it takes to get answers. The important thing is that you work out exactly what’s going on. There’s no grand hypothesis in our discovery efforts which would be amenable to large-scale computational studies.

What do you think about companies offering specialist computational biology tools? Do they have a viable business model?

It’s very difficult. The problem with selling computational tools is that your business often becomes about the tool, but that’s not really what the customer cares about, is it? Instead, it should be about what you do with it and how you make a difference.

When I was at Selventa, I remember a trip that we made to a big pharma company, and they looked at the computational tool we’d created, and they said: “Hmm, it’s a great tool, it’s a very efficient and interesting way of analyzing data. But, to validate it, we want you to go through the data set that we’ve already analyzed”. 

Now this sort of situation nearly always ends in failure, because there are essentially two outcomes. The first one is you find exactly what they already found, and then they say: “that’s very impressive, you did it in three days rather than three months but, in reality, that difference in timescale doesn’t matter much to us – so no thanks”. Or even worse is if you find something new, and then they go on the defensive and say: “we’ve got great people here, are you telling us they’ve been doing it all wrong?”. So, either way, it becomes really hard to show the added value of what you’re providing.

In your opinion, what are the benefits of commercializing computational biology tools?

There are plenty. Top of the list is that thanks to computational biology, we’ve got a lot better at defining diseases in terms of what’s going on at a molecular level. And that’s important because patients don’t come in the clinic and complain that their PI3 kinase hurts. We need to continue to do that, and make connections through the whole value chain, from identifying symptoms, to understanding the molecular basis of disease, through to drug discovery. Now, that’s a grand aim, so you’d need to carve out something narrower there, but there are plenty of opportunities. For example, we don’t know why patients don’t always benefit as expected from gene therapy. And we don’t really know why people get autoimmune diseases, with a case-in-point being long Covid. There’s a lot to be gained if you’re prepared to focus on such questions. 

Are there any other ways you see computational tools making a big difference?

When I was in big pharma, we often did the exact same assay 15 or 20 times, because it was always much easier to regenerate the data than it was to find the old data and assess the conditions used. That remains a huge inefficiency. I think that computational tools that annotate relevant data, and allow you to search across it, could really pay off.

Another opportunity is being able to take externally curated content and understand it in the context of your own experiments. Then there’s the concept of making a link between patients with a similar molecular phenotype, rather than talking about them purely as a sort of observational phenotype.

And finally, being able to extract value from studies that haven’t worked has massive potential. People don’t think about the fact that 99% of the money in the pharma industry is spent on things that end up going wrong. But I don’t believe it’s right to simply forget about that work – we should bring it together and get some insights from it. I think all these things have the potential to be huge drivers in accelerating drug discovery and development.

You implied an issue with data flow in your previous role, can you explain more about that and what it means for those wanting to commercialize computational biology tools?

Data flow is absolutely an issue, and I’ll give you another example from my time at Selventa. We’d worked with a big pharma company to analyze gene expression data from a couple of their clinical trials to help them decide whether to move particular candidates forward. At the time, they had a poorly defined mechanism of action, and didn’t understand why certain patients responded and others didn’t. Anyway, we crunched the numbers, we figured it out, it was a really nice, productive collaboration, and the CEO invited us over for a meeting, and we thought: “great, we’re in there, these guys want to do a deal with us”.

But it didn’t turn out like that. They took us out the night before, then the next day we talked business, and it all fell to pieces. We had this excruciating 45-minute conversation with the CEO, who spent the entire time apologizing, explaining that because we’d analyzed all their genetic and genomic data, they were going to be busy for the next two or three years dealing with it all. Although they wanted to retain a relationship, they weren’t going to have any new data for us. 

My conclusion is that it’s very difficult to sell a specialist computational tool to a pharma company unless they’ve got a continuous flow of data to feed into it – and not many do. To make it worth their while to construct that information infrastructure, they need to be able to make full use of it in the long-term.

It’s interesting you say that, especially as Paradigm4 has focused on translational medicine, where the data flow is huge and continuous. Proteomics is also an emerging gold mine of data! How would you say those challenges you mention can be tackled?

One approach is to refine your core pitch and make it something that pharma companies can’t do. The problem is that often, the one thing that they can’t do is to use these specialized tools. You end up building a service organization on top of your software offering, and those don’t scale particularly well. It’s very, very hard to extract value from them.

Another way is to provide your own content – for example, to curate and annotate all the data that’s in the public domain and sell it. Now, there may be a continuous need to have that data but, to be frank, the value of that is very small, because it’s not unique to a particular company.

In summary, I think it’s very hard to develop a good business model for these companies. In the research setting, customers will say: “this tool is fantastic, but we haven’t got the data flow to give us good return on investment”, and in the drug development setting, they say: “this tool is still fantastic, but we’re only going to use a little bit of it, so it’s not going to give us return on investment”.

So, it seems you’re saying that these software companies need to have a different business model to be viable in the long term?

Yes, and you can state it in three words. Analyze clinical data. That’s where the money is, and that’s where important decisions are made. It’s true that supporting research using your software platform can be a great way to build confidence. But you’re not going to make money until you’re in the clinic.

Fair enough, but what about the regulatory side?

That’s a good point, and it comes down to the scope of your analysis. Again, thinking about things from the pharma perspective, one of their worries is that if you look at more and more parameters, you’re going to find something that looks wrong. And that has huge impacts for your path through clinical regulation. For example, in a preclinical toxicology model I worked on, we saw induction of a single cytokine out of about 30. That cytokine then needed to be tracked through phase one, phase two, phase three – even though it was completely irrelevant. And it even ended up on the label, even though there’s absolutely zero evidence that there was any issue with it. Now, the way I’d approach that is not by saying we shouldn’t generate the data, but by saying upfront: “here are some very specific things we’ll be looking for”. That way, you eliminate the risk of automatically flagging-up differences that ultimately don’t matter.

David, thank you for your time and for giving us some fascinating perspectives on computational biology. 

An interesting take on many aspects of the industry and one important point we’ve taken away is that many scientists struggle to use computational tools in their current form, often needing support from software developers. At Paradigm4, we work closely with our customers so that scientists can use the computational tools to their advantage, asking complex questions and assessing key biological hypotheses more efficiently and independently. Our platform is what we like to call ‘science-ready.’ We want to help scientists make a difference with their data. For more information on how our technology can help to transform your single-cell data analysis, contact


Dr. David De Graaf is CEO of Abcuro, a clinical-stage biotechnology company that he joined in late 2020, after having had a string of positions in the biotech sector, including CEO of Comet Therapeutic and Syntimmune, and leadership roles in systems and computational biology at Pfizer, AstraZeneca, Boehringer-Ingelheim, Selventa and Apple Tree Partners. He holds a Ph.D. in genetics from the University of Illinois at Chicago.

Abcuro is developing antibody treatments for autoimmune diseases and cancers modulated by cytotoxic T and NK cells, including ABC008 for treatment of the degenerative muscle condition known as inclusion body myositis, and ABC015 for reactivating bodily defenses against tumors. 

Dr. Matt Brauer - How to grow a biomed startup

Dr. Matt Brauer

Biobanks offer a rich source of data for biomed startups, but the route to deriving maximum benefit from them is not always straightforward. We talk to Matt Brauer, Vice President of Data Science at Maze Therapeutics, about his experience with the UK Biobank, Finngen and other consortia, and the role that making data-management processes open-source has had in delivering business success.
Dr. Lygia Pereira - Thirty years in genetics

Dr. Lygia Pereira

The course of academic research rarely runs smoothly, and through her 30-year career in genetics, Professor Lygia V. Pereira has certainly seen plenty of challenges – but also lots of successes. We talk to her about the importance of genetic diversity to better serve the health needs of countries like Brazil but also to improve our understanding of disease and health across all populations, and why we need to make bioinformatics tools truly user-friendly.
Dr. Ahmed Hamed - Network science

Dr. Ahmed Hamed

In the latest interview in our cutting-edge series, Ahmed Abdeen Hamed, Assistant Professor of Data Science and Artificial Intelligence at Norwich University, explains the power of network sciences to solve intractable data analysis problems and why this approach is proving valuable in the medical sciences.