Diffblue Spryte AI Spotlight Company To Watch Award

Diffblue received the Spryte Spotlight Company to watch award for their groundbreaking product - the only fully-autonomous AI-powered Java and Kotlin unit test writing solution, generates reliable unit tests at scale, locally and in CI environments.

About Spryte Spotlight

Spryte Spotlight awards and highlights the very best CTOs, tech leads, and technical executives in various industries. Spotlight leaders are at the forefront of digital transformation for their organizations and society at large.

sprytelabs.com/spotlight

Related Articles

Highlights

quote

What we wanted to do is to use technology to make testing much easier, in particular for developers that spend a lot of time writing tests

Peter Schrammel

About

Get to know Spotlight leaders. In our interviews we delve deep into how they think and what they’re building.

How Diffblue makes unit testing easier for developers

Testability is the result of good architectural practices

Diffblue's secret sauce: reinforcement learning

View the full interview

companies-to-watch

Spryte AI Spotlight: Diffblue CTO Peter Schrammel

Okay. All right. Well, welcome, Peter. Peter's CTO at Diffblue. CTO and founder, is that right?

I'm the co-founder as well, yeah. Co-founder.

Okay. So welcome, Peter. And let me start off with a very broad AI question. I think a lot of people are on the fence about AI right now. A lot of people are scared. Do you think AI is the end of humanity or is it sort of the dawn of a new age for us? in a good way? Where are you on the spectrum?

Yeah, for me, AI is a technological tool. Like, humanity has developed many of these tools since the beginning, like knives and steam engines and so on. So yeah, the use of these tools will necessarily lead to problems. They can be misused, but they will also bring about many good things.

So if you're thinking of it as just a technological tool, I would tend to agree with you. Is it on the scale of knives and guns, or is it on the nuclear bomb scale?

I would rather say it's on the nuclear energy scale. It is good, but of course you can also misuse it or it can lead to fatal catastrophes.

Got it. Okay. So it's a new tool, but it's still a pretty big deal. What are you feeling right now? What's the emotion that you get when you're dealing with AI? Is it excitement? Is it fear? Is it both? What are you feeling?

It's both here I said every tool can be used for good as well as for bad things, so how um you know when did you actually get into you know at Diffblue you guys are you know you're dealing directly with code and you're using ai to help people code or maintain code when did you actually get started what what sort of give me give me the origin story of DiffBlue. So we started almost eight years ago. And we started from my background in software verification with the goal to make sure that software does what it is supposed to do. And testing is obviously one method for doing that. So what we wanted to do is to use technology to make testing much easier, in particular for developers that spend a lot of time writing tests. And this is something that can be automated by putting some intelligence in tools. This is how we started out. Our tagline of the company was pretty soon, or AI for code, because this is what we provide as a company.

So what, you know, can you remember the first time? I think, you know, most people over the last year really got shocked into having to learn about AI through ChatGPT. You've been in this a lot longer than that. Is that when is the first moment when AI really surprised you, when you got shocked with the interaction you had with an AI system?

I think what the public perceived at the bend of the eye pretty much is a revolution. But in reality, the development has been much more incremental over many decades. What was really revolutionary, in my opinion, was how The new interface that OpenAI came up with was making it easier for everyone to interact with it. And this has dramatically increased the visibility of the state of the art where we actually are in AI research. And I would say the power of the user interface, this was really revolutionary.

Got it. So, and I think then maybe in your particular field, you know, did you get a similar feeling from Copilot or from GitHub's Copilot or not really?

Much less, because you, as a developer, have certain expectations. Yeah. And, you see it can sometimes do great things, but sometimes it just spits out garbage and you just discard it and move on. So it's a mixed deck. Sometimes it helps, sometimes it doesn't.

Yes, I think the validation is a big problem. Hallucination or the quality of the data is a big problem. Maybe not so much for the general public. When I talk to people about ChatGPT, my personal interaction with it is... If you want something general, it seems to be okay. But as soon as you dig into the details, you're bound to find something wrong pretty much everywhere. So in your particular case scenario, that's a big issue, right? How do you guys deal with that challenge?

Yeah, so our system has... has a functionality to actually make sure that we always produce the correct result. So software is a bit special because we have the software artifacts that we can execute. So we have actual access to the ground truth and we can make sure that the outputs that our AI produces are actually correct. Which is very different from many other applications where you yeah, everything is fluffy. You can't really tell what is right or wrong that easily, but the software is much easier. You can execute and you actually see what is going on. You see what the behavior is. And this is what we exploit in our system to make sure that we always give the correct results.

OK, so does that cover Can you take us a little bit through everything that your system does for someone that doesn't know anything about DiffBlue?

Sorry, I didn't get the question.

Could you take us through, for someone that doesn't know much about unit testing or DiffBlue in general, can you take us through a little bit of what your system does for it?

Yes, yeah. So we provide software developers with a tool that automatically writes unit tests. And we do this on two levels. First of all, it's integrated in the CI system of a project. So it runs autonomously on the code base and provides the unit tests. many advantages. First of all, it removes the burden from developers of actually doing anything. Secondly, for the managers, it's great to achieve or enforce a certain standard of unit testing across many code bases that their teams might be involved with and they might be at different levels. And so they can use our tool to get a certain amount of unit test coverage across their entire estate. And this is, of course, a big advantage. In particular, when they have projects that are at very different maturity stages, where legacy projects that have no testing at all and other projects where the developers maybe have a certain culture of not writing tests or being too lazy and so on. So this is really a tool that enables managers to get grips on their code base and states. On the other hand, we have also a plugin that developers can use from their IDE to write tests for the code they're currently working on.

Got it. Okay, so I think, you know, correct me if I'm wrong, you know, my experience is that most developers have a love-hate relationship with tests, right? It's not a love-hate relationship that a person has with tests in general. It's some developers really love it and some developers really hate it. Would I be correct in your experience in qualifying that way?

Yeah, certainly, yes. Yeah, yeah. So, you know, I think then in your case, you're covering the, you know, code bases that are with a group of developers who's not handling unit testing very well. And that covers that automatically. Can you share some insight into sort of how that could even work? Like what's the secret sauce within AI that would enable that to be possible? So the key ingredient is that it needs to work autonomously. So you need to write, it's not like when you use Copilot that you get some suggestion and then it doesn't quite work, you have to fix it up, you have to make it compile and so on. So to really produce code autonomously, it must work out of box without any human intervention. And the system that we, the secret sauce behind that is that we, we use reinforcement learning. So we don't purely rely on some pre-trained models to produce something, but we actually learn as we proceed what the correct code will be. So we use that to optimize simultaneously to achieve the right code coverage, but, and also the, so that the code is idiomatic and looks like a human write them. And at the same time, make the code correct. So we do these things all at once, and the result is that we can produce code that actually compiles and runs and behaves as expected.

So developers can then just submit their code, and then your system is part of the DevOps pipeline, which automatically looks at that code, creates the unit tests, and runs them, and just gets included as part of the build process.

Exactly, yes. Got it.

Does that create 100% coverage? How do you set coverage?

No, this doesn't create 100% coverage. It's always the best effort. It often depends on how the code base is actually written. So it's quite easy to write code that is not really testable, not by a human hand, not let alone by a machine. So there is... So you won't get 100% coverage. It depends a bit on how well your code is written so that you get good coverage. But what we see in our customers is that on really bad projects, we get 30% coverage. On really good written projects, we get easily 80% and so on without any human intervention.

Wow, OK, so that's interesting. So how would you define well-written versus poorly written? What's the standard that developers need to get up to in order to enable for automatic optimum coverage?

Yeah, so testability is usually a question of good architectural practices. So the usual story of high cohesion, low coupling, If a code is written that well, then it's also easy to test. And the same is true for a machine. If it has to write tests for code that is a plate of spaghetti, then this is really hard or almost impossible.

Got it. OK, so it's not really about the syntax or the commenting or the way that the code is written. It's more about the architecture and the fundamental understanding from a computer science perspective.

Exactly, yes. Got it. Okay. Well, that's good.

I think if it pushes people towards applying or thinking more about the architecture from the beginning and you get all of those tests for free, that's a positive outcome, I would say.

Exactly. And approaches like TDD, they aim at the same. You're using TDD because it enables you to produce better code that is better architectured, that has less coupling, and so on.

Yeah. Yeah, except TDD used to be sort of, I think it works really well for a certain type of developer. and somehow it doesn't work well. I think the personality of developers for which TDD seems to have stuck, and I don't know if it's a personality issue or if it's a training and diligence issue. Actually, what do you think about that? What were the main challenges, I guess, before you've made it irrelevant now with automated coverage? what are the main challenges towards building a great TDD team? Is it internal training or is it more than that?

It depends. pretty much on the culture and how you approach a project. So to use TDD, you really need to know all the requirements upfront, which is a bit at odds with certain HR practices where you rather have a feedback process to incrementally discover the requirements and build your process more incrementally. So TDD has certain problems in this respect. But what we generally recommend is that you write the tests for the basic requirements that you know upfront using a TDD approach. And this is a very good practice because these tests guide you during the implementation. And then there are many, many corner cases that you still need to write tests for. And for these cases, automated test writing solution is perfectly suited.

Got it. Are you, you know, is there a hope? Where do you think this technology is going as far as, you know, test automation? Are you planning on going towards integration testing or sort of more complex test cases? Or is there, you know, what else is a possible option for this technology?

Yeah. So we're focusing on unit testing because this is where most people are lacking and there is most to be gained from. Other types of tests are certainly an option. People are usually quite good at writing end-to-end tests because this is what they're most familiar with, although there is a lot to be automated there. There's quite some potential. For us as Deepflow, the current strategy is to stay at unit testing because this is what there is still a lot to be done in helping companies to adapt unit testing, which is necessary also to benefit from continuous integration. Because if you have no unit test to run, then your continuous integration is not very useful because you don't get fast feedback.

Yeah, got it. So what are, you know, are there specific industries where you found that, you know, that you are getting better results or more interest from?

I'd certainly know, you know, you guys are focusing on Java, so I would get, I always get finance. Those guys need a lot of help in maintaining older code bases. Any other industries other than finance, which are sort of responding well to your product? Yeah, finance is certainly one of our sweet spots. There is certain interest from Tilker and Farmer and other companies that have quite substantial Java code bases.

Got it. What was the strategy? Was it a business strategy or a technical strategy to start with Java?

It It was a business strategy because Java is, uh, is by far the largest language for backend systems. And the business decision was to focus on backend systems. So we said, okay, backend system is Java. So let's start with that.

Got it. But there's no sort of, uh, you know, inherent difference between, uh, any, any language that you could think of that wouldn't make it applicable.

There's certainly differences in the languages from a low-level technical perspective.

But as far as applying your technology to different languages, it's just a matter of brute force, one by one, adding the technology? Or are there any particular sort of ?

In principle, yes. There are no fundamental challenges. It's essentially just engineering.

Got it. Got it. Great. So I think, you know, a lot of people are really sort of concerned with AI, right? In the field of robotics, with the use of data, the IP, you know, in art, taking jobs, right? There's all sorts of controversial topics surrounding the use of AI. Is there one that's particular to your field, you know, testing-wise? Is there sort of a controversial topic there that needs to be addressed?

There's certainly the controversial topic around copyright of the training data that still causes quite some anxiety, I would say. So we at Deepflow, we don't have this problem because we... All our training data is very carefully curated, but other companies that use models where the training data is, yeah, is less creative have that problem to um argue that yeah there are no ip implications if you use this tool and generate code with it and so on so there is so customers are very um interested in these questions and want to know that they're on the safe side

So I guess the main question that you're hinting at is customers asking whether you're going to use their code base to train and then apply it on other clients' code bases. Is that the right assumption?

This is one way that is, of course, for many customers a no-go because they don't want any of their data to leave the premises. But also the other way around, if you train on some GPL code, and then that gets infiltrated in your code base, then of course this is undesirable.

So how do you guys resolve that? Do you start from scratch on each customer's model, a strain from their code base to begin with, or does it start with a base model that you're?

So we start with a base model that is fully under our control, and then we use reinforcement learning on the customer code. to learn from the customer.

So how does, how long, what is that onboarding procedure then? Is that sort of a, you know, how long does it take to learn? I guess it would depend on the size of your code base, but in your experience and, you know, the case studies that you have with customers, what's sort of the onboarding time for a product like yours into a new customer code base?

It's usually a matter of a few weeks I would say depends of course on the number of projects that they want to onboard but usually for yeah full integration into CI and training of the developers can usually be done in two to three weeks. This is mostly DevOps setup time and training. The computer itself doesn't take weeks.

Yeah, considering what you get out of it versus having to put on a QA team to do that or training your devs to do that, that's a no-brainer in my mind. What's the biggest, other than the data and IP pushback, what's some of the sort of questions that most people have regarding how it works or what the limitations are?

Of course, the level of coverage is always a question. But most customers understand the enormous benefit that they get from the tool. So any increase by 20, 30% is massive. Because it will take the developers years or decades to do this work manually. So they... So this is relatively easy to understand that you won't get 100% coverage, but even if you just get a few percent, you have saved a lot of money and time. The questions around data security IP protection is very important. So nothing, no code should leave the premises. And all the code that is generated by the tool needs to be owned by the company, obviously, by the customer.

So I guess because of the industries that you're working with, finance, health care, insurance, all of these highly regulated guys, those are the questions that they usually have. So your solution is on-prem?

You guys are installing in their cloud systems or on their hardware? Yes. So our solution is on-prem. So you can install it. Usually it is installed in the CI system, so it's extremely lightweight because we use a small model and reinforcement learning, so it runs within a CI worker. It doesn't need a lot of compute. It doesn't call out to some cloud systems. It's quite easy to deploy.

Got it. Um, and, and any plans for, I mean, I can totally see this. I definitely understand why it would make sense for a large enterprise, but I also, you know, as a developer myself, you know, any, any hope of getting a. A single user, you know, license and a SAS type system for, for developers who want to use it on their own code, or is that in the plans for you guys?

Yeah. So we have a, um, a community edition. which provides only the plugin that you can use in your ID right now. So you can just download it now if you want. And that one runs locally on your laptop. So it's lightweight enough to just run on your laptop. We currently don't have an individual license. We are working on that because we have lots of users of the Community Edition who are not part of large enterprises who can afford an enterprise license, but still want to use it for their projects on a continuous basis without being restricted by the restrictions of the Community Edition. So we are working on that, yeah. Right

And so it seems to me like there are, I don't know, in the US, maybe 100,000 companies that could really use this software right now. It seems like there's probably way too many companies out there that could just plug this in right away. and would get an immediate benefit? What strategy are you guys using to get at these people? Are you focusing really right now on a certain finance, health care, certain industries? Or is it really open for anybody who wants to use it to come in and use?

So the community edition is open for anybody to use. And this is one of our main entry points for new customers. Usually some developers download the tool, try it out in the ID, and then approach the managers to buy a license. This is one of the ways, which works quite successfully. Of course, we have... also people who approach us directly from the higher manager levels and say, yeah, we're interested in the solution, we want to do a proof of concept, and so on. And then we start with a team, and then they expand to more teams within these accounts. This is our usual approach. But we are not really restricting it to to particular verticals it's more that there are certain problems that people want to solve like they want to re-architect code bases and to realize they don't have enough tests and then they're looking for these kind of solutions and yeah these are the kind of customers we're looking for

Got it. Okay. So shout out. So I think anybody who's experienced, anybody who's led a tech team and has experience trying to get tests out of their developers, I think it's a good idea that they head over your page and try it out.

Yeah. Thank you so much for your time. I really appreciate it. It's great to learn. I think you've got sort of an application and a you know, a strategy, the strategy of sort of making it hands-off, right? I think that's a unique thing that we've seen about DiffBlue. Out of all the AI companies that are working right now, most are sort of still at the stage where they require human input and checking, right? And I think the application of just being able to have an AI system that does things for you automatically is a pure benefit, right? It's something that's very different from many AI systems that are out there. So I think it's super interesting. And so any, you know, any upcoming announcements or, you know, what are you looking for in 2024? You know, anything that people should know about DiffBlue? What are you guys sort of working on that's next level? So, we're exploring integration with more systems, such as CI providers like GitHub Actions, GitLab, or Jenkins, which are quite popular. Our aim is to simplify the deployment of our solution within these systems. Additionally, we're looking into expanding compatibility with more IDEs and similar tools. Naturally, we're continuously striving to offer our users more comprehensive and effective tests.

Browse Articles

companies-to-watch
Gupshup CTO Krishna Tammana Awarded Spryte AI Spotlight Award

Krishna Tammana was awarded the Spryte Spotlight Company to Watch award for his outstanding contribution and groundbreaking product Gupshup which builds AI-powered chatbots that truly engage customers and help organisations level up their customer service game.