213: Repeating Tests

Brian:

Let's talk about repeating tests. I said, let's talk about repeating tests. I couldn't help myself. So today, we're gonna talk about repeating tests and why we might wanna do that and how to use plugins to repeat the tests. If a test fails in the test suite, I'm gonna wanna rerun it.

Brian:

Why am I gonna wanna rerun it? Because I'm actually hoping that the test fails right away when I rerun it just by itself. If it does fail by itself, then I have a test that I can verify the fix with. Or I have a test that might have a problem in the test itself, and I can fix it right there. It's isolated.

Brian:

If it doesn't fail in isolation, well, it I might have a couple options. I might have a flaky test. The test itself might fail sometimes, and that might be somehow because of the test. I'm not sure. Or I might have a flaky feature that I'm testing, in which case, I'm gonna have to debug both either one of them, and I have to debug it.

Brian:

It also might be that I have a dependency in my tests. But, hopefully, that's not the case because I'm using randomization to help with that. See episode 211 about removing test dependencies. So let's say it's not a test dependency. It's just maybe a flaky test or a repeatable failing test.

Brian:

Repeatable failing test, and I'm done. But if it's a sometimes passes, sometimes fails, then I might wanna repeat it for a while. So there are a few things I might want to do, to repeat a test or a section of a test suite for, for debugging. So first of all, it'd be neat if the test suite would automatically rerun the failed test. So when when the test suite is run, if it hits a failure, maybe just retry that to see if maybe a second time it passes.

Brian:

But what do I do if it does pass a second time or if it fails twice in a row and then passes on the 3rd time or whatever? Well, I still wanna know about it. It's still a test failure, but it's, but I do wanna but it would help me. I wouldn't have to, like, you know, just rerun it again to see if it passes. I would already know that that possibly there's flakiness.

Brian:

If it doesn't fail a second time or if it doesn't fail on the 3rd time and that flakiness is there, I might wanna take that test in isolation and run it, like, a bunch of times. Like, run it maybe 5 times in a row, maybe for half an hour, see how many times it fails. So there's a there's a reason to repeat there. I also might have isolated maybe I do have a dependency in my test suite, but I've isolated the, the section to a small section or part of the test suite. I might wanna repeat just that for a while and see if that if that's repeatable.

Brian:

In any case, repeating is kind of something that's important to do. We're gonna take a look at a handful of plugins for, to be exact that help us with this. There's 3 repeating fit tests, and then there's one that helps us with looking at output. So what are we gonna look at? We're gonna look at pytestrepeat and pytestrerun failures and pytestflake finder.

Brian:

And then associated with a lot of this, a plugin that I find helpful that has nothing to do with repeating called pytest-instafail. So let's take a look at each of these in a row. Let's start with pytest rerun failures. With pytest rerun failures, we can say, you know, hey, let's do a rerun of 3, which means anytime it hits a failure, it'll rerun it again. And if it passes, it's fine.

Brian:

But it if it fails the second time, it'll rerun it again. So the rerun reruns is the number maximum number of reruns. And, actually, I always forget. If I say rerun 3, does it mean it the first failure and then for 3 reruns, it's 4 altogether? And I don't I don't remember.

Brian:

I I should have looked it up before recording this, but in any case, you can repeat it if it fails and it stops after it passes. This is really interesting and helpful information. But with the flag, it's just gonna do it on any failure. But it might be just a particular test that is, exhibiting the flakiness. In which case, you can, instead of repeating on everything, every failure, you can put a marker on it.

Brian:

So there's a marker that's included. It's like, pie test mark flaky, I think. And and you can just attach that right to that test to say, hey. This one is the one that I want to rerun if it fails. It's super handy.

Brian:

And it's something that you can turn on either for everything or for an individual test and have it be running in your test suite. So you can see it over the course of maybe a couple days or something to see how it's reacting. Super helpful. But you have to make sure to take a look at the end. So let's say we have a rerun of like 10 or something like that.

Brian:

And eventually, the test passes. That doesn't it's it's it doesn't mean that it really is passing. So you really need to be looking at that output. Luckily, it does send some output to the to the, summary to say, hey. How many re how many reruns we did on different tests?

Brian:

So it's very helpful output, but you have to look at the output. The next plugin I wanna take a look at is pytest -repeat, and it is actually one that I have started to help with, the maintenance of. I'm not the sole maintainer, but, I'm helping with it. So pytest repeat. What does it do?

Brian:

Well, you can pass it a you got a flag of, like, count 5, for instance. So there's a there's a few flags you can pass in. But let's say you say count 5. What it's gonna do is for every single test, it's gonna repeat it 5 times. And it's just like you parameterized it.

Brian:

It's like you said parameterized over an integer of, like, of a range of 5 or something, for each test. And it and then it just actually adds to the, the test node name with a bracket, like, 105, 205, things like that. And it it runs each and every test that many times, whether it passed or failed. It just re repeats it. So, let's say I've got this back to my scenario.

Brian:

I have my test failure. I might wanna just see okay. I'm gonna rerun this test, like, 10 times or 50 times, and just see how many times it passes and fails just zoomed in on one test or maybe a module. Super helpful to be able to to just run it a bunch of times. It also might be that I want to just sort of beat up on my, a section of the my software first for a while.

Brian:

It's a quick way to just repeat something quickly. One of the things I love about pyjest repeat is that it's got the scoping system. So by default, it's per function. So every test function just repeats it, like, 5 times. But you can set up the scoping to be wider.

Brian:

You can have it be class or module or session. So you can even just say, session scope. So that it just actually goes through the entire suite once. If you have it on session scope, it'll do, like, go through the entire suite once and then repeat the whole thing again, and then repeat the whole thing again. You're still getting, like, these parameterization counts in individual test node names, but you'll it it just does the whole order.

Brian:

I don't I don't I don't think I've used it much on a session level, but I often do it on a module level. I can say, Hey, for each module, just run all the tests there and then repeat them. I often do that because I've, I've got fixtures set up on a module basis, and it just makes sense like that. Now let's say I've discovered the flaky test or the flaky implementation. I know, I think I've got it fixed maybe, and I just wanna run it, like, beat it up for a really long time to make sure it just doesn't come back.

Brian:

And I might wanna run it for, like, an hour depending on the software. Run it for an hour, maybe half an hour or 10. For you, maybe if you've got little fast fast software and fast tests, maybe like a minute is long enough. But I want to be able to just run it for a while. That's where I pull in, Pytest Flakefinder.

Brian:

So one of the cool things about Flakefinder is it does the repeat thing. It does, like, you know, 1 through 10 sort of thing. Actually, the date the default is 50. So the default, it just runs every single test test function 50 times. You can change that with a flag or with a setting.

Brian:

But then, you can even change it on a test basis. You can put a marker on it on individual tests, which is very handy. But you can also just say, hey. For this, I just want I usually use it to zoom in on one test that's flaky or a module that's, of tests that are that are exhibiting the flaky behavior. And I just, like, run it for a long time, like, maybe an hour, maybe a couple hours.

Brian:

I've I'm working with hardware systems, so I don't have sub second test suites. So, even overnight maybe to to run it for a while. But I wanna make make sure that it doesn't run too long. Like, I don't, I I'm picking a number like a 1,000 test runs, but I don't know, exactly when, when it's gonna, how long that's gonna take. And if I come back to look at it, I want it to stop.

Brian:

So the cool thing about flight finder is it has a timeout feature. You give it a maximum number of minutes, and the test automatically, turns off. Even if it's not done with all the repetitions, it turns off after say, like, if you give it 10 minutes, it'll turn off after 10 minutes. Super handy to be able to really abuse something and, and, and see like how flaky it is. You'll be able to see, like, if I have one test, I'll see that it, oh, it ran like 30 times and like, 10 of them are failures.

Brian:

So every 3rd test is a failure approximately. It tells me kind of how flaky things are. And, and I don't know what to do with that metric other than, hopefully like, if it's, like, say, 1 out of a 100, is that good? Is that bad? You know, I don't want to, I don't want any failures, but we don't want flakiness.

Brian:

But FlakeFinder is really cool to be able to work with this. Okay. The those are my 3 repeating plugins. But, the now I wanna talk about instafail. So I've got these test runs that are going for a really long time.

Brian:

And if they're if I'm watching it I shouldn't be watching it. But maybe I have it on this on a window. I'm working on something else. I've got some tests running. I've got it on a separate separate screen or separate monitor, or even just in another window.

Brian:

And I wanna, like, watch it, you know, kinda out of the corner of my eye, check on it once in a while. If I'm doing that, instafail is really great. Because what instafail does is it lets us see the trace back output, like, as they happen. So normally, pytest will just run all your tests and collect all the the, the output and the trace backs and the failures, and then print it at the end of the test suite after it's done with everything. And this just is a is it's for especially for short suites, it's, really kind of nice to have those all clumped at the bottom so you can see them.

Brian:

But, but if we if we're just running one test, like, a whole bunch of times, like for an hour, it's really helpful to be able to see the failures as they happen. So by test, instafail, lets you see the failures as they happen. Super cool. So and now that I'm thinking about this, this this time out feature of Flake Finder, it's I was just looking at the implementation. It's not that terrible.

Brian:

And I think maybe it'd be cool to add that to pytest repeat. Pytest repeat doesn't have a time time out, but but that might be kinda nice to add to that. That'd be cool. So those are the plug ins that I wanted to, to let you know about and make sure that you take a look look at these if you're trying to repeat some test failure or some flakiness for over a while or just wanna run something for a really long time. So what are those again?

Brian:

There's pytest repeat, pytest rerun failures, pytest-flakefinder, and pytestinstafail. I'll have, links in the show notes, so so you can just check those out at podcast.pythontest.com. Thanks so much for listening.

Creators and Guests

Brian Okken
Host
Brian Okken
Software Engineer, also on Python Bytes and Python People podcasts
213: Repeating Tests
Broadcast by