What is A/B testing and how you can use it to grow your MARKETING and SALES conversions.
SPEAKERS
Nils Koppelmann (76%), Dave Erickson (16%), Botond Seres (8%)
Dave Erickson 0:03
So many choices. can't choose which one will 10x your company? Time to see if AB testing can help you grow your business. The key to successful marketing and sales initiatives is to use an iterative approach and test everything you do to find out what will get the best results. We're going to talk about whether A or B will work best on this ScreamingBox Podcast. Please like our podcast and subscribe to our channel to get notified when the next podcast is released.
Making the wrong decision can cost you 1000s. But can AB testing make you millions? Welcome to the ScreamingBox technology and business rundown podcast. In this one's podcast. My co-host Botond Seres and I, Dave Erickson, are going to see if A is better than B by talking to Nils Koppelmann, founder of 3Tech. Nils has a background in software engineering and design and studied computer science in Berlin. He focuses on helping mission driven companies grow online and enables them to make better decisions. In this podcast, we are going to test the limits of AB testing and find out if it can help your business grow. So Nils anything you'd like to add.
Nils Koppelmann 1:37
It's just perfect. I'm surprised how well stitched together it is.
Dave Erickson 1:42
Oh thanks. So everyone kind of has their own understanding of what they think AB testing is. Maybe you can kind of give us your definition or your concept of what you think AB testing is.
Nils Koppelmann 1:56
Yeah, sure. So I mean, depending on the background people come from, AB is, oftentimes named in the same sentence as CRO conversion rate optimization. This is also where I had my first touch points with AB testing. But if we look at AB testing, the essence is really a tool to test two variations of something, that could be a website or an email or something, and see which one of those performs better on a given metric. And basically, this is the essence of AB testing and there's so many things that you can go into with it to to make all sorts of decisions might that could be like, as I said, a change in an email headline or a body or what we do primarily is for for ecommerce, where we are ecommerce and SaaS where we help them optimize their, their, their website, right their, their landing page and all these kinds of things. The essence is reducing risk by just, by just rolling out changes without knowing the outcome and that's probably my, my first take on it. Let's see where we go in this podcast.
Botond Seres 3:09
Well, Nils, anything that has to do with statistics in general and I suppose AB testing has a lot to do with statistics. We can't help but wonder like, in your opinion, what is the ideal sample size? Or what is the minimum viable sample size for these tests?
Nils Koppelmann 3:28
So I wish I could give the simplest answer and tell you three or 10 or 100,000. But the reality is, it really depends on the kind of on, the kind of change you want to measure. So what I usually recommend doing is actually go into a calculator like the one you can find on CXL or something, right, there's like AB testing calculator, or sample size calculators that we also use in as part of our what we call pre test analysis, to basically find out how long a test has to run, because this is the other thing that is usually relevant, right? There's the sample size is basically how many people do have, have to be in, in each variant. This is the control and the variation. So the, the version that has no change and the version that hasn't changed. And now the website, or a website usually doesn't have just a sample size, it has traffic. So let's say we have 100,000 visitors a month and if I say on one page, so let's see how many people we actually need per variant. So if we take an AB test, where we have split it by in half, so 50% C one 50% C the other then we only have 50 Right 50,000 visitors and these calculators basically give you an understanding of after what time can you measure what what impact on the site and thereby find out the sample size, if I was to look at definitions by Ronick Harvey, for example, only a very, very high traffic websites are able to properly test changes. Unfortunately, I can't give you the perfect number. But usually it's in the high 10s and hundreds of 1000s.
Dave Erickson 5:22
But if you had a website, and it was getting traffic, you know, 1000 visitors a month or, you know, 10,000 visitors a month, and you know, the objective of the business is, I want to make it so that I convert better with the people hitting my website convert better. So they may take a look at what are the calls to action, or the things that, that are used to convert traffic. So if you are doing AB testing, you know, you would take a component like I guess, the first call to action and you would try one call to action, and then you try another call to action. How is that actually done? Like, are you doing it on the website? Or do you set up a separate landing page and test it that way? What are kind of, the mechanics of setting up something to do an AB test on some kind of component of a website?
Nils Koppelmann 6:17
Sure. So I'll answer that independent of the sample size of the traffic. But the basic idea is of AB that we somehow split people into two buckets, actually, the process itself is called bucketing. So there's various ways of doing this, there is one the service side way of doing it. This is where this computation of like, dropping people in these buckets on some criteria based on some criteria is done on the service side. And then probably the more popular way and the one that's more attainable for most companies, especially those starting out with AB, in marketing, for example, is the client site. And what basically happens there is there's a snippet that you embed on your site and that snippet sets a cookie based on some randomization that happens, and says, If this cookie exists, then show people this variant. And I'll explain in a second how this shown this variant works. And if it doesn't, or if the cookie says you're in control, then it shows you just doesn't show any change. Showing this change can also be various kinds of ways. So if you look at how most client side tools work, is,they work with JavaScript. So they basically manipulate in real time, the site you're having. So if we do the, the very generic and probably butchered example of AB testing button color changes, then you would have, for example, normally a blue button, and you want to, want to find out if a red button, and I already know everybody's gonna hate me for this example, but you want to know if the red button work, performs better. So you would ingest some JavaScript that 50% of the people will get ingested on the site. And that would just, for example, make a CSS change to that button, just changing the color. So that's one way of doing it. With more complex tests, where you, it's not just a change, or like a color change or adding a word. But if you, for example, will want to add more complexity. Another way would be just implementing, already, that change on the site, hiding it and then if people are in the variation, just doing the switch for them. So basically, hiding the control and showing the variation. Sometimes we call this also a hybrid approach, because it's, but it's basically like, still, like showing the, the variation to the clients. Something you have to remember, though, is on the client side, you might run into something called flickering effects. So if you can imagine some sites take a bit to load, and then also the snippet this JavaScript snippet has to load. So there is timing in between and especially if those changes happen above the fold. So basically, the thing you directly see when opening the page, there might be some flickers like some you might first see the control and then a millisecond later, you might see the variation. And if I look across all the tests where this might have happened, this can also, yeah, play with the results a bit in a negative way, or impact the results. This is basically if you do a lot of these things and or want to test more complex, more complex things. What I usually recommend if the client infrastructure allows it is going for server side testing.This is basically where this dicing or the bucketing is not only not done on the client side, but where it's basically also rolling out the changes on server side. So this also allows for a far wider variety of tests that can be run. For example, you could test various different kinds of checkouts, or, or say, people see different shipping options and stuff like that; stuff that's way more complex, and maybe just showing them some more information on a given product or on a category page or what it is
Botond Seres 10:37
Perhaps showing one side, or one group, free shipping, but with an increased price. And for the other group, we can show a lower price, but, which you can exclude it, for sure. I mean, but, I really like your example.
Nils Koppelmann 10:52
Listen, the shipping thing is actually very interesting, because it's like, depending on where you are, you have to also be a bit cautious of, like, legal issues there. But, you sometimes in some regions, you have to be very careful about price testing, because it is a form of price testing in a way. But it's very interesting to actually gauge. Do people, are people willing to pay for shipping or do they kind of expect it to be free? For example, in the Netherlands, free shipping is a norm. If you have to pay for shipping in the Netherlands, people will oftentimes just not purchase from you. So this, these kinds of things and maybe we can also go into this as like, just because you get a result in one country doesn't mean necessarily, that you will get the same result everywhere. And if you think about this, then the same applies. Just because we learned something with one company on one side doesn't necessarily mean that will work, for every, for every other shop, or even for shops in the similar niche. And this is a, really like I'm pushing for this a bit in this conversation, but it's super crucial, because this is basically why we're testing, right? Because we just can't take best practices that worked for, for, for someone else, and just assume that it works for the current project as well. And this is basically where we go from just opinionated, like ideas or just from, from opinions about whether or not something worked or even empirical studies. But we actually use tests to verify whether or not something works. And I think this, if I had to, if I had to reiterate on my initial take on AB, this is what it's really about, like making sure that we can actually verify our assumptions.
Botond Seres 12:54
Bucketing, I mean, let’s just say that it's not as easy as just doing one test. What I wonder is, what is a good strategy to, to actually decide which user goes into which buckets because I mean, country, as you said, is pretty good grouping. Like, we can say that, okay, clearly, in the Netherlands, they prefer the product with free shipping. But let's say in Hungary, they prefer it with, without free shipping. So I do actually feel like that is the case in some countries, because there are certain cultures where free shipping is regarded as a scam, basically. Right? And, but I'm sure they're much, more metrics like age, perhaps ethnicity, or…
Nils Koppelmann 13:47
I would, I would frame it a slightly different way, because what we're talking about is not so much bucketing, but really, bucketing is just the process of basically just equally distributing 100% or like a certain amount of traffic across these various variations. What we're talking about are the examples you mentioned is basically segmentation. So and, segmentation, before the test is run, right? Or at least to segment the traffic because you might have an international website but only want to run this test for say, in my case, the German audience super small, at least compared to the US super small audience, right. And yeah, for sure their geolocation would be, would be a good way of doing that. But I'm a bit torn on using segmentation too much. If the data you're segmenting on is only halfway realized or like only somewhat reliable, if we say age, then how do we actually know that If people are of a certain age, it's, it's super hard to segment that way. What you can do though, if you have actual user data.
Botond Seres 15:08
We know for a fact that everyone's overeating on the internet.
Nils Koppelmann 15:13
Is that, it's okay. Ok, but like,
Botond Seres 15:16
Isn't this what everyone puts in? When, when there's like a form, like when we were born, but actually more than eighteen years ago?
Nils Koppelmann 15:23
For sure. I mean, this is, this is also not all the data, this proves a point, not all the data you get is reliable data. Right? Not people will tell you different things than what's actually true. So
Dave Erickson 15:39
I never put my real age, as far as everyone’s concerned, I’m always 18.
Botond Seres 15:45
I mean, if everyone is concerned, I was born in 1969, the fourth month and 20th day.
Nils Koppelmann 15:55
Actually, like, I'm always putting my real age, and maybe we should change them. But no, I mean, I think there's this, the age where you want to be older. And then at some point, you put, like, always, preferably a younger age there. But getting back to the point of segmentation, I think is, I think segmentation is, is not that easy. If you if, if all your if you're basically just getting traffic from, from everywhere, like what's the data you're segmenting on. But geolocation for one is a good one, what you could also do is segment based on traffic source, for example. So if you want to test behavior, only for visitors that come through search, or that come from a certain campaign that you might identify via UTM parameters and that's for sure a way to target those specific people. Usually, especially in Germany, brands that we work with don't have the hugest amount of traffic. So segmenting before the fact is oftentimes hard, because then the traffic that's left over is not that huge. So but, yeah, geolocation, especially for international brands, definitely a way to segment and also something that's sometimes unnecessary, because you might not be able to test things all across the globe, on international sites.
Dave Erickson 17:32
It sounds like for segmentation, the broader you get, the better it is in a sense. If you make your segmentation too focused or narrow, the results have you know, you don't always get real data, right, data that can be you know, verified very easily, you know, so one example is, is really easy to tell if the traffic is coming from a web browser or a mobile browser. So segmenting, where you know, desktop versus mobile could be one and you can show mobile browsers a different, you know, option for purchase than you do desktop, you know, because mobile indicates something. So that's one foreign segment, age might be a little difficult, because that usually results in some. kind of form or input, but country, browser, there's a few other things that kind of come with the traffic, that probably you can segment with. Yeah, you know, but I can also, you know, that's one of the things I guess a company really needs to look at, or ask itself when it's going to do AB testing is what how do they want to segment if at all, they may just say, I just I don't care. All traffic is what's coming into the site. Let's just work with all traffic versus segmenting, correct?
Nils Koppelmann 18:53
No, for sure. I mean, actually, the one you mentioned, just slipped my mind for some reason, but actually, like targeting only, for example, mobile traffic, might for some, for some experiments be exactly the thing you want, because maybe this change is only relevant for mobile users. So yeah, for sure, there's a lot of information that is already sent with the traffic that you can inspect or that you can use to to target people, especially in the in the in the client side way of, way of testing. It gets a bit different if we look at server side testing, because sometimes you might not have all the information available on the service side is, also some sometimes due to GDPR limits. But yeah, for sure, segmenting results is also something that's possible, but usually only advisable for, for very large or high traffic sites, because what you always need to take into account is if you segment traffic traffic, or segment results at the end, which means, for example, you could look at, you have an experiment for mobile and desktop, and later want to only segment mobile users, you still need to make sure that you have an equal distribution in mobile traffic in control and variation, and don't have an SRM error in there. An SRM is basically a sample ratio mismatch and this is when there is like, the traffic is not equally distributed in that segment and oftentimes, this is like people don't pay attention to it and say, Oh, this one in mobile, and I'm like, okay, sure, it might, but the data is not trustworthy anymore, because there is an SRM. For example,
Botond Seres 20:46
AB testing sounds like a wonderful tool. I mean, especially if done correctly. But I'm sure there are some limitations that you've experienced in your adventures in this field. I mean, I cannot even begin to imagine, but I'm pretty sure that targeting for example, can go very wrong. Like, we may not know for a fact that there is a certain subgroup of users that just hate big buttons and we try to test our size increased call to action on them and just all of them hate it's like, is that the real thing to worry about, or basically just something that…
Nils Koppelmann 21:34
It's not a thing to worry about? Really, because that's the essence of what you want to find out. If you're, if you have an idea of a change, and you think, Okay, now, big yellow buttons are the way to go. I mean, that's good, but if none of your, your customers, the people in your target audience want, I mean, not sure if this is like the proper example. But if it doesn't do anything to them, right, then, yeah, it's a lost test. But it's a one opportunity, because you're not just deciding to make all the buttons like one color now. But you actually know that, that what your change that you wanted to roll out, did in fact, not work, or it didn't, in fact, not move the needle in the direction you wanted. And this is basically where this, this point of decision making really comes in. Because you're not leaving things up to chance, assuming you're doing it properly, right, and you're applying a certain degree of statistical rigor, to actually know that what you're measuring is correctly measured. But if we, if we go a bit away from this example of a button and make it maybe a bit more tangible and something that might actually matter to customers, because with buttons, it's usually just, if you don't have the traffic of Google, it's hard to test stuff there and make it matter. But if we, for example, use images for in a on a on a product detail page for an econ brand, right. And, and we want to test a very simple hypotheses, hypotheses of do annotated images. So maybe, for example, showing USPs in an image help with, with a conversion rate, does it help people go from looking at the product to progress into checkout or putting it into the cart and then progressing to checkout? A very, that's a hypothesis, we can easily, easily test by taking, for example, a very high traffic product or maybe a set of products, creating these images for this, for these products, and just switching them out in the, in the variation. And thereby we see, okay, does this actually work for our set of products? Do people care about the USPs mentioned in, in these product images3.And thereby, we can actually learn something, because thereby we know people care, not just about the product, or maybe the price, but they might actually care about what the product does for them. And this is where you, we start to go from Okay, we just want to test a couple couple button colors to actually understanding more about user behavior, understanding about what's relevant for them. And then of course, you can go deeper, say that was a winner. You can you, can you can go deeper and understand what kind of features, what kind of benefits actually click with them. And yeah, so that's one way of doing it. For sure. I mean, the idea should always be learning about your users and making decisions not based on your own opinion, but based on what the data shows that works. Sometimes the data is ambiguous, because it might not, or you might have made too many changes, and you might not really know. But then you have to just dig deeper and keep going. And this is where at least for me, the fun begins, when it's not everything is when not everything is obvious. Because, yeah, then you can really test and go into the, into the data. And, yeah,
Dave Erickson 25:25
Well, it seems like in AB testing, you know, you don't want to make tests that incorporate large subsets, you want to test individual items, because you don't know what actually is going to change in emotion or an effect. So you, let's just say you have a website, and it's a landing page and you want to figure out are all the components of the landing page optimal, right for conversion, say, for a call to action. So, you know, you might do one AB test on just the heading, you might do one AB test on just the pictures, one AB test on just the call button. So it's kind of a sequence of stuff. If somebody were to kind of say, Okay, I got a website, I really don't know if it's optimal. Where do they start, when it comes to AB testing? Is there kind of, something that you look at first, or doesn't really matter?
Nils Koppelmann 26:21
What I always recommend doing is before even, even I mean, it's easy to want to really push the trigger, right? It's easy, you want to push that button, start that test. But in order to come up with good tests, ideas, you might want to do some good research. And I always say the better the research, the better the test ideas, the better the win rate, and win rate is basically just take 10 tests, and the win rate is the amount of tests that win right? So what I suggest doing is, is doing research, and there's many forms of rich research you can do. I think one of the best is actually talking to users, understanding what they really care about and then checking does the site or the landing page align with, with these needs? does it align with the problems you're solving for them, or they feel you're solving for them? Another is actually looking into data and if we're talking about a landing page, something to consider is do people actually see all the relevant information? Do people interact with various parts of the, of the landing page and for example, if they don't interact or never see the USPs, because they're all the way down at the bottom? Maybe just one idea could be like moving these ups all the way up when confronting people with, with, with them, right? Or there could be like many, like landing pages are vastly different, right? So there is not one blueprint but the idea is basically finding out what's keeping people from, from making a decision. And this decision could be going one step further into checkout. And yeah, so I think the best thing, it's, it's a bit of a hard answer, because people don't like doing research most of the time. But it's the thing that will really propel propel you forward, if you really spend a couple hours, like going into the data, looking at reviews, even looking at competitor product reviews, for example, on Amazon, understanding why people love their product, why people hate their product, and using that to make a point. And then you can create a backlog of test ideas. And use a protestation scheme that's readily available out there. There's like ice, there's rice, there's PXL, there's many others that you can use to then prioritize these test ideas based on what you think sometimes, but also some, some are more objective where you would see your pays this above the fold, is it on a high traffic site, and all these kinds of things and then pick the first idea tested, and then see what the outcome is. And don't forget one thing, critical hypotheses, a good hypothesis.
Botond Seres 29:20
We talked a lot about making small and granular changes, like moving content around or adding annotations to pictures or changing the color of buttons, which is everyone's favorite example, I know. But generally some people make a living doing that. So who am I to judge? What I am wondering about is is it normal or is it even ever a thing to do like a full rebrand and AB test that? Like let's say I have a very reserved, business oriented page and then one day, for some reason, someone comes to me with the theory that we should, actually we should be marketing to teenagers with, with lots of photoshopped pictures that are kind of non sensible, but they look fun. And a lot of brands do this to be honest, that like, their marketing makes zero sense, but it's, it's just fun. So I guess the question is like, is it normal to test for the fun factor?
Nils Koppelmann 30:31
I mean, you just said people, there is a point because people do this for a living. So I wouldn't even say I'm testing oftentimes for the fun factor. But for testing sake, right. But I think if we're talking about, like, big changes, we were just talking about, like very granular changes, right. There is a point for that. There's also a point for, or a case for doing very big changes, and seeing what happens and then trickling down and refining. But even in the case of a redesign, I used to be in the, and am still, more or less am but in the, in the field, or in the, in the group of people who saying, Yeah, never do a redesign. We did redesigns for a long time before we actually started with zero. As an agency, we did redesigns for a living, we revamped and re launched so many sites before. The point is, oftentimes it doesn't solve the underlying problem. And one thing that I learned to ask a lot of the times is why people came to us with, with, with the wish to relaunch their site. And I, at some point, started asking why. Well, it's been two years and I keep asking, why, why do you want to relaunch again? Well, the design is not modern enough. Okay, but why? Why do we relaunch now? Well, we want a more modern design. And if you ask this question, annoyingly too many times, they will tell you, Oh, because we want to make more sales, or we want to increase the, the amount people order or whatever it is. And, and this is the essence. So the point is not do we test? Or do we do a relaunch? The question really is, what do you want to achieve? And if, if what you want to achieve is not doing a relaunch because it doesn't really solve a problem, then what I usually suggest is, you can do a relaunch because there's reasons for relaunch like replatforming, or a rebranding of you just mentioned the the total rebrand, right? That there is a, there is a, there is a case for this. But what I would always do is like, again, do research. And when we first started doing CRO we didn't start with AB testing, we started with what we now call conversion redesigns. And this is basically understanding, doing research with, with the current customers, understanding why they buy and what's keeping them, similarly to what we do for research for AB testing, but or for zero in general. But basically understanding how we can optimize a site. And if the company is very, let's say stubborn or for other reasons needs to be, needs to do a redesign or rebrand, then, yeah, sure. But at least let's do the best informed way of doing that. And this is doing research. And even now, we sometimes, we think last year in November, we had a client, no, sorry, it was September, October, we had a client, they had to do a redesign. The reason was, there was I would say 50, or something developers that had worked on that Shopify theme and it was just more, it took more time to maintain that theme than, to do than to develop a new one. And what we did though, was we worked with one of their designers, and their idea of how the new branch should look like and all these kinds of things. And we started testing individual features against the current version of the site. So taking away the risk, or at least some of the risk that's associated with doing the full big, big relaunch, and slowly but surely incorporating these things into the, into the already existing theme. What this then does is, yeah, just reduce risk, but also already make users accommodated in a way, in the, in the new brand language. We tested things like a new menu structure. We tested things like I think a new cart with, with very different things. I think I have a case study on LinkedIn about that, if people want to see visuals, but we tested a variety of things to reduce that risk, because every change introduces risk. But also not changing something also is risky. Because you don't know if the current version is the best one,
Dave Erickson 35:23
The business goals should always kind of be in the forefront of many people's minds. But many business owners, they instinctively know Oh, of course, I want to do this to grow my business, right. But they're not able to kind of communicate that. So for them, it's, well, I think I need to change my brand, right? All, anytime a business owner comes to us with some kind of development request, or this is what I need to do. The assumption is always, they want to grow their business. Right. But the question, and I wish I had an AB test for this one is what is the place to start? What is it that they actually need to fix versus redesign? Right? And so I guess part of the goal, or part of the use case for AB testing, is to start going through what they currently have as a business platform, and figuring out what is the low hanging fruit, right. So if it's a marketing website, it's going to be calls to action, right? That's what they need, they need to get calls to action, because that starts the business process. But on an e-commerce site, it could be anything from the photos that, that are used to display a product to the order process to whether shipping is free or not free. So there's a lot of different kinds of opportunities to use AB testing to figure out what they want, right. And a lot of people use it for simple stuff like UX, right, which we've discussed big buttons, small buttons, that kind of thing. But it sounds to me, Nils, that the strategy of AB testing, really is that of testing, what gets the money? Is that correct?
Nils Koppelmann 37:16
I would say yes and no? So the question is, again, why do you want to AB test, right? Somebody has like we, we work as an agency and consultancy, right and somebody pays us. So I can't just work on the things that I like to, but the things that that we test, or hopefully the outcome of these tests has to ultimately ultimately make sense at some point. And yeah, the motivator there is money. Right? But what I need to also like someone creating the strategy for AB testing, or for conversion rate optimization, usually, that's the kind of program context we're working in, is how do we make it? How do we optimize our own process? And there, the priority should not always be on just what directly makes more money? Yes, that's the motivator. But also looking at what helps customers most because this is the other side of the coin. I was recently doing a webinar about this and it's the, really the customer doesn't care about your conversion rate. Heck, they don't even know what your conversion rate is. So they don't even know what it is. They want to just have their problem solved. And I'm talking about the end customer, the person coming to your shop, or to your site, and eventually clicking that button, buying this product. So I think you have to really ask yourself the question, How can I solve their needs? How can I, how can I make it work for them? Because yeah, you guessed it, people will buy the product, if their needs are solved, if you make it as easy for them as possible to go through the funnel or whatever it is. And so the prioritization or the priorities doesn't always necessarily rely on what we think is best, right or on, yeah, but, but really on how we can best present that to the customer. What I want to add also earlier is AB doesn't necessarily or the concepts don't have to be super complex, right? You can also one of the simplest things you can test is, for example, various angles of messaging. And on landing pages, this would oftentimes really be the headline tests. Something that's oftentimes skipped because it seems so simple, but it's the first thing that people see maybe apart from imagery, right but it's, it's super simple to test and but it makes such a big difference because people still read text especially the big bulky headlines. So, and but having a strategy behind that is also again, super valuable and necessary if you want AB testing not only to, if you want it to make sense, basically. So have,that said, having a good copywriter at hand definitely pays off.
Botond Seres 40:21
So we talked a lot about statistical significance and sample sizes, and generally tends to be anywhere from at least a couple of 1000s to 10s, or hundreds of 1000s of visits, I suppose. Not sure if that's per hour per day, per month.
Nils Koppelmann 40:40
No, I mean, we would usually be looking at visitors or more specifically, users in the, in the tool context of the, on the site, because visitors sometimes depending on a tool might be calculated by,by basis sessions, even it really depends. But there is also cases for, for looking at sessions, but, but we will usually, we would go for looking at users.
Botond Seres 41:09
If a customer comes to you, who has a tiny amount of traffic, like a couple 100 visitors per month, is it even viable to do an AB test?
Nils Koppelmann 41:20
I mean, you can do an AB test, it just won't tell you anything. So the point is, if, if traffic is too low, there is no real point in testing, because you will never reach that level of let's just call it significance here where, where you can, at the end, be certain that what you just tested even though you might have a result, but that what you have just tested would be a you'd be able to replicate it in a second test. And this is where, like, error rates come in. So there's the false positive, or typical one, right? Where you say, Okay, this is a winner, and you declare it way too early or just don't have enough traffic to declare it, but you declare it anyway. But if you ran this test, say 100 times more, you would realize that it would be a false positive because it wouldn't replicate. So that's why for very low traffic sites, and this even goes into the when they have only a couple 1000 or a couple of 10,000 per month, right? So AB testing is a game of big numbers, then, then sometimes it's more feasible to do other things. What I oftentimes advise is actually doing user testing with I mean, also there is a limit to that, right? It's not as trustworthy as isn't AB test. If we look at what's called the hierarchy of evidence, where AB testing and Meta analysis are way up the top. But, But, doing research and talking with, with users or customers are doing user tests, there's also a way to validate things, validate changes that you can't validate in an AB with, with the amount of traffic you have. So I think the point though, is and what I would always do as a company, is try to understand why get get behind the why and try to keep this testing mentality still in the back of my head, even if I can't AB test, because this allows you to also not be arrogant, as bad as it sounds but arrogant about thinking that you know what's best for the user, or for your customer. Because most people really don't, because oftentimes they're not the user themselves, they're biased by loving their own product, and they might not see the trees in the woods. If that's an expression that works in English, I'm not sure. But yeah, so, so actually challenging yourself, challenging the assumptions you have, and testing things, again, even if it's not an AB test, against, against the real world. But one thing that I actually like to recommend even if people don't have the traffic to support on-site AB testing is doing creative AB testing. On the side, for example, if you're running ads in, in your Facebook ads, or in other channels, but just because there you can also test you can you can test one landing page versus the other. You can you, can test your creatives, if you look at your click through rates, does it help you might not be able to infer on, on that leading to more purchases at the end. But you can get one step closer to, to your end goal. So yeah, for very small traffic sites, AB definitely doesn't make sense, at least not on site, but there is a way to go around them.
Dave Erickson 45:00
You know, we've talked a lot about AB for websites, e-commerce sites. You mentioned messaging, probably one of the most common uses of messaging for sales or to grow sales. Is that of emails and email campaigns. Can you talk a little bit about AB for emails? And what does that involve? And what do people test for?
Nils Koppelmann 45:26
In email, you basically have, have one or two goals, first, people opening your email. And this presents you with the opportunity to test your, your subject lines, right? So how do you, what subject lines perform, what subject line performs better, and maybe even performs consistently better over time. And then what variation of, of content in your email gets people to actually take an action. So this could be a call to action in the email, and later, also a sale for example, on your site, very important also to link that the. With email subject lines I wish I could give, like a one size fits all solution here. But again, that, that's up for up to be tested. But yeah, I think there's so many things you can do. And also depending on like, I run a newsletter, and initially, I tested various headlines against one another. In the end, I just defaulted to just writing the name of the newsletter, and, and then having I think one or, like a sentence of what's actually included in it. But yeah, in content, in the email, I guess, just making, making sure that what you want people to do is very clear, and that you're offering value. And playing around with this, like with various settings there. Again, like opens opportunities for, for AB testing, you could for example, think about do I directly show the big button of buy here? Or do I first tell a bit of a story on the new product launch or something? Do I show testimonials in the, in the newsletter or stuff like that, or something, if you do have the data, personalize the emails with for example, if you know that this person has shopped with you before showing them relevant products in your email. Again, I'm not too deep into the, into the CRM and email topic in that, but there is always ways of trying to make, make the asset in that case, the email more valuable to the customers and testing with that is super interesting.
Botond Seres 47:48
I’d like to get into some specifics on how some results can be analyzed. So I suppose that click through rates are enough, conversion is enough again. But I wonder if there are some more holistic measurements, I think that will be a correct term, like, say, time spent on the site, or perhaps time spent assembling the perfect set of items for my Cart and then, then leaving.
Nils Koppelmann 48:23
There is this model called basically a KPI tree. And you have to as a company, I think you have to very, very much understand what you want to optimize for. What is driving business for you? And usually, it's just a couple of very simple metrics, like revenue, retention, and say conversion rate, right? There's, you could, there's very, like, nice trees online that you can find them like, but then with whatever tree, they have branches, and they might branch off into a couple couple directions. So if you look at what's really important to you. Sure, it might be conversion rate. It might be, it might be, as you said, time spent on page. But the question is, and we've had this a lot of times where people like But yeah, this experiment performed so much better. We had like a bounce rate drop by 20%. But does that really affect our revenue, what's really important to us? And I think this is a question that every company has to answer for themselves. Look at YouTube, for example. They, YouTube, optimizes for, for something called, I think, average duration of video. I think watch time, I think watch time or watch minutes is, is something that YouTube optimizes for. So for them, it's super relevant, that people spend as much time as possible on the page that people have Like, go from video to video to video, on an e-commerce site, if you optimize only for time on page, you can make it as complicated as possible for people to put things into the cart. Because they might spend more and more time, just like getting to their end goal. And they might be there for 15 minutes and so frustrated at the end, that they just cancel, but you have a very, very, like, long time on page. So the question is, does it really help? And also is there? Is there a limit to it? So maybe three to five minutes is the maximum that you can actually allow people to, to be on the site, because after that, they might get bored, or they are overwhelmed with all the other things. Or another example, if we stay with this is if you want people to explore too much on your site, maybe they keep on exploring, exploring, but never gets, get to actually choose the product. So again, the question is, like, what's relevant for, for, like, in your specific case. And what I really recommend is just, most companies do have that, especially the more mature ones, but like creating, like such a KPI tree, and understanding how various metrics impact or pay into other higher level metrics, and then going from there. And yeah, but what I would also recommend, and that is like, just, creating a customer journey, maybe on a piece of paper, and writing the measurable, measurable KPIs at every step of the way. And then going for that, or going for the once one KPI after the one that's currently relevant at that step you're optimizing for.
Dave Erickson 51:49
It sounds like a strategy is almost like, kind of lay out what your KPIs are. So you can see them kind of do some research to figure out on your current site, what's working, what may not be working, and kind of fix that stuff. And then look again at your KPIs and say, Okay, is there really some questions here? Let's first look at, say messaging and marketing. And okay, we do a little testing on messaging and marketing, that fixes maybe one KPI. And okay, well, let's take a look at the buying process or the selection process and testing those areas to fix another KPI. And that kind of may give you an AB testing strategy out of that. Is that correct?
Nils Koppelmann 52:41
I would say so. So I mean, with everything, you have to start somewhere And especially with AB testing, because it is somewhat complex, I would not make it too complex in the beginning. So starting with something like messaging, that super easy to get the ball rolling really helps. But then refining that over time, and, and looking at what we call test spread. So basically looking at where are we testing at the moment? And what other assets can we start testing on in the future?
Dave Erickson 53:15
Can you give,kind of, an example of what is your traditional, or what is the most requested kind of AB testing that you do?
Nils Koppelmann 53:26
So the most requested thing is usually or people come to us with, hey, I want to optimize like if it's e-con, right, people come to us with, wanting to optimize their conversion rate. And usually they come with very abstruse ideas of oh, I want to double my conversion rate to say, in two months, where I usually have to already start putting the brakes in a bit and like putting them, their feet back on the ground. And this is usually not what's what's feasible. Usually, the request is for us to optimize the conversion rate. Because this is usually the most pressing issue. They're not making enough sales, especially now with the rising costs of ad spend and acquisition costs. This is usually what the most pressing thing at first, from methodology methodological standpoint, or what it's called. It's, it's usually either people don't, usu, don't necessarily come to us with a request to AB test. It's usually just, help us optimize our conversion rate. And one way of us doing that then is either giving them an audit, or really like building up an experimentation program to continuously optimize the site. And optimize, optimizing the phrase here might be a bit misleading, because sometimes it's really also just helping them understand what's really relevant for their users. And then step by step putting this into the site. For me Maybe just removing a lot of stuff, also very, fun.
Dave Erickson 55:03
And then concerning 3Tech, can you tell us a little bit about the company and how to usually engage with people?
Nils Koppelmann 55:12
Sure. So where an agency slash consultancy. I think I mentioned that at the beginning a it, but we basically help companies build up an experimentation program, start with AB testing, and use CRO to drive growth. I kind of like what we do now, because I try to not only function as an agency, but really, at some point start transitioning our service into, into, into helping them do it themselves, which is especially important for bigger companies, because we help companies build up this process, initially drive it ourselves, sometimes fully, autonomously. But over time, our goal was always to, to make this a core business capability to be able to test as a company, and, and make it work for them. And then sometimes we are still on as consultants or help them with, with problems along the way. But yeah, that's usually how we do it.
Botond Seres 56:18
Yes, last question. In your opinion, what is the future of AB?
Nils Koppelmann 56:25
The future of AB testing, I love that question. Hate it at the same time, I gotta be honest, because for me AB testing is, as passionate as I am about it, It's a tool. And I think a tool that won't go away. I think though, the, the position of it will change at some point, especially as the importance or even the maturity of AI continues to grow. Because especially like we were talking a lot about brands that don't have a lot of traffic. Brands for whom AB testing today is not very feasible. In some communities, people are already talking about something like synthetic users, where AI helps based on previous data to inform on what for example, a variation could, could have as an as an outcome as a result. To date, not very reliable. But I'm very, very much interested to see and work with these kinds of tools in the future, and see how they stack up against current AB testing technology and basically, for me, the most important thing is, how can we use these tools to help companies make the right decisions. So I think AB is here to stay and just looking forward to how we can enhance, enhance the process in of itself.
Dave Erickson 58:04
Nils, thank you so much for helping us see the difference between the A and B of AB testing. Well, that's about all the time this episode has today. But before you go to your audience, you might want to consider this important question.
Botond Seres 58:16
Will you take what you learned here and make a positive impact in your business with it?
Dave Erickson 58:21
For our listeners, please subscribe and click the notification to join us on our next ScreamingBox technology and business rundown podcast. Until then, choose wisely and use AB testing to grow.
Dave Erickson 58:26
Thank you very much for taking this journey with us. Join us for our next exciting exploration of technology and business in the first week of every month. Please help us by subscribing, liking and following us on whichever platform you're listening to or watching us on. We hope you enjoyed this podcast and please let us know any subjects or topics you'd like us to discuss in our next podcast by leaving a message for us in the comment sections or sending us a Twitter DM till next month. Please stay happy and healthy.