5 Best Practices for A/B Testing

As the digital marketing landscape continues to shift to accommodate for new privacy policies, voice search assistants, and other trends, one capability that remains constant is A/B testing. In this data-driven industry, A/B testing enables marketers to gain insight on which type of content is generating higher performance. This knowledge plays a vital role in the optimization process, allowing digital media analysts to integrate the highest performing content into campaigns to elicit higher KPIs. Although split testing can prove to be beneficial when optimizing campaigns, there are best practices that should be followed to ensure accurate analysis and depiction of data. 

1. Develop a hypothesis

Like all properly conducted experiments, A/B testing should start with the creation of a hypothesis. This hypothesis is meant to guide your experimental process, forcing you to consider what you need to test, why you need to test that element, and how you can use that new knowledge to achieve a desired outcome. Without this template, outcomes from the test would not result in a complete understanding. Rather, it will simply highlight raw data and present findings that do not necessarily provide cohesive or insightful results.  

Running tests without a hypothesis in mind can also put you at risk of wasting time, capital, and other resources.  

The end goal here is to learn something. Going into a split-test without a clear idea of the “what” and “why” will ultimately hinder your ability to make progressive insights that could be used in tandem with additional strategies to optimize your content.  

2. determine what to test and do not divert

As an overarching goal, A/B testing is meant to determine what works and what does not. With that in mind, it is important for you to decide what variable to evaluate. This variable should reflect the point of friction that a customer experiences as they navigate through the company’s sales cycle. Distinguishing these barriers will allow you to consider vital areas of opportunity and test variations that have the potential to generate stronger performance.  

It is also important here to restrict the test to one specific element at a time, keeping all other factors constant. This will directly attribute a response or change in performance strictly to one element. 

Multiple variations can be run simultaneously to allow for a comparison in data, but it is recommended that variations be limited to two to four at a time to warrant efficiency and reliability.  

When conducting these tests, it is imperative that the experiment is not altered mid-process. It may be easy to shift focus and incorporate a new dimension into the existing test, but it is strongly recommended to refrain from doing so. Without consistency, reliability of the end results will be diminished, and it will be unclear which element was conducive to the outcome.

3. Test a representative sample size 

Due to marketers’ heavy reliance on data, it is necessary to promote data integrity throughout the entire experimental process, including testing a representative sample size. Without an adequate sample size, data derived from the tests are not accurately supported and would prove to be of little use.   

The standard for a representative sample size will depend on the type of content you intend to test, as well as your website’s overall visits and conversions. For example, if you are testing an element that has a delivery to a finite audience, like an email, it is best to split up your audience list according to the number of different variations, all while ensuring that the sample size for each variation is adequate. As a general best practice, it is suggested that each test has at least 1,000 subjects.                

However, if the element tested is being delivered to an infinite audience, like a landing page, the duration of the test will be more indicative of the needed sample size. To ensure that the sample size is representative of the general population and that there is no overlapping data, it is also recommended that the split test is run specifically for certain devices and browsers. 

Without administering separate tests that are specific to different devices and browsers, the possibility of receiving overlapping data from customers who access a website through multiple devices or browsers are heightened. This creates sample pollution, compromising the validity of the test results as a whole. 

4. Determine the right timing to your tests

Timing carries great weight when administering experimental tests. Proper implementation involves accounting for situations in which data could be inflated or skewed, often misleading the true source of the outcome.   

Before launching an A/B test, it is important that you thoroughly consider any possible external factors such as seasonal shifts or company specific changes that could incite varying data. Launching tests at stable and unfluctuating times will aid in delivering action-oriented results that are representative of non-peak periods. 

To ensure quality results, it is also recommended that you run your test for a minimum of one to two weeks and a maximum of four weeks. A key point to note here is that your test should run for full weeks, regardless of when you reach your required sample size. Having this timeframe will provide week-long information that gives you a closer look into daily performance. For many businesses, this means receiving data for non-peak days, as well as weekends, which tend to be associated with higher website traffic. Ending tests early, or mid-week, can lead to data pollution, resulting in an inaccurate depiction of your data.

5. Test Statistical Significance

Although it may sound complex, calculating your test’s statistical significance is crucial to understanding if your results were sound, and not merely caused by an unprecedented shift in consumer behavior.  

VWO’s A/B Split Test Significance Calculator can be a great resource for determining your test’s significance level. To receive the significance value, simply enter the number of visitors of the control and variation page, as well as the overall number of conversions. The calculator will return a p-value, as well as the results of whether the test was statistically significant. It will also depict a visual representation that offers a more robust and detailed view of the results. With this information, you can either proceed to make the necessary changes to further optimize your marketing campaigns or adjust certain elements to rerun a test that may elicit a stronger statistical significance. 

unnamed

In the digital marketing world, data acts as the fuel to devise strategies and drive desired performance. Without these building blocks, a marketer lacks the understanding and knowledge needed to actualize their goals. Despite the ever-changing landscape that continues to affect marketers’ forms of data retrieval, A/B testing has been a consistent mechanism that delivers evidence-backed insight that can be used to properly execute optimizations. Without the ability to administer split testing, marketers would merely be operating off hunches, rather than a clear and proven idea of what works and what does not.

Leave a Comment

Your email address will not be published. Required fields are marked *