Page impressions (PIs) — aka traffic or inventory for digital sales teams — is a term thrown about frequently and carelessly by publishers, agencies, and marketers. I’m here to tell people about the issues that surround using this term and what you’re actually trying measure.
It’s not just quibbling, though. I have seen millions of dollars wasted by marketers who don’t really understand what a page impression is and media continues to constantly misuse the term.
The first thing to to know is what a PI is. Does it get its name from a page in a book? Not really.
A page impression is a variable metric produced by a particular methodology and reported from software. There is a set of rules and definitions as to what counts as a PI. It isn’t just when a page appears on a web page for you to read. For web content to count as a PI, something must let the analytic software know that the page has loaded.
This is generally something like a small bit of code that activates tracking, including placing cookies on the browser or IP Tracking of some variety. It can also be a weblog based on server requests.
The information collected can be extremely rich and detailed, but that is the subject of another topic.
Now that we know a PI is dependent upon some code on a web page, it becomes relevant where the code actually is on the page. In some cases the code might be at the top of the page and report a visit from a browser as a PI before most of the page has even loaded.
This is an issue if you’re paying for some PIs that don’t finish loading (including the ad or content you wanted the user to see). The code might also be at the bottom, requiring the page to fully load all content (images, ads, etc) before counting the visit as a PI.
On slow loading pages (or with slow internet connection) you might have people seeing the content or ad they want and then clicking away from the page before the analytic code has activated and it won’t be counted as a PI. There are also asynchronous codes that aren’t as affected by this issue. So keep in mind that there can be issues due to where the code activating the PI is on the page — it can clearly be an additional issue among the others I’ll be discussing.
Since large PI numbers are often used by marketers and salespeople to represent the popularity of a website (I would consider this not completely truthful), there are many tricks publishers use to inflate those numbers. In some cases this can also increase the amount of lower value inventory for ad sales.
A simple and obvious way is when publishers split large articles into multiple pages. They’re not doing this because they’ve run out of page space, a web page can be infinitely long. They’re doing it to create additional PIs to increase their ad inventory and enlarge their 'traffic’ numbers, even though the same content could easily be on just one page.
It is even more of an issue with content presentation such as photo galleries or quizzes. In many cases you remain on the same page and scroll quickly through lots of picture frames. Depending on how you have technically implemented the code for this you can fire off a new PI with each image, whether or not other page elements such as ads also change. This is one way some sites can rack up really high PI numbers despite low user bases.
Yet another way that is fortunately much less common now is the auto-refresh. This is where a site automatically refreshes the PI (creating a new one) after a set time. So you may load a page and wander off for a 10 minute cup of tea, and find out that you have actually created 10 PIs (and probably ads that you didn’t see).
One way some publishers use to drive additional PIs is gamification, like offering badges or rewards for interacting with content. If done right this can be very engaging, but done poorly, users are just creating extra PIs and engaging with the game, not with the content or advertising. Depending on the analytic software (and rules of use by the provider, especially if they are trying to be an impartial third party providing comparative metrics) there may be limits or rules on implementation based on how much or in what way the page changes.
Google and other search engines (and analytics/research companies) are constantly sending spiders out to crawl the web to record web pages and images. Most good quality analytic software can identify and remove this type of traffic from its reporting, but cheap and free software often doesn’t manage this well if at all. Even the good ones all use slightly different rules for what constitutes a spider or bot to not be counted. They have blacklists, IP lists, and even track behaviour or the bots to determine whether to count a PI or not. This is just one of the many reasons if you have more than one analytic software reporting on your pages, they will never exactly match up.
It’s also important to note that the accuracy of this little bit of code counting your PIs is completely dependent on you actually making sure it is correctly coded on each page you want counted. Some sites might forget to install the code on every page and actually be getting a lot more traffic than the software is reporting. It could also be installed in such a way as to inflate the counts.
Some software that utilises server requests might not appropriately count PIs if some of the pages delivered were cached. Settings You might also have set your software to not count specific IP addresses (such as if you don’t want to include your own company’s visits) and other software might not have that functionality or executes it differently, so again the numbers won’t match up.
It's rare that oversight of any kind can guarantee proper technical implementation of PI tracking. If you want as much accuracy as possible, then working with professional companies that support implementation is probably your best bet. You get what you pay for when it comes to analytic software. It also means that your data reports are never going to match that of other people also measuring your site, so don’t get too upset if there isn’t a close match.
So now you have an understanding of what constitutes a PI, the real question is if that PI is any good? Does it have any value?
Like in the print world all pages are not created equally, otherwise the primary driver for the cost of a book would be how many pages it has, not who it is by, how many were printed, is it just text, is it hardcover, etc.
The actual page count has very little impact on the price or value of a book. In the same way, PIs have very little correlation with the value of a website. If we go right down to the page level of a book we can compare something like a large colourful coffe table book with the text page from a mass-market paperback.
Clearly the two pages are of different value, but how do you measure such things online? Well, stuff like brand value of the author, quality of the content are all hugely important, but outside the scope of PI measurement, so we’ll stick to some things our analytic software can report on.
My go-to metric for PI value is page duration. This is simply the amount of time a page was open in the browser and it is usually determined by taking the difference between the time stamp of the page you are on and the time stamp of the next PI your software recognises. Because this is a metric, there are also lots of assumptions and potential bias issues with it as well that you just need to keep in mind and take into account. The most obvious one is that if someone clicks off your site, your software doesn’t get a time stamp to use, so it has no way of knowing what the page duration of that last PI was.
This is why most publishers will use and report on average page duration (APD). That brings in another bias to be watchful for. If you’re buying advertising on just a subsection or page of a website, make sure you know the APD of that subset, not just the APD of the whole site; they may be radically different to your detriment.
Make sure you get what you paid for. Because page duration correlates with content consumption/feature use, it makes a good engagement metric as well. But also remember that it isn’t exact. Just because a page is open (much like all other media) doesn’t mean it is being viewed.
So there are different ways to try and adjust for that, and again what a page duration is, is defined by your analytic software and many have very different ones. For instance, does the page have to be in the active window or not? Another pitfall is that longer page duration isn’t always a good thing. If you have navigational or informational content the faster the user can locate what they need and click away the better the user experience.
So it serves as a good reminder that if you’re buying ads you need to be aware of the context they’ll be sitting in. The metrics that describe the value of that PI are dependent on the type of content. Despite those limitations, APD is pretty useful for valuation and comparison of the core unit of ad sales, PIs, because it is a good substitution metric for engagement, which is really what advertisers are after in the end.
In practice What can the APD tell us about value? First, if the APD is very low, it implies the content and or audience isn’t very engaged, and there is a lack of fit somewhere in the user experience. This can be a bad landing page (design, content, fit, speed, usability) that people bounce right off of, but it can also be just very short content. It’s easy to fill blogs or websites with pages full of small bits of content.
That’s another great way to inflate PIs, but it probably isn’t very engaging, and more importantly for advertisers their ads probably aren’t being seen for long enough or at all. Just imagine a page/site with an APD of 20 seconds. That’s not a lot of time for a user to consume the content they came for, let alone any additional content such as display ads. I’d argue below a certain point low duration pages have no value to advertisers.
This is especially true depending on how the ads are executed on page (they’re usually script activated too). Do they load last? You might get charged for a PI that your ad didn’t even appear on. Ad impressions have all the same issues PIs do. Your ad probably isn’t on the entire page as well, so its position and the relative length of the page is very important to the value of ad spots.
How long is your ad going to actually be seen? If you don’t know this, I guarantee you’re wasting money. It’s as simple as the fact that 30 second TV ads are worth more than 15 second TV ads. Display isn’t any different, so why buy PIs like they’re all the same? This is especially true when considering ad placements below the fold or out of view. A page impression isn’t an ad impression.
Even if you’re being given ad impression numbers, that doesn’t mean they were seen. An ad can be delivered and never viewed. The basic rule is you get what you pay for, and if it’s really cheap there’s probably a reason. Some publishers might charge twice as much, but on the cheap site only a quarter of the actual ads delivered are ever seen.
Price is a great indicator of quality, but it still pays to check what’s actually being delivered. Also, what kind of ad is it? How much time does it need to get its message across? How does that flow with the content and context of the page? There are lots of things to keep in mind for media buying, and let’s be honest, it’s usually not looked at closely enough. That’s why if I do have to take a shortcut and just rely on one metric to determine page value, then I’m going with APD.
True value is always determined case by case, but you have to understand your metrics to get value. There is a lot more to PIs than meets the eye. I don’t even start to claim that this is comprehensive as there are additional new technologies being invented all the time for tracking users online (fingerprinting, packet sniffing, etc)
But hopefully you’ve found this useful, and if anyone has any suggestions to improve/correct this explanation I’d really appreciate the input.
Eric Rowe is a senior digital analyst at Tourism New Zealand.