Google started providing encrypted search back in 2010 and while the connection between the user and Google was encrypted, Google were still passing the users search query through to websites. In October 2011, Google made a change whereby users logged into their Google Account on google.com would be automatically switched over to HTTPS and in March 2012, Google announced that they were rolling that same change out globally through all of their regional Google portals such as www.google.com.au.
Importantly, unlike the encrypted search product from Google released in 2010 that still passed the users search query through to the destination website, Google are not passing the users search query through to websites as of the changes rolled out in 2011 and subsequently in 2012.
(not provided) Keyword
The lack of the keyword information being passed through to the destination website manifests itself in web statistics products like Google Analytics with a pseudo-search term known as (not provided).
To provide a high level example of what is happening, if a website received 5000 visits from 5000 different users, each with unique search phrases and all users were using Google secure search – a product like Google Analytics will report all of those 5000 visits against a single (not provided) keyword and aggregate all of the individual user metrics against that one keyword.
In more specific terms, below are some of the issues faced not having search query data:
- you won’t know how many unique search queries and their respective volumes are entering a site
- you can’t analyse keyword level metrics like pages/visit, bounce rate, conversion rate
- you can’t find pages competing with one another inside a site and providing a poor user experience
- you can’t optimise a landing page based on the users keyword
- you won’t be able to understand user search behaviour in terms of their research/buy cycle
- you’ll lose the ability to understand how your brand, product and generic phrases are related to one another
- you’ll lose the ability to understand how different devices play a role in your marketing efforts to know that the research/buy cycle is different
- you can’t report on goal completions or goal funnel completion by keyword
- you can’t report on transactions, average order value or revenue by keyword
- attribution for a major percentage of a sites traffic is greatly impacted
Hashed Keywords
I wondered long ago if Google might consider taking a small step back from their current stance and instead of sending no value for the query through to the destination website in the HTTP REFERER header that they might provide a unique hash for every keyword instead.
For those unaware, hashing algorithms take variable length inputs and output an associated, unique, fixed length output. There are a variety of different hashing functions available, but as an example of their use – SHA-1 is used in cryptography and is part of the security for HTTPS web traffic.
The important thing to understand about this idea, whether it is done through a hashing function or another mechanism, is that the goal would be to replace the users actual query with another unique value that doesn’t disclose or leak the users actual query for privacy reasons.
Using an approach like this isn’t going to address all of the issues raised in the bullet point list above or the longer list of issues the (not provided) keyword introduces, however it improves a businesses understanding of their website and their visitors behaviour without compromising a users right to privacy.
Unintended Side Effects
History will show that as we make advances in one area, often with only the best of intentions, that those best intentions are ultimately twisted, bent and adapted to drive some less than ideal outcomes.
The same can be seen with user privacy, the HTTP REFERER header was designed to help a website owner understand how users move through the internet at large and an individual website. When the HTTP specification was first developed, at the time I’m sure that the inventors didn’t imagine that in the future that simple concept was going to ultimately become a tool to attack a users privacy.
Now the question to ask would be, if Google were to take a couple of steps back from where they are currently and provide a hashed representation of the users query instead of no query data at all – could a website owner, opportunistic marketer or nefarious hacker misuse the hashed query against the user in some way? Could the hashed keyword value be reverse engineered to ascertain what the original users query was?
Is there hope for the future?
It is interesting that Google sends HTTP Referer for Google AdWords campaign regardless of user privacy that he is so concerned!
That has been a pet peeve of mine and many others for a long time now as well Vladimir. It is easy to see why Google didn’t want to go down that road, it would put AdWords at risk and when that single product delivers Google ~95% of their revenue – they were never going to go near it.
You mention that, “Using an approach like this isn’t going to address all of the issues raised in the bullet point list above or the longer list of issues the (not provided) keyword introduces, however it improves a businesses understanding of their website and their visitors behaviour without compromising a users right to privacy.”
Besides the first bullet, how does it address any issues? In other words, how does it help users improve understanding of their website? (By comparing the data to GWT and guessing which keywords are being used?”)
Also, in what way does hiding referrers protect a user’s “right to privacy?” I clicked on your link from a Google+ post. Does your knowing that I came to your site from Google+ violate my right to privacy?
Personally, I think that the right to privacy thing regarding search referrers is pretty flimsy.
Interesting idea! Is there any way to pull something similar from existing data, without waiting for Google?
Thanks for an excellent article. It has helped me understand the data in my Google Analytics account. I’ve been wondering why so many keywords aren’t provided although the number of new visitors is so high. I see that encrypted search significantly limits the data which can be analysed.
Colin,
Unfortunately not mate, currently Google don’t provide any information whatsoever about the users query if they are signed into their Google Account or if they simply choose to use HTTPS version of Google.
Al.
Yehoshua,
Having a unique hashed keyword instead of a single (not provided) keyword will help in a lot of different ways.
Consider that you’re about to re-develop the content for your website, having a unique hashed keyword helps you understand if your new content is better/worst from a keyword diversity stand point. Getting more traffic isn’t necessarily enough, your rankings could have improved unrelated to the content development – but having a unique hashed keyword at least tells you that you’re getting more diversity.
If you were running an ecommerce site, instead of having all of your revenue going through a single keyword – it’ll now be split over each unique hashed keyword instead.
If you were looking at the performance of a landing page from organic search and you’ve got a single (not provided) keyword, all of the behavioural metrics for the visitors are grouped into that single keyword – making it entirely useless. With a unique hashed value per keyword, you’d be able to see what keywords by volume are performing well or poorly. If it was some of your highest volume keywords performing poorly, you’ll be able to understand what those keywords might be using common sense, Google Webmaster Tools or maybe complementary query data from other search engines as an example and take steps to improve the relevancy of the landing page for those keywords.
The ways in which you could do a better job having some keyword data, albeit encrypted, over no keyword data are vast. Reverse the tables on this discussion and for a moment think that we’ve never had any keyword data and tomorrow Google decided that you could have the encrypted/hashed version of the keyword – the number of creative ways you’d find to make use of it would be long.
Don’t for a moment think I’m suggesting this is a replacement for getting the actual keyword data the user entered, it isn’t – however I’m going to take more data over less data every single time – especially when I know that the data is accurate and relates to user behaviour/experience.
Al.
Great stuff, Alistair!
Unfortunately, I have my doubts that it’ll ever get serious consideration at Google. I suspect that the additional resource load would be more than they’re willing to accept. Perhaps, there’s an innovative way to handle the additional load without substantial impact. On the surface though, it seems prohibitive.
A brilliant idea, though.
Excellent post Al, fingers crossed but I’ll not hold my breath!