Google started providing encrypted search back in 2010 and while the connection between the user and Google was encrypted, Google were still passing the users search query through to websites. In October 2011, Google made a change whereby users logged into their Google Account on google.com would be automatically switched over to HTTPS and in March 2012, Google announced that they were rolling that same change out globally through all of their regional Google portals such as www.google.com.au.
Importantly, unlike the encrypted search product from Google released in 2010 that still passed the users search query through to the destination website, Google are not passing the users search query through to websites as of the changes rolled out in 2011 and subsequently in 2012.
(not provided) Keyword
The lack of the keyword information being passed through to the destination website manifests itself in web statistics products like Google Analytics with a pseudo-search term known as (not provided).
To provide a high level example of what is happening, if a website received 5000 visits from 5000 different users, each with unique search phrases and all users were using Google secure search – a product like Google Analytics will report all of those 5000 visits against a single (not provided) keyword and aggregate all of the individual user metrics against that one keyword.
In more specific terms, below are some of the issues faced not having search query data:
- you won’t know how many unique search queries and their respective volumes are entering a site
- you can’t analyse keyword level metrics like pages/visit, bounce rate, conversion rate
- you can’t find pages competing with one another inside a site and providing a poor user experience
- you can’t optimise a landing page based on the users keyword
- you won’t be able to understand user search behaviour in terms of their research/buy cycle
- you’ll lose the ability to understand how your brand, product and generic phrases are related to one another
- you’ll lose the ability to understand how different devices play a role in your marketing efforts to know that the research/buy cycle is different
- you can’t report on goal completions or goal funnel completion by keyword
- you can’t report on transactions, average order value or revenue by keyword
- attribution for a major percentage of a sites traffic is greatly impacted
Hashed Keywords
I wondered long ago if Google might consider taking a small step back from their current stance and instead of sending no value for the query through to the destination website in the HTTP REFERER header that they might provide a unique hash for every keyword instead.
For those unaware, hashing algorithms take variable length inputs and output an associated, unique, fixed length output. There are a variety of different hashing functions available, but as an example of their use – SHA-1 is used in cryptography and is part of the security for HTTPS web traffic.
The important thing to understand about this idea, whether it is done through a hashing function or another mechanism, is that the goal would be to replace the users actual query with another unique value that doesn’t disclose or leak the users actual query for privacy reasons.
Using an approach like this isn’t going to address all of the issues raised in the bullet point list above or the longer list of issues the (not provided) keyword introduces, however it improves a businesses understanding of their website and their visitors behaviour without compromising a users right to privacy.
Unintended Side Effects
History will show that as we make advances in one area, often with only the best of intentions, that those best intentions are ultimately twisted, bent and adapted to drive some less than ideal outcomes.
The same can be seen with user privacy, the HTTP REFERER header was designed to help a website owner understand how users move through the internet at large and an individual website. When the HTTP specification was first developed, at the time I’m sure that the inventors didn’t imagine that in the future that simple concept was going to ultimately become a tool to attack a users privacy.
Now the question to ask would be, if Google were to take a couple of steps back from where they are currently and provide a hashed representation of the users query instead of no query data at all – could a website owner, opportunistic marketer or nefarious hacker misuse the hashed query against the user in some way? Could the hashed keyword value be reverse engineered to ascertain what the original users query was?
Is there hope for the future?