I’ve been reading Bill Franks’ Taming the Big Data Tidal Wave: Finding Opportunities in Huge Data Streams with Advanced Analytics. It’s a worthwhile read, though it’s written for novices in data science, so if you’re an expert in the space, you might find it dull. Generally, Franks provides a good overview of what the big data buzz is all about and outlines various opportunities to leverage this big data stuff.
Anyway, flying home yesterday from Philadelphia, I was reading the chapter on web log data. There’s just a ton of it, but as Franks notes, a lot of it remains unleveraged. The easier-said-than-done secret in using big data to do interesting things is extracting, structuring, analyzing and then visualizing data in ways that provide meaningful insight. Insight that you can use to take future action. Web analytics, of course, is a fairly advanced discipline, and we can mine clickstream data to extract all kinds of useful information, such as web site performance, user shopping preferences, user segmentation and so on. But there are so many more possibilities, especially as we begin to link data on user behavior across the web (and not just on one web site).
So I had this idea, which I thought was cool. Maybe it’s been done, I don’t know. I’ll throw it out anyway in a numbered list:
1. An interesting unit of web data reporting might be the “Web Day.” If you could connect the data on every user on the web and slice it into unique Web Days, one per user per day, you would be able to capture the details of how people live their digital lives, individually, in segments and in the overall aggregate.
2. The data that you could bring together would be location data, time data, clickstream data and text entry data, among others. It would simply be a matter of stringing together a 24-hour narrative: “user_200431” got on at 6:00 am PST, visited these sites, clicked these links, watched this video, wrote a blog post here, did a specific keyword search on Google, commented on Facebook here, sent emails, logged off at this time, logged back on at this time, clicked here, bought this item from this site, etc.
3. The optics here could potentially be beautiful: imagine an interface where you could click on a single user ID, on a specific day, and see a visual timeline of activity, including site thumbnails, product thumbnails, text snippets, etc. You could drill down into a single activity or time period.
4. The analytics could also potentially correlate and synthesize data. You could query the database and build aggregate models of user segments and multi-day time periods. You could say to the database: give me every user who purchases kids toys online on Black Friday, then give me an aggregate common profile of their daily web activities for the prior week. In such a scenario, you could determine how people prepare to shop, what they are searching for, what influences their purchasing activity on that day, and then do something about it.
5. Privacy issues would have to be adequately handled, so there would have to be a way to mask a user’s personal identity. You could use a non-personally identifiable ID for each user. You’d also have to mask or withhold information that would be personally identifiable such as Facebook IDs and some posted content, etc. I think it could be done in a way that still retains all the really useful information.
6. I imagine such an analytics tool would be huge for brand marketers, sociologists, and people researchers of all kinds. There are many organizations who devote considerable energy to segmentation, trends and consumer insight. You could build this service and offer it to marketers on a software as service model. Sell licenses with monthly or annual fees.
I think you’d want mobile and other non-web data in there too eventually, to make it even richer, and as it gets richer, the privacy issues become more acute. Of course, there are huge technical issues to tackle in terms of tying all the data together in a way that provides valuable output. But there’s a lot of smart people out there, and all problems that are worth solving can be solved.
So that’s Web Day analytics. Cool idea? Creepy? What do you think?