digital

Big Data in Retailing

The opportunities and possibilities arising from Big Data in retailing, particularly along five major data dimensions – data pertaining to customers, products, time, location and channel. Much of the increase in data quality and application possibilities comes from a mix of new data sources, a smart application of statistical tools and domain knowledge combined with theoretical insights.

Read also Big Data retail analytics: 6 Mistakes To Avoid

Integrated online and offline experiments

According to some estimates, Walmart collects around 2.5 petabytes (1 petabyte = 1,000,000 gigabytes) of information every hour about transactions, customer behavior, location and devices. An IT analyst firm Gartner estimates that there will be 20 Billion (13.5 Billion in the consumer sector) devices connected in the “Internet of Things”. Imagine the amount of data that will be generated by these devices. Imagine a day where online and offline retailing data provide a complete view of customer buying behavior, and even better if the data is linked at the level of the individual customer to enable “true” customer lifetime value calculations.

Imagine a day where data thought only to exist in online retailing, e.g. consumer path data, exists inside the store due to RFID and other GPS tracking-based technologies. Imagine a day where integrated online/offline experiments are being run that provide variation that enables causal inference about important marketing/retailing topics such as the efficacy of email, coupons, advertising, etc. Imagine a day where eye-tracking data isn’t just collected in the laboratory from monitors but is collected in the field due to retinal scanning devices embedded within shelves.

As futuristic as those data sources sound, all of them exist today (albeit not ubiquitously) and will soon be part of the information that marketing scientists (within and outside of retail) use for customer-level understanding and firm-level optimization. Simply and heuristically put, these data sources will be adding “columns” to our databases (and a lot of columns!) that provide an increased ability to predict customer behavior and the implications of marketing on it. Now, add that to the technology (i.e. IP address tracking, cookie tracking, registered-user log-in, loyalty card usage, to name just a few) which enables firms to collect this from millions of customers, for each and every moment, linked to each and every transaction, linked to each and every firm-level touchpoint, and linked across distribution platforms, and we have the big data that pervades the popular press today.

While the lure of big data is tempting, the big data revolution really is a “better data” revolution, and especially so in retailing. This framework mirrors the definition of business analytics which includes descriptive analytics, predictive analytics and prescriptive analytics. 

Sources of big data in retailing

There is potential to exploit the vast flows of information in a five-dimensional space, across customers, products, time, geo-spatial location, and channel.

1. Customers

When most people think of big data, they think of data sets with a lot of rows, and they should. Tracking technologies have enabled firms to move from aggregate data analyses which dominated marketing effectiveness studies when data was limited to individual-level data analyses that allows for much more granular targeting. In fact, one could argue that one of the big missions of a firm is to grow the number of rows (via customer acquisition, i.e. more unique IDs) and more transactions per customer with greater monetary value. In retailing, the ability to track new customers and to link transactions over time is key. Loyalty, widespread today, are the most common way that such tracking exists; however, credit card, IP address, and registered user log-ins are also commonplace. Besides more rows, firms also have much better measures about each row which typically, in retailing, might include a link between customer transaction data from a CRM system, demographic data from credit card or loyalty card information, survey data that is linked via email address, and in-store visitation information that can be tracked in a variety of ways. If one includes social media data and more broadly user-generated content which can be tracked to individual-level behavior, then customer-level data becomes extremely rich and nuanced.

2. Products

Product information in marketing, has and likely always will be, defined by a set of attributes and levels for those attributes which define the product. However, in today’s data rich environment we see an expansion of product information on two-dimensions.

  • First, this information may be available now for hundreds of thousands of SKUs in the store, making the data set about products have a lot of rows in it.
  • Second, the amount of information about each product need not be limited now to a small set of attributes thus increasing the column-width, if you will, about the product information matrix.

Product information along these two dimensions alone (at the store level) can enable a host of downstream analyses – such as that of brand premiums, or of product similarities and thereby grouping structures and subcategory boundaries. Thus, retailers will have product information matrices that are both dynamic, and much more descriptive allowing for greater variation of product varieties that are micro-targeted towards consumers. Furthermore, since more attributes and levels can be collected about each product, this will allow retailers to gain an understanding of products that were never modeled before (e.g. experiential goods), because they consisted of too many attributes, or hard to measure attributes, to allow for a parsimonious representation.

3. Time

While the large data sets described in the above “customer” and “product” pieces may seem large, imagine the third-dimension – “time” which literally multiplies the size of this data. That is, while historical analyses in retailing has looked at data aggregated to monthly or possibly weekly level, data in retailing today comes with a time stamp that allows for continuous measurement of customer behavior, product assortment, stock outs, in-store displays and environments such that assuming anything is static is at best an approximation. For example, imagine a retailer trying to understand how providing a discount, or changing the product location changes the flow of customers in the store, how long customers spend at a given store location, what they subsequently put into their shopping basket and in what order? A database that contains consumer in-store movements connected to their purchases could now answer this question because of the time dimension that has been added. In addition, due to the continuous nature with which information now flows to a retailer, the historical daily decision making about inventory levels, re-stocking, orders, etc. aren’t granular enough and real-time solutions that are tied directly to the POS systems and the CRM database are now more accessible.

4. Location

The famous quote about “delivering the right message to the right customer at the right time” has never been truer than in the era of big data. In particular, the first two components (“the right message” and “the right customer”) have been a large part of the copy testing, experimental design and customized marketing literature for at least the past 40 years. However, the ability to use the spatial location of the customer at any given point in time has opened up a whole new avenue for retailers where customer’s geo-spatial location could impact the effectiveness of marketing, change what offer to make, determine at what marketing depth to make an offer, to name just a few. When the customer’s geo-spatial location is also tied to the CRM database of a firm, retailers can unlock tremendous value where a customer’s purchase history is then tied to what products they are physically near to allow for hyper-targeting at the most granular level. However, while this hyper-targeting is certainly appealing, and short-term revenue maximizing, retailers will need to consider both the ethical and potential boomerang effects that many customers feel when products are hyper-localized.

5. Channel

This century has seen a definitive increase in the number of channels through which consumers access product, experience, purchase and post-purchase information. Consequently, consumers are displaying a tendency to indulge in ‘research shopping’, i.e. accessing information from one channel while purchasing from another. This has led to efforts to collect data from the multiple touch points (i.e. from different channels). The collection, integration and analysis of such omni-channel data is likely to help retailers in several ways:

  • understanding, tracking and mapping the customer journey across touch-points,
  • evaluating profit impact,
  • and better allocating marketing budgets to channel, among others.

Realizing that information gathering and actual purchase may happen at different points of time, and that consumers often require assistance in making purchase decisions, firms now started experimenting on relatively newer ideas like Showrooming – wherein the customer searches in the offline channels and buys online, and Webrooming – where the customer’s behavior is the opposite.