Using stylistic creative to increase conversions

Using stylistic creative to increase conversions

One of the major challenges of marketing is to convert responders to buyers.

Let us say you have worked hard to improve the response rates of your marketing initiatives, you find that your catalogs and emails receive a standing ovation of sorts, you have reached an optimal response rate to your campaigns. Then your manager looks at your conversions for each of your campaigns and makes a comment that there is only marginal improvement in the conversions.  You are reminded that the goal of marketing is to convert prospects into buyers, reactivating ocaasional buyers, and increasing the purchase of the regular buyers. You come back to your desk and feel deflated.

The question facing you is what can I do differently to increase conversions?

Perhaps you need to  understand the underlying latent desire of the customer. Or simply call Express Analytics

Try to understand what does the customer want to buy but also why does the customer buy what they buy.

So your engagement with the customer needs to be continuous and ongoing.

Obviously you won’t get insights of why they buy in one campaign.  In fact recent research from Forrester indicates that campaigns are dead.

Buying decisions are usually emotional or reasoned. However the emotional decisions are quick and  sponteneous, while reasoned decisions are more thoughtful and drawn out. One can insert the emotional factors in reasoned decision to shorten the timeframe of purchase. More on this topic in a future blog.

Recently we did some A/B testing to establish whether stylistic adjustment to the content emailed has an improvement in conversions. The first part of our exercise was to segment the customers buying behavior  into contemporary, modern, traditional etc. We segmented the products bought by customers into these categories. Then we decided to test the impact of their style preference by sending emails specifically designed to reflect the style of their past buying behavior. So while most of the people were sent our emails without any style, a set of customers were sent emails that matched their style preference, such as traditional styled creative sent to customers whose past buying behavior was traditional products. The modern customers were sent creative content in the email that was modern contemporary or mixed.

The conversion rates we found were quite encouraging. The mix of creative style of the email and the customers segmented by their style preference such as modern, traditional, none etc. showed that the response was up in the 20-25% when the email artwork was correctly matched to the browsing sytle of the traditional customer while it was in the 12-13% for a Modern customer when the style was mixed. Modern customers receiving emails with no style was in the 7-8% range and Traditional customers receiving emails with no style were in the 9-10% range. So just by adjusting the layout of the text and product images in the email template we were able to improve conversion rates by 12% for traditional buyers and 5% for modern buyers.

Creative Style Outliers


Creative Style Outliers_data

Improving the response rates of email marketing #2

By Hemant Warudkar.

Everything in marketing is about being relevant to the context of the customer. Having said that, when we set out on achieving this lofty goal we run into a number of challenges. One of the major consideration is the definition of the size of the opportunity.

Let us assume that we email every one of our email list of 10 million members.

Let us define response rate in marketing parlance.

Response rate = number of responders/number of impressions in the performance window.

Performance window can be defined as the time after the email reaches the inbox of the target audience. This can be as short as 24 hours or up to two weeks depending on the frequency of your waves of emails.

So let us say that

  • we sent 10 million emails
  • 500,000 people opened their emails
  • and of these 5000 clicked at least one link in your email
  • within the two weeks after we sent the email,

then our response rate would be  5000/10,000000 = 0.05%.

We could also look at another important ratio here.

Click to Open Ratio = 5000/500,000 = 1%. These two response rates are typical of the email marketing numbers. Let us see why.

The general notion is to send more emails to increase the number of responders. However, by the definition as the number of impressions increase the denominator of our formula increases thus reducing your response  rate.

Our executive management reviews the response rates and mandates that we do better, or perhaps we don’t share these rates with our management as they are too low and proactively decide to do something about the response rates.

[polldaddy poll=7273554]

Let us rethink our approach

If we take the same “spray and pray” technology of the newspaper, billboard era, then we have not made any progress. The best part about the digital world is that technology allows us to sense and measure more signals than was possible in the offline world. If we don’t take advantage of this measurement we haven’t made progress.

So let us think about narrowcasting rather than broadcasting. I want to make an offer only to those people who may be interested. That way I can keep my number of impressions to a minimum and improve the number of responders. However, wishful thinking alone doesn’t make this happen. We decide to think different.

One of the team members has a bright idea,- why don’t we segregate our buyers from non buyers? surely, they behave differently. Then the discussion drifts along the lines of demographics and geographic segmentation. These dimensions are easy to use, and so, we dive in headlong in this approach. But we soon discover that in the next couple of our campaigns the response rates don’t budge. If you have been there, don’t worry, we all have been there.

A Better Approach 

Let us start by looking at how we can understand the context of the buyer. We can start by segmenting the customers into a number of segments by the following attributes based on their past buying behavior. Some of them are:

  1. Buying stage
  2. Behavior
  3. Style preference
  4. Price Sensitivity
  5. Social acceptance
  6. Location
  7. Quality consciousness
  8. Recency
  9. Frequency
  10. Monetary
  11. Return behavior
  12. demographic
  13. Gender
  14. Channel preference
  15. Opt In status

With Today’s technology, we can observe the potential customer unobtrusively over a long period of time and collect data about them. Once we have done that for a while, we can start to group them into various clusters. Some clusters can be as follows:

Cluster 1

Cluster 1

Cluster #1

Just browsing/prefers to browse online/brand conscious/likes contemporary styling/wants to buy first/local < 10 miles/discerning of quality/last bought a year ago/last purchase was $300/Return behavior unknown/Professional/ male/retail buyer/Opted-In

Cluster 2

Cluster 2

Cluster #2

Actively buying for self/Brand Neutral/Modern style lover/Balances price to performance/seeks recommendations/local <5 miles from store/compromises on quality/last bought 90 days ago/last purchase was $100/rarely returns/upper class/female/online buyer/opt-In

Such clusters, once defined allow us to predict the future behavior of new members to the cluster. As we gather important clues about individual customers, we can start to plan separate campaigns to address every unique cluster.

Each of these listed variables influence the buyer behavior to a varying degree. So, perhaps we can study the last two years of the behavior of this customer segment and calculate the rank of each variable by its influence on the buying behavior of the members.

We can also find the strength of influence of each variable on the outcome. This is the correlation between individual variable and the outcome. So let us try to find an equation that can explain in mathematical terms, the influence of these variables on the final outcome. Will the receiver respond or not? So we classify receivers based on their likelihood of response.

Once we have done this classification, we can calculate the probability of response. Now even within responders there is going to be a probability of response by each receiver. So we need to rank the likely responders based on their probability of response (e.g. 0.85). Then we can create a cut off threshold (let us say 0.75) . Anyone with a probability above this threshold should be emailed, the rest of them can be safely ignored.

This is one part of the equation. We have identified the segments of customers we want to communicate with. THE WHO of our story is defined.

However if we still need to establish the context of the receiver of our communication when we are going to email them. So we still need to try to explore the recent activities of the receiver. Our ability to track the receiver’s search terms, the visit to the various websites, the blogs visited, the products reviewed, the products they have pinned, the stores they have visited, give us their stage in their journey to purchase. This also gives us the clue about their stage in the buying cycle and what may be of interest to the receiver. This allows us to do some controlled experiments with the type of emails we can send them. It also answers the question -when is the best time to send them the emails.

This approach of using the data to guide us through the unknown territory of marketing new receivers is called Data Science. Using the historical data to analyze the habits of customers needs a scientific approach. It is methodical, time-consuming but the surest path to marketing success.

In a future blog, I will explore the world of subject line testing, the composition of the email, the contrast colors, the sizes of the images, the positioning of the call to action links. These activities make our emails more effective as marketing messages.

Improving the response rates of email marketing

If you are like most marketing managers the top most thing in your mind is to generate more revenue with your marketing spend. Perhaps your performance is measured on it. In effect you are expected to invent the perpetual machine which takes no input but generates infinite output, or so it seems.

Most companies today use emails as a main way to communicate their marketing messages with their customers. Yet the response rates are so low that the gut reaction to poor email responses to your email marketing program is to increase the email frequency to improve the number of responders.

However this leads the customers to perceive your messages as an irritant if not spam. Even if you are not flagged as a spammer or an irritant there is a strong possibility that the value of your message is diluted, thus creating a longterm loss of brand value. Is there anything that an organization with a modest budget, can do to improve response rates without barraging your customers with unwanted emails?

Fortunately, there is a way to engage your customers without bombarding them with emails and yet improve the response rates of your email marketing.

Let us look at how you can segment your customers with whom you want to communicate.

Broadly speaking there are four different types of customer categories:

  • The persuadables
  • The sure bets
  • The lost causes
  • The “Do not Disturbs”

The persuadables are those that are likely to be seeking a product or service and are familiar with your brand and aware of your offerings. These customers are likely to welcome your email because it solves a problem they are trying to solve. Perhaps they are interested in buying a product you are offering and so your email seems to be well-timed. Here the need is met just in time, or there is an untapped desire, unspent disposable income that you can access by sending the right message at the right time to the right person. The persuadables are also the customers who will spend higher if targeted.

The sure bets are those customers who are very familiar with your brand and offering. These customers may buy irrespective of receiving an email/catalog/sms/coupon. You may potentially waste your money by sending them emails, or better still reduce the profitable revenue generated by your marketing program by offering them coupons. This is preaching to the choir.

The lost causes are those who are never likely to respond to marketing messages as they are either not interested in buying, or they have been won over by your competition. Sending them emails may be fruitless and you are better off trying your message somewhere else.

The Do Not Disturbs: The fourth category of customers are those who are likely to be loyal customers but who don’t want to be disturbed by frequent emails. Sending them emails is likely to turn them off. You can lose a good customer due to poor marketing. Generally these customers feel slighted that you don’t know them and get put off by your marketing emails. This is a risk you can’t afford to take, as it would mean losing a good but infrequent customer who buys a lot when ever they get to your store or website.

The question by now you must be asking is all this is good but how do I segment my customers in these four categories. I will get to the process of effective segmentation later first let us look at the historical and current situation.

Historically experienced marketing managers have developed an intuition based on observing the behavior of their customers.

  1. When did they last buy?
  2. How frequently do they buy?
  3. How much do they buy?

The trade term for this formula or expertise is called RFM (Recency, frequency and monetary) value. For years this has been a mechanism used to segment customers by these three dimensions and target them with marketing messages. But this technique has been overused. Along with this the avenues for buying have increased significantly as well. Besides retail stores, there are now e-commerce sites and mobile apps where the buyer can exercise the right to buy. They can buy in their bedroom late at night, in their pajamas, or buy while they are riding a car during their daily commute. So the customer is getting empowered to buy anything, anywhere, anytime.

The advertising influences on a customer are increasing multifold. Google search, ratings and reviews, social media bragging by friends about what a great deal they got are routine. So what in your marketing really worked? What can ou attribute the sale to? This is the holy grail of marketing today. My point is that just a three-dimensional analysis of customers doesn’t give enough insight into their buying behavior. Obviously a better way to analyze customer behavior is needed.

Over the years direct marketing companies have used predictive modeling for creating multiple segments of the customers based on a large number of variables that are likely to influence the buying behavior. Obviously you couldn’t mail the catalog to all the people in the country as it costs real money to get the catalog in the hands of a customer. Even if it costs $0.50 per catalog to send a 50 page catalog to a customer the numbers quickly add up when you mail the whole population multiple times a year. Hence the need to improve targeting. Marketing managers have developed very deep expertise to increase the return on investment of the marketing dollars. In direct marketing the predictive modeling is used to calculate a purchase propensity score (the probability of purchase multiplied by the amount of money the customer is likely to spend) for each customer. This gives a sense of the success of the campaign before any mailing is done. Use of this technique has not been applied to email marketing mostly due to the cost of modeling and scoring the customers. There is also a notion that it costs very little (at least relatively!) to send an email blast, so I might as well send it to every one of my customers.

Both the cost of the modeling and the almost negligible cost of emailing have kept this approach from being used for email modeling.

However our experience over the last few years has been quite the opposite. Typically most companies are happy if they get 1%- 2.5% rate of response to their email marketing. But using the approach I am about to outline, we have experienced response rates in the 12-15% range. Initially, when we reviewed the numbers we didn’t buy them, but when the rates continued to keep coming up again and again it became conviction that we are on to something.

In the next few posts I will attempt to articulate this approach and look forward to your feedback. What we will attempt to learn together are the issues involved and how to overcome these to attain the marketing nirvana of “sending the right message to the right customer at the right time based on their moods, likings, buying stage and buying behavior”. Stay tuned.

[contact-form][contact-field label='Name' type='name' required='1'/][contact-field label='Email' type='email' required='1'/][contact-field label='Website' type='url'/][contact-field label='Comment' type='textarea' required='1'/][/contact-form]

Power of Sankey Diagram in Data Visualization


Data Visualization is one of greatest way to simplify the complexity of understanding relationships among data. Sankey Diagram is one such powerful technique to visualize the association of data elements. They are named after the Irishman Matthew Henry Phineas Riall Sankey, who first used them in a publication on energy efficiency of a steam engine in 1898. They can be difficult, time-consuming and very tedious to draw by hand, but now a days we have various tools to generate these diagrams automatically such as Business Intelligence technologies like Tableau, Google Visualization, and D3.JS etc.

So what are Sankey diagrams and how can they be useful?

Sankey diagrams are basically flow diagrams, in which width of lines associated with two different nodes are proportional to value of metric or key performance indicator. We can also present this kind of information using neural networks and association analysis diagrams. They provide

  • flexibility,
  • interactivity for business user to get insight on data at fast pace

They are  a better way to illustrate what are the departments which are holding strong association, thereby we can  improve our promotion mix by launching various loyalty schemes with sales kit, which contains products from these two departments at competitive price or we can also take steps to improve the association between departments where we don’t have much penetration.

Below diagram depicts a strong association among different departments in retail organization and we have drawn this diagram using Google visualization library, we can also create this diagram using S.Draw, Visguy, Fineo, Parallel Sets, Sankey Helper, D3 Sankey Plug-in etc. This is being widely used in energy sector to analyze flow of transmission, and also being used to illustrate anomalies in money and material flow in business organizations.


Sankey Diagram


Putting this tool into the hands of the people who have the knowledge rather than having a graphic artist in the process allows users the opportunity to visualize a wide range of processes  such as

  • production cost optimization by understanding process flow at ease.
  • energy losses of particular machine.
  • material flows within specific economic sectors.
  • improve operational efficiency and support a more sustainable business operation.
  • effective cash flow analysis in business organizations.

Adding your own visual graphics to the Sankey Diagram gives rich interactive visualization, resulting in attractive graphics for information materials and effective visual data exploration practices!


Analytical Platform Evaluation

I recently had an opportunity to work on a technology consulting project for one of our customers who is in the services industry of providing business analytics solutions. The scope of the project included recommending an “analytical platform” among others, so that the customer could move out of their existing custom application with limited analytical flexibility to a more generic platform that would help them carry out full spectrum of analytic solutions.

The Need:

We started by trying to understand the state of business analytics now and how it might shape up in the near future, which being in a span of about next 2-4 years. This was important because the platform is not only supposed to address the current needs, but also should take care of the near future requirements. After looking at various leading recent research publications on this subject, it was clear that more and more companies are adopting analytics not only to out-perform competition, but to avoid risk and to make decisions based on data rather than intuition.

As companies start to become analytics driven, they would want to use every bit of data that might be available to them. And this data can come in all shapes, sizes and speed. That is, the data might become available as structured, semi-structured or unstructured. It might become available as database dumps, text files, audio or video files, sensor data, web server logs all of which could be in varying sizes. A lot of it could become quickly available, owing to the fact that virtually every business is done digitally, i.e. involving digital computing in some form or the other. Thus, the recommended platform should

  • Enable storing of any and all sorts of data
  • Enable easy conversion of the raw data into right insight to aid in better business decisions.

The latter part of above mentioned needs, i.e. the raw data transformation into the right insights by applying various analytical techniques, can only be done by the right talent which is very limited in the market at present but is critical for any analytical project success. So the key take away from the exercise above, is sort of an “analytics outlook” for us were the facts that the platform should be an enabler in producing the right data from the raw data and right talent, which is hard to find currently, from an existing pool of talent.

Traditional Systems:

With an understanding of the above, we started by looking at various options we have readily got and could not help but think about the platforms we are familiar with such as Oracle, SQL Server, Teradata etc., that have become ubiquitous and skill set of such technologies can be found very easily. We quickly figured out that these traditional relational database systems cannot deliver the kind of analytics the current businesses demand. The old systems were not built for that purpose. They are “stone age” systems when speaking of the “new age” analytic requirements including but not limited to prescriptive, predictive or the good old descriptive analytics. Add to that list the text analytics, exploratory analytics, machine learning etc. that have started to become the norm of late.

Looking at how the traditional database management systems are built, most of them are SMP i.e. Symmetric Multi-Processing systems, designed to share resources amongst the various processes. These kind of designs have practical limitations when it comes to scaling for both “data and performance” as

  • They cannot work with tons of data requiring serious processing horse power.
  • Neither can they support ad-hoc, on demand, interactive analytics requiring rapid iterations.

We concluded that there’s a need for systems that is designed with a radically different approach to traditional systems.

New Age Systems:

The technological breakthroughs in the last decade have produced systems that are built to address the shortcomings of the traditional systems. Specifically, the MPP i.e. Massively Parallel Processing systems, with shared-nothing architecture do a great job of leveraging the hardware and building systems that can scale up and scale out for both data and performance. A few notable solutions built on this idea are Apache Hadoop, IBM Netezza, Teradata Aster, EMC Greenplum, Actian Matrix and SAP HANA among others.

Of these leading solutions, some are built on proprietary custom hardware – software combination and shipped as an appliance that can be used for plug and play. The examples of this category are IBM Netezza and to some extent, though not strictly categorized as an appliance is Teradata. Others are purely software solutions built on commodity hardware but can deliver equally good performance or in some instances fare better. The solutions offered as appliances are typically priced higher and maintenance could be an issue in the long run. That left us to evaluate software only solutions that leverage the commodity hardware. As hardware prices continue to fall and the same dollar value can buy more of data and processing power, this model was as a clear winner. Among the purely software solutions, the choice was to be made between a system such as Hadoop and Actian Matrix or SAP HANA.

Narrowing Down:

Evaluation of Hadoop yielded some great insights, especially owing to the fact that there has been a lot of development going on in this eco-system, where many projects are being promoted to be top-level projects and Hadoop continues to become stronger and making case for its place firmer in the analytics world. We understand that as of now, Hadoop can’t meet requirements such as low-latency, high-speed, ad-hoc querying of data, advanced analytical functions with rapid iterations. It is still a niche system requiring hardcore technical talent to write map-reduce programs in languages such as Java, Python etc. and such talent is very difficult to get in the market, although there have been advancements with projects such as Spark & Shark etc. related to such requirements that work to leverage in-memory capabilities. We have to agree that Hadoop has a place in the current analytics eco-system, because it works best for batch processes, is required to work with variety of data coming in at a great velocity and in more volume every day.

The other software only systems such as Actian Matrix or SAP HANA work well when it comes to scaling up for performance or data volumes as they

  • Built to address requirements of ad-hoc querying
  • Support low-latency query results and
  • Can do rapid iterations through the analytics model
  • Run advanced analytics such as predictive analytics and/or prescriptive analytics & text analytics
  • Features such as in-memory capability with in-database analytics truly deliver on the promise of real-time analytical needs where time is the currency.

A key feature of these systems is to store data in columnar fashion which aids greatly in retrieval/query and storing data as such can allow leveraging various compression techniques to be able to reduce disk and network I/O. These systems, like Hadoop, are also built for high availability and disaster recovery.


Our evaluation yielded an interesting outcome – the fact that Hadoop is a great MPP system built in the modern distributed computing era, but can’t address all requirements of the new age analytics. Thus systems like Actian Matrix or SAP HANA have their own place and are indispensable for businesses of our time that wish to benefit greatly by being analytics driven organizations. So both of these are required — Hadoop as well as Matrix/HANA. What that necessitates is that there should be a mechanism where both these systems have the ability to interact seamlessly otherwise they could end up being silos causing great pains. That is, these both systems should integrate so well that it should seem they are one single system.

Having personally worked on Actian Matrix and seen its integration not only with Hadoop, but also other external systems, it was our final choice!

Role of Social Media, Data Analytics in (Loksabha) General Elections 2014 in India

By Pavan Sarathi

(Our objective through this article is to provide insights through use of data analytics and social media for  informed decision making.)

Recently 16th (Loksabha) General elections were conducted in India. They were conducted in 9 phases across the country. Even before the elections were announced in mid March 2014, people in India were very active on social media about the elections and their candidates. These elections have attracted the whole world media, reports say they have been one of the most observed elections so far in the largest democratic country in the world, with 814.5 million voters (81.45 crores) including 100 million (10 crores) first time voters.

Use of Data Analytics and Sentiment analysis by political parties

With national general elections under way, some parties appeared to have an edge, as technology, social media and big data played a key role in connecting with voters. Data mining from real social media like Facebook and Twitter included voter sentiments, emotions and concerns in different constituencies and states. India’s political parties used this data to prepare manifesto, ways to reach voters effectively, drive donations, enroll volunteers and organize resources on the ground to improve the effectiveness of everything from door knocks and phone calls, to micro messaging and social media.

Today, data is analyzed not just for  research, but it is also used to analyze real-time monitoring of people’s reaction to politics, policies and also in rapid responses to crisis situations. Data analytics plays a significant role in changing the political outcomes. Studies revealed that in more than 160 out of 543 constituencies, the impact of social media was high enough to determine the results.

“Modi is perhaps one of the most tech-savvy politicians in the world and certainly the most active in India,” says Amit Sheth, a professor at Wright State University’s Knowledge Computing Center in Ohio.

BJP, the winning party in these elections had developed their own customized digital tools based on both commissioned and open source data that put them in direct touch with voters. Narendra Modi - the newly elected Prime Minster of India had 3.67 million followers on Twitter, 15 million likes on Facebook and the party’s 68 million page views on Google+. This was revealed by Aravind Gupta – Head, BJP IT cell and social media center. Few key metrics decided the results of the candidate and party in a given constituency. Most of these metrics were provided by data analytics and data mining from social media.

How to position your camapaign.

campaign plan

I being a part of IT cell BJP Karnataka, would like describe our game plan on how we operated during the elections. This was carried out at each Assembly constituency levels. It was operated at 2 levels, the general public level and the (karyakarta) volunteer level.


  1. Social Media – Facebook, Twitter, Google+ hangout, shared the achievements of Narendra Modi, failures of UPA and individual achievements of the candidates to the relevant groups.
  2. Connected with group administrators of different groups that already existed on Social media for each (Vidhansabha) Assembly constituency and posted relevant, inspirational information each day.
  3. Achievements of the candidates in their own constituency were published on their websites. The websites were updated with activities, tour plan etc and the details were shared on social media frequently.
  4. BULK SMS- were sent to as many people as possible on regular basis.
  5. 2-3 emails were sent out regularly on a daily basis.
  6. Groups were created on WhatsApp Messenger, which is a cross-platform mobile messaging app which allows to exchange messages was used to pass on information, for flow of effective messages/pictures to different existing WhatsApp groups by the group administrators
  7. Messages and pictures were posted that motivated public, especially youngsters to vote and explained the need of the hour in different messages (email, SMS, whatsapp, social media etc).


  1. (Karyakartas)  volunteers groups were created for each (Loksabha) general constituency in emails, mobile and whatspp, and other social media. Ideally this is called Distribution list.
  2. Relevant and required information was sent to the (Karyakartas) volunteers working in the field. It educated them with different facts that were available online, facebook and other media. Everyday they were sent short SMS, emails or presentations, pictures through whatsapp.
  3. The electoral, previous election details of each (Loksabha) general constituency in the form of presentations were shared with the campaigners working in the field.
  4. Unique experiments which were carried out in any constituency were shared with other constituency campaigners through short SMS, emails etc.
  5. Provided different questions along with answers any campaigner may come across while working on field which were similar to FAQs

By Electronic Media

Electronic media has extensively used data analytics while delivering news during elections. This has helped them in understanding viewers concerns, trends of different political parties, politicians across the country. Based on that, breaking news were telecast many times. These were one of the elections where electronic media has made use of data analytics more than ever before. In fact one of the leading channel CNN-IBN had tied up with IT giant Microsoft and had set up an Analytical Center especially for this elections. Many IT companies provided data analytics services to different channels.

In fact, Reuters reported that U.S. social networking company Twitter is planning to replicate parts of its India election strategy across countries that go to polls this year, after it emerged as a key tool for politicians and media companies during the world’s largest democratic exercise.

In India, Twitter Inc worked closely with politicians including the victor Narendra Modi who used the platform for election campaigning, and also partnered with mobile and media firms to distribute tweets online and offline.

Now, with polling due in other nations later this year, the San Francisco-based company plans to take its India lessons abroad to expand its foothold in the political arena and increase its user base.

“The election more than any other moment provides a nice microcosm of the value Twitter can add … we are sharing widely the lessons of this Indian election around the world,” said Rishi Jaitly, India market director at Twitter.

Revenue generated by social media and Data analytical companies due to elections has substantially increased perhaps two or three times than ever before.

The Pitch Madison Media Advertising Outlook 2014 estimates, released on 19 February, forecast the digital medium to contribute Rs.3,950 crore (Rs. 39.5 billion or $658.33 million @exchange rate of 1USD = INR 60) in 2014. Most of it coming during elections.

“Online as a medium to reach voters has become a really powerful one for the parties, so spending on digital will only continue to grow,” said Asheesh Raina, principal research analyst at Gartner Inc.

The elections are just over and IT companies are yet to declare their revenue generated due to elections.

Facts about India and use of Social Media

  • There are 93 million Facebook and 33 million Twitter users
  • There are 600 million mobile users and most of them moving to smart phones.

Some of the data collected from Social media about key Political parties and politicians.



social mediaImage-06









What is Digital Marketing?

Some of you must be aware about the recently held general elections in India. The results are out to see. There is much more to learn from the results. While, I don’t intend to talk about the political learning’s, my focus is going to be the use of social media and online marketing for campaign and use this to talk about the same.  The social media has been used effectively in other geographies, mainly developed countries, for election campaigns. However, this is probably the first time in India where this medium was used for campaigning, and to motivate and create an army of volunteers who will campaign for the parties. (Even twitter is talking about taking same innovations to global market.) So can we say that, digital marketing has finally arrived with one more strong success story? The answer is obvious, Yes! In fact, if someone answers that the medium has arrived long back and we are late to recognize the power, I won’t be surprised.. Just to explain the power of social media in the elections, Narendra Modi, the winning candidate had 15.5 million likes on Facebook whereas Times of India, the leading English daily has readership of 7.4 million. This makes clear that Modi didn’t have to depend on TOI to publish his views. He was free to share his views, agenda the way he wanted and at the time, he wanted and we saw the results. The other important aspect of this campaign to note is Narendra Modi, the early mover on social media has gathered a good support from people. The number of likes on Facebook and Twitter will tell you this. The early mover has certainly an advantage over here too (like in many other situations) So what’s there for me as a marketer or a business owner? There is only one action point. If your marketing strategy doesn’t include digital marketing, you need to include it NOW. There are several reports, how marketing world is moving towards digital. More and more people are spending more on digital media than conventional media. If you are already spending on digital marketing then it’s good but you need to check if your strategy is right.

So what exactly is Digital Marketing and why should one go for it?

In simple words, digital marketing is nothing but marketing done on digital mediums i.e. mobile, internet, social media (part of internet), email marketing. Before we go into the details of digital marketing, let’s talk about the advantages. For a long time, it has been a challenge for marketers to know the return on investment on their marketing spends. They know that half of the money that they are spending are not yielding any results but the problem was knowing which half is not working. With digital marketing, you can easily calculate ROI and manage your budgets properly. The other big advantage is target marketing. You can segment your customers and decide the customers that you want to target. With target marketing, you are not only able to target specific customers but you can deliver the content when the context is favorable or the prospect is in right frame of mind.

To sum up the advantages of Digital  Marketing

  • Target Marketing
  • Easy calculation of ROI
  • High ROI,
  • Budgeting and forecasting,
  • Efficient and effective campaigns,
  • Attribution modeling etc….

I guess, these benefits are enough to make you feel there is more to know about digital marketing. In digital marketing the medium of content delivery is computer/ laptop/ mobile or tablet. We can decide to publish advertisements on any of the websites, social media platforms, YouTube or simply an email. Contrary to the buzz around social media and people saying ‘ Email is dead’, Email marketing is still one of the most effective medium of marketing and certainly more than social media. As per few studies the return on investment is as high as $40 return on $1 spent on email. seo-pyramid-10094971 The next popular medium is the search engines. We all use Google to search for products and services. The key here is if your website is showing as part of top 3-4 links, there are high chances that people will visit your website and a lead will be generated. This is called as organic search. To ensure that your website shows up in a search, you have to optimize your website. This process is known as Search Engine Optimization (SEO). If you are dependent on your website to generate leads then you have to pay attention to SEO. If you pay to show your results when a user searches for specific keywords, these results are shown as sponsored results. This is called as paid search. You can use Google ad-words or contact the websites directly for such advertisements. This is a great medium to target customers based on location, context and good for brand building too! All the above listed mediums are associated with website. Once user clicks on the links made available through the above mediums, they will be directed to your website. So it’s very important that the website is neatly designed. These users are converted from leads into customers and this largely depends on the website. One needs to analyze user actions once user visits your  company website. This analysis typically involves the pages visited by the user, time spent, saved cart analysis etc. This analysis is called web analytics and is very much important if you are dependent on the websites to generate leads. Then obviously, there is social media. You can advertise on social networking sites like Facebook, LinkedIn, Twitter etc. Considering the time that we are spending on social networking sites, this is gaining more and more popularity among marketers. Last but not the least, there is mobile marketing. This is a very powerful medium as you can deliver content based on specific location via SMS, mobile internet or mobile apps as today most of the people access their emails and social media accounts on their smartphones most of the time. One of biggest strength of this medium, unlike any other digital medium is, people always carry their mobiles with them and are ‘connected’. This is just a tip of iceberg as far as digital marketing is concerned. There are certain challenges associated with it as expected. The first challenge is about the IT infrastructure required for this. But this infrastructure can be outsourced or built in-house. The next big challenge is to decide the right medium or right budget mix for these mediums. All these medias have their own strengths and can be used very effectively in their area of strengths. It’s also important that we continuously monitor the ROIs of these medium and are flexible with our strategy and spending. All this requires a good understanding of digital marketing and a very robust analytics platform. This platform should be designed to measure ROIs of various campaigns, compare various campaigns and to enable real time monitoring so that corrective action can be taken. The analytics platform will also help you with customer segmentation so that you can target right customers. Earlier, marketers had limited media like TV, magazines, newspapers or billboards to spend on, whereas calculating ROI was a challenge. Now you can calculate the ROI for digital marketing but the challenge is to decide the right medium. Therefore, to be competitive, one needs to invest in digital marketing and marketing analytics to maximize ROI. Don’t be far behind in this game otherwise it might be too difficult to play the catch-up game. As I said earlier, this is just the tip of iceberg and this is just an overview of this interesting field. Will cover the topics in subsequent blogs in detail.

Dimensionally speaking….

Dimensional Modeling


Profile Dimension (aka Junk Dimension aka Mystery Dimension)

Hello readers! I hope you all are enjoying the colorful spring! This time I am going to tell you a story about a new friend of mine, who lives on the Main Street of the Dimensional Modeling town, and goes by a whole bunch of different names! I am not sure how & from where he got all these names, but, It Is What It Is … As Data Warehouse Architects, we usually know the two kinds, namely, Dimensions & Facts, very basic, right? But then, every once in a while, we have our “encounters of the third kind”… the attributes that are neither dimensions nor facts! This new friend of mine helps us deal with these!  Scenario: Last month, we kicked off a new project to build a new DataMart for the supply chain group here. Everybody’s calendar got filled with a lot of meetings, meetings to discuss the project requirements, meetings to chalk out the estimates & the plan. Life was cool & I was doing my most favorite stuff, data modeling for the new project. I came up with a beautiful design, right out of the pages of a Ralph Kimball book with 5 dimensions, and 7 measures, and I was on my way to the star schema heaven. Suddenly I stumbled upon user’s awkward questions: – Where is such and such flag? Where’s the blah-blah type? Why can’t I see that xyz code? I wondered if we really need all of those junk attributes for BI? Well, it turned out that those standalone codes & flags & types really meant a lot for the business.   Traditional Approaches & the issues: I had 2 options in my mind at that time to deal with this situation:

  1. Create new tiny dimensions for all of these attributes. But then, I was worried that my good looking, picture perfect model is going to get crowded with all these new dimensions that have nothing much to them other than just those codes, flags & types.
  2. Add all these transactional attributes to my fact table. But then, my fact table won’t just stand for 100 million rows of these sparsely populated attributes.

I didn’t like either of the options. And then, like Captain America, this new friend of mine, the “PROFILE” Dimension, came in to save the world, or at least my model!   Concept: A profile dimension is a dimension that holds all the unique or valid combinations of a set of columns, and assigns a unique key to each combination. This key is then hooked up onto the fact table. The set of columns that this dimension encompasses are usually low cardinality attributes like flags, types, codes & statuses etc. These attributes do not necessarily have a direct correlation or relationship with each other, but are only housed in a single dimension table.   Example: Let me tell you more about the exact situation I was trying to resolve. We had a Purchasing transactional table from the ERP system that needed to be fed into the DataMart. Apart from the 5 dimensions & 5 measures that I talked about earlier, this transactional table had several codes, statuses & flags. All these standalone codes, statuses & flags were moved to a profile dimension. Here is how it looks in the model. Please take a look at the profile dimension which is marked in GREEN. Of course, this is a much simplified version of the actual star schema, just to explain the concept. junk dimension in green

Advantages: A profile dimension allows all the columns to be queriable, while only adding one column to the fact table, and providing a much more efficient solution in comparison to either creating multiple dimensions, or leaving all the data in the fact table. By moving such transactional attributes to a profile dimension, you’ve got fewer indexes on the fact table which might be important depending on the size.   Data Population: A key consideration when populating the profile dimension is, how many combinations are technically possible v/s how many actually exist in the data set. If the number of all possible combinations is too high, the profile dimension size may be unmanageable. Also, many of the technically possible combinations may not even make sense in reality. In such cases, populating only the combinations that exist in data set makes the design more efficient. On the other hand, you are safe to create all possible combinations, if the attributes have a fixed set of values, like Boolean, or codes from a known finite set, because you can be sure you’ve created a dimension row for every combination that the data set might have. If there are free form string columns, then you need to make sure your ETL is able to generate new dimension rows and surrogate keys as new combinations are created in the source system. As I mentioned earlier, this dimension is also called as a ”Junk Dimension” or a “Mystery Dimension”, because it houses all those junk / mystery transactional fields that are neither dimensions nor facts. I hope you would find this new dimension table concept helpful in taking care of the outlier attributes in your data model… Happy Modeling!

What’s in a name? A lot if you are talking databases….

Database Design – Naming Conventions


What's in a Name By Shakespear

Well, that’s a pretty popular line, however, not most Data Architects would agree with Shakespeare on that!

Talking about “Name”, here is an incident that I remember. One day, a new guy in my team created a table and called it “I_WIN_YOU_LOSE”, seriously he did that! And so were the names of the fields in that table. At first sight it was funny, as all the queries running against that table were pretty creative. You can imagine some innovative queries on such a table.

Although all of that earned him the reputation of a “Cool Dude”, it was not possible to follow his style in our Data Warehouse. So, I had to be the geek & explain the team the way we name database objects here…

An observation that I made based on this experience was that when we lay out standards for everyone the progress is rapid. The computer industry has experienced the value of standardization for long. When we standardized on the formatting of a disk drive we got drives that could be read by both Unix and Windows systems, CDs, DVDs, Flash Drives, all have a standard format for information exchange. When we standardized on the TCP/IP protocol we are able to connect all the computers to each other and the internet was born! When we standardized on the HTML protocol the world wide web was born. I could give you many more examples but you get the point. So why can’t we humans follow simple naming conventions? Particularly when it comes to naming table names and column names we suddenly get creative. Its almost like we take for granted an authority to be different.

When it comes to data modeling, especially in a multi-tier, team based, fast growing environment, the “Name” of an object really becomes crucial as it defines the objects. Naming objects becomes much more than just tagging a word to a face. Naming becomes complex when different people have different meanings for the same name and also have different names with the same meaning. Everyone has their own style that comes with the personality, personal preference as well as past experiences, like the one in my earlier story.

Let’s try to understand the principles behind naming conventions with specific industry standards & examples. I am going to make this more generic & not specifically to our environment.

Principles of Naming Conventions:

By combining the words of names in a specific way, standardized data component names are created.  The rules will vary for each organization, but the basic principles for developing rule sets are constant.

There are three kinds of rules that form a complete naming convention:

  • Semantic rules are based on the meaning of the words used to describe
  • Syntax rules prescribe the arrangement of words within a name
  • Lexical rules concern the language-related aspects of names

 I.     Semantic Rules:

These are rules based on the meaning of the words used to name data components.

  • Subjects: entity or subject terms are based on the names of data objects or subjects that are found in data models (entities) or object models (object classes).
  • Modifiers: can be that subject’s properties or qualifiers that are used interchangeably when naming data objects.
  • Class Words: describe the type of data that a column or attribute contains.  This is a classification of the type of data, or domain.

II.     Syntax Rules:

These rules specify the arrangement of name components. Examples of Syntax Rules are:

  • • The subject or object term occupies the leftmost position in the name, unless it is used as a modifier to another subject.
  • • Modifier terms follow the subject. The order of the qualifiers in a name is used to make the name complete and clear to the intended audience. Use subject, property and/or qualifier terms as needed.
  • • For columns and attributes, the last term should be the class word at the rightmost position.

III.     Lexical Rules

These rules determine the standard look of names.

Examples of Lexical Rules are:

  • • Nouns are used in singular form
  • • Verbs are always in the present tense
  • • No special characters are allowed
  • • All words are separated by underscores
  • • All words are in upper case
  • • Listed / approved abbreviations and acronyms

Industry Standards:

 I.     Definitions & Common Rules

Entity or Table

An entity is the representation of a distinguishable person, place, thing, concept, event or state that has characteristics, properties and relationships.  A table is a physical collection of data about a person, place, thing, concept, event or state.  A table may correspond with an entity.

Attribute or Column

A column or attribute contains a specific detail about an entity or table.  A column or attribute should not contain multiple values such as arrays or concatenated values.

Common Rules

  • • The name of an entity / table or attribute / column should enable its audience to identify and locate it within its context. Therefore, each entity / table or attribute / column name must be unique within its context (an entity within its model, a table within a database schema, an attribute within an entity or a column within a table).
  • • The name of an entity / table or attribute / column should be a declaration of the classification of the data it contains or will contain and therefore it should be a noun or noun phrase in the singular form and should follow a classification declarative format.
  • • The name of an entity / table or attribute / column should enable designers, developers and business personnel to effectively know what is in it or what to place in it.  It should describe its content (what it is), rather than how it is used, processed, populated or decoded.

II.     Formats & Examples of Naming Conventions

Entity or table name should be a noun phrase constructed with the following format:

Subject Modifier

Attribute or column names should be a noun phrase constructed following the format: 

Subject Modifier Class

Subject indicates the class of information that the entity or table describes; it provides the proper naming context for the modifier.  Subjects are nouns that name things.

Subjects may be composed of several terms or words.

Examples: EmployeePurchase OrderItem etc.

Modifier is an optional component of the entity or table name that further qualifies the name.  The modifier is one or more properties and one or more qualifiers.

Examples: Project InstallmentEmployee Contact etc.

Class or class word classifies the type of information being represented by the column or attribute.

Examples: Employee NumberGL Account NumberPurchase Order Status Code

All of this might sound boring to all the Shakespeare types out there, but I know for sure that all the Data Architects would agree & appreciate these naming conventions!

After all this makes our lives easier & organized and data interchangeable. I will follow this up with a post on data types and data quality rules that change data into information. How to populate missing values? How to check for ranges of values for each field? How to test data quality on the dimensions of completeness, accuracy, consistency, timeliness, etc.



Segmentation: An Important Marketing Strategy

In the current hyper-competitive business environment, understanding customers’ changing tastes and purchasing behavior is extremely important! It is clearly evident that companies that do not track changing consumer wants and needs have seen a decline in their fortune. KODAK is just one but a very good example to underline this point. Failure of understanding and predicting consumers’ buying trends towards more technologically advanced digital  cameras lead to the bankruptcy and demise of the company.

Most companies today make tremendous effort to track consumer behavior. Meticulous recording and utilization of consumer digital body language serves as the keystone of the process. The advent of the internet and mobile technologies has changed the consumer behavior – They search for things on the net, educate the pros and cons of the product, compare prices and evaluate vendors for timeliness and customer service. They actively seek coupons and collect feedback from friends on the social networks.

In doing so the consumer leaves a trail of breadcrumbs all over the internet. Astute businesses collect, analyze and utilize the digital body language of the customer to improve their marketing activities. There are several ways of collecting and analyzing the customer data – web crawling, browser based java scripts, server log mining, geo location tracking in malls and stores. Most companies use a data warehouse to store the data and analyze it to find out the customer purchasing behavioral patterns. Effective utilization of this information can lead to revenue generating promotional activities for the company.

All promotional marketing activities are expensive since the response rates are very low. You cannot send a letter or catalog to every customer in the country, as it will be cost prohibitive. To make these promotional activities cost effective, it is very important to segment the customers, so that the promotional plans can be directed towards selected and potentially profitable customers.

Segmentation is a process by which a large customer base is divided into small subsets of customers, having common needs and priorities. There are several conventional methods of creating segments such as: Geographical segmentation, Demographic Segmentation, Behavioral segmentation etc. However, more sophisticated ways, currently utilized in the market include various statistical techniques such as K-means clustering, hierarchical clustering etc. An interesting and upcoming technique of segmentation is “Micro segmentation”. It is utilized to understand individual customer behavior and to make personalized marketing offers.

When using these methods, companies should consider the ideal characteristics of the segments such as:

  • Segment should be measurable and profitable.
  • Stable across the time
  • And every consumer in the segment should be easily reachable.

We will focus our discussion on the segmentation method commonly utilized by most b2c companies. In retail industry, companies prominently focus on customers’ transactional behavior to create various segments. Recency, Frequency and Monetary (RFM) are the main indicators used for creating segments of the consumers. Few marketers utilize these individual indicators and while others combine the various segmentation techniques to implement their marketing plan.

Traditionally, more recently active customers are considered as the more important customers. Some marketers divide customers according to the recency of their purchase. e.g. customers who made a purchase during last 12 months, 13-24 months and 25+ months. Marketers use predictive modeling techniques such as linear regression, logistic regression to score their consumer base. Finally, consumers are segmented by their geography such as zip codes, households as well. These groups are then combined to make several mini segments.  These resultant segments are considered as potential consumers and are targeted for promotional efforts of the company.

The performance of these segmentation and promotional strategies is measured for each segment, against the metrics such as:

  • total revenue generated,
  • average revenue per customer,
  • cost of the promotional activity.

This report is then utilized as a feedback, to continually improve the marketing strategy. Segments evolve over time depending on the performance feedback, more recent trends, and various experimental testing (A/B testing).

It is worth considering those consumer segments that are not current and active buyers; they may represent potential defectors to competitors. Knowledge of their purchasing behavior and transactional data can be very helpful to target and attract them through selective marketing strategies.

Hence, marketing strategies conducted by utilizing selective customer segmentation can serve as a win–win situation for both, the company as well as consumer; as company can promote the right offer to right consumer at right time and consumer may eventually respond to the offer by making a purchase, while not getting bombarded with junk mail.

Express Analytics
8001 Irvine Center Drive Suite 400
Irvine, CA 92618