Traditional Marketing Research

Traditional marketing research relies primarily on small surveys of individual consumers sampled from a national universe and its use is generally restricted to more or less strategic tasks such as:

identifying the characteristics of consumers most likely to purchase the goods or services offered,

describing the most important consumer needs which the product or service must satisfy, and

in the light of the above information, specifying the most effective of alternate messages which communicate product benefits to these consumers and encourage them to buy.

However, it is impossible to obtain information about each individual consumer in the nation equivalent to the responses to product-oriented and attitudinal questions on a traditional marketing sample survey.  Yet such information would be necessary to apply these findings directly in tactical marketing actions and, consequently, to evaluate the results of such actions.

While traditional research can tell the marketer who the highest potential buyer is in general, it cannot directly specify where to find such buyers out of the many millions scattered over the nation.

Geodemographic Marketing Research

Geodemographic marketing research reaches beyond traditional research to adapt and utilize the extensive and detailed data resources which describe thousands of characteristics of consumers aggregated into summary statistics for all of the neighborhoods or small communities in which they reside across the United States.

Geodemographic context areas  include small postal delivery zones such as five-digit and plus-four  ZIP Codes or  Carrier Routes as well as small census geounits such as tracts, block-groups or blocks.  The United States is "tiled" by literally millions of these neighborhood sized geounits.

The geodemographic approach not only supplies a good deal of information not collected by traditional market surveys, but also supports tactical efforts to sell products and services to consumers by:

sizing the segments of consumers with a high likelihood of buying in some or all small areas across the nation in order to determine share of total market and to identify growth opportunities by contrasting actual to potential sales,

locating concentrations of high potential prospects or "qualified" consumers geographically to optimize placement of retail outlets and to focus print, broadcast and other advertising efforts on the most responsive audiences, and

selecting high potential prospective consumers for direct promotion by mailing or contacting only those who reside in the most promising small area geographic contexts.

Supplementing Traditional with Geodemographic Marketing Research

Traditional market surveys, however, do gather data about consumers' attitudes, values, temperaments, media exposure, and product purchases and preferences which are not directly observed in the "secondary" data describing small area contexts (such as the U. S. Census of Population and Housing).  Nevertheless,  a sample may be linked to the geodemographic database by simply using the address of each survey respondent to add a code to that respondent's record specifying the small geographic area containing his/her address ("geocoding").  Data qualifying or describing the respondent's neighborhood ("proxy variables") can then be appended to his/her record and models built which estimate the  incidence of specific survey items through their statistical relationship to small area context items.

Thus, promotions can avail themselves of the best of both worlds, using the guidance of survey data to create the best selling messages for given segments of consumers and using the results of geodemographic models to locate those segments geographically in order to concentrate marketing efforts, save money and increase sales overall.

In the absence of survey data, however, geodemographic research can still directly solve the problem of targeting high potential prospects by aggregating into geounit totals sales data from geocoded in-house customer files or records of mail promotions and responses.  After linking such summaries to the geodemographic database, models are easily built which can estimate potential sales or response for every small geounit in the United States and rank order them all on their value as market targets.

Geodemographic Targeting Models

Geodemographic marketing research is performed by building targeting models which use census and other aggregate data at small area levels to help estimate local sales potential and rank order selling opportunities.

There are two classes of geodemographic targeting models:

1)  generic models - models which observe the differences in sales or survey data when tabulated by a system of area classifications based on the primary socioeconomic variables underlying human settlement patterns. Such "cluster" models are strongly applicable to a wide variety of product and service categories whose consumption is predictable by socioeconomic factors.

2)  custom models - models which solve the problem of estimating sales as a function of an optimum mixture of demographic, economic and cultural small area context meaures determined in a dedicated manner, that is, building a formulation especially for a particular category of goods or services or for specific segments of the consuming universe which are not well described by standard data sources.  They are necessary when simple socioeconomic variables determining the structure of an available generic model do not adequately predict the sales variable in question.

Generic Geodemographic Models

A generic geodemographic targeting model like Claritas Corporation's PRIZM (first developed in 1974 by Claritas' founder, Jonathan Robbin) classifies neighborhoods into clusters or types which contain places most alike over a large number of independent residential geodemographic characteristics, such as area measures of socioeconomic status, urbanization or predominant family life cycles and ethnic groups.  PRIZM is described in a book by Michael J. Weiss, The Clustering of America, published by Harper and Row in 1988.

When sales or survey data with address information on individual records are profiled (tabulated) by such clusters, significantly different levels of purchase or propensity to purchase are inevitably observed in specific clusters.  A sales potential index reflecting these differences can be developed for each cluster and selling efforts then concentrated by targeting prospects in geographic areas identified as belonging to the highest ranking clusters.

The generality of the system also allows linkage to other cluster-profiled data sources such as surveys of readers of newspapers and magazines or audiences of broadcast media.  This linkage allows targeting of the most appropriate advertising vehicles.  In addition, selections by cluster may be made from many lists of mail order purchasers or other promotional resources which contain postal codes or have been "geocoded" (census geographic area codes added to the address field).  Inexpensive large compiled lists with historical returns below margin are often made to perform acceptably through cluster selection.

Custom Geodemographic Models

The purpose of a custom geodemographic marketing model is to formulate a function of observed geounit data (made up of "independent variables") which estimates aggregate consumer behavior (called the "dependent variable") more accurately than generic models for a given consuming or sales target.

Examples of individual consumers are:  single mail order households, retail customers, or specific business establishments.

Examples of aggregate consuming targets are summary characteristics of individual consumers collected into small area units such as: census tracts and block-groups, five and nine digit ZIP Codes, carrier routes, retail store or shopping center trading areas, or business to business sales territories.

The dependent variables observed for individual consuming units are derived from customer accounts, store records, mail campaign promotion and response databases and other unit record files.  Address information on these records is used to summarize them into small area totals for the purpose of expressing sales data as proportions of their small area universes (percent penetration).  Such data usually show the percent of sales prospects in the small area who have purchased goods or services by the amount, frequency and recency of purchase, often qualified by the advertising source or "key" (mailing list or "select", date of issue of print media, broadcast date and day part, specific offer and "package" or ad format used).

The independent variables in custom models consist of "context" data, or data which show aggregate characteristics of small areas, such as: population and housing census data, business patterns, or other economic, cultural demographic or geophysical summaries describing the area containing a unit record.

   Examples of independent varianbles:

% of persons over age 65 and not in the labor force with unearned incomes > $75,000,

% persons employed in bituminous coal mining establishments,

Average annual sunshine,

% unmarried males under age 25 who are residents of multiple unit dwellings,

Cost of living relative to other places,

% of married females in labor force who are unemployed,

% of households that are two-earner families without children with combined income over $150,000,

% registered voters who voted for the Republican Presidential Candidate,

Infant Mortality Rate,

% of households subscribers to the Atlantic Monthly,

Pounds of nitrates deposited per acre,

Average consumer expenditures on children's shoes,

% of furniture and home furnishings stores with sales over $1,000,000 per year.

Custom models may require special tabulations of census public use microdata samples or Current Population Survey respondents in order to create highly specific independent variables such as some of those illustrated above. Generic models traditionally have relied completely on census summary files that supply standard cross-tabulations of greatly limited detail.  To escape these restrictions, however, special tabulations may be performed by systems such as PDQ Explore (Public Data Queries, Inc.) for relatively large areas such as "PUMA's" (Public Use Microdata Areas). Then, estimates may be made of these counts for small areas such as block-groups or ZIP Codes using statistical techniques such as iterational proportional fitting which preserve both latent correlations and marginal totals. This methodology can be employed to provide updates to the current period as well as the additional predictive power of highly specified variables.

Custom models strive to "specify" or include as many of these independent variables as possible to provide a full and accurate prediction of the dependent variable without "overfitting."  Factor analysis may be used to isolate a few independent and meaningful dimensions latent in thousands of redundant data items.

Sample survey or poll questionnaires with respondent address information attached provide excellent sets of data for identifying benefit segments which more accurately qualify the dependent variable (propensity to buy) than unqualified aggregate sales statistics.  Surveys can also specify non-standard independent variables which can be estimated as described above for small areas.  Both fixed-content and non-standard independent context or aggregate variables can be appended to survey records as custom model inputs serving to predict segmented consumers' predisposition to buy.  If unbiased sampling procedures were used in collecting the surveys, such custom models can be accurately evaluated at small area levels, and directly used to make targeting decisions.  This approach is particularly valuable for use in launching new or previously unpromoted products and services.

In selecting the most powerful predictors of the dependent variable, statistical tests are run which qualify each independent variable available in terms of its explanatory power and relative usefulness, together with all other independent variables, in estimating the dependent variable.

These data are further examined by techniques such as hierarchical cluster analysis to identify subsets which may have different combinations of effective predictors.  If such are found, then sub-models are computed to accommodate them.  Often this approach is used to account for important local or regional attributes not seen by generic models.

Analyses are also performed which isolate exceptional values in both dependent and independent variables ("outliers") and these are trimmed or adjusted.  Missing values of potentially important variables, if there are not too many of them, are filled in or "imputed".

The "model" is literally an equation which describes the functional relationship of the dependent to the independent variables. In statistical terms, the model is "fitted" to the observed data.  The independent variables are weighted by a variety of techniques such that they produce a predicted or estimated dependent variable that is minimally different from the actual data.  These techniques are chosen to accommodate different types of observations (interval, nominal, ordinal). The weights (coefficients, constants or other terms) can be applied in different ways to various forms of the independent variables such as sums (simple linear regression), products, quotients, powers, log odds (negative logistic), or other transcendental and nonlinear functions. These weights are calculated such that the model error, or some function of the difference between the actual and the predicted dependent variable, variously termed the "loss" or residual value, is somehow minimized over the whole range of observations. The "loss function" can be the squared value of the residual (least squares) or its absolute value, a non-linear residual function or can maximize some other useful value such as a log-likelihood or maximum likelihood statistic.

The art of the custom modeler lies in formulating the model appropriately, specifying the independent variables meaningfully and completely and guarding against spurious results which can arise from correlated independent variables (multi-collinearity), sparse or abnormally distributed data which are not adequately fitted by the form of model chosen, underspecification (too few or irrelevant independent variables), overspecification (too many or idiosyncratic independent variables), prior sampling biases, over-aggregation and ecological fallacy. The process of modeling always includes a final check on the reproducibility of the results, estimating the dependent variable with data which were not used in developing the model (withheld) or with random halves of the original universe ("split-half" techniques).

The benefit of custom modeling lies in its high accuracy for a special application, segmentation criterion, vertical industry, new product, or ill-conditioned dependent variable.  Fundamentally, custom geodemographic modeling is a refinement of generic modeling without which difficulties or subtleties in the data would render the results weak or meaningless.

Well designed and executed custom models have proved in practice to produce results which conform well to the three essential requirements of good observational social science they are valid (predict the phenomenon they purport to predict); they are reliable (are free from distortion by technical problems and biases); and they are reproducible (hold up in repeated application over time, with any sample taken of the universe and with the same independent variables drawn from new or different universes).

A drawback in custom modeling is that the outcome may require additional expense to be linked to other data sources for media planning or similar applications.  A custom model also requires a high level of expertise to build and may be a more time consuming and expensive project than a "canned" generic model, although in some instances, the custom approach is "the only game in town."  In general, custom models explain 20% to 50% more of the variance in sales potential than generic models, i.e., are 20% to 50% more accurate predictors of sales and, hence, 20% to 50% more efficient and  cost-effective optimizers of marketing actions.

Return to Ricercar home page....