Ancient Analytics

CPU5Our Chapter, along with our partners the Virginia State Police and national ACFE, will be hosting a two day seminar starting April 8th entitled, ‘Hands on Analytics – Using Data Analytics to Identify Fraud’ at the VASP Training Academy here in Richmond, Virginia.  Our presenter will be one of the ACFE’s best, the renowned fraud examiner Bethmara Kessler, Chief Audit Officer of the Campbell Soup Company.  The science of analytics has come a long way in its evolution into the effective tool we know and all make such good use of today.  I can remember being hired fresh out of graduate school at the University of Chicago by a Virginia bank (long since vanished into the mists of time) to do market and operations research in the early 1970’s.

The bank had just started accumulating operational and branch related data for use with a fairly primitive IBM mainframe relational database; simple as that first application was, it was like a new day had dawned!  The bank’s holding company was expanding rapidly, buying up correspondent banks all over the state so, as you can imagine, we were hungry for all sorts of actionable market and financial information.  In those early days, in the almost total absence of any real information, when data about the holding company was first being accumulated and some initial reports run, it felt like lighting a candle in a dark room!  At first blush, the information seemed very useful and numbers-savvy people poured over the reports identifying how some of the quantities (variables) in the reports varied in relation to others.  As we all know now, based on a wider and more informed experience, there’s sometimes a direct correlation between fields and sometimes there’s an implied correlation. When our marketing and financial analysts began to see these correlations, relating the numbers to their own experiences in branch bank location and in lending risk management for example, it was natural for them to write up some rules to manage vulnerable areas like branch operations and fraud risk.  With regard to fraud control, the data based rules worked great for a while but since they were only rules, fraudsters quickly proved surprisingly effective at figuring out exactly what sort of fraud the rules were designed to stop.  If the rule cutoff was $300 for a cash withdrawal, we found that fraudsters soon experimented with various amounts and determined that withdrawing $280 was a safe option.  The bank’s experts saw this and started designing rules to prevent a whole range of specific scenarios but it quickly became a losing game for the bank since fraudsters only got craftier and craftier.

Linear regression models were first put forward to address this incessant back and forth issue of rule definition and fraudster response as database software became more adept at handling larger amounts of data effectively and so enough data could be analyzed to begin to identify persistent patterns.  The linear regression model assumed that the relationships between the predictors used in the model and the fraud target were linear and so the algorithm tries to fit a linear model to detect fraud by identifying outliers from the basic fit of the regression line.   The regression models proved better than the rule based approach since they could systematically look at all the bank’s credit card data, for instance, and so could draw more effective conclusions about what was actually going on than the rules ever could.

As we at the bank found in the early days of attempted analytics based fraud detection, when operating managers get together and devise fraud identification rules, they generally do slightly better than random chance in identifying cases of actual fraud; this is because, no matter how good and well formulated the rules are, they can’t cover the entire universe of possible transactions.  We can only give anti-fraud coverage to the portion of transactions addressed by the rules.  When the bank built a linear model employing algorithms comparing actual past experience with present actual experience the analysis experienced the advantage of covering the entire set of transactions and classifying them as either fraudulent or good.   Fraud identification improved considerably above chance.

It’s emerged over the years that a big drawback with using linear regression models to identify fraud is that, although there are many cases in which the underlying risk is truly linear, there are more where it’s non-linear; where both the target (fraud) and independent variables are non-continuous.  While there are many problems where a 90% solution is good enough, fraud is not one of them.  This is where such non-linear techniques, like the neural networks Bethmara Kessler will be discussing, come in.  Neural networks were originally developed to model the functioning of the brain; their statistical properties also make them an excellent fit for addressing many risk related problems.

As our April seminar will demonstrate, there are generally two lines of thought regarding the building of models to perform fraud analytics.  One is that techniques don’t matter that much; what matters is the data itself and how much of it and its variety the fraud analyst can get; the more data, the better the analysis.  The other line of thought holds that, whereas, more data is always good, techniques do matter.  There are many well documented fraud investigation situations in which improving the sophistication of the techniques has yielded truly amazing results.

All of these issues and more will be covered in our Chapter’s April seminar.  I hope all of you can join us!

Comments are closed.