Tag Archives: analytical techniques

Sock Puppets

The issue of falsely claimed identity in all its myriad forms has shadowed the Internet since the beginning of the medium.  Anyone who has used an on-line dating or auction site is all too familiar with the problem; anyone can claim to be anyone.  Likewise, confidence games, on or off-line, involve a range of fraudulent conduct committed by professional con artists against unsuspecting victims. The victims can be organizations, but more commonly are individuals. Con artists have classically acted alone, but now, especially on the Internet, they usually group together in criminal organizations for increasingly complex criminal endeavors. Con artists are skilled marketers who can develop effective marketing strategies, which include a target audience and an appropriate marketing plan: crafting promotions, product, price, and place to lure their victims. Victimization is achieved when this marketing strategy is successful. And falsely claimed identities are always an integral component of such schemes, especially those carried out on-line.

Such marketing strategies generally involve a specific target market, which is usually made up of affinity groups consisting of individuals grouped around an objective, bond, or association like Facebook or LinkedIn Group users. Affinity groups may, therefore, include those associated through age, gender, religion, social status, geographic location, business or industry, hobbies or activities, or professional status. Perpetrators gain their victims’ trust by affiliating themselves with these groups.  Historically, various mediums of communication have been initially used to lure the victim. In most cases, today’s fraudulent schemes begin with an offer or invitation to connect through the Internet or social network, but the invitation can come by mail, telephone, newspapers and magazines, television, radio, or door-to-door channels.

Once the mark receives and accepts the offer to connect, some sort of response or acceptance is requested. The response will typically include (in the case of Facebook or LinkedIn) clicking on a link included in a fraudulent follow-up post to visit a specified web site or to call a toll-free number.

According to one of Facebook’s own annual reports, up to 11.2 percent of its accounts are fake. Considering the world’s largest social media company has 1.3 billion users, that means up to 140 million Facebook accounts are fraudulent; these users simply don’t exist. With 140 million inhabitants, the fake population of Facebook would be the tenth-largest country in the world. Just as Nielsen ratings on television sets determine different advertising rates for one television program versus another, on-line ad sales are determined by how many eyeballs a Web site or social media service can command.

Let’s say a shyster want 3,000 followers on Twitter to boost the credibility of her scheme? They can be hers for $5. Let’s say she wants 10,000 satisfied customers on Facebook for the same reason? No problem, she can buy them on several websites for around $1,500. A million new friends on Instagram can be had for only $3,700. Whether the con man wants favorites, likes, retweets, up votes, or page views, all are for sale on Web sites like Swenzy, Fiverr, and Craigslist. These fraudulent social media accounts can then be freely used to falsely endorse a product, service, or company, all for just a small fee. Most of the work of fake account set up is carried out in the developing world, in places such as India and Bangladesh, where actual humans may control the accounts. In other locales, such as Russia, Ukraine, and Romania, the entire process has been scripted by computer bots, programs that will carry out pre-encoded automated instructions, such as “click the Like button,” repeatedly, each time using a different fake persona.

Just as horror movie shape-shifters can physically transform themselves from one being into another, these modern screen shifters have their own magical powers, and organizations of men are eager to employ them, studying their techniques and deploying them against easy marks for massive profit. In fact, many of these clicks are done for the purposes of “click fraud.” Businesses pay companies such as Facebook and Google every time a potential customer clicks on one of the ubiquitous banner ads or links online, but organized crime groups have figured out how to game the system to drive profits their way via so-called ad networks, which capitalize on all those extra clicks.

Painfully aware of this, social media companies have attempted to cut back on the number of fake profiles. As a result, thousands and thousands of identities have disappeared over night among the followers of many well know celebrities and popular websites. If Facebook has 140 million fake profiles, there is no way they could have been created manually one by one. The process of creation is called sock puppetry and is a reference to the children’s toy puppet created when a hand is inserted into a sock to bring the sock to life. In the online world, organized crime groups create sock puppets by combining computer scripting, web automation, and social networks to create legions of online personas. This can be done easily and cheaply enough to allow those with deceptive intentions to create hundreds of thousands of fake online citizens. One only needs to consult a readily available on-line directory of the most common names in any country or region. Have a scripted bot merely pick a first name and a last name, then choose a date of birth and let the bot sign up for a free e-mail account. Next, scrape on-line photo sites such as Picasa, Instagram, Facebook, Google, and Flickr to choose an age-appropriate image to represent your new sock puppet.

Armed with an e-mail address, name, date of birth, and photograph, you sign up your fake persona for an account on Facebook, LinkedIn, Twitter, or Instagram. As a last step, you teach your puppets how to talk by scripting them to reach out and send friend requests, repost other people’s tweets, and randomly like things they see Online. Your bots can even communicate and cross-post with one another. Before the fraudster knows it, s/he has thousands of sock puppets at his disposal for use as he sees fit. It is these armies of sock puppets that criminals use as key constituents in their phishing attacks, to fake on-line reviews, to trick users into downloading spyware, and to commit a wide variety of financial frauds, all based on misplaced and falsely claimed identity.

The fraudster’s environment has changed and is changing over time, from a face-to-face physical encounter to an anonymous on-line encounter in the comfort of the victim’s own home. While some consumers are unaware that a weapon is virtually right in front of them, others are victims who struggle with the balance of the many wonderful benefits offered by advanced technology and the painful effects of its consequences. The goal of law enforcement has not changed over the years; to block the roads and close the loopholes of perpetrators even as perpetrators continue to strive to find yet another avenue to commit fraud in an environment in which they can thrive. Today, the challenge for CFEs, law enforcement and government officials is to stay on the cutting edge of technology, which requires access to constantly updated resources and communication between organizations; the ability to gather information; and the capacity to identify and analyze trends, institute effective policies, and detect and deter fraud through restitution and prevention measures.

Now is the time for CFEs and other assurance professionals to continuously reevaluate all we for take for granted in the modern technical world and to increasingly question our ever growing dependence on the whole range of ubiquitous machines whose potential to facilitate fraud so few of our clients and the general public understand.

Where the Money Is

bank-robberyOne of the followers of our Central Virginia Chapter’s group on LinkedIn is a bank auditor heavily engaged in his organization’s analytics based fraud control program.  He was kind enough to share some of his thoughts regarding his organization’s sophisticated anti-fraud data modelling program as material for this blog post.

Our LinkedIn connection reports that, in his opinion, getting fraud data accurately captured, categorized, and stored is the first, vitally important challenge to using data-driven technology to combat fraud losses. This might seem relatively easy to those not directly involved in the process but, experience quickly reveals that having fraud related data stored reliably over a long period of time and in a readily accessible format represents a significant challenge requiring a systematic approach at all levels of any organization serious about the effective application of analytically supported fraud management. The idea of any single piece of data being of potential importance to addressing a problem is a relatively new concept in the history of banking and of most other types of financial enterprises.

Accumulating accurate data starts with an overall vision of how the multiple steps in the process connect to affect the outcome. It’s important for every member of the fraud control team to understand how important each process pre-defined step is in capturing the information correctly — from the person who is responsible for risk management in the organization to the people who run the fraud analytics program to the person who designs the data layout to the person who enters the data. Even a customer service analyst or a fraud analyst not marking a certain type of transaction correctly as fraud can have an on-going impact on developing an accurate fraud control system. It really helps to establish rigorous processes of data entry on the front end and to explain to all players exactly why those specific processes are in place. Process without communication and communication without process both are unlikely to produce desirable results. In order to understand the importance of recording fraud information correctly, it’s important for management to communicate to all some general understanding about how a data-driven detection system (whether it’s based on simple rules or on sophisticated models) is developed.

Our connection goes on to say that even after an organization has implemented a fraud detection system that is based on sophisticated techniques and that can execute effectively in real time, it’s important for the operational staff to use the output recommendations of the system effectively. There are three ways that fraud management can improve results within even a highly sophisticated system like that of our LinkedIn connection.

The first strategy is never to allow operational staff to second-guess a sophisticated model at will. Very often, a model score of 900 (let’s say this is an indicator of very high fraud risk), when combined with some decision keys and sometimes on its own, can perform extremely well as a fraud predictor. It’s good practice to use the scores at this high risk range generated by a tested model as is and not allow individual analysts to adjust it further. This policy will have to be completely understood and controlled at the operational level. Using a well-developed fraud score as is without watering it down is one of the most important operational strategies for the long term success of any model. Application of this rule also makes it simpler to identify instances of model scoring failure by rendering them free of any subsequent analyst adjustments.

Second, fraud analysts will have to be trained to use the scores and the reason codes (reason codes explain why the score is indicative of risk) effectively in operations. Typically, this is done by writing some rules in operations that incorporate the scores and reason codes as decision keys. In the fraud management world, these rules are generally referred to as strategies. It’s extremely important to ensure strategies are applied uniformly by all fraud analysts. It’s also essential to closely monitor how the fraud analysts are operating using the scores and strategies.

Third, it’s very important to train the analysts to mark transactions that are confirmed or reported to be fraudulent by the organization’s customers accurately in their data store.

All three of these strategies may seem very straight forward to accomplish, but in practical terms, they are not that easy without a lot of planning, time, and energy. A superior fraud detection system can be rendered almost useless if it is not used correctly. It is extremely important to allow the right level of employee to exercise the right level of judgment.  Again, individual fraud analysts should not be allowed to second-guess the efficacy of a fraud score that is the result of a sophisticated model. Similarly, planners of operations should take into account all practical limitations while coming up with fraud strategies (fraud scenarios). Ensuring that all of this gets done the right way with the right emphasis ultimately leads the organization to good, effective fraud management.

At the heart of any fraud detection system is a rule or a model that attempts to detect a behavior that has been observed repeatedly in various frequencies in the past and classifies it as fraud or non-fraud with a certain rank ordering. We would like to figure out this behavior scenario in advance and stop it in its tracks. What we observe from historical data and our experience needs be converted to some sort of a rule that can be systematically applied to the data real-time in the future. We expect that these rules or models will improve our chance of detecting aberrations in behavior and help us distinguish between genuine customers and fraudsters in a timely manner. The goal is to stop the bleeding of cash from the account and to accomplish that as close to the start of the fraud episode as we can. If banks can accurately identify early indicators of on-going fraud, significant losses can be avoided.

In statistical terms, what we define as a fraud scenario would be the dependent variable or the variable we are trying to predict (or detect) using a model. We would try to use a few independent variables (as many of the variables used in the model tend to have some dependency on each other in real life) to detect fraud. Fundamentally, at this stage we are trying to model the fraud scenario using these independent variables. Typically, a model attempts to detect fraud as opposed to predict fraud. We are not trying to say that fraud is likely to happen on this entity in the future; rather, we are trying to determine whether fraud is likely happening at the present moment, and the goal of the fraud model is to identify this as close to the time that the fraud starts as possible.

In credit risk management, we try to predict if there will likely be serious delinquency or default risk in the future, based on the behavior exhibited in the entity today. With respect to detecting fraud, during the model-building process, not having accurate fraud data is akin to not knowing what the target is in a shooting range. If a model or rule is built on data that is only 75 percent accurate, it is going to cause the model’s accuracy and effectiveness to be suspect as well. There are two sides to this problem.  Suppose we mark 25 percent of the fraudulent transactions inaccurately as non-fraud or good transactions. Not only are we missing out on learning from a significant portion of fraudulent behavior, by misclassifying it as non-fraud, the misclassification leads to the model assuming the behavior is actually good behavior. Hence, misclassification of data affects both sides of the equation. Accurate fraud data is fundamental to addressing the fraud problem effectively.

So, in summary, collecting accurate fraud data is not the responsibility of just one set of people in any organization. The entire mind-set of the organization should be geared around collecting, preserving, and using this valuable resource effectively. Interestingly, our LinkedIn connection concludes, the fraud data challenges faced by a number of other industries are very similar to those faced by financial institutions such as his own. Banks are probably further along in fraud management and can provide a number of pointers to other industries, but fundamentally, the problem is the same everywhere. Hence, a number of techniques he details in this post are applicable to a number of industries, even though most of his experience is bank based. As fraud examiners and forensic accountants, we will no doubt witness the impact of the application of analytically based fraud risk management by an ever multiplying number of client industrial types.

Fraud Reports as Road Maps to Future Fraud & Loss Prevention

portfolio-3There are a number of good reasons why fraud examiners should work hard at including inclusive, well written descriptions of fraud scenarios in their reports;  some of these reasons are obvious and some less so.  A well written fraud report, like little else, can put dry controls in the context of real life situations that client managers can comprehend no matter what their level of actual experience with fraud.  It’s been my experience that well written reports, in plain business language, free from descriptions of arcane control structures, and supported by hard hitting scenario analysis can help spark anti-fraud conversations throughout the whole of a firm’s upper management.   A well written report can be a vital tool in transforming that discussion from, for example, relatively abstract talk about the need for an identity management system to a more concrete and useful one dealing with the report’s description of how the theft of vital business data has actually proven to benefit a competitor.

Well written, comprehensive fraud reports can make fraud scenarios real by concretely demonstrating the actual value of the fraud prevention effort to enterprise management and the Board.  They can also graphically help set the boundaries for the expectations of what management will expect the prevention function to do in the future if this, or similar scenarios, actually re-occur.   The written presentation of the principal fraud or loss scenario treated in the report necessarily involves consideration of the vital controls in place to prevent its re-occurrence which then allows for the related presentation of a qualitative assessment of the present effectiveness of the controls themselves.   A well written report thus helps everyone understand how all the control failures related to the fraud interacted and reinforced each other; it’s, therefore,  only natural that the fraud examiner or analyst recommend that the report’s intelligence be channeled for use in the enterprise’s fraud and loss prevention program.

Strong fraud report writing has much in common with good story telling.  A narrative is shaped explaining a sequence of events that, in this case, has led to an adverse outcome.  Although sometimes industry or organization specific, the details of the specific fraud’s unfolding always contains elements of the unique and can sometimes be quite challenging for the examiner even to narrate.   The narrator/examiner should especially strive to clearly identify the negative outcomes of the fraud for the organization for those outcomes can be many and related.  Each outcome should be explicitly explicated and its impact clearly enumerated in non-technical language.

But to be most useful as a future fraud prevention tool the examiner’s report needs to make it clear that controls  work as separate lines of defense,  at times in a sequential way, and at other times interacting with each other to help prevent the occurrence of the adverse event.  The report should attempt to demonstrate in plain language how this structure broke down in the current instance and demonstrate the implications for the enterprise’s future fraud prevention efforts.  Often, the report might explain, how the correct operation of just one control may provide adequate protection or mitigation.  If the controls operate independently of each other, as they often do, the combined probability of all of them failing simultaneously tends to be significantly lower than the probability of failure of any one of them.  These are the kinds of realities with the power to significantly and positively shape the fraud prevention program for the better and, hence, should never be buried in individual reports but used collectively, across reports, to form a true combined resource for the management of the prevention program.

The final report should talk about the likelihood of the principal scenario being repeated given the present state of preventative controls; this is often best-estimated during discussions with client management, if appropriate. What client management will truly be interested in is the probability of recurrence, but the question is actually better framed in terms of the likelihood over a long (extended) period of time.  This question is best answered by involved managers, in particular with the loss prevention manager.  If the answer is that this particular fraud risk might materialize again once every 10 years, the probability of its annual occurrence is a sobering 10 percent.

As with frequency estimation, to be of most on-going help in guiding the fraud prevention program, individual fraud reports should attempt to estimate the severity of each scenario’s occurrence.  Is it the worst case loss, or the most likely or median loss?  In some cases, the absolute worst case may not be knowable, or may mean something as disastrous as the end-of-game for the organization.  Any descriptive fraud scenario presented in a fraud report should cover the range of identified losses associated with the case at hand (including any collateral losses the business is likely to face).  Documented control failures should always be clearly associated with the losses.  Under broad categories, such as process and workflow errors, information leakage events, business continuity events and external attacks, there might have to be a number of developed, narrative scenarios to address the full complexity of the individual case.

Fraud reports, especially for large organizations for which the risk of fraud must always remain a constant preoccupation, can be used to extend and refine their fraud prevention programs.  Using the documented results of the fraud reporting process, report data can be converted to estimates of losses at different confidence intervals and fed to the fraud prevention program’s estimated distributions for frequency and severity. The bottom line is that organizations of all sizes shouldn’t just shelve their fraud reports but use them as vital input tools to build and maintain the fraud risk assessment ongoing process for ultimate inclusion in the enterprise’s loss prevention and fraud prevention programs.

Ancient Analytics


CPU5Our Chapter, along with our partners the Virginia State Police and national ACFE, will be hosting a two day seminar starting April 8th entitled, ‘Hands on Analytics – Using Data Analytics to Identify Fraud’ at the VASP Training Academy here in Richmond, Virginia.  Our presenter will be one of the ACFE’s best, the renowned fraud examiner Bethmara Kessler, Chief Audit Officer of the Campbell Soup Company.  The science of analytics has come a long way in its evolution into the effective tool we know and all make such good use of today.  I can remember being hired fresh out of graduate school at the University of Chicago by a Virginia bank (long since vanished into the mists of time) to do market and operations research in the early 1970’s.

The bank had just started accumulating operational and branch related data for use with a fairly primitive IBM mainframe relational database; simple as that first application was, it was like a new day had dawned!  The bank’s holding company was expanding rapidly, buying up correspondent banks all over the state so, as you can imagine, we were hungry for all sorts of actionable market and financial information.  In those early days, in the almost total absence of any real information, when data about the holding company was first being accumulated and some initial reports run, it felt like lighting a candle in a dark room!  At first blush, the information seemed very useful and numbers-savvy people poured over the reports identifying how some of the quantities (variables) in the reports varied in relation to others.  As we all know now, based on a wider and more informed experience, there’s sometimes a direct correlation between fields and sometimes there’s an implied correlation. When our marketing and financial analysts began to see these correlations, relating the numbers to their own experiences in branch bank location and in lending risk management for example, it was natural for them to write up some rules to manage vulnerable areas like branch operations and fraud risk.  With regard to fraud control, the data based rules worked great for a while but since they were only rules, fraudsters quickly proved surprisingly effective at figuring out exactly what sort of fraud the rules were designed to stop.  If the rule cutoff was $300 for a cash withdrawal, we found that fraudsters soon experimented with various amounts and determined that withdrawing $280 was a safe option.  The bank’s experts saw this and started designing rules to prevent a whole range of specific scenarios but it quickly became a losing game for the bank since fraudsters only got craftier and craftier.

Linear regression models were first put forward to address this incessant back and forth issue of rule definition and fraudster response as database software became more adept at handling larger amounts of data effectively and so enough data could be analyzed to begin to identify persistent patterns.  The linear regression model assumed that the relationships between the predictors used in the model and the fraud target were linear and so the algorithm tries to fit a linear model to detect fraud by identifying outliers from the basic fit of the regression line.   The regression models proved better than the rule based approach since they could systematically look at all the bank’s credit card data, for instance, and so could draw more effective conclusions about what was actually going on than the rules ever could.

As we at the bank found in the early days of attempted analytics based fraud detection, when operating managers get together and devise fraud identification rules, they generally do slightly better than random chance in identifying cases of actual fraud; this is because, no matter how good and well formulated the rules are, they can’t cover the entire universe of possible transactions.  We can only give anti-fraud coverage to the portion of transactions addressed by the rules.  When the bank built a linear model employing algorithms comparing actual past experience with present actual experience the analysis experienced the advantage of covering the entire set of transactions and classifying them as either fraudulent or good.   Fraud identification improved considerably above chance.

It’s emerged over the years that a big drawback with using linear regression models to identify fraud is that, although there are many cases in which the underlying risk is truly linear, there are more where it’s non-linear; where both the target (fraud) and independent variables are non-continuous.  While there are many problems where a 90% solution is good enough, fraud is not one of them.  This is where such non-linear techniques, like the neural networks Bethmara Kessler will be discussing, come in.  Neural networks were originally developed to model the functioning of the brain; their statistical properties also make them an excellent fit for addressing many risk related problems.

As our April seminar will demonstrate, there are generally two lines of thought regarding the building of models to perform fraud analytics.  One is that techniques don’t matter that much; what matters is the data itself and how much of it and its variety the fraud analyst can get; the more data, the better the analysis.  The other line of thought holds that, whereas, more data is always good, techniques do matter.  There are many well documented fraud investigation situations in which improving the sophistication of the techniques has yielded truly amazing results.

All of these issues and more will be covered in our Chapter’s April seminar.  I hope all of you can join us!

Did the Auditors Fail? – Analytical Techniques to Determine Due Diligence

foster-HeartbleedFraud examiners are increasingly finding themselves in situations where they are asked to investigate financial frauds identified subsequent to the performance of single or even multiple external and internal audits.  In such situations fraud examiners and forensic accountants do well to consider the application of a number of analytical techniques to identify exactly which organizational control assertions (and/or the auditor’s examination of them), broke down; such assertions as existence, rights and obligations, valuation, completeness, occurrence, measurement and presentation, either alone or in combination.   One particularly powerful analytical technique, among several,  available to the fraud examiner in analyzing the performance of the parties to such a situation is Benford’s Law.

Benford’s Law is named after Frank Benford, a research physicist at the General Electric Research Laboratories in Schenectady, New York who, in 1938 and the years immediately following, performed a detailed study of numbers and found that certain numbers and number combinations appeared more frequently than others; Benford’s law predicts the digit patterns in naturally occurring sets of data.  He then tested the assumption that numbers ordered from smallest to largest would form a geometric sequence by using integral calculus to formulate the expected digit frequencies in lists of such numbers.   The significance of this for fraud examiners is that, after analyzing more than 20,000 pieces of data, he found that the chance of the first digit in the data (say,  a list of the dollar amount of invoiced transactions, for example) being “1” is not one in nine, but rather one in three, or 30 percent.  The chance of the first number in the string being “2” is only 17 percent, and the probabilities of successive numbers being the first digit decline until reaching “9”, which has only a five percent chance of appearance.  Benford found that in arrays of numbers, the digit “1” occurred more than any other number did.  The same type of predictable patterns were found to hold for the predicted frequencies of digits in the second, third and fourth positions, given the occurrence of all the differing initial digits (1 through 9).  Over the intervening years, further investigation and extension of Benford’s original work has made it possible to detect potentially fraudulent numbers in large data files by comparing the frequency of occurrence of initial digits in a list of financially related numbers to those anticipated by Benford’s findings.  For example, when a fraudster invents numbers in connection with the perpetration of a fraudulent scheme, s/he will tend to fake data containing too many instances of the initial digits 7, 8, and 9 and too few of 1, 2, and 3.

As a case in point, let’s say you have been engaged to determine whether or not external auditors performed due diligence in evaluating the client’s control assertions in a case where the existence of an existing fraud, contemporaneous with the auditor’s examination, was clearly not identified.  If the auditor was lacking diligence, damages are due to the plaintiff; on the other side, it may be the case that the auditor actually performed due diligence.  In the center portion of the spectrum is a grey area where due diligence is questionable.  Clearly, in this case, the plaintiff organization has an obvious monetary incentive to claim a lack of due diligence, while the defendants (the auditors) have a clear motive to claim the opposite.

The advantage the fraud examiner has in applying analytical tests to the facts of this particular hypothetical case is that both the auditor’s work papers and the client’s financial records as they were at the time of the audit are available for her analysis.  Analytical manipulations  like those represented by the most successful Benford’s Law tests (like the coincidence of same invoice number, same dollar amount and different vendor numbers) can identify clearly abnormal patterns in the  defendant’s analysis of the   plaintiff’s assertions that can either confirm or disprove the plaintiff’s claim for damages as a consequence of the defendant’s non-performance of due diligence.  This type of testing by the fraud examiner of the plaintiff’s data and auditor’s related work papers tends to reduce the grey area of doubt in either the plaintiff’s or defendant’s favor by applying techniques that juries will find relatively easy to understand while representing a credible source of evidence based on the relevant actual data available from both litigants.  Use of even the most basic digital tests of these available data can impress a jury and influence an appropriate due diligence decision one way or the other.   Thus, if the engaged fraud examiner acting for the plaintiffs can demonstrate that tests exist that, if applied, could have detected the fraud but that the tests were not applied, the auditors can expect to lose the case and face damage awards.   The point is that fraud examiners can run digital tests on all sorts of permutations, combinations, and subsets of the relevant data until the fraudulent transactions stand out as clearly significant deviations and in this way either support or refute the plaintiff’s claim of a lack of auditor due diligence.

To summarize, a jury might be more likely to accept the fraud examiner’s arguments for or refutation of the claims for due diligence, if those arguments are supported by concrete examples drawn from the actual data at play by her application of Benford and other related analytical tools to analyze the relevant data and organize her findings.