Category Archives: Data Warehouse

Analytics Confronts the Normal

The Information Audit and Control Association (ISACA) tells us that we produce and store more data in a day now than mankind did altogether in the last 2,000 years. The data that is produced daily is estimated to be one exabyte, which is the computer storage equivalent of one quintillion bytes, which is the same as one million terabytes. Not too long ago, about 15 years, a terabyte of data was considered a huge amount of data; today the latest Swiss Army knife comes with a 1 terabyte flash drive.

When an interaction with a business is complete, the information from the interaction is only as good as the pieces of data that get captured during that interaction. A customer walks into a bank and withdraws cash. The transaction that just happened gets stored as a monetary withdrawal transaction with certain characteristics in the form of associated data. There might be information on the date and time when the withdrawal happened; there may be information on which customer made the withdrawal (if there are multiple customers who operate the same account). The amount of cash that was withdrawn, the account from which the money was extracted, the teller/ATM who facilitated the withdrawal, the balance on the account after the withdrawal, and so forth, are all typically recorded. But these are just a few of the data elements that can get captured in any withdrawal transaction. Just imagine all the different interactions possible on all the assorted products that a bank has to offer: checking accounts, savings accounts, credit cards, debit cards, mortgage loans, home equity lines of credit, brokerage, and so on. The data that gets captured during all these interactions goes through data-checking processes and gets stored somewhere internally or in the cloud.  The data that gets stored this way has been steadily growing over the past few decades, and, most importantly for fraud examiners, most of this data carries tons of information about the nuances of the individual customers’ normal behavior.

In addition to what the customer does, from the same data, by looking at a different dimension of the data, examiners can also understand what is normal for certain other related entities. For example, by looking at all the customer withdrawals at a single ARM, CFEs can gain a good understanding of what is normal for that particular ATM terminal.  Understanding the normal behavior of customers is very useful in detecting fraud since deviation from normal behavior is a such a primary indicator of fraud. Understanding non-fraud or normal behavior is not only important at the main account holder level but also at all the entity levels associated with that individual account. The same data presents completely different information when observed in the context of one entity versus another. In this sense, having all the data saved and then analyzed and understood is a key element in tackling the fraud threat to any organization.

Any systematic, numbers-based system of understanding of the phenomenon of fraud as a past occurring event is dependent on an accurate description of exactly what happened through the data stream that got accumulated before, during, and after the fraud scenario occurred. Allowing the data to speak is the key to the success of any model-based system. This data needs to be saved and interpreted very precisely for the examiner’s models to make sense. The first crucial step to building a model is to define, understand, and interpret fraud scenarios correctly. At first glance, this seems like a very easy problem to solve. In practical terms, it is a lot more complicated process than it seems.

The level of understanding of the fraud episode or scenario itself varies greatly among the different business processes involved with handling the various products and functions within an organization. Typically, fraud can have a significant impact on the bottom line of any organization. Looking at the level of specific information that is systematically stored and analyzed about fraud in financial institutions for example, one would arrive at the conclusion that such storage needs to be a lot more systematic and rigorous than it typically is today. There are several factors influencing this. Unlike some of the other types of risk involved in client organizations, fraud risk is a censored problem. For example, if we are looking at serious delinquency, bankruptcy, or charge-off risk in credit card portfolios, the actual dollars-at-risk quantity is very well understood. Based on past data, it is relatively straightforward to quantify precise credit dollars at risk by looking at how many customers defaulted on a loan or didn’t pay their monthly bill for three or more cycles or declared bankruptcy. Based on this, it is easy to quantify the amount at risk as far as credit risk goes. However, in fraud, it is virtually impossible to quantify the actual amount that would have gone out the door as the fraud is stopped immediately after detection. The problem is censored as soon as some intervention takes place, making it difficult to precisely quantify the potential risk.

Another challenge in the process of quantifying fraud is how well the fraud episode itself gets recorded. Consider the case of a credit card number getting stolen without the physical card getting stolen. During a certain period, both the legitimate cardholder and the fraudster are charging using the card. If the fraud detection system in the issuing institution doesn’t identify the fraudulent transactions as they were happening in real time, typically fraud is identified when the cardholder gets the monthly statement and figures out that some of the charges were not made by him/her. Then the cardholder calls the issuer to report the fraud.  In the not too distant past, all that used to get recorded by the bank was the cardholder’s estimate of when the fraud episode began, even though there were additional details about the fraudulent transactions that were likely shared by the cardholder. If all that gets recorded is the cardholder’s estimate of when the fraud episode began, ambiguity is introduced regarding the granularity of the actual fraud episode. The initial estimate of the fraud amount becomes a rough estimate at best.  In the case in which the bank’s fraud detection system was able to catch the fraud during the actual fraud episode, the fraudulent transactions tended to be recorded by a fraud analyst, and sometimes not too accurately. If the transaction was marked as fraud or non-fraud incorrectly, this problem was typically not corrected even after the correct information flowed in. When eventually the transactions that were actually fraudulent were identified using the actual postings of the transactions, relating this back to the authorization transactions was often not a straightforward process. Sometimes the amounts of the transactions may have varied slightly. For example, the authorization transaction of a restaurant charge is sometimes unlikely to include the tip that the customer added to the bill. The posted amount when this transaction gets reconciled would look slightly different from the authorized amount. All of this poses an interesting challenge when designing a data-driven analytical system to combat fraud.

The level of accuracy associated with recording fraud data also tends to be dependent on whether the fraud loss is a liability for the customer or to the financial institution. To a significant extent, the answer to the question, “Whose loss is it?” really drives how well past fraud data is recorded. In the case of unsecured lending such as credit cards, most of the liability lies with the banks, and the banks tend to care a lot more about this type of loss. Hence systems are put in place to capture this data on a historical basis reasonably accurately.

In the case of secured lending, ID theft, and so on, a significant portion of the liability is really on the customer, and it is up to the customer to prove to the bank that he or she has been defrauded. Interestingly, this shift of liability also tends to have an impact on the quality of the fraud data captured. In the case of fraud associated with automated clearing house (ACH) batches and domestic and international wires, the problem is twofold: The fraud instances are very infrequent, making it impossible for the banks to have a uniform method of recording frauds; and the liability shifts are dependent on the geography.  Most international locations put the onus on the customer, while in the United States there is legislation requiring banks to have fraud detection systems in place.  The extent to which our client organizations take responsibility also tends to depend on how much they care about the customer who has been defrauded. When a very valuable customer complains about fraud on her account, a bank is likely to pay attention.  Given that most such frauds are not large scale, there is less need to establish elaborate systems to focus on and collect the data and keep track of past irregularities. The past fraud information is also influenced heavily by whether the fraud is third-party or first-party fraud. Third-party fraud is where the fraud is committed clearly by a third party, not the two parties involved in a transaction. In first-party fraud, the perpetrator of the fraud is the one who has the relationship with the bank. The fraudster in this case goes to great lengths to prevent the banks from knowing that fraud is happening. In this case, there is no reporting of the fraud by the customer. Until the bank figures out that fraud is going on, there is no data that can be collected. Also, such fraud could go on for quite a while and some of it might never be identified. This poses some interesting problems. Internal fraud where the employee of the institution is committing fraud could also take significantly longer to find. Hence the data on this tends to be scarce as well.

In summary, one of the most significant challenges in fraud analytics is to build a sufficient database of normal client transactions.  The normal transactions of any organization constitute the baseline from which abnormal, fraudulent or irregular transactions, can be identified and analyzed.  The pinpointing of the irregular is thus foundational to the development of the transaction processing edits which prevent the irregular transactions embodying fraud from even being processed and paid on the front end; furnishing the key to modern, analytically based fraud prevention.

Dr. Fraudster & the Billing Anomaly Continuum

healthcare-fraudThis month’s member’s lecture on Medicare and Medicaid Fraud triggered a couple of Chapter member requests for more specifics about how health care fraud detection analytics work in actual practice.

It’s a truism within the specialty of data analytics having to do with health care billing data that the harder you work on the front end, the more successful you’ll be in materializing information that will generate productive results on the back end.  Indeed, in the output of health care analytics applications, fraud examiners and health care auditors now have a new set of increasingly powerful tools to use in the audit and investigation of all types of fraud generally and of health care fraud specifically; I’m referring, of course, to analytically supported analysis of what’s called the billing anomaly continuum.

The use of the anomaly continuum in the general investigative process starts with the initial process of detection, proceeds to investigation and mitigation and then (depending on the severity of the case) can lead to the follow-on phases of prevention, response and recovery.   We’ll only discuss the first three phases here as most relevant for the fraud examination process and leave the prevention, response and recovery phases for a later post.

Detection is the discovery of clues within the data.  The process involves taking individual data segments related to the whole health care process (from the initial provision of care by the health care provider all the way to the billing and payment for that care by the insurance provider) and blending them into one data source for seamless analysis.  Any anomalies in the data can then be noted.  The output is then evaluated for either response or for follow-up investigation.  It is these identified anomalies that will go on at the end of the present investigative process to feed the detection database for future analysis.

As an example of an actual Medicare case, let’s say we have a health care provider whom we’ll call Dr. Fraudster, some of whose billing data reveals a higher than average percentage of complicated (and costly) patient visits. It also seems that Dr. Fraudster apparently generated some of this billings while travelling outside the country.  There were also referred patient visits to chiropractors, acupuncturists, massage therapists, nutritionists and personal trainers at a local gym whose services were also billed under Dr. Fraudster’s tax ID number as well as under standard MD Current Procedural Terminology (CPT) visit codes.  In addition, a Dr. Outlander, a staff physician, and an unlicensed doctor, was on Dr. Fraudster’s staff and billed for $5 an hour.  Besides Outlander, a Dr. Absent was noted as billing out of Dr. Fraudster’s clinic even though he was no longer associated with the clinic.

First off, in the initial detection phase, its seems Dr. Fraudster’s high-volume activity flagged an edit function that tracks an above-average practice growth rate without the addition of new staff on the claim form.  Another anomalous activity picked up was the appearance of wellness services presented as illness based services.  Also the billed provision of services while travelling is also certainly anomalous.

The following investigation phase involves ascertaining whether various activities or statements are true.  In Dr. Fraudster’s case, evidence to collect regarding his on-staff associate, Dr. Outlander, may include confirmation of license status, if any; educational training, clinic marketing materials and payroll records.  The high percentage of complicated visits and the foreign travel issues need to be broken down and each activity analyzed separately in full detail.  If Dr. Fraudster truly has a high complication patient population, most likely these patients would be receiving some type of prescription regime.  The lack of a diagnosis requirement with associated prescriptions in this case limited the scope of the real-life investigation.  Was Dr. Fraudster prescribing medications with no basis?  If he uses an unlicensed Doctor on his staff, presents wellness services as illness related services, and sees himself (perhaps) as a caring doctor getting reluctant insurance companies to pay for alternative health treatments, what other alternative treatment might he be providing with prescribed medications?  Also, Dr. Fraudster had to know that the bills submitted during his foreign travels were false.  Statistical analysis in addition to clinical analysis of the medical records by actual provider and travel records would provide a strong argument that the doctor had intent to misrepresent his claims.

The mitigation phase typically builds on issues noted within the detection and investigation phases.  Mitigation is the process of reducing or making a certain set of circumstances less severe.  In the case of Dr. Fraudster, mitigation occurred in the form of prosecution.  Dr. Fraudster was convicted of false claims and removed from the Medicare network as a licensed physician, thereby preventing further harm and loss.  Other applicable issues that came forward at trial were evidence of substandard care and medical unbelievability patterns (CPE codes billed that made no sense except to inflate the billing).  What made this case even more complicated was tracking down Dr. Fraudster’s assets.  Ultimately, the real-life Dr. Fraudster did receive a criminal conviction; civil lawsuits were initiated, and he ultimately lost his license.

From an analytics point of view, mitigation does not stop at the point of conviction of the perpetrator.  The findings regarding all individual anomalies identified in the case should be followed up with adjustment of the insurance company’s administrative adjudication and edit procedures (Medicare was the third party claims payer in this case).  What this means is that feedback from every fraud case should be fed back into the analytics system.  Incorporating the patterns of Dr. Fraudster’s fraud into the Medicare Fraud Prevention Model will help to prevent or minimize future similar occurrences, help find currently on-going similar schemes elsewhere with other providers and reduce the time it takes to discover these other schemes.  A complete mitigation process also feeds detection by reducing the amount of investigative time required to make the existence of a fraud known.

As practicing fraud examiners, we are provided by the ACFE with an examination methodology quite powerful in its ability to extend and support all three phases of the health care fraud anomaly identification process presented above.  There are essentially three tools available to the fraud examiner in every health care fraud examination, all of which can significantly extend the value of the overall analytics based health care fraud investigative process.  The first is interviewing – the process of obtaining relevant information about the matter from those with knowledge of it.  The second is supporting documents – the examiner is skilled at examining financial statements, books and records.   The examiner also knows the legal ramifications of the evidence and how to maintain the chain of custody over documents.  The third is observation – the examiner is often placed in a position where s/he can observe behavior, search for displays of wealth and, in some instances, even observe specific offenses.

Dovetailing the work of the fraud examiner with that of the healthcare analytics team is a win for both parties to any healthcare fraud investigation and represents a considerable strengthening of the entire long term healthcare fraud mitigation process.

Concurrent Fraud Auditing (CFA) as a Tool for Fraud Prevention

JeSuisCharlieOne of our CFE chapter members left us a contact comment asking whether concurrent fraud auditing might not be a good anti-fraud tool for use by a retailer client of hers that receives hundreds of credit card payments for services each day.  The foundational concepts behind concurrent fraud auditing owe much to the idea of continuous assurance auditing (CAA) that internal auditors have applied for years.  Basically, at the heart of a system of concurrent fraud auditing (CFA) like that of CAA,  is the process of embedding control based software monitors in real time, automated financial or payment systems to alert reviewers of transactional anomalies as close to their occurrence as possible.  Today’s networked processing environments have made the implementation and support of such real time review approaches operationally feasible in ways that the older, batch processing based environments couldn’t.

Our member’s client uses several on-line, cloud based services to process its customer payments; these services provide our member’s client with a large database full of payment history, tantamount to a data warehouse, all available for use on SQL server,  by in-house client IT applications like Oracle and Microsoft Access.  In such a data rich environment, CFE’s and other assurance professionals can readily test for the presence of transactional patterns characteristic of defined, common payment fraud scenarios such as those associated with identity theft and money laundering.   The objective of the CFA program is not necessarily to recover the dollars associated with on-line frauds but to continuously (in as close to real time as possible) adjust the edits in the payment collection and processing system so that certain fraudulent transactions (those associated with known fraud scenarios) stand a greater chance of not even getting processed in the first place.  Over time, the CFA process should get better and better at editing out or flagging the anomalies associated with your defined scenarios.

The central process of any CFA system is that of an independent application monitoring for suspected fraud related activity through, for example (as with our Chapter member), periodic (or even real time) reviews of the cloud based files of an automated payment system. Depending upon the degree of criticality of the results of its observations, activity summaries of unusual items can be generated with any specified frequency and/or highlighted to an exception report folder and communicated to auditors via “red flag” e-mail notices.  At the heart of the system lies a set of measurable, operational metrics or tags associated with defined fraud scenarios.  The fraud prevention team would establish the metrics it wishes to monitor as well as supporting standards for those metrics.   As a simple example, the U.S. has established anti-money-laundering banking rules specifying that all transactions over $10,000 must be reported to regulators.  By experience, the $10,000 threshold is a fraud related metric investigators have found to be generic in the identification of many money-laundering fraud scenarios.  Anti-fraud metric tags could be built into the cloud based financial system of our Chapter member’s client to monitor in real time all accounts payable and other cash transfer transactions with a rule that any over $10,000 would be flagged and reviewed by a member of the audit staff.  This same process could have multiple levels of metrics and standards with exceptions fed up to a first level assurance process that could monitor the outliers and, in some instances,  send back a correcting  feedback transaction to the financial system itself (an adjusting or corrective edit or transaction flag).  The warning notes that our e-mail systems send us that our mailboxes are full are another example of this type of real time flagging and editing.

Yet other types of discrepancies would flow up to a second level fraud monitoring or audit process.  This level would produce pre-formatted reports to management or constitute emergency exception notices.  Beyond just reports, this level could produce more significant anti-fraud or assurance actions like the referral of a transaction or group of transactions to an enterprise fraud management committee for consideration as documentation of the need for an actual future financial system fraud prevention edit. To continue the e-mail example, this is where the system would initiate a transaction to prevent future mailbox accesses by an offending e-mail user.

There is additionally yet a third level for our system which is to use the CFA to monitor the concurrent fraud auditing process itself.  Control procedures can be built to report monitoring results to external auditors, governmental regulators, the audit committee and to corporate council as documented evidence of management’s performance of due diligence in its fight against fraud.

So its no surprise that  I would certainly encourage our member to discuss the CFA approach with the management of her client.  It isn’t the right tool for everyone since such systems can vary greatly in cost depending upon the existing processing environment and level of IT sophistication of the developing organization but the discussion is worth the candle. CFA’s are particularly useful for monitoring purchase and payment cycle applications with an emphasis on controls over customer and vendor related fraud.  CFA is an especially useful tool for any financial application where large amounts of cash are either coming in our going out the door like banking applications and especially  to control all aspects of the processing of insurance claims.

E-discovery Challenges for Fraud Examiners

black-signI returned from the beach last Friday to find a question in my in-box from one of our Chapter members relating to several E-discovery issues (electronically stored information) she’s currently encountering on one of her cases.  The rules involving E-discovery are laid out in the US Federal Rules of Civil Procedure and affect not only parties to federal lawsuits but also any related business (like the client of our member).  Many fraud professionals who don’t routinely work with matters involving the discovery of electronically stored information are surprised to learn just how complex the process can be; unfortunately, like our member’s client company, they sometimes have to learn the hard way, during the heat of litigation.

All parties to a Federal lawsuit have a legal responsibility, under the Rules of Civil Procedure and numerous State mirror statutes, to preserve relevant electronic information.  What is often not understood by folks like our member’s client is that, when a party finds itself under the duty to preserve information because of pending or reasonably anticipated litigation, adjustment in the normal pattern of its information systems processing is very often required and can be hard to implement.  For example, under the impact of litigation, our member’s client needs to stop deleting certain e-mails and refrain from recycling system backup media as it’s routinely done for years.  The series of steps her client needs to take to stop the alteration or destruction of information relevant to the case is known as a ‘litigation hold’.

What our clients need to clearly understand regarding E-discovery is that the process is a serious matter and that, accordingly, courts can impose significant sanctions if a party to litigation does not take proper steps to preserve electronic information.  The good news is, however, that if a party is found to have performed due diligence and implemented reasonable procedures to preserve relevant electronic data, the Rules provide that sanctions will not be imposed due to the loss of information during the ‘normal routine’ and ‘good faith’ operations of automated systems; this protection provided by due diligence is called the ‘safe harbor’.

To ensure that our clients enjoy the protections afforded them through confirmation of due diligence, my recommendation is that both parties to the litigation meet to attempt to identify issues, avoid misunderstandings, expedite proper resolution of problems and reduce the overall litigation costs (which can quickly get out of hand) associated with E-discovery.  The plaintiff’s and defendant’s lawyers need some sort of venue where they can become thoroughly familiar with the information systems and electronic information of their own client and those of the opposing party.  Fraud examiners can be of invaluable assistance to both parties in achieving this objective since they typically know most about the details of the investigation which is often the occasion of the litigation.  Both sides need to obtain information about the electronic records in play prior to the initial discovery planning conference, perhaps at a special session, to determine:

–the information systems infrastructure of both parties to the litigation;
–location and sources of relevant digitized information;
–scope of the electronic information requirements of both litigants;
–time period during which the required information must be available;
–the accessibility of the information;
–information retrieval formats;
–costs and effort to retrieve the required information;
–preservation and chain of custody of discover-able information;
–assertions of privilege and protection of materials related to the litigation.

Technical difficulties and verbal misunderstandings can arise at any point in the E-discovery process.  It often happens that one of the litigants may need to provide technical support so it that digital information can even be used by the opposing party … this can mean that metadata (details about the electronic data) must be provided for the data to be understandable.  This makes it a standard good practice for all parties to test a sample of the information requested to determine how usable it is as well as to determine how burdensome it is to the requested party to retrieve and provide.

It just makes good sense to get the client’s information management professionals involved as soon as possible in the E-discovery process.  A business will have to disclose all digitally stored information that it plans to use to support its claims or defenses.  When faced with specific requests from the opposing side, your client will need to determine whether it can retrieve information in its original format that is usable by the opposition; a question that often only skilled information professionals can definitively answer.

Since fraud examination clients face E-discovery obligations not only for active Federal litigation but also for foreseeable litigation, businesses can be affected that merely receive a Federal subpoena seeking digital information.   Our questioner’s client received such a subpoena regarding an on-going fraud investigation and was not ready to effectively respond to it, leaving the company potentially vulnerable to fines and adverse judgments.

Data Warehouses, OLAP & the Fraud Examiner


SewingSince two of the topics to be discussed in our April 2014 Introduction to Fraud Examination seminar in Richmond next week will be client data warehousing and online analytical processing (OLAP), I thought I’d write a short post briefly introducing both concepts as practice tools for fraud examiners.

A fraud examiner laying out an investigation might well ask, “What’s a data warehouse and how’s can it be of use to me in building the case at hand?”  The short answer is that a client’s data warehouse represents formatted, managed client data stored outside of the client’s operational information systems.  The originating idea behind the warehouse concept is that data stored to answer all sorts of analytical questions about the business can be accessed most effectively by users by separating that data from the enterprise’s operational systems.  What’s the point of separating operational data from analytical data?  Most audit practitioners today can remember that not too long ago our clients archived inactive data onto tapes and ran any analytical reports they chose to run against those tapes, primarily to lessen the day to day system performance impact on their important operational systems.

In today’s far more complex data handing environment, the reason for the separation is that there’s just far more data of  different types to be analyzed, all available at once and requiring processing at different  frequencies and levels,  for a seemingly ever expanding roster of purposes. The last decade has demonstrated that the data warehouse concept operates most successfully for those organizations which can combine data from multiple business processes such as marketing, sales and production systems into an easily updated and maintained location (such as the cloud), accessible by all authorized user stakeholders, both internal and external to the organization. Source applications feed the warehouse incrementally and map the transfer trace allowing fraud examiners and other control assurance professionals to perform transaction cross-referencing and data filtering; the fraud examiner profiling the data flows related to a fraud scenario can generate case related queries  for a given week, month, quarter or a year and (this is most important) compare financial transaction data flows based the on-going historical status (old and updated) of the same and related applications.

A important point for fraud examiners to be aware of is that, often, access to just low end data analysis tools such as simple query capabilities may be all that’s required to assist in the construction of  even relatively complex fraud cases.  For examiners who choose to broaden their practice capabilities to handle even more complex investigations, access to powerful, multi-dimensional tools is now, increasingly, available.  One of these tools is on line analytical processing (OLAP) based on the concept of the relational database embodied by many, if not most, information systems database applications today.

Think of a cloud based data ware house over-laid by a complex, very large spreadsheet (the OLAP application) allowing the examiner to perform queries, searches, pivots, calculations and a vast array of other types of data manipulation over multiple dimensional pages.  Imagine being able to flip a complex database on its side and examine all of the data from that different perspective or being able to highlight an individual data element and then drilling down to examine and trace the basic foundational data that went into creating that item of interest.

Today’s increasingly cloud based OLAP systems support multi-dimensional conceptual (what-if) views of the underlying data allowing for calculations and modeling to be applied across multiple dimensions, through hierarchies, and across database elements.  Amazingly, advanced tools are presently available to allow OLAP-based analysis across eight to ten different dimensions.  Of special interest to investigators is the OLAP’s ability to perform detailed analysis of trends and fluctuations in transactional data while laying bare the supporting information rolling up to the trend or fluctuation.

Fraud examiners and other assurance professionals need to be generally aware that there are various software products on the market today that can be used to perform OLAP functions.  As we’ll cover in the seminar, client organizations usually implement OLAP in a multi-user client/server mode with the aim of offering authorized users rapid responses to queries regardless of the size and complexity of the internal or cloud based underlying warehouse.  OLAP can held fraud examiners and other users summarize targeted client information for investigation through comparative, personalized viewing as well as through analysis of historical and projected data in various what-if data model scenarios.