Loading presentation...

Present Remotely

Send the link below via email or IM


Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.


On the Simulation of Financial Transactions for Fraud Detection Research

No description

Edgar Lopez

on 24 October 2016

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of On the Simulation of Financial Transactions for Fraud Detection Research

Is it possible to do reliable fraud detection using
a Synthetic Data set?

A literature review lead to an analysis of using synthetic data, and we concluded that if we can build a realistic simulator that generates such data, the fraud detection techniques that we apply will be useful and applicable to the original data.
When using synthetic data for research there are several benefits and threats:

The data that represent realistic scenarios are readily available.
The privacy of the customer is not impacted.
The disclosure of results is not affected by policies or legal issues.
The data set is available for other researchers to reproduce experiments.
Different scenarios can be modeled with parameters controlled by the researcher.
Injection of enough abnormal data to address the class unbalance problem.
Simulation of abnormal behavior prevent the problem of mislabeled classes.
Much more data, and much more varied types of data can be produced at will, than can be collected in the field.
Money Laundering Detection using Synthetic Data
RetSim: A Shoe Store Agent-Based Simulation for Fraud Detection
Using the RetSim Simulator for Fraud Detection Research
Is the generated data set properly anonymized with respect to the original data set?
Since we do not keep any record of who is purchasing what items in the store, we can ensure that no real customers are exposed. Even though there is, of course, some leakage from a business perspective, the data owners consider that this data is old enough to not pose a risk for their business today but for our research is good enough to build our model from it.
Is threshold detection sufficient to keep the losses from fraud at manageable level?
Multi Agent Based Simulation (MABS) of Financial Transactions for
Anti Money Laundering (AML)
How could we generate a realistic synthetic data set for financial transactions for the purpose of fraud detection?
Edgar Alonso Lopez-Rojas - edgar.lopez@bth.se
On the Simulation of Financial Transactions for Fraud Detection Research
Fraud is an important problem in a number of different fields and domains.
In order to develop, test and compare techniques it is necessary to have access to data.
We started some time ago to address the problem of develop novel methods for fraud detection, more specifically Anti-Money Laundering for Mobile Money Payments.
But the first issue we addressed was the lack of available data for our research in this field.
The work presented in this thesis is an effort to address the lack of public available financial data.
Our aim is to find suitable methods and techniques to simulate realistic financial scenarios and generate synthetic data sets.
If we can not get access to public financial records due the many restrictions imposed by the owners, then one possible alternative is to generate such a data.
Money Laundering
Source: United Nations Office of Drug and Crime (https://www.unodc.org/unodc/en/money-laundering/laundrycycle.html)
Two case studies that implement a Multi-Agent Based Simulation model to address the problem of simulating financial transactions for fraud detection research.
Our agent model with its programmed micro behavior produces a similar type of overall interaction network that we can observe in the original data, and furthermore, this interaction network give rise to the same emerging macro behavior as found on the real data set.
We can implement and study different malicious behavior and test fraud detection techniques such as simple threshold control or more advance such as machine learning.
RETSIM: A Retail Store Simulator for Fraud Detection
PaySim: A Mobile Money Payment Simulator
PaySim is based on a company that developed a mobile money platform to transfer money between users using the phone as a sort of electronic wallet.
Our task was to develop an approach that detects suspicious activities that are indicative of money laundering.
This service was only running in a demo mode.
This prevents us from collecting any data that can be used for analysis of possible detection methods.
We modeled and implemented a MABS that uses the schema of the real mobile money service and generates synthetic data following scenarios based on predictions of what could be possible when the real system starts operating.
Unfortunately other issues arise that are important to consider when using synthetic data. Some of these issues or disadvantages are:

The data generated might be neither representative or realistic.
Data can be biased.
Data can be unconsciously designed to fit a specific model.
It is difficult to build a realistic model due to the complexity of the variables and parameters.
The simulated suspicious data cannot be investigated further. In a real scenario these results could be used for improving the accuracy of the existing classification algorithms.
It is unknown whether we can transfer the learning from a simulated data set to a real-world data set.
How could we generate a realistic synthetic data set for financial transactions for the purpose of fraud detection?

Using a case study based on a Mobile Money Payment system suggested an approach based on Multi-Agent Based Simulations (MABS)
How could we model and simulate a retail shoe store and obtaining a
realistic synthetic data set for the purpose of fraud detection?
Simulators and
Agent Based Simulation
Simulation uses a model to infer conclusions about the behavior of real-world phenomena.
Computer simulation seeks to attain the same goal but requires the model to be implemented on a computer.
Simulations with the aid of computers became very popular due to the impossibility to replicate or simulate certain complex phenomena using other techniques.
The amount of processing needed for complex simulation are making computer simulation a hot topic nowadays.
In this thesis we make use of Agent-Based Simulation approach.
MABS makes use of the knowledge of the individual behavior of the components or agents.
By programming the micro-behavior of the agents, a macro-behavior emerge and it is observed in the system
Money-laundering is the process that disguises illegal profits without compromising the criminals who wish to benefit from the proceeds.
We studied possible algorithms for our detection research. The algorithms analyzed here are based on supervised learning with Decision Tree learning and Decision Rules techniques
What is next?
Our initial goal of addressing complex types of fraud such as MoneyLaundering is still present.
We aim to build three different kind of simulators related with financial transactions
PaySim needs real data to calibrate the parameters of the simulator and evaluate the performance.
RetSim will be used to investigate further fraud cases and dig more into how effective is threshold detection in comparison with other methods.
BankSim is under development. It uses aggregated data from credit card payments that were made publicly available by a bank in Spain.
One of the biggest challenges of this development phase is to integrate all three simulators into one single Multi-Simulator that shares a common reference to customers and can keep track of the transactions of a single agent across all simulators.
Licentiate Seminar - 5th May 2014
Blekinge Institute of Technology
Department of Computer Science and Engineering
Full transcript