An Efficient Classification Model for Analyzing Skewed Data to Detect Frauds in the Financial Sector ; Un modèle de classification efficace pour l'analyse des données déséquilibrées pour détecter les fraudes dans le secteur financier

Item request has been placed!

Item request cannot be made.

Processing Request

Read More Add to Saved list

Author(s): Makki, Sara
Source:
https://theses.hal.science/tel-02457134 ; Data Structures and Algorithms [cs.DS]. Université de Lyon; Université Libanaise, 2019. English. ⟨NNT : 2019LYSE1339⟩.
Subject Terms:
Financial fraud; Class imbalance; F1 score; Cost Sensitive Classification; Cosine similarity; K-Nearest Neighbors; Apprentissage ensembliste; K-modes; Fraude financière; Déséquilibre de classe; Score F1; Classification sensible aux coûts; Mesure de cosinus; K plus proche voisins; [INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]
Document Type:
doctoral or postdoctoral thesis
Language:
English

Additional Information
- Contributors:
  Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS); Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL); Université de Lyon-Université de Lyon-Université Claude Bernard Lyon 1 (UCBL); Université de Lyon-Institut National des Sciences Appliquées de Lyon (INSA Lyon); Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS); Base de Données (BD); Université de Lyon-Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Lumière - Lyon 2 (UL2)-École Centrale de Lyon (ECL); Université de Lyon; Université Libanaise; Mohand Saïd Hacid; Hassan Zeineddine
- Publication Information:
  CCSD
- Publication Date:
  2019
- Collection:
  HAL Lyon 1 (University Claude Bernard Lyon 1)
- Abstract:
  There are different types of risks in financial domain such as, terrorist financing, money laundering, credit card fraudulence and insurance fraudulence that may result in catastrophic consequences for entities such as banks or insurance companies. These financial risks are usually detected using classification algorithms. In classification problems, the skewed distribution of classes also known as class imbalance, is a very common challenge in financial fraud detection, where special data mining approaches are used along with the traditional classification algorithms to tackle this issue. Imbalance class problem occurs when one of the classes have more instances than another class. This problem is more vulnerable when we consider big data context. The datasets that are used to build and train the models contain an extremely small portion of minority group also known as positives in comparison to the majority class known as negatives. In most of the cases, it’s more delicate and crucial to correctly classify the minority group rather than the other group, like fraud detection, disease diagnosis, etc. In these examples, the fraud and the disease are the minority groups and it’s more delicate to detect a fraud record because of its dangerous consequences, than a normal one. These class data proportions make it very difficult to the machine learning classifier to learn the characteristics and patterns of the minority group. These classifiers will be biased towards the majority group because of their many examples in the dataset and will learn to classify them much faster than the other group. After conducting a thorough study to investigate the challenges faced in the class imbalance cases, we found that we still can’t reach an acceptable sensitivity (i.e. good classification of minority group) without a significant decrease of accuracy. This leads to another challenge which is the choice of performance measures used to evaluate models. In these cases, this choice is not straightforward, the accuracy or sensitivity ...
- Relation:
  NNT: 2019LYSE1339
- Online Access:
  https://theses.hal.science/tel-02457134
  https://theses.hal.science/tel-02457134v1/document
  https://theses.hal.science/tel-02457134v1/file/TH2019MAKKISARA.pdf
- Rights:
  info:eu-repo/semantics/OpenAccess
- Accession Number:
  edsbas.326948A9

Comments

No Comments.

An Efficient Classification Model for Analyzing Skewed Data to Detect Frauds in the Financial Sector ; Un modèle de classification efficace pour l'analyse des données déséquilibrées pour détecter les fraudes dans le secteur financier

Contact

Follow us