4.7 Article

High-Cardinality Categorical Attributes and Credit Card Fraud Detection

Journal

MATHEMATICS
Volume 10, Issue 20, Pages -

Publisher

MDPI
DOI: 10.3390/math10203808

Keywords

credit card fraud; fraud-detection system; high-cardinality attribute; pattern recognition; clustering; deep learning

Categories

Funding

  1. Brazilian Aeronautics Institute of Technology (ITA)
  2. Casimiro Montenegro Filho Foundation (FCMF)
  3. 2RP Net Enterprise
  4. Brazilian Ministry of Education (Ministerio da EduCacao-MEC)

Ask authors/readers for more resources

This study reports the positive impacts of using high-cardinality attributes on credit card fraud detection. A new algorithm for domain reduction is proposed, which preserves the fraud-detection capabilities while reducing attribute cardinality and improving training times.
Credit card transactions may contain some categorical attributes with large domains, involving up to hundreds of possible values, also known as high-cardinality attributes. The inclusion of such attributes makes analysis harder, due to results with poorer generalization and higher resource usage. A common practice is, therefore, to ignore such attributes, removing them, albeit wasting the information they provided. Contrariwise, this paper reports our findings on the positive impacts of using high-cardinality attributes on credit card fraud detection. Thus, we present a new algorithm for domain reduction that preserves the fraud-detection capabilities. Experiments applying a deep feedforward neural network on real datasets from a major Brazilian financial institution have shown that, when measured by the F-1 metric, the inclusion of such attributes does improve fraud-detection quality. As a main contribution, this proposed algorithm was able to reduce attribute cardinality, improving the training times of a model while preserving its predictive capabilities.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available