System for discovering hidden correlation relationships for risk analysis using graph-based machine learning

Item request has been placed!

Item request cannot be made.

Processing Request

Read Online Read More Add to Saved list

Publication Date:
March 07, 2023

Additional Information
- Patent Number:
  11599,840
- Appl. No:
  16/285157
- Application Filed:
  February 25, 2019
- Abstract:
  A system, method, and computer readable device that detects hidden correlation relationships among entities, such as companies and/or individuals is presented. A dataset that corresponds to a predefined set of correlation relationships of these companies and/or individuals may be collected. The dataset may be stored in a graph database and a machine learning system may be built using features computed from the graph database. At least a new pair of companies or a new pair of an individual and a company may be evaluated. The system, method, and/or computer readable device may determine whether a hidden correlation relationship exists between them.
- Inventors:
  Graphen, Inc. (New York, NY, US)
- Assignees:
  Graphen, Inc. (New York, NY, US)
- Claim:
  1. A system, comprising: a memory that stores instructions; and a processor that executes the instructions to perform operations, the operations comprising: collecting data associated with known correlation relationships that exist among a plurality of entities; accessing the data associated with the known correlation relationships that exist among a plurality of entities, wherein the data comprises information associated with the known correlation relationships and an indication of a strength of the known correlation relationships; generating a graph based on the known correlation relationships and the strength of the known correlation relationships, wherein vertexes of the graph correspond to the plurality of entities and edges of the graph correspond to the known correlation relationships; computing, based on the graph, a set of features and corresponding labels for the plurality of entities, wherein the set of features and the corresponding labels are computed based on converting the graph to a reduced graph by removing an edge of the edges of the graph representing a type of relationship between a pair of vertexes of the vertexes that is sought to be revealed from being hidden, wherein at least one feature of the set of features is computed by utilizing a proximity measure indicating a measure of how close each vertex of the pair of vertexes are to each other, wherein the proximity measure indicates a number of paths between the pair of vertexes and a sum of a weighted path length; training a machine learning model using the computed set of features and the corresponding labels; and determining a hidden correlation relationship between a first entity of the plurality of entities and at least one additional entity of the plurality of entities by utilizing the machine learning model trained with the computed set of features, wherein the hidden correlation relationship is displayed to a user on a first user device to assist the user in making a decision regarding the first entity.
- Claim:
  2. The system of claim 1 , wherein the graph comprises a simple graph, a multi-graph, or a combination thereof.
- Claim:
  3. The system of claim 1 , wherein the vertexes of the graph comprise a company, an important person of the company, or a combination thereof.
- Claim:
  4. The system of claim 1 , wherein the edges represent an investment relationship, a shareholder relationship, a funding relationship, a transactions relationship, a guarantee relationship, a collateral relationship, a trading relationship, a very important person (VIP) relationship, any type of relationship, any type of business relationship, or a combination thereof.
- Claim:
  5. The system of claim 1 , wherein the features of the set of features include network topology features and business-related features.
- Claim:
  6. The system of claim 1 , wherein the features of the set of features are based on neighborhood and comprise node-pair wise metrics comprising common neighbors, Jaccard's coefficient, Adar Index, Salton Index, Leicht Index, Sorensen Index, Hub Index, Resource Allocation Index, or a combination thereof.
- Claim:
  7. The system of claim 1 , wherein features of the set of features comprise node-pair wise metrics based on paths including a shortest path distance, a Katz distance, a hitting time, a number of total paths, or a combination thereof.
- Claim:
  8. The system of claim 1 , wherein the features of the set of features are computed from one or more of: a subgraph containing two nodes of interest, including the total vertexes and edges of the subgraph and the ratio of the number of edges over the nodes; circles that contain two vertexes of the vertexes and are determined by an overlap of the circles; or properties of the vertexes, wherein the properties comprise a money transaction amount, a number of guarantees provided for each other, a number of collaterals provided for each other, a total amount of the guarantees, a total amount of the collaterals, an overlap in names, emails, addresses, or a combination thereof.
- Claim:
  9. The system of claim 1 , wherein the features of the set of features are computed for a single edge of the edges and multiple edges of the edges.
- Claim:
  10. The system of claim 1 , wherein the operations further comprise: providing positive labels to node-pairs where two nodes have a certain correlation relationship of interest; and providing negative labels to node-pairs where the two nodes do not have the correlation relationship of interest.
- Claim:
  11. The system of claim 10 , wherein the operations further comprise: generating a series of graphs for different time points; and computing additional features for the graph at each of the different time points.
- Claim:
  12. The system of claim 1 , wherein the third-party data sources include information provided by government agencies, news media, social networks, third-party agencies, public announcements made by the plurality of entities, or a combination thereof.
- Claim:
  13. The system of claim 1 , wherein the data includes loan histories, credit histories, financial information, trading and economic information, shareholder information, and transactional information.
- Claim:
  14. A method, comprising: accessing data associated with known correlation relationships that exist among a plurality of entities, wherein the data comprises information associated with the known correlation relationships and an indication of a strength of the known correlation relationships; generating a graph based on the known correlation relationships and the strength of the known correlation relationships, wherein vertexes of the graph correspond to the plurality of entities and edges of the graph correspond to the known correlation relationships; computing, based on the graph, a set of features and corresponding labels for the plurality of entities, wherein the set of features and the corresponding labels are computed based on converting the graph to a reduced graph by removing an edge of the edges of the graph representing a type of relationship between a pair of vertexes of the vertexes that is sought to be revealed from being hidden, wherein at least one feature of the set of features is computed by utilizing a proximity measure indicating a measure of how close each vertex of the pair of vertexes are to each other, wherein the proximity measure indicates a number of paths between the pair of vertexes and a sum of a weighted path length; training a machine learning model using the computed set of features and the corresponding labels; determining, by utilizing instructions from a memory that are executed by a processor, a hidden correlation relationship between a first entity of the plurality of entities and at least one additional entity of the plurality of entities by utilizing the machine learning model trained with the computed set of features; and displaying, on a user device, the hidden relationship to a user using the user device to assist the user in making a decision regarding the first entity.
- Claim:
  15. The method of claim 14 , further comprising training the machine learning model by utilizing support vector machines, deep neural networks, gradient boosting, decision trees, random forests, logistic regression, or a combination thereof.
- Claim:
  16. The method of claim 14 , further comprising training the machine learning model in a supervised, semi-supervised, or unsupervised manner.
- Claim:
  17. The method of claim 14 , further comprising determining an additional hidden correlation relationship among two or more entities of the plurality of entities for anti-money laundering, anti-terrorist, or other law enforcement investigations.
- Claim:
  18. The method of claim 14 , further comprising: providing positive labels to node-pairs where two nodes have a certain correlation relationship of interest; and providing negative labels to node-pairs where the two nodes do not have the correlation relationship of interest.
- Claim:
  19. The method of claim 14 , further comprising: generating a series of graphs for different time points; and computing additional features for the graph at each of the different time points.
- Claim:
  20. A non-transitory computer-readable device comprising instructions, which when loaded and executed by a processor, cause the processor to perform operations comprising: receiving data associated with known correlation relationships that exist among a plurality of entities, wherein the data comprises information associated with the known correlation relationships and an indication of a strength of the known correlation relationships; generating a graph based on the known correlation relationships and the strength of the known correlation relationships, wherein vertexes of the graph correspond to the plurality of entities and edges of the graph correspond to the known correlation relationships; computing, based on the graph, a set of features and corresponding labels for the plurality of entities, wherein the set of features and the corresponding labels are computed based on converting the graph to a reduced graph by removing an edge of the edges of the graph representing a type of relationship between a pair of vertexes of the vertexes that is sought to be revealed from being hidden, wherein at least one feature of the set of features is computed by utilizing a proximity measure indicating a measure of how close each vertex of the pair of vertexes are to each other, wherein the proximity measure indicates a number of paths between the pair of vertexes and a sum of a weighted path length; training a machine learning model using the computed set of features and the corresponding labels; determining a hidden correlation relationship between a first entity of the plurality of entities and at least one additional entity of the plurality of entities by utilizing the machine learning model trained with the computed set of features; and displaying, on a user device, the hidden correlation relationship to a user using the user device to assist the user in making a decision regarding the first entity.
- Patent References Cited:
  7512612 March 2009 Akella et al.
  10127511 November 2018 Epstein
  10210470 February 2019 Datta Ray
  20050222929 October 2005 Steier
  20060167784 July 2006 Hoffberg
  20070087756 April 2007 Hoffberg
  20100317420 December 2010 Hoffberg
  20110208681 August 2011 Kuecuekyan
  20150310195 October 2015 Bailor et al.
  20160078356 March 2016 Dang et al.
  20170220964 August 2017 Datta Ray
  20180197128 July 2018 Carstens
  20200226512 July 2020 Epstein
- Other References:
  Hamilton, William L., Rex Ying, and Jure Leskovec. “Representation learning on graphs: Methods and applications.” arXiv preprint arXiv: 1709.05584 (2017). (Year: 2017). cited by examiner
  Feng, Nan, Harry Jiannan Wang, and Minqiang Li. “A security risk analysis model for information systems: Causal relationships of risk factors and vulnerability propagation analysis.” Information sciences 256 (2014): 57-73. (Year: 2014). cited by examiner
  Galindo, Jorge, and Pablo Tamayo. “Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications.” Computational Economics 15.1 (2000): 107-143. (Year: 2000). cited by examiner
  Kose, Ilker, Mehmet Gokturk, and Kemal Kilic. “An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance.” Applied Soft Computing 36 (2015): 283-299. (Year: 2015). cited by examiner
  Phua, Clifton, et al. “A comprehensive survey of data mining-based fraud detection research.” arXiv preprint arXiv:1009.6119 (2010). (Year: 2010). cited by examiner
  Prado, Adriana, et al. “Mining graph topological patterns: Finding covariations among vertex descriptors.” IEEE Transactions on Knowledge and Data Engineering 25.9 (2012): 2090-2104. (Year: 2012) (Year: 2012). cited by examiner
  Ren, Xuguang, and Junhu Wang. “Exploiting vertex relationships in speeding up subgraph isomorphism over large graphs.” Proceedings of the VLDB Endowment 8.5 (2015): 617-628. (Year: 2015) (Year: 2015). cited by examiner
  Budur et al., “Structural analysis of criminal network and predicting hidden links using machine learning,” In: arXiv preprint arXiv, Sep. 21, 2015, https://arxiv.org/pd1/1507.05739.pdf. cited by applicant
  Leskovec et al., “Learning to discover social circles in ego networks,” In: Advances in neural information processing systems, 2012, https://cs.stanford.edu/people/jure/pubs/circles-nips12.pdf. cited by applicant
  World Intellectual Property Organization, “International Search Report and Written Opinion,” issued in PCT/US2019/019466, dated May 13, 2019. cited by applicant
  Liben-Nowell et al., “The link-prediction problem for social networks,” J Am. Soc. Inf. Sci., 58: 1019-1031, May 2007. cited by applicant
  Wilson et al., “Graph-based Proximity Measure”, Book Chapter 6 from Practical Graph Mining with R, CRC Press, 2013. cited by applicant
- Primary Examiner:
  Singh, Gurkanwaljit
- Attorney, Agent or Firm:
  Akerman LLP
  Harding, Ryan L.
- Accession Number:
  edspgr.11599840

Comments

No Comments.

System for discovering hidden correlation relationships for risk analysis using graph-based machine learning

Contact

Follow us