跳转至

Vertica 集成 FICO

概述

FICO Blaze Advisor 决策规则管理系统允许企业快速将业务规则和分析洞察转化为可操作决策。当决策规则需要处理大量数据时,将决策引擎靠近数据源可获得更好的性能。将 Blaze Advisor 与 Vertica 集成,可以在数据源处执行分析,充分利用 Vertica 的 MPP 能力。

Vertica 提供创建 Java 库的方式,可作为 JAR 文件部署并关联函数。Vertica 函数是使用 Vertica SDK 实现的 Java 类。可以配置 Vertica 函数调用存储在 Vertica 中的 Blaze Advisor 决策规则应用程序,减少数据检索和决策之间的延迟。

集成示例

本文示例计算银行在客户违约时对给定贷款的敞口 (Exposure at Default, EAD)。

工作流程

  1. 使用 Blaze Advisor Quick Deployer 将决策规则模型转换为 Java 独立部署。
  2. 创建 Eclipse 项目实现 Vertica 函数,嵌入 FICO 模型及相关文件。
  3. 创建可在 Vertica 集群上部署的集成 JAR 文件。
  4. 在 Vertica 中创建库和函数,将构件部署到整个集群。
  5. 通过 SELECT 语句调用 Vertica 函数,在 Vertica 集群中的数据上执行决策规则计算。

截图

违约敞口 (EAD)

示例使用贷款数据确定银行每笔贷款的敞口程度:

  • Account number:账户唯一标识
  • Recent index valuation:购买的最新指数估值
  • Purchase price:购买价格
  • Balance outstanding:贷款未偿还余额

FICO 决策规则模型计算 Outst_Bal_Perc_Rec_Idx_Val(未偿还余额占指数估值的百分比),该值越低,贷款违约概率越小。

两种评分计算方式

  1. 将所有数据从 Vertica 读取出来,传入 FICO 引擎计算评分。
  2. 推荐方式:将 FICO 模型部署到 Vertica 上,在 Vertica 内计算评分。无需将数据移出 Vertica,对大量数据具有更好的扩展性。

创建 Vertica+Blaze 集成 JAR 文件

所需文件

  • Vertica 函数 Java 类
  • Blaze Advisor 调用 Java 类
  • POJO (Plain Old Java Object) 部署
  • 服务器配置文件 (.server)
  • .adb 文件(编译后的 Blaze Advisor 项目)

截图

编写 Vertica 函数

Vertica 标量函数,执行时从 Vertica 数据库获取数据,调用 Blaze SRL 函数:

public class score extends ScalarFunction {
    public void processBlock(ServerInterface srvInterface, BlockReader argReader, BlockWriter resWriter)
        throws UdfException, DestroyInvocation {
        do {
            double a = argReader.getDouble(0);
            double b = argReader.getDouble(1);
            double c = argReader.getDouble(2);
            double result = -9999.0D;
            EADModelInExternal req = new EADModelInExternal();
            try {
                req.setOutst_Bal_Perc_Rec_Idx_Val(a);
                req.setPurchase_Price(b);
                req.setRecent_Indexed_Valuation(c);
                result = new VerticaEadPdModelExample().invokeBlazeFunction(
                    req, "ModelExecutionService",
                    "ModelExecutionService_ser.server", "getEADScore_1");
            } catch (Exception e) {
                throw new UdfException(0, e);
            }
            resWriter.setDouble(result);
            resWriter.next();
        } while (argReader.next());
    }
}

POJO 定义

public class EADModelInExternal {
    private double Outst_Bal_Perc_Rec_Idx_Val;
    private double Purchase_Price;
    private double Recent_Indexed_Valuation;
    private double score;
    // getters and setters
}

创建 JAR 文件

下载包中包含 InstallJars.bat 脚本,可创建集成 JAR 文件 VerticaLoanApplication-trunk.jar。Eclipse 项目针对 Java 1.7 和 Vertica SDK v7.2.2-1 编译。

部署 JAR 文件和创建 Vertica 函数

部署 JAR 文件作为 Java 库

=> CREATE LIBRARY ficoEADPD AS
'/home/dbadmin/Builds/scripts/jar/VerticaBlazeApplication-1.0.jar'
LANGUAGE 'Java';

创建函数

=> CREATE FUNCTION scoreEAD AS LANGUAGE 'Java'
NAME 'com.blaze.vertica.EADModelFactory' LIBRARY ficoEADPD;

测试函数

=> SELECT account_number, purchase_price, Recent_Indexed_Valuation,
       Outst_Bal_Perc_Rec_Idx_Val,
       scoreEAD(Outst_Bal_Perc_Rec_Idx_Val, Purchase_Price, Recent_Indexed_Valuation) AS EAD_ESTIMATE
FROM EAD_DATASET;

  account_number | purchase_price | Recent_Indexed_Valuation | Outst_Bal_Perc_Rec_Idx_Val |   EAD_ESTIMATE
 ----------------+----------------+--------------------------+----------------------------+-------------------
               1 |         300000 |                   334912 |          4.961693774000000 | -37645.2852262864
               2 |         140000 |                   154571 |         22.924013590000000 | -78958.9842684706
               3 |         165125 |                   161023 |         17.864888020000000 | -65349.9408090219
               4 |         325000 |                   345408 |         31.875252620000000 | 20110.4389424053
               5 |         350000 |                   376018 |         14.990185220000000 | -8507.77831017182
               6 |         260000 |                   257841 |         62.017901640000000 |   74906.823918807

示例脚本

完整的示例数据库脚本 (dbscript.sh) 执行以下任务:

1. 创建库和函数

DROP LIBRARY IF EXISTS ficoEADPD CASCADE;
CREATE LIBRARY ficoEADPD AS '/home/dbadmin/Builds/scripts/jar/VerticaBlazeApplication-1.0.jar' LANGUAGE 'Java';
CREATE FUNCTION scoreEAD AS LANGUAGE 'Java' NAME 'com.blaze.vertica.EADModelFactory' LIBRARY ficoEADPD;

2. 创建表

=> DROP TABLE IF EXISTS scoringRunMetrics;
DROP TABLE
 
=> DROP TABLE IF EXISTS scoringInput;
DROP TABLE
 
=> DROP TABLE IF EXISTS scoringResults;
DROP TABLE
 
=> DROP TABLE IF EXISTS EAD_DATASET;
DROP TABLE
 
=> DROP TABLE IF EXISTS FICO_EAD_RESULTS;
DROP TABLE
 
=> CREATE TABLE EAD_DATASET (
    RDAR_ID auto_increment,
    Account_Number int,
    Recent_Indexed_Valuation int,
    Balance_Outstanding int,
    Purchase_Price int,
    Purchase_Valuation_Date int,
    Outst_Bal_Perc_Rec_Idx_Val numeric(31,15),
    Property_Type varchar(50),
    Region varchar(50),
    Current_Months_In_Arrears int,
    PD numeric(31,15),
    PD_LRA numeric(31,15),
    EAD numeric(31,15),
    EAD_LRA numeric(31,15),
    LGD numeric(31,15),
    LGD_LRA numeric(31,15),
    EL numeric(31,15),
    EL_LRA numeric(31,15),
    RWA_LRA numeric(31,15),
    RW_LRA numeric(31,15),
    PD_LRA_Floored numeric(31,15),
    LGD_LRA_Floored numeric(31,15),
    EL_Floored numeric(31,15),
    RWA_Floored numeric(31,15),
    RW_Floored numeric(31,15),
    setID int,
    EAD_estimate int,
    RDAR_JOB_ID int
);
CREATE TABLE
 
=> CREATE TABLE FICO_EAD_RESULTS (
    RDAR_JOB_ID int,
    RDAR_ID int,
    EAD_ESTIMATE numeric(31,16),
    SCORE_TS VARCHAR(44) NOT NULL DEFAULT TIMEOFDAY()--TIMESTAMP 'NOW' --CURRENT_TIMESTAMP
);
CREATE TABLE
 
=> CREATE TABLE scoringRunMetrics (
        id AUTO_INCREMENT
        , runType VARCHAR(20)
        , beginDT DATETIME
        , endDt DATETIME
        , totalRunTime INTERVAL
        , numberOfRecords INT
);
CREATE TABLE
 
=> CREATE TABLE scoringInput (
        id AUTO_INCREMENT
        , percentMaxBalance NUMERIC
        , percentDelinquent NUMERIC
        , expectedScore NUMERIC
);
CREATE TABLE
 
=> CREATE TABLE scoringResults (
        id AUTO_INCREMENT
        , runKey VARCHAR(40)
        , sourceRowID INT
        , percentMaxBalance FLOAT
        , percentDelinquent FLOAT
        , expectedScore NUMERIC
        , actualScore FLOAT
        , score_ts VARCHAR(44) NOT NULL DEFAULT TIMEOFDAY()--TIMESTAMP 'NOW' --CURRENT_TIMESTAMP
);
CREATE TABLE

3. 加载测试数据

=> COPY EAD_DATASET (
        Account_Number,
        Recent_Indexed_Valuation,
        Balance_Outstanding,
        Purchase_Price,
        Purchase_Valuation_Date,
        Outst_Bal_Perc_Rec_Idx_Val,
        Property_Type,
        Region,
        Current_Months_In_Arrears,
        PD,
        PD_LRA,
        EAD,
        EAD_LRA,
        LGD,
        LGD_LRA,
        EL,
        EL_LRA,
        RWA_LRA,
        RW_LRA,
        PD_LRA_Floored,
        LGD_LRA_Floored,
        EL_Floored,
        RWA_Floored,
        RW_Floored,
        setID,
        EAD_estimate
) FROM LOCAL '/home/dbadmin/Builds/scripts/data/EAD_dataset.csv'
        WITH DELIMITER AS ',' SKIP 1;-- skips header row
 Rows Loaded
-------------
       10000
(1 row)

4. 执行评分

\set scoreRunInputRecordCount ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT COUNT(1) FROM EAD_DATASET;'`''''
--\echo records to score :scoreRunInputRecordCount
 
-- get number of existing records in results table.
\set scoreRunBeginRecordCount ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT COUNT(1) FROM FICO_EAD_RESULTS;'`''''
--\echo get count of existing records in the results table
 
-- get BEGIN time of day.
\set scoreRunBeginTime ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT TIMEOFDAY();'`''''
SELECT :scoreRunBeginTime AS beginTime;
               beginTime
---------------------------------------
  Fri Jun 03 15:24:18.387493 2016 EDT
(1 row)
 
-- execute EAD scoring SQL
=> INSERT INTO FICO_EAD_RESULTS (RDAR_ID, RDAR_JOB_ID, EAD_ESTIMATE) SELECT RDAR_ID, RDAR_JOB_ID, scoreEAD (Outst_Bal_Perc_Rec_Idx_Val, Purchase_Price, Recent_Indexed_Valuation) AS EAD_ESTIMATE FROM EAD_DATASET;
 OUTPUT
--------
  10000
(1 row)
 
commit;
COMMIT
 
-- get END time of day.
\set scoreRunEndTime ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT TIMEOFDAY();'`''''
=> SELECT :scoreRunEndTime AS endTime;
                endTime
---------------------------------------
  Fri Jun 03 15:24:25.867694 2016 EDT
(1 row)

性能

在示例中,Vertica 处理 10000 条记录约需 7 秒

\set scoreRunEndRecordCount ''''`vsql -U dbadmin -w fic0@123 -t -c 'SELECT COUNT(1) FROM FICO_EAD_RESULTS;'`''''
 
=> INSERT INTO scoringRunMetrics (runType, beginDT, endDt, totalRunTime, numberOfRecords) VALUES ('EAD', :scoreRunBeginTime::DATETIME, :scoreRunEndTime::DATETIME, (:scoreRunEndTime::DATETIME - :scoreRunBeginTime::DATETIME), ((:scoreRunEndRecordCount::INT) - (:scoreRunBeginRecordCount::INT)));
 OUTPUT
--------
      1
(1 row)
 
COMMIT;
COMMIT
... running results on EAD using /home/dbadmin/Builds/scripts/sql-scripts/run-results-EAD.sql
.......................
=> SELECT beginDT, totalRunTime, numberOfRecords FROM scoringRunMetrics WHERE runType='EAD' ORDER BY beginDT DESC;
           beginDT           |  totalRunTime   | numberOfRecords
-----------------------------+-----------------+-----------------
  2016-06-03 15:24:18.387493 | 00:00:07.480201 |           10000
(1 row)
 
=> SELECT numberOfRecords, AVG(totalRunTime) FROM scoringRunMetrics WHERE runType='EAD' GROUP BY numberOfRecords ORDER BY numberOfRecords DESC;
 numberOfRecords |       AVG
-----------------+-----------------
           10000 | 00:00:07.480201
(1 row)

参考


原文来源:https://www.vertica.com/kb/Integrating-FICO-Blaze-Advisor-with-HPE-Vertica/Content/Partner/Integrating-FICO-Blaze-Advisor-with-HPE-Vertica.htm