wepay:基于机器学习的自动化欺诈检测系统
wepay:基于机器学习的自动化欺诈检测系统
第三方支付平台,https://go.wepay.com/about-wepay
wepay
https://en.wikipedia.org/wiki/WePay
wepay基于机器学习进行欺诈检测,减少资损。
you have to be able to spot fraud with a high degree of accuracy so that you can shut it down before it results in a loss.
人工经验+机器学习,实现自动化,减少人力成本,提升性能和效率
At WePay, it increasingly also means machine learning models which can spot complicated fraud patterns faster with less human intervention.
目前基于机器学习进行反欺诈存在的挑战
(1)欺诈不是静止不变的
道高一尺魔高一丈,fraud is constantly changing
Machine learning models are great for spotting fraud, but they aren’t psychic — they rely on past data to make predictions about the transactions they’re currently looking at. Since the patterns aren’t constant, that means they go out of date quickly.模型性能衰减较快
根据wepay的经验,Beyond the month, its accuracy may drop by 50%, and will continue to slowly decrease after that.
(2)更新模型比较困难
Retraining a model by running the full machine learning pipeline can take hours. This includes extraction and transformation (ETL) of incremental new data, feature creation and engineering, model training, performance evaluation, and model deployment.
为了减少复杂度,某些公司采用简单的模型,logistic regression,但是治标不治本。the newest data might not be the most useful for model training purposes because new fraud can take time to mature — it can often take two or more months for a cardholder to see and report fraud. This means new data can be labeled good before it’s seen as bad, and training models with the latest data can actually hurt model accuracy.
wepay欺诈检测自动化
wepay 自动化方法:
+ Pull new, incremental retraining data daily 增量计算
+ Refresh the model by running it again with combined new and existing fraud data
+ Test the new models, evaluating each on Area Under Curve (AUC), precision and recall
+ Transfer models that meet initial test criteria into a pseudo-production environment for additional assessment against test cases + Deploy upon satisfactory completion of all performance and test case validation
基于python实现机器学习自动化
wepay采用python作为模型原型和生产环境语言。
基于python做web服务,flask,django
基于python scikit-learn pandas numpy构建机器学习模型,快速,方便,简洁
Just copy the model files to production instance and import the same libraries in production as in development, and you are almost good to go!
都是基于python开发,部署到迁移,完全兼容
Putting it all together
模型日更新,When we’re training our models, we simply exclude transactions flagged as good in the most recent time period while including every transaction flagged as fraud that we can. This lets us train on data that includes the most recent fraud patterns while also not contaminating our model with bad data.
总结
数据科学自动化,提升性能,减少成本,增加效率
持续学习新技术,优化方法,提升反欺诈效果
fraud doesn’t stand still. If we’re to be successful in fighting crime and protecting our customers’ money, we must constantly be working to improve our approach, explore new techniques, and create new systems that let us tackle newer and more sophisticated attacks.
比如深度学习算法,ensemble technique等
from:
http://blog.wepay.com/automating-machine-learning-for-platform-fraud-detection/
感受:
基于业务的机器学习平台,自动化系统和平台化,增量计算,模型日更新,借鉴应用到实际工作。