you have to be able to spot fraud with a high degree of accuracy so that you can shut it down before it results in a loss.


At WePay, it increasingly also means machine learning models which can spot complicated fraud patterns faster with less human intervention.



道高一尺魔高一丈,fraud is constantly changing

Machine learning models are great for spotting fraud, but they aren’t psychic — they rely on past data to make predictions about the transactions they’re currently looking at. Since the patterns aren’t constant, that means they go out of date quickly.模型性能衰减较快

根据wepay的经验,Beyond the month, its accuracy may drop by 50%, and will continue to slowly decrease after that.


Retraining a model by running the full machine learning pipeline can take hours. This includes extraction and transformation (ETL) of incremental new data, feature creation and engineering, model training, performance evaluation, and model deployment.

为了减少复杂度,某些公司采用简单的模型,logistic regression,但是治标不治本。the newest data might not be the most useful for model training purposes because new fraud can take time to mature — it can often take two or more months for a cardholder to see and report fraud. This means new data can be labeled good before it’s seen as bad, and training models with the latest data can actually hurt model accuracy.


wepay 自动化方法:

+ Pull new, incremental retraining data daily 增量计算

+ Refresh the model by running it again with combined new and existing fraud data

+ Test the new models, evaluating each on Area Under Curve (AUC), precision and recall

+ Transfer models that meet initial test criteria into a pseudo-production environment for additional assessment against test cases + Deploy upon satisfactory completion of all performance and test case validation




基于python scikit-learn pandas numpy构建机器学习模型,快速,方便,简洁

Just copy the model files to production instance and import the same libraries in production as in development, and you are almost good to go!


Putting it all together

模型日更新,When we’re training our models, we simply exclude transactions flagged as good in the most recent time period while including every transaction flagged as fraud that we can. This lets us train on data that includes the most recent fraud patterns while also not contaminating our model with bad data.




fraud doesn’t stand still. If we’re to be successful in fighting crime and protecting our customers’ money, we must constantly be working to improve our approach, explore new techniques, and create new systems that let us tackle newer and more sophisticated attacks.

比如深度学习算法,ensemble technique等






