ML —

Matrix Factorization

December 19, 2022•620 words

Matrix Factorization set unknown/missing number to zero Matrix factorization follows the following: Initialize two random matrices a and b with dimensions m by j and j by n such that when multiplied, their dimension matches the original matrix z (that has dimensions m by n). Multiply a by b to achieve an estimate for z. Subtract z from y for the known values of z, or some other loss function, to evaluate how far off the estimate is from the real matrix. Use gradient descent formulas to adj...

Read post

Eigenvalue & EigenVector

December 16, 2022•1 words

...

Read post

Fraud Detection

December 16, 2022•179 words

Fraud Detection plaid的payment risk就是这一类我想到的metrics Offline confusion matrix: recall NE Online number of fraud transaction loss from fraud, per day, or per million transaction 看这里 [[TP, FN, TOC, etc.....]] Precision Precision is calculated by dividing the true positives by anything that was predicted as a positive. Recall Recall (or True Positive Rate) is calculated by dividing the true positives by anything that should have been predicted as positive. Accuracy...

Read post

Naive Bayes

December 16, 2022•37 words

Naive Bayes 文章： https://tinyurl.com/2k5vw3cf 他也是翻译的： https://tinyurl.com/2ga7nqoc 简单的说，就是用Bayes Rule来预测 Training Data用来计算 prior probability 然后prediction其实就是计算posterior probability 强调一下什么是Naive Being Naive 我们假设一个句子中的每个单词都与其他单词无关。这意味着我们不再看整个句子，而是单个单词。我们把 P(A very close game) 写成： P(a very close game)=P(a)×P(very)×P(close)×P(game) 这个假设非常强大，但是非常有用。这使得整个模型能够很好地处理可能被错误标签的少量数据或数据。下一步将它应用到我们以前所说的： P（a very close game|Sports)=P(a|Sports)×P(very|Sports)×P(close|Sports)×P(game|Sports) 现在，我...

Read post

Laplace Smoothing

January 12, 2021•14 words

背景:为什么要做平滑处理? 　　零概率问题，就是在计算实例的概率时，如果某个量x，在观察样本库（训练集）中没有出现过，会导致整个实例的概率结果是0。在文本分类的问题中，当一个词语没有在训练样本中出现，该词语调概率为0，使用连乘计算文本出现概率时也为0。这是不合理的，不能因为一个事件没有观察到就武断的认为该事件的概率是0。拉普拉斯的理论支撑　　为了解决零概率的问题，法国数学家拉普拉斯最早提出用加1的方法估计没有出现过的现象的概率，所以加法平滑也叫做拉普拉斯平滑。　　假定训练样本很大时，每个分量x的计数加1造成的估计概率变化可以忽略不计，但可以方便有效的避免零概率问题。说点接地气的，在机器学习中常常会碰到对从未发生过的事件的预测，拉普拉斯的方法就可以发挥作用。比如下面是一些邮件的数据，左边两列是邮件中的是否包含某些关键字，第三列是该邮件是否为垃圾邮件，最后一列是这样的邮件有多少封：根据上面的数据容易得到，包含“发票”和“微信”且是垃圾邮件（表中第八行）的概率为（机器学习中一般是计算条件概率，这里为了解释方便，只去计算了如下概率）：仔细观察会发现，只包含“微信”的情...

Read post

Beta Distribution

January 12, 2021•5 words

这是我见到讲的最好的一个beta distribution http://varianceexplained.org/statistics/beta_distribution_and_baseball/ 这个是他的翻译 https://www.zhihu.com/question/30269898/answer/123261564 ...

Read post

Beta Distribution

January 10, 2021•6 words

Beta 分布可以作为先验概率对于一个我们不知道概率是什么，而又有一些合理的猜测时，beta分布能很好的作为一个表示概率的概率分布 https://www.zhihu.com/question/30269898/answer/123261564 ...

Read post

Distribution

January 10, 2021•10 words

今天查beta distribution看到知乎上面一个问题玩flappy bird以后得分是一个什么分布涉及到的几个分布链接放在下面今天发现wiki是没有内部的bookmark的，无法save https://en.wikipedia.org/wiki/Beta_distribution https://en.wikipedia.org/wiki/Geometric_distribution https://en.wikipedia.org/wiki/Poisson_distribution https://en.wikipedia.org/wiki/Exponential_distribution ...

Read post

Day 1

January 6, 2021•2 words

bookmark 创建第一天 ...

Read post