Dealing with non-differentiable loss functions in ML¶
Let the loss function $$f(y)=\mathbb1(h_\theta(x)\ne y)$$
Clearly, it is non differentiable with respect to the weights.
Hence, we resort to policy gradient methods, which is used in reinforcement learning.
Our objective is to minimize the loss f(y), i.e $$\min E_{y \sim P_{\theta}} \left[ f(y) \right]$$
$$\nabla_{\theta} E_{y \sim P_{\theta}}[ f(y)] = \nabla_{\theta} \int P_{\theta}(y) f(y) =  \int P_{\theta}(y) f(y) \frac{\nabla_{\theta} P_{\theta}(y) }{P_{\theta}(y)}   = E_{y \sim P_{\theta}} \left[ f(y)  \frac{\nabla_{\theta} P_{\theta}(y)}{P_{\theta}(y)} \right]$$$$=  E_{y \sim P_{\theta}} \left[ f(y)  \ln \nabla_{\theta} P_{\theta}(y) \right]$$
Note that now, we are no longer directly differentiating the loss. This manipulations is sometimes called the log derivative trick
In [11]:
import numpy as np
from sklearn.metrics import accuracy_score
In [12]:
%%latex
Let us dive into the code. We will create a synthetic data set to test whether it works, where
$$y=\begin{cases} 
      1  &  x> 0.5 \\
      
      0 & x \leq 0.5
   \end{cases}
$$
In [13]:
x=np.random.rand(200,1)
y=(x>0.5).astype('int32')
In [14]:
#Setting up predictor function
def sigmoid(z,weights):
    temp=1+np.exp(-(weights[0]*z+weights[1]))
    return 1.0/temp
$$\sigma(x)= 1/(1+exp(-x))$$
In [15]:
import random
In [16]:
alpha=0.1
weights=np.random.rand(2,1)*0.01
#print(weights)
num_iter=1000
minibatch_size=8
for h in range(num_iter):
    change=0
    minibatch=np.array(random.sample(list(np.hstack([x,y])),minibatch_size))
    x_samp=minibatch[:,0]
    y_samp=minibatch[:,1]
    #print(x_samp)
    #print(y_samp)
    for m,l in enumerate(list(x_samp)):
        
        prob_positive=sigmoid(l,weights) # computing probability of class 1
        #print(logpred)
        ypred=np.random.rand()<prob_positive # sampling from binomial disto
        
        reward= 2*(ypred==y_samp[m])-1 #reward setting
       
        if ypred==1:
            grads=np.array([(1-prob_positive)*l,(1-prob_positive)])
        
        if ypred==0:
            grads=np.array([-prob_positive*l,-prob_positive])
    
        change=change+alpha*grads*reward
        #print(grads)
    #print(change/minibatch_size)
    weights=weights+change/minibatch_size
    if h%200==0:
        print(h)
        
#checking the accuracy        
pred=[]
for idx,el in enumerate(x):
    
    positive_prob=sigmoid(el,weights)
    
    if positive_prob>=0.5:
        t=1
    else:
        t=0
    pred.append(t)
print('Accuracy is {}'.format(accuracy_score(pred,y)))
#print(weights)
In [ ]:
 
