Dealing with non-differentiable loss functions in ML¶

Let the loss function $$f(y)=\mathbb1(h_\theta(x)\ne y)$$

Clearly, it is non differentiable with respect to the weights.
Hence, we resort to policy gradient methods, which is used in reinforcement learning.

Our objective is to minimize the loss f(y), i.e $$\min E_{y \sim P_{\theta}} \left[ f(y) \right]$$

$$\nabla_{\theta} E_{y \sim P_{\theta}}[ f(y)] = \nabla_{\theta} \int P_{\theta}(y) f(y) = \int P_{\theta}(y) f(y) \frac{\nabla_{\theta} P_{\theta}(y) }{P_{\theta}(y)} = E_{y \sim P_{\theta}} \left[ f(y) \frac{\nabla_{\theta} P_{\theta}(y)}{P_{\theta}(y)} \right]$$$$= E_{y \sim P_{\theta}} \left[ f(y) \ln \nabla_{\theta} P_{\theta}(y) \right]$$

Note that now, we are no longer directly differentiating the loss. This manipulations is sometimes called the log derivative trick

import numpy as np
from sklearn.metrics import accuracy_score

%%latex
Let us dive into the code. We will create a synthetic data set to test whether it works, where
$$y=\begin{cases} 
      1  &  x> 0.5 \\
      
      0 & x \leq 0.5
   \end{cases}
$$

x=np.random.rand(200,1)
y=(x>0.5).astype('int32')

#Setting up predictor function
def sigmoid(z,weights):
    temp=1+np.exp(-(weights[0]*z+weights[1]))
    return 1.0/temp

$$\sigma(x)= 1/(1+exp(-x))$$

import random

alpha=0.1
weights=np.random.rand(2,1)*0.01
#print(weights)
num_iter=1000
minibatch_size=8

for h in range(num_iter):
    change=0
    minibatch=np.array(random.sample(list(np.hstack([x,y])),minibatch_size))
    x_samp=minibatch[:,0]
    y_samp=minibatch[:,1]
    #print(x_samp)
    #print(y_samp)
    for m,l in enumerate(list(x_samp)):
        
        prob_positive=sigmoid(l,weights) # computing probability of class 1
        #print(logpred)
        ypred=np.random.rand()<prob_positive # sampling from binomial disto
        
        reward= 2*(ypred==y_samp[m])-1 #reward setting
       
        if ypred==1:
            grads=np.array([(1-prob_positive)*l,(1-prob_positive)])
        
        if ypred==0:
            grads=np.array([-prob_positive*l,-prob_positive])
    
        change=change+alpha*grads*reward
        #print(grads)
    #print(change/minibatch_size)
    weights=weights+change/minibatch_size
    if h%200==0:
        print(h)
        
#checking the accuracy        
pred=[]

for idx,el in enumerate(x):
    
    positive_prob=sigmoid(el,weights)
    
    if positive_prob>=0.5:
        t=1
    else:
        t=0
    pred.append(t)
print('Accuracy is {}'.format(accuracy_score(pred,y)))
#print(weights)

0
200
400
600
800
Accuracy is 0.94

Sridhar Thiagarajan

Navigation

Dealing with non-differentiable loss functions in ML¶