KeyError: ''val_loss" when training model
I am training a model with keras and am getting an error in callback in fit_generator function. I always run to epoch 3rd and get this error
annotation_path = 'train2.txt'
log_dir = 'logs/000/'
classes_path = 'model_data/deplao_classes.txt'
anchors_path = 'model_data/yolo_anchors.txt'
class_names = get_classes(classes_path)
num_classes = len(class_names)
anchors = get_anchors(anchors_path)
input_shape = (416,416) # multiple of 32, hw
is_tiny_version = len(anchors)==6 # default setting
if is_tiny_version:
model = create_tiny_model(input_shape, anchors, num_classes,
freeze_body=2, weights_path='model_data/tiny_yolo_weights.h5')
else:
model = create_model(input_shape, anchors, num_classes,
freeze_body=2, weights_path='model_data/yolo_weights.h5') # make sure you know what you freeze
logging = TensorBoard(log_dir=log_dir)
checkpoint = ModelCheckpoint(log_dir + 'ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5',
monitor='val_loss', save_weights_only=True, save_best_only=True, period=3)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, verbose=1)
early_stopping = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
[error]
Traceback (most recent call last):
File "train.py", line 194, in <module>
_main()
File "train.py", line 69, in _main
callbacks=[logging, checkpoint])
File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\engine\training_generator.py", line 251, in fit_generator
callbacks.on_epoch_end(epoch, epoch_logs)
File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\callbacks.py", line 79, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "C:\Users\ilove\AppData\Roaming\Python\Python37\lib\site-packages\keras\callbacks.py", line 429, in on_epoch_end
filepath = self.filepath.format(epoch=epoch + 1, **logs)
KeyError: 'val_loss'
can anyone find out problem to help me?
Thanks in advance for your help.
This callback runs at the end of iteration 3.
The error message is claiming that there is no val_loss in the
logs
variable when executing:This would happen if fit is called without validation_data.
I would start by simplifying the path name for model checkpoint. It is probably enough to include the epoch in the name.
This answer doesn't apply to the question, but this was at the top of the Google results for
keras "KeyError: 'val_loss'"
so I'm going to share the solution for my problem.The error was the same for me: when using
val_loss
in the checkpoint file name, I would get the following error:KeyError: 'val_loss'
. My checkpointer was also monitoring this field, so even if I took the field out of the file name, I would still get this warning from the checkpointer:WARNING:tensorflow:Can save best model only with val_loss available, skipping.
In my case, the issue was that I was upgrading from using Keras and Tensorflow 1 separately, to using the Keras that came with Tensorflow 2. The
period
param forModelCheckpoint
had been replaced withsave_freq
. I erroneously assumed thatsave_freq
behaved the same way, so I set it tosave_freq=1
thinking this would save it every epic. However, the docs state:Setting
save_freq='epoch'
solved the issue for me. Note: the OP was still usingperiod=1
so this is definitely not what was causing their problemFor me the problem was that I was trying to set the
initial_epoch
(inmodel.fit
) to a value other than the standard 0. I was doing so because I'm runningmodel.fit
in a loop that runs 10 epochs each cycle, then retrieves history data, checks if loss has decreased and runsmodel.fit
again until it's satisfied.I thought I had to update the value as I was restarting the previous model but apparently no...