Deep learning model that can learn image information
After entering only safety/non-safety information for the entire image, create a Class Auction Map using a deep learning model that can learn image information. Through this process, the deep learning model can be recognized for dangerous situations in the construction site. Both Encodes received as image inputs and Decodes that generate text are constructed using Transformer models. Encoder leverages pre-trained Distillation ViT to reduce computational speed and improve performance.