Identify the plant seedlings species from 12 different species of grains and weeds using a convolutional neural network for a large national home plant grower.
In recent times, the field of agriculture has been in urgent need of modernizing, since the amount of manual work people need to put in to check if plants are growing correctly is still highly extensive. Despite several advances in agricultural technology, people working in the agricultural industry still need to have the ability to sort and recognize different plants and weeds, which takes a lot of time and effort in the long term.
Exploratory Data Analysis
- We noticed from most images that we displayed, background of plants has fairly red/yellow hue of gravel in contrast to green hue of the plants. However there are some images taken over dark background. Plotting pixel colors of all images in HSV we have marked the section of the Green hue that we would likely use for masking. Sometimes it is good idea to remove the background for faster model training but we risk loosing information from the images that might be usefull in training.
- We can notice that images for some plant species have been taken at various points of its life cycle so continous retraining of the model as the more images are taken is advised.
Data Pre-processing
- Most images have been taken in a dim environment so intensity of pixels blures edges. We improved that using contrast and brightness changes from OpenCV library. Similarly with slight alpha change we were able to improve visibility of the plants over background.
- Image pixel values range from 0-255, our method of normalization here will be scaling - we shall divide all the pixel values by 255 to standardize the images to have values between 0-1.
- Our Test set contains only 10% images of the original set. Problem with imbalanced classes is obvious and gets more pronouced with smaller sets like this. We will calculate class weights and pass them at time of fitting model to give more opportunity for the smaller classes to be trained.
Model Implementation & Performance
In order for agricuture to prosper it is equaly important for the model to identify useful crops and weeds and our model shows good F1 score of 87.3%.
- Good identification of crops could provide useful information about their growth phases and environmental effects on them.
- Good identification of weeds could provide useful information on when soil needs to be treated with weed control as well as which weeds are dominant so that correct weed control can be applied.
Company could easily deploy our model in plant nurseries equipped with cameras to monitor seedlings growth and alert staff of potential weed issues. Our model is able to predict useful crops with relatively higher accuracy than weeds and it could be deployed in scenarios where crop seedlings growth is monitored. With the increase in number of images we expect weed identification to improve.
Future Model Improvements
- Enhance Hue related analysis we performed in the EDA section to Remove Backgrounds from the images and see if that improves model performance, as well as the unrelated image identification that could come from broken and wrongly adjusted camera.
- Get more information about some types of weeds with similar structure and see if they have similar weed control process as combining them into one class could show better model performance, as they are almost indistinguishable to human vision as well.
Detailed Analysis: NationalNursery-PlantClassification.ipynb
Technologies: openCV for image processing, TensorFlow-Keras for building CNN model