Soil Moisture Predictive Analysis using Machine Learning

Published  November 30, 2023   0
Soil Moisture Alert and Prediction System

The task of manually collecting soil moisture data at a specific site is laborious. This project aims to streamline this process by deploying a device in the area, allowing remote access to the data. Periodic visits are only necessary for power source recharging. This system enhances efficiency by automating data collection, providing real-time alerts, and enabling predictive analysis for better agricultural management.

Components Used:

  1. 5V / 3.3V Power Supply Module
  2. Raspberry Pi Zero
  3. Soil Moisture Sensor
  4. Potentiometer
  5. Jumper Wires
  6. 7805 Voltage Regulator

Note: This project was built for the World Energy Challenge 2023, I have used a Pi Zero from Raspberry Pi for data collection and an LM7805 voltage regulator from STmicroelectronics to power my project. Both components are purchased from Digikey as per contest rules. 

Working:

Soil Moisture Alert and Prediction System

  1. Use the potentiometer to set the sensitivity of the soil moisture sensor.
  2. Once sensitivity is set, insert the soil moisture sensor into the soil at the desired location.
  3. The Raspberry Pi reads data from the sensor every hour, storing it locally with a timestamp.
  4. When the sensor detects moisture levels below a specified point, it sends a LOW signal (0) to the Pi.
  5. Upon receiving this signal, the Pi sends an email alert to the user.
  6. Every 24 hours, the Pi sends the locally stored data to a server.
  7. The server receives and stores this data along with the previously received data.
  8. The server builds a machine learning model using the Random Forest algorithm with the entire dataset and stores the model locally.
  9. Users can load the model and use it to predict moisture content for a given date and season.

Circuit Diagram:

Circuit Diagram of Soil Moisture Alert and Prediction System

Power Supply Module:

  • Connect the power supply module to the breadboard.
  • Ensure it provides the required voltage for both the soil moisture sensor and the Raspberry Pi.

Soil Moisture Sensor:

  • Connect the + pin of the soil moisture sensor to the + pin of the potentiometer.
  • Connect the - pin of the soil moisture sensor to the - pin of the potentiometer.

Potentiometer:

  • Connect the VCC pin of the potentiometer to the power rail on the breadboard.
  • Connect the GND pin of the potentiometer to the ground rail on the breadboard.
  • Connect the DO (Digital Output) pin of the potentiometer to a breadboard row.

5V Regulator:

  • Connect the IN pin of the regulator to the power rail on the breadboard.
  • Connect the GND pin of the regulator to the ground rail on the breadboard.
  • Connect the OUT pin of the regulator to a breadboard row.

Raspberry Pi:

  • Connect the GPIO-18 pin of the Raspberry Pi to the same row as the DO pin of the potentiometer.
  • Connect the 5V pin of the Raspberry Pi to the same row as the OUT pin of the regulator.
  • Connect the GND pin of the Raspberry Pi to the ground rail on the breadboard.

Summary:

  • The power supply module and regulator provide power to the entire circuit.
  • The soil moisture sensor is connected to the potentiometer to adjust sensitivity.
  • The potentiometer is connected to the power and ground rails and provides its digital output to a breadboard row.
  • The Raspberry Pi reads the digital output from the potentiometer, gets power from the regulator, and shares a common ground with the rest of the circuit.

Code Explanation:

pi_code.py
import csi
import smtplib
import socket
import ssl
import time
import RPi.GPIO as io
from datetime import datetime, date

These lines import necessary modules. csv is used for handling CSV files, smtplib is used for sending emails, socket for socket communication, ssl for secure sockets, time for time-related operations, and RPi.GPIO for Raspberry Pi GPIO handling. datetime and date are used for working with date and time.

dataFilePath = "pi/projects/cd_project/mostiureData.csv"

Specifies the file path for storing moisture data.

class MosistureDataPoint:

    def __init__(self, value: int, dateTime: datetime):
    self.value = value
    self.dateTime = dateTime

This defines a class MoistureDataPoint to represent a data point. It has attributes for moisture value (value) and timestamp (dateTime).

def sendAlertEmail():
    smtpServer = "smtp.gmail.com"
    smtpServerPort = 465
    sslContext = ssl.create_default_context()
    message = """\
    Subject: [Alert] Moisture level has dropped below set threshold!!!
    Sensor has detected that the mostiure level has dropped below set threshold."""
    with smtplib.SMTP_SSL(smtpServer, smtpServerPort, context=sslContext) as server:
    server.login("alert@gmail.com", "password")
    server.sendmail("alert@gmail.com", "alert@gmail.com", message)

This function sends an email alert if the moisture level drops below a certain threshold. It uses Gmail's SMTP server for sending the email securely.

def getSeasonFromDate(date: date):
    month = date.month
    if 3 <= month <= 6:
        return "summer"
    elif 6 <= month <= 9:
        return "monsoon"
    elif 10 <= month <= 12 or 1 <= month <= 2:
        return "winter"
    else:
        return "Invalid Month"

Defines a function getSeasonFromDate() to determine the season based on the month of the date.

def writeToFile(dataPoints: list[MosistureDataPoint]):
    with open(dataFilePath, mode="w") as dataFile:
    openFile = csv.writer(
    dataFile, delimiter=",", quotechar='"', quoting=csv.QUOTE_MINIMAL
    )
    for dataPoint in dataPoints:
    openFile.writerow(
    [
    dataPoint.dateTime.strftime("%Y-%m-%d %H:%M:%S"),
    getSeasonFromDate(dataPoint.dateTime.date),
    dataPoint.value,
    ]
    )

This function writes the moisture data points to a CSV file. It includes the timestamp, season, and moisture value for each data point.

def sendDataToServer():
    receiverIP = "192.168.29.200"
    receiverPort = 8080
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    try:
    sock.connect((receiverIP, receiverPort))
    while True:
    fi = open(dataFilePath, "r")
    data = fi.read(1024)
    if not data:
    break
    while data:
    sock.send(str(data).encode())
    data = fi.read(1024)
    fi.close()
    except (ConnectionRefusedError, TimeoutError) as e:
    print(f"Error connecting to the server: {e}")
    finally:
    sock.close()

This function sends the content of the CSV file to a server over a TCP socket connection.

def main():
    pullRate = 3600
    io.setmode(io.BCM)
    ipGpioPin = 18
    io.setup(ipGpioPin, io.IN)
    tommDate = datetime.now().date() + 1
    dataPoints = []
    while True:
    try:
    moistureValue = io.input(ipGpioPin)
    currDate = datetime.now().date()
    if currDate == tommDate:
    writeToFile(dataPoints)
    sendDataToServer()
    tommDate = currDate
    dataPoints = []
    if moistureValue == 0:
    print("[WARNING] Mostiure level has droped below set threshold!!!")
    sendAlertEmail()
    else:
    print("[INFO] Mostiure level above set threshold...")
    dataPoints.append(MosistureDataPoint(moistureValue, datetime.now()))
    time.sleep(pullRate)
    except Exception as e:
    print("An error occurred: {}", e)
    finally:
    io.cleanup()
    if __name__ == "__main__":
    main()

The main() function is the starting point of the script. It continuously monitors the moisture level using a GPIO pin on the Raspberry Pi. If the moisture level drops below a threshold, it triggers an email alert and logs the data. The data is then periodically sent to a server. The script runs indefinitely, and GPIO cleanup is performed in the finally block to ensure proper resource release when the script is terminated.

server_code.py:
import joblib
import socket
import pandas as pd
from datetime import datetime
from sklearn.ensemble import RandomForestClassifier

These are import statements. joblib is used for saving and loading machine learning models, socket for network communication, pandas for data manipulation and analysis, datetime for working with dates and times, and RandomForestClassifier is an ensemble machine learning model from scikit-learn.

BUFFER_SIZE = 1024
dataFilePath = "server/projects/cd_project/moistureData.csv"
BUFFER_SIZE sets the size of the buffer for receiving data over the network. dataFilePath specifies the file path where moisture data will be stored.
def buildModel():
    df = pd.read_csv(dataFilePath)
    df["datetime"] = pd.to_datetime(df["datetime"])
    df["year"] = df["datetime"].dt.year
    df["month"] = df["datetime"].dt.month
    df["day"] = df["datetime"].dt.day
    df["hour"] = df["datetime"].dt.hour
    df["season"] = df["season"].map({"summer": 0, "monsoon": 1, "winter": 2})
    df = df.drop(columns=["datetime"])
    X = df[["year", "month", "day", "hour", "season"]]
    y = df["moisture_good"]
    model = RandomForestClassifier()
    model.fit(X, y)
    joblib.dump(model, "soil_moisture_model-{}.joblib".format(datetime.now().date()))

buildModel() function reads moisture data from a CSV file into a Pandas DataFrame. It then preprocesses the data by extracting year, month, day, hour, and mapping the season to numerical values. After that, it drops the original datetime column and separates the features (X) and target variable (y). A RandomForestClassifier model is then trained on this data, and the model is saved using joblib.

def main():
    hostIP = "192.168.29.200"
    receivingPort = 8080
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind((hostIP, receivingPort))
    conn = None
    try:
    while True:
    sock.listen(1)
    conn, addr = sock.accept()
    fo = open(dataFilePath, "a")
    data = conn.recv(BUFFER_SIZE).decode()
    if not data:
    continue
    while data:
    if not data:
    break
    else:
    fo.write(data)
    data = conn.recv(BUFFER_SIZE).decode()
    fo.close()
    buildModel()
    except Exception as e:
    print("An error occurred:", e)
    finally:
    if conn:
    conn.close()
    if __name__ == "__main__":
    main()

main() function sets up a socket to listen for incoming data. When data is received, it is appended to the specified CSV file (dataFilePath). The buildModel() function is then called to retrain the machine learning model with the updated data. The script runs indefinitely, listening for new data over the network. If an exception occurs, it is printed, and the script attempts to close the socket connection in the finally block.

predict.py
import joblib
import os
import re
import pandas as pd

These are import statements. joblib is used for loading machine learning models, os for interacting with the operating system, re for regular expressions, and pandas for working with data frames.

def main():
    model_files = [
    f
    for f in os.listdir()
    if f.startswith("soil_moisture_model-") and f.endswith(".joblib")
    ]

This line creates a list model_files containing filenames in the current directory that start with "soil_moisture_model-" and end with ".joblib". These filenames represent different versions of the machine learning model saved at different dates.

dates = [re.search(r"\d{4}-\d{2}-\d{2}", model).group() for model in model_files]

This line extracts the date from each model filename using a regular expression. The extracted dates are stored in the dates list.

latest_date = max(dates)

latest_model_filename = f"/home/damianironclad/projects/cd_project/soil_moisture_model-{latest_date}.joblib"

This determines the latest date from the extracted dates and constructs the filename of the latest model.

model = joblib.load(latest_model_filename)

The latest machine learning model is loaded using joblib.load().

    year = int(input("Year: "))
    month = int(input("Month (1-12): "))
    day = int(input("Day (1-31): "))
    hour = int(input("Hour (1 -24): "))
    season = int(input("Season (Summer->0 | Monsoon->1 | Winter->2): "))
    Collect user input for the year, month, day, hour, and season.
    new_data = pd.DataFrame(
    {
    "year": [year],
    "month": [month],
    "day": [day],
    "hour": [hour],
    "season": [season],
    }
    )

Create a new DataFrame new_data containing the user-inputted values.

predictions = model.predict(new_data)

Use the loaded machine learning model to make predictions on the user-inputted data.

    if predictions == 0:
    print("Soil moisture content will be ABOVE set threshold")
    else:
    print("Soil moisture content will be BELOW set threshold")

Print whether the predicted soil moisture content will be above or below the set threshold based on the machine learning model's prediction.

Code
##### pi_code #####

import csv
import smtplib
import socket
import ssl
import time
import RPi.GPIO as io
from datetime import datetime, date

dataFilePath = "pi/projects/cd_project/mostiureData.csv"


class MosistureDataPoint:
    def __init__(self, value: int, dateTime: datetime):
        self.value = value
        self.dateTime = dateTime


def sendAlertEmail():
    smtpServer = "smtp.gmail.com"
    smtpServerPort = 465
    sslContext = ssl.create_default_context()
    message = """\
    Subject: [Alert] Moisture level has dropped below set threshold!!!
    Sensor has detected that the mostiure level has dropped below set threshold."""

    with smtplib.SMTP_SSL(smtpServer, smtpServerPort, context=sslContext) as server:
        server.login("alert@gmail.com", "password")
        server.sendmail("alert@gmail.com", "alert@gmail.com", message)


def getSeasonFromDate(date: date):
    month = date.month
    if 3 <= month <= 6:
        return "summer"
    elif 6 <= month <= 9:
        return "monsoon"
    elif 10 <= month <= 12 or 1 <= month <= 2:
        return "winter"
    else:
        return "Invalid Month"


def writeToFile(dataPoints: list[MosistureDataPoint]):
    with open(dataFilePath, mode="w") as dataFile:
        openFile = csv.writer(
            dataFile, delimiter=",", quotechar='"', quoting=csv.QUOTE_MINIMAL
        )

        for dataPoint in dataPoints:
            openFile.writerow(
                [
                    dataPoint.dateTime.strftime("%Y-%m-%d %H:%M:%S"),
                    getSeasonFromDate(dataPoint.dateTime.date),
                    dataPoint.value,
                ]
            )


def sendDataToServer():
    receiverIP = "192.168.29.200"
    receiverPort = 8080
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    try:
        sock.connect((receiverIP, receiverPort))
        while True:
            fi = open(dataFilePath, "r")
            data = fi.read(1024)
            if not data:
                break
            while data:
                sock.send(str(data).encode())
                data = fi.read(1024)
            fi.close()

    except (ConnectionRefusedError, TimeoutError) as e:
        print(f"Error connecting to the server: {e}")

    finally:
        sock.close()


def main():
    pullRate = 3600

    io.setmode(io.BCM)
    ipGpioPin = 18
    io.setup(ipGpioPin, io.IN)

    tommDate = datetime.now().date() + 1
    dataPoints = []
    while True:
        try:
            moistureValue = io.input(ipGpioPin)
            currDate = datetime.now().date()
            if currDate == tommDate:
                writeToFile(dataPoints)
                sendDataToServer()
                tommDate = currDate
                dataPoints = []

            if moistureValue == 0:
                print("[WARNING] Mostiure level has droped below set threshold!!!")
                sendAlertEmail()
            else:
                print("[INFO] Mostiure level above set threshold...")

            dataPoints.append(MosistureDataPoint(moistureValue, datetime.now()))
            time.sleep(pullRate)

        except Exception as e:
            print("An error occurred: {}", e)

        finally:
            io.cleanup()


if __name__ == "__main__":
    main()



##### server_code.py #####

import joblib
import socket
import pandas as pd
from datetime import datetime
from sklearn.ensemble import RandomForestClassifier

BUFFER_SIZE = 1024
dataFilePath = "server/projects/cd_project/mostiureDate.csv"


def buildModel():
    df = pd.read_csv(dataFilePath)

    df["datetime"] = pd.to_datetime(df["datetime"])
    df["year"] = df["datetime"].dt.year
    df["month"] = df["datetime"].dt.month
    df["day"] = df["datetime"].dt.day
    df["hour"] = df["datetime"].dt.hour
    df["season"] = df["season"].map({"summer": 0, "monsoon": 1, "winter": 2})
    df = df.drop(columns=["datetime"])

    X = df[["year", "month", "day", "hour", "season"]]
    y = df["mostiure_good"]

    model = RandomForestClassifier()
    model.fit(X, y)

    joblib.dump(model, "soil_moisture_model-{}.joblib".format(datetime.now().date()))


def main():
    hostIP = "192.168.29.200"
    receivingPort = 8080
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.bind((hostIP, receivingPort))

    conn = None  # Initialize conn outside the try block

    try:
        while True:
            sock.listen(1)
            conn, addr = sock.accept()
            fo = open(dataFilePath, "a")

            data = conn.recv(BUFFER_SIZE).decode()
            if not data:
                continue
            while data:
                if not data:
                    break
                else:
                    fo.write(data)
                    data = conn.recv(BUFFER_SIZE).decode()
            fo.close()

            buildModel()

    except Exception as e:
        print("An error occurred:", e)

    finally:
        if conn:
            conn.close()


if __name__ == "__main__":
    main()




##### predict.py #####

import joblib
import os
import re
import pandas as pd

def main():
    model_files = [
        f
        for f in os.listdir()
        if f.startswith("soil_moisture_model-") and f.endswith(".joblib")
    ]

    dates = [re.search(r"\d{4}-\d{2}-\d{2}", model).group() for model in model_files]

    latest_date = max(dates)
    latest_model_filename = f"/home/damianironclad//projects/cd_project/soil_moisture_model-{latest_date}.joblib"

    model = joblib.load(latest_model_filename)

    print("Enter which date you want to predict:")

    year = int(input("Year: "))
    month = int(input("Month (1-12): "))
    day = int(input("Day (1-31): "))
    hour = int(input("Hour (1 -24): "))
    season = int(input("Season (Summer->0 | Monsoon->1 | Winter->2): "))

    new_data = pd.DataFrame(
        {
            "year": [year],
            "month": [month],
            "day": [day],
            "hour": [hour],
            "season": [season],
        }
    )

    predictions = model.predict(new_data)

    if predictions == 0:
        print("Soil moisture content will be ABOVE set threshold")
    else:
        print("Soil moisture content will be BELOW set threshold")

if __name__ == "__main__":
    main()