Overview

Pandemic is an extended version of an epidemic, which spreads among people beyond a geographical area. As time passes the number of pandemic disease increases and now we have reached till the most infectious and threatful orthocoronavirinae from the family coronaviridae. The first case of COrona VIrus Disease 2019 (COVID-19) was detected in Wuhan, a city in China around 17 November 2019. After 7 to 14 days of vulnerability, there will be symptoms of high fever, cough and also shortness in breathing. In the span of 3 months, COVID-19 infected more than 3 lacs of people around the world. Here, we will deep dive into the prediction and spreading of pandemic diseases with Machine Learning models and classifiers.

In Machine Learning, many models are built for the prediction of the epidemic as well as pandemic diseases. Here we are discussing NaiveBayes, RandomForest and Adaboost classifiers. Apart from them, there are SVM, XG-Boost, Gradient Boost and much more.

We can also find the spreading of this pandemic disease to a larger extent, here we will go through the basic compartmental model used to find the disease spread among the people. Machine learning makes our model learn the dataset and make the most accurate prediction of the disease about the future. Machine learning also aims to provide the spread of disease among the individuals and thus help us in taking necessary precautions for the cause.

What is pandemic Disease?

Pandemic is a type of epidemic disease but is extended to all over the globe. Pandemic flu was the first pandemic disease. Thereafter came Smallpox, Monkeypox, Nipah virus, SARS, HIV/AIDS, and now we are in the outbreak of the Novel Coronavirus. The meaning of pandemic is widespread or prevalent which means it can spread all over the country or world. And It can be an outbreak of all diseases.

How will we predict Pandemic Disease?

Before building our model or classifier we have to go under these stages for prediction of pandemic disease.

Retrieval of the dataset : Collection data set from websites or tweet datasets

: Collection data set from websites or tweet datasets Preprocessing the dataset : This is the most important step before building the model. Here we clean the dataset by transforming raw facts into an informative format.

: This is the most important step before building the model. Here we clean the dataset by transforming raw facts into an informative format. Feature Extraction : This is the process of removing noisy or unwanted columns from our dataset

: This is the process of removing noisy or unwanted columns from our dataset Train-Test split: here we split our dataset into training and testing data which can be in proportion of 70% and 30% respectively.

Types of Classifiers

After undergoing the stages which are mentioned above, we are ready to build our model. Machine learning algorithms that give a categorical or binary value as output is called classifiers and process is called classification. The classification algorithm classifies a group of values in the training set into clusters and predict the result.

Naive Bayes Classifier Here we mention Naive Bayes also known as a probabilistic classifier, which is one the simplest and basic classifier.let the features vary from x0 to xn and our classes vary from c0 to cn. The model checks the probability of feature occurring in each class and predicts the most likely occurring class. Naive Bayes uses the famous Bayesian rule and performs the prediction. As the name says its naive or has a lack of experience, hence gives a poor accuracy rate compared to the rest of the classifiers. Naive Bayes gives a better result for multiple classes as well as for text classification. Random Forest Classifier Random Forest is an ensemble classifier which means it consists of a group of algorithms or classifiers. It develops random numbers of decision trees from the training dataset and uses the majority voting method to assign votes for the decision tree and predicts the final class which has the highest vote. This gives a better accuracy rate when there is a large dataset and when there are more missing values. we can use CORONA VIRUS (COVID-19) TWEETS DATASET that contains the live feed for COVID-19 virus-related tweets. Adaboost Classifier This is a boosting classifier which means improved or better. As its name says it is a better classifier compared to naive Bayes. It uses a majority voting method. It is also an ensemble classifier like the random forest classifier. They assign weightage after training each classifier and also based on the accuracy of each classifier. The more accurate classifier has the highest vote and predicts the outcome depending on the highest vote. This classifier is better than a Random Forest classifier because it gives weak classifier the final decision, hence give a better accuracy rate than random forest classifier. These models help us to predict the outbreak of epidemic or pandemic diseases like coronavirus. Next from this model we try getting the predictions and evaluate the results. Predicting a pandemic disease such as corona, flu pandemic, etc can help individuals in a particular place to take safety measures against the pandemic outbreak.



How will we find the spread of a pandemic outbreak?

The spread of this pandemic coronavirus is so fast and rapidly. It can be passed through air, water or through any sources. Here we use compartmental models to identify the spread.

In these models, there are 3 main compartments:

susceptible are those individuals who are likely to be infected by the disease.

Infectious is the number of infected individuals.

Recovery are those individuals who have retained their immunity.

There are mainly 3 types of compartmental models SIR, SIS, and SI model.





Fig 1: This graph helps us to understand the rapid growth of coronavirus in China, X-axis denotes the month and Y-axis denotes the number of cases.



Image Credit: Nature

In the case of the SI model, individuals from susceptible states move to the infected state and remain infected throughout. Corona is not a disease where individuals remain infected throughout so we don’t use the SI model for identifying the spread of the disease.





Fig2: Here beta is the transmission rate

SIS is a model similar to the SI model where an individual does not move to recovery state but move to the susceptible state, hence this model also cannot be used for the spread of coronavirus





Fig3: Beta is the transmission rate and gamma the recovery rate

In the SIR model, individuals from susceptible states move to the infected and from their move to the recovery state. Also, individuals affected with a disease or in the infected state, they move to recovery state or in other words disease can be recovered by the individuals and can retain their immunity. Coronavirus is a disease that can be recovered, hence we can use the SIR model to predict or identify the spread of disease among individuals.

Fig 4: Here beta is the transmission rate and gamma the recovery rate

Conclusion

Getting nervous is not a solution rather takes and spreads awareness among society. Prediction of this virus makes people aware of the disease and can take precaution measures and can help the country to fight against the disease. Spreading models helps the individual understand the spread of disease. We have heard the saying “prevention better than cure”, it is better to take precaution for the disease rather than looking for cures for the disease. These prediction and spreading models help the people to understand the rates and transmission of disease and can take action at the earliest.

Read Next: Hackers steal sensitive information exploiting COVID-19 outbreak