1. COLON CANCER CLASSIFICATION (ML)
Predicting Length of Stay for Colon Cancer Patients
- The dataset comes from the University of Medicine and Pharmacy of Craiova, Romania. It contains information on the hospitalization period of 621 patients diagnosed with colorectal cancer who underwent surgery between 2007 and 2014. The dataset includes 8 attributes and a target attribute named Class (short, medium, long).
- During exploratory data analysis (EDA), I found that the dataset is imbalanced. The surgery, stage, and tumor parameters are the main attributes that influence the Class attribute.
- For data preprocessing, I used both label encoding and one-hot encoding. One-hot encoding yielded better accuracy compared to label encoding.
- I applied several algorithms: Naive Bayes (GaussianNB, BernoulliNB, MultinomialNB), Decision Trees, and ensemble algorithms (Random Forest).
- I used cross-validation to assess the models' performance. The accuracies were: GaussianNB - 0.70, BernoulliNB - 0.66, MultinomialNB - 0.67, DecisionTree - 0.67, Random Forest - 0.74, and XGBClassifier - 0.76.

Colon Cancer project source code:
2.DATA VISUALIZATION
Data Visualization using immigrants to Canada dataset
- The dataset is taken from Kaggle. It consists of 195 records and 39 attributes. Which is having the data from 1980 to 2013. In which 4 are object type and 35 are integer type.
- I have used the Line plot for plotting the trend of immigrants from Haiti in the interval of 1980 - 2013.
- Area plot: taken the top 5 immigrant countries for plotting.

- Histogram: gives out the count.

- Bar chart: plotted for the highest immigrants from the respective continents. And also observed the trend of the Icelandic immigrants over the interval.
