kaggle dat a thon 23 u can find datasets on kaggle

About the competition The numbers are concerning, given that proper hand hygiene is one of the most effective measures to halt the spread of CoviD19 and other pathogens. Simple handwashing not only protects individuals from contracting the disease but also prevents transmission to others. However, it is alarming to note that 2.2 billion people worldwide lack access to safe water at home, and an additional 1.37 billion people lack handwashing facilities at home. Furthermore, nearly two billion people worldwide rely on healthcare facilities that lack basic water services. Climate change, population growth, and pollution are threatening the world’s water resources. As the global population continues to expand, the challenge of accessing sufficient water while preserving aquatic ecosystems’ integrity persists. The Pacific Institute collaborates with stakeholders worldwide to address water resource issues and ensure that communities and nature have the water they need to thrive presently and in the future. Understanding water sanitation and ensuring water cleanliness is crucial in both rural and urban areas. One way to achieve this is by assessing the quality of the water we consume daily. The objective of this competition is to train a machine learning model using the provided water quality data in the training file and use it to predict the quality estimation result for the test dataset. For further information on the competition, including instructions on submitting predictions, please refer to Kaggle’s competition documentation available at the following link: www.kaggle.com/docs/competitions.
About the data set The dataset provided in the train, csv consists of the following features: – 1d: The unique ID for each row. – categorya – categoryF: 6 category columns with suffix A to . – featurea – featureI: 9 feature columns with suffix a to I. – compositiona – compositionJ: 10 composition columns with suffix A to J. – unit: The unit of measurement for the result values. – result: The measure for water quality (target variable). The datasets provided could be read using the read_csv( ) function in the pandas module. \# code to read the dataset import pandas pandas . read_csv(“train.csv”) Acknowledgements We thank European Environment Agency and The World Bank for providing this dataset.
Evaluation Metric The evaluation metric for this competition is Root Mean Squared Logarithmic Error (RMSLE). The RMSLE is calculated as

Read more here: Source link