Evaluation of Data Clustering Accuracy using K-Means Algorithm

Authors

  • Suraya Suraya Institut Sains & Teknologi AKPRIND
  • Muhammad Sholeh Institut Sains & Teknologi AKPRIND
  • Uning Lestari Institut Sains & Teknologi AKPRIND

DOI:

https://doi.org/10.59653/ijmars.v2i01.504

Keywords:

metric, valuation, normalized, clustering, labels

Abstract

Data clustering is one of the methods in data science that is often used in data analysis. This method is used in making groupings from a collection of datasheets. Data clustering is done to find patterns or relationships between data. This research aims to evaluate the accuracy of data clustering using K-Means algorithm on wine datasheet. Wine datasheet has 13 features that describe the chemical characteristics of three types of wine. The clustering process must produce the best clustering evaluation metrics. The evaluation metric is done through comparison between the clustering results of K-Means algorithm with Davies Bouldin and Silhouette. The research steps involved data standardization, selection of the optimal number of clusters, and assessment of clustering accuracy. The research method uses KDD which consists of pre-processing, transformation, model building and model evaluation. Experimental results show that appropriate parameters and cluster initialization can improve clustering evaluation metrics. The clustering results show that the normalized datasheet produces evaluation metrics for Davies Bouldin 2 groups and Silhouette produces 3 groups. Before normalization, Davies Boulidin results in 7 groups and Silhouette results in 2 groups. In conclusion, this study produced different evaluation metrics between normalized and non-normalized datasheets. The selection of the number of groups chosen depends on the context of the data analysis performed and is selected into 3 groups which can be labelled "Superior Variety", the second group "Intermediate Variety" and the third group "Standard Variety".

Downloads

Download data is not yet available.

References

Amanda, & Veronica Sitorus, M. (2021). Penerapan Algoritma K-Means Clustering Untuk Pengelompokan Konsumsi Produk Kosmetik milik PT Cedefindo. Jurnal Ilmiah MIKA AMIK Al Muslim, V(2), 63–68.

Asmiatun, S., Wakhidah, N., Putri, A. N., & Kunci, K. (2019). Identifikasi Kondisi Permukaan Jalan Menggunakan K-Means Clustering Road Surface Conditions Identification Using K-Means Clustering. November 2019, 23–30.

Awaludin, M. (2014). Penerapan Algoritma K-Means Clustering Pada K-Harmonic Means Untuk Schedule Preventive Maintenance Service. Jurnal Sistem Informasi Universitas Suryadarma, 6(1), 1–17. https://doi.org/10.35968/jsi.v6i1.271

Cielen, D., Meysman, A. D. B., & Ali, M. (2016). Introducing Data Science: Big Data, Machine Learning, and more, using Python tools - PDFDrive.com. Manning Publications.

Deny Jollyta , Muhammad Siddik , Herman Mawengkang, S. E. (2021). Teknik Evaluasi Cluster Solusi Menggunakan Python Dan Rapidminer. Deepublish Publisher.

Dewi, D. A. I. C., & Pramita, D. A. K. (2019). Analisis Perbandingan Metode Elbow dan Silhouette pada Algoritma Clustering K-Medoids dalam Pengelompokan Produksi Kerajinan Bali. Matrix : Jurnal Manajemen Teknologi Dan Informatika, 9(3), 102–109. https://doi.org/10.31940/matrix.v9i3.1662

Faizah, N. M., Surohman, Fabrianto, L., Hendra, & Prasetyo, R. (2020). Unbalanced Data Clustering with K-Means and Euclidean Distance Algorithm Approach Case Study Population and Refugee Data. Journal of Physics: Conference Series, 1477(2). https://doi.org/10.1088/1742-6596/1477/2/022005

Garang, B. D. (2022). Penerapan Data Mining Untuk Prediksi Penjualan Smartphone Paling Laris Menggunakan Metode K-Nearest Neighbor (Studi Kasus : Pusat Ponsel & Laptop). 1–54.

Informatika, S., & Polinema, A. (2020). Evaluasi Kmeans Clustering pada Preprocessing Sistem Temu Kembali Informasi. Siap), 2020.

Jollyta, D., Efendi, S., Zarlis, M., & Mawengkang, H. (2019). Optimasi Cluster Pada Data Stunting: Teknik Evaluasi Cluster Sum of Square Error dan Davies Bouldin Index. Prosiding Seminar Nasional Riset Information Science (SENARIS), 1(September), 918. https://doi.org/10.30645/senaris.v1i0.100

Kurniadi, D., Agustin, Y. H., Akbar, H. I. N., & Farida, I. (2023). Penerapan Algoritma k-Means Clustering untuk Pengelompokan Pembangunan Jalan pada Dinas Pekerjaan Umum dan Penataan Ruang. Aiti, 20(1), 64–77. https://doi.org/10.24246/aiti.v20i1.64-77

Listiani, L., Agustin, Y. H., & Ramdhani, M. Z. (2019). Implementasi algoritma k-means cluster untuk rekomendasi pekerjaan berdasarkan pengelompokkan data penduduk. Seminar Nasional Sistem Informasi Dan Teknik Informatika, 761–769.

Mathur, P. (2019). Machine Learning Applications Using Python. Apress.

Muliono, R., & Sembiring, Z. (2019). Data Mining Clustering Menggunakan Algoritma K-Means Untuk Klasterisasi Tingkat Tridarma Pengajaran Dosen. CESS (Journal of Computer Engineering, System and Science), 4(2), 2502–2714.

Nurjanah, M., & Arifin, T. (2021). Penerapan Algoritma K-Means Untuk Analisis Data Ulasan Di Situs Tripadvisor. Jurnal Responsif : Riset Sains Dan Informatika, 3(1), 75–82. https://doi.org/10.51977/jti.v3i1.395

Orisa, M. (2022). Optimasi Cluster pada Algoritma K-Means. Prosiding SENIATI, 430–437. https://doi.org/10.36040/seniati.v6i2.5034

Ozdemir, S. (2017). Principles of Data Science. Packt Publishing Ltd. https://doi.org/10.1145/3097983.3105808

Paembonan, S., & Abduh, H. (2021). Penerapan Metode Silhouette Coefficient untuk Evaluasi Clustering Obat. PENA TEKNIK: Jurnal Ilmiah Ilmu-Ilmu Teknik, 6(2), 48. https://doi.org/10.51557/pt_jiit.v6i2.659

Priyatman, H., Sajid, F., & Haldivany, D. (2019). Klasterisasi Menggunakan Algoritma K-Means Clustering untuk Memprediksi Waktu Kelulusan Mahasiswa. Jurnal Edukasi Dan Penelitian Informatika (JEPIN), 5(1), 62. https://doi.org/10.26418/jp.v5i1.29611

Purba, Y., Prayudha, J., & Azanuddin, A. (2022). Penerapan Metode K-Means Clustering Pada Data Mining Untuk Menentukan Genre Musik Lagu Di Radio Joy 101 Fm. Jurnal Cyber Tech, 1–8. https://ojs.trigunadharma.ac.id/index.php/jct/article/view/1646%0Ahttps://ojs.trigunadharma.ac.id/index.php/jct/article/download/1646/1002

Quinthara, D. R., Fauzan, A. C., & Huda, M. M. (2023). Penerapan Algoritma K-Modes Menggunakan Validasi Davies Bouldin Index Untuk Klasterisasi Karakter Pada Game Wild Rift. Journal of System and Computer Engineering (JSCE), 4(2), 123–135. https://doi.org/10.61628/jsce.v4i2.802

Sholeh, M., & Aeni, K. (2023). Perbandingan Evaluasi Metode Davies Bouldin, Elbow dan Silhouette pada Model Clustering dengan Menggunakan Algoritma K-Means. STRING (Satuan Tulisan Riset Dan Inovasi Teknologi), 8(1), 56. https://doi.org/10.30998/string.v8i1.16388

Tambunan, M. P. (2021). Penerapan Data Mining Dalam Analisa Data Pemakaian Obat Dengan Menerapkan Algoritma K-Means. Jurnal Informasi Dan Teknlogi Ilmiah (INTI), 8(3), 109–113.

Vania, P., & Sari, B. N. (2023). Perbandingan Metode Elbow dan Silhouette untuk Penentuan Jumlah Klaster yang Optimal pada Clustering Produksi Padi menggunakan Algoritma K-Means. Jurnal Ilmiah Wahana Pendidikan, 9(2), 547–558.

Downloads

Published

2023-12-21

How to Cite

Suraya, S., Sholeh, M., & Lestari, U. (2023). Evaluation of Data Clustering Accuracy using K-Means Algorithm. International Journal of Multidisciplinary Approach Research and Science, 2(01), 385–396. https://doi.org/10.59653/ijmars.v2i01.504