Membandingkan Nilai Akurasi BERT dan DistilBERT pada Dataset Twitter

Faisal Fajri; Bambang Tutuko; Sukemi Sukemi

doi:10.19109/jusifo.v8i2.13885

JUSIFO (Jurnal Sistem Informasi) | Vol. 8, No. 2, Desember 2022

PDF

Published: Dec 31, 2022

DOI: https://doi.org/10.19109/jusifo.v8i2.13885

Keywords:

BERT, DistilBERT, Twitter

Faisal Fajri

Universitas Sriwijaya

Bambang Tutuko

Universitas Sriwijaya

Sukemi Sukemi

Universitas Sriwijaya

Abstract

The growth of digital media has been incredibly fast, which has made consuming information a challenging task. Social media processing aided by Machine Learning has been very helpful in the digital era. Sentiment analysis is a fundamental task in Natural Language Processing (NLP). Based on the increasing number of social media users, the amount of data stored in social media platforms is also growing rapidly. As a result, many researchers are conducting studies that utilize social media data. Opinion mining (OM) or Sentiment Analysis (SA) is one of the methods used to analyze information contained in text from social media. Until now, several other studies have attempted to predict Data Mining (DM) using remarkable data mining techniques. The objective of this research is to compare the accuracy values of BERT and DistilBERT. DistilBERT is a technique derived from BERT that provides speed and maximizes classification. The research findings indicate that the use of DistilBERT method resulted in an accuracy value of 97%, precision of 99%, recall of 99%, and f1-score of 99%, which is higher compared to BERT that yielded an accuracy value of 87%, precision of 91%, recall of 91%, and f1-score of 89%.

How to Cite

Fajri, F., Tutuko, B., & Sukemi, S. (2022). Membandingkan Nilai Akurasi BERT dan DistilBERT pada Dataset Twitter. JUSIFO (Jurnal Sistem Informasi), 8(2), 71-80. https://doi.org/10.19109/jusifo.v8i2.13885

Issue

Vol. 8 No. 2 (2022): JUSIFO (Jurnal Sistem Informasi) | December 2022

Section

Articles

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.

How to Cite

Fajri, F., Tutuko, B., & Sukemi, S. (2022). Membandingkan Nilai Akurasi BERT dan DistilBERT pada Dataset Twitter. JUSIFO (Jurnal Sistem Informasi), 8(2), 71-80. https://doi.org/10.19109/jusifo.v8i2.13885

Download Citation

Endnote/Zotero/Mendeley (RIS)

References

Acheampong, F. A., Nunoo-Mensah, H., & Chen, W. (2021). Recognizing emotions from texts using an ensemble of transformer-based language models. 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 161–164. https://doi.org/10.1109/ICCWAMTIP53232.2021.9674102

Adel, H., Dahou, A., Mabrouk, A., Elaziz, M. A., Kayed, M., El-Henawy, I. M., Alshathri, S., & Ali, A. A. (2022). Improving Crisis Events Detection Using DistilBERT with Hunger Games Search Algorithm. Mathematics 2022, 10(3), 447. https://doi.org/10.3390/MATH10030447

Adoma, A. F., Henry, N. M., & Chen, W. (2020). Comparative analyses of bert, roberta, distilbert, and xlnet for text-based emotion recognition. 17th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2020, 117–121. https://doi.org/10.1109/ICCWAMTIP51612.2020.9317379

Ayoub, J., Yang, X. J., & Zhou, F. (2021). Combat covid-19 infodemic using explainable natural language processing models. Information Processing & Management, 58(4), 102569. https://doi.org/10.1016/J.IPM.2021.102569

Basiri, M. E., Nemati, S., Abdar, M., Asadi, S., & Acharrya, U. R. (2021). A novel fusion-based deep learning model for sentiment analysis of covid-19 tweets. Knowledge-Based Systems, 228, 107242. https://doi.org/10.1016/J.KNOSYS.2021.107242

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: pre-training of deep bidirectional transformers for language understanding. Proceedings OfNAACL-HLT 2019, 4171–4186. https://aclanthology.org/N19-1423.pdf

Do, P., & Phan, T. H. V. (2021). Developing a bert based triple classification model using knowledge graph embedding for question answering system. Applied Intelligence 2021 52:1, 52(1), 636–651. https://doi.org/10.1007/S10489-021-02460-W

Dogra, V., Singh, A., Verma, S., Kavita, K., Jhanjhi, N. Z., & Talib, M. N. (2021). Analyzing distilbert for sentiment classification of banking financial news. Lecture Notes in Networks and Systems, 248, 501–510. https://doi.org/10.1007/978-981-16-3153-5_53/COVER

Faturrohman, F., & Rosmala, D. (2022). Analisis sentimen sosial media dengan metode bidirectional gated recurrent unit. Prosiding Diseminasi FTI. https://eproceeding.itenas.ac.id/index.php/fti/article/view/962

Gao, Z., Feng, A., Song, X., & Wu, X. (2019). Target-dependent sentiment classification with bert. IEEE Access, 7, 154290–154299. https://doi.org/10.1109/ACCESS.2019.2946594

Geetha, M. P., & Karthika Renuka, D. (2021). Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks, 2, 64–69. https://doi.org/10.1016/J.IJIN.2021.06.005

Gimpel, K., Schneider, N., O’connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N. A. (2011). Part-of-speech tagging for twitter: annotation, features, and experiments. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 42–47. https://aclanthology.org/P11-2008.pdf

Hermanto, D. T., Setyanto, A., & Luthfi, E. T. (2021). Algoritma lstm-cnn untuk binary klasifikasi dengan word2vec pada media online. Creative Information Technology Journal, 8(1), 64–77. https://doi.org/10.24076/CITEC.2021V8I1.264

Huddar, M. G., Sannakki, S. S., & Rajpurohit, V. S. (2021). Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional lstm. Multimedia Tools and Applications, 80(9), 13059–13076. https://doi.org/10.1007/S11042-020-10285-X/METRICS

Joulin, A., Grave, É., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. The Association for Computational Linguistics, 2, 427–431. https://aclanthology.org/E17-2068

Naseem, U., Razzak, I., & Eklund, P. W. (2021). A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimedia Tools and Applications, 80(28–29), 35239–35266. https://doi.org/10.1007/S11042-020-10082-6/METRICS

Nurrohmat, M. A., & SN, A. (2019). Sentiment analysis of novel review using long short-term memory method. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(3), 209–218. https://doi.org/10.22146/IJCCS.41236

Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., Hoste, V., Apidianaki, M., Tannier, X., Loukachevitch, N., Kotelnikov, E., Bel, N., Jiménez-Zafra, S. M., & Eryigit, G. (2016). Aspect based sentiment analysis. SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings, 19–30. https://doi.org/10.18653/V1/S16-1002

Preite, S. (2019). Deep question answering: a new teacher for distilbert [University of Bologna]. https://amslaurea.unibo.it/20384/1/MasterThesisBologna.pdf

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. https://arxiv.org/abs/1910.01108v4

Santosa, R. D. W., Bijaksana, M. A., & Romadhony, A. (2021). Implementasi algoritma long short-term memory (lstm) untuk mendeteksi penggunaan kalimat abusive pada teks bahasa indonesia. EProceedings of Engineering, 8(1). https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/14318

Article Sidebar

Main Article Content

Abstract

Article Details

How to Cite

References