Membandingkan Nilai Akurasi BERT dan DistilBERT pada Dataset Twitter
Main Article Content
Abstract
The growth of digital media has been incredibly fast, which has made consuming information a challenging task. Social media processing aided by Machine Learning has been very helpful in the digital era. Sentiment analysis is a fundamental task in Natural Language Processing (NLP). Based on the increasing number of social media users, the amount of data stored in social media platforms is also growing rapidly. As a result, many researchers are conducting studies that utilize social media data. Opinion mining (OM) or Sentiment Analysis (SA) is one of the methods used to analyze information contained in text from social media. Until now, several other studies have attempted to predict Data Mining (DM) using remarkable data mining techniques. The objective of this research is to compare the accuracy values of BERT and DistilBERT. DistilBERT is a technique derived from BERT that provides speed and maximizes classification. The research findings indicate that the use of DistilBERT method resulted in an accuracy value of 97%, precision of 99%, recall of 99%, and f1-score of 99%, which is higher compared to BERT that yielded an accuracy value of 87%, precision of 91%, recall of 91%, and f1-score of 89%.
Article Details
How to Cite
Fajri, F., Tutuko, B., & Sukemi, S. (2022). Membandingkan Nilai Akurasi BERT dan DistilBERT pada Dataset Twitter. JUSIFO (Jurnal Sistem Informasi), 8(2), 71-80. https://doi.org/10.19109/jusifo.v8i2.13885
Section
Articles
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.
How to Cite
Fajri, F., Tutuko, B., & Sukemi, S. (2022). Membandingkan Nilai Akurasi BERT dan DistilBERT pada Dataset Twitter. JUSIFO (Jurnal Sistem Informasi), 8(2), 71-80. https://doi.org/10.19109/jusifo.v8i2.13885
References
Acheampong, F. A., Nunoo-Mensah, H., & Chen, W. (2021). Recognizing emotions from texts using an ensemble of transformer-based language models. 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 161–164. https://doi.org/10.1109/ICCWAMTIP53232.2021.9674102
Adel, H., Dahou, A., Mabrouk, A., Elaziz, M. A., Kayed, M., El-Henawy, I. M., Alshathri, S., & Ali, A. A. (2022). Improving Crisis Events Detection Using DistilBERT with Hunger Games Search Algorithm. Mathematics 2022, 10(3), 447. https://doi.org/10.3390/MATH10030447
Adoma, A. F., Henry, N. M., & Chen, W. (2020). Comparative analyses of bert, roberta, distilbert, and xlnet for text-based emotion recognition. 17th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2020, 117–121. https://doi.org/10.1109/ICCWAMTIP51612.2020.9317379
Ayoub, J., Yang, X. J., & Zhou, F. (2021). Combat covid-19 infodemic using explainable natural language processing models. Information Processing & Management, 58(4), 102569. https://doi.org/10.1016/J.IPM.2021.102569
Basiri, M. E., Nemati, S., Abdar, M., Asadi, S., & Acharrya, U. R. (2021). A novel fusion-based deep learning model for sentiment analysis of covid-19 tweets. Knowledge-Based Systems, 228, 107242. https://doi.org/10.1016/J.KNOSYS.2021.107242
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: pre-training of deep bidirectional transformers for language understanding. Proceedings OfNAACL-HLT 2019, 4171–4186. https://aclanthology.org/N19-1423.pdf
Do, P., & Phan, T. H. V. (2021). Developing a bert based triple classification model using knowledge graph embedding for question answering system. Applied Intelligence 2021 52:1, 52(1), 636–651. https://doi.org/10.1007/S10489-021-02460-W
Dogra, V., Singh, A., Verma, S., Kavita, K., Jhanjhi, N. Z., & Talib, M. N. (2021). Analyzing distilbert for sentiment classification of banking financial news. Lecture Notes in Networks and Systems, 248, 501–510. https://doi.org/10.1007/978-981-16-3153-5_53/COVER
Faturrohman, F., & Rosmala, D. (2022). Analisis sentimen sosial media dengan metode bidirectional gated recurrent unit. Prosiding Diseminasi FTI. https://eproceeding.itenas.ac.id/index.php/fti/article/view/962
Gao, Z., Feng, A., Song, X., & Wu, X. (2019). Target-dependent sentiment classification with bert. IEEE Access, 7, 154290–154299. https://doi.org/10.1109/ACCESS.2019.2946594
Geetha, M. P., & Karthika Renuka, D. (2021). Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks, 2, 64–69. https://doi.org/10.1016/J.IJIN.2021.06.005
Gimpel, K., Schneider, N., O’connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N. A. (2011). Part-of-speech tagging for twitter: annotation, features, and experiments. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 42–47. https://aclanthology.org/P11-2008.pdf
Hermanto, D. T., Setyanto, A., & Luthfi, E. T. (2021). Algoritma lstm-cnn untuk binary klasifikasi dengan word2vec pada media online. Creative Information Technology Journal, 8(1), 64–77. https://doi.org/10.24076/CITEC.2021V8I1.264
Huddar, M. G., Sannakki, S. S., & Rajpurohit, V. S. (2021). Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional lstm. Multimedia Tools and Applications, 80(9), 13059–13076. https://doi.org/10.1007/S11042-020-10285-X/METRICS
Joulin, A., Grave, É., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. The Association for Computational Linguistics, 2, 427–431. https://aclanthology.org/E17-2068
Naseem, U., Razzak, I., & Eklund, P. W. (2021). A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimedia Tools and Applications, 80(28–29), 35239–35266. https://doi.org/10.1007/S11042-020-10082-6/METRICS
Nurrohmat, M. A., & SN, A. (2019). Sentiment analysis of novel review using long short-term memory method. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(3), 209–218. https://doi.org/10.22146/IJCCS.41236
Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., Hoste, V., Apidianaki, M., Tannier, X., Loukachevitch, N., Kotelnikov, E., Bel, N., Jiménez-Zafra, S. M., & Eryigit, G. (2016). Aspect based sentiment analysis. SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings, 19–30. https://doi.org/10.18653/V1/S16-1002
Preite, S. (2019). Deep question answering: a new teacher for distilbert [University of Bologna]. https://amslaurea.unibo.it/20384/1/MasterThesisBologna.pdf
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. https://arxiv.org/abs/1910.01108v4
Santosa, R. D. W., Bijaksana, M. A., & Romadhony, A. (2021). Implementasi algoritma long short-term memory (lstm) untuk mendeteksi penggunaan kalimat abusive pada teks bahasa indonesia. EProceedings of Engineering, 8(1). https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/14318
Adel, H., Dahou, A., Mabrouk, A., Elaziz, M. A., Kayed, M., El-Henawy, I. M., Alshathri, S., & Ali, A. A. (2022). Improving Crisis Events Detection Using DistilBERT with Hunger Games Search Algorithm. Mathematics 2022, 10(3), 447. https://doi.org/10.3390/MATH10030447
Adoma, A. F., Henry, N. M., & Chen, W. (2020). Comparative analyses of bert, roberta, distilbert, and xlnet for text-based emotion recognition. 17th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2020, 117–121. https://doi.org/10.1109/ICCWAMTIP51612.2020.9317379
Ayoub, J., Yang, X. J., & Zhou, F. (2021). Combat covid-19 infodemic using explainable natural language processing models. Information Processing & Management, 58(4), 102569. https://doi.org/10.1016/J.IPM.2021.102569
Basiri, M. E., Nemati, S., Abdar, M., Asadi, S., & Acharrya, U. R. (2021). A novel fusion-based deep learning model for sentiment analysis of covid-19 tweets. Knowledge-Based Systems, 228, 107242. https://doi.org/10.1016/J.KNOSYS.2021.107242
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: pre-training of deep bidirectional transformers for language understanding. Proceedings OfNAACL-HLT 2019, 4171–4186. https://aclanthology.org/N19-1423.pdf
Do, P., & Phan, T. H. V. (2021). Developing a bert based triple classification model using knowledge graph embedding for question answering system. Applied Intelligence 2021 52:1, 52(1), 636–651. https://doi.org/10.1007/S10489-021-02460-W
Dogra, V., Singh, A., Verma, S., Kavita, K., Jhanjhi, N. Z., & Talib, M. N. (2021). Analyzing distilbert for sentiment classification of banking financial news. Lecture Notes in Networks and Systems, 248, 501–510. https://doi.org/10.1007/978-981-16-3153-5_53/COVER
Faturrohman, F., & Rosmala, D. (2022). Analisis sentimen sosial media dengan metode bidirectional gated recurrent unit. Prosiding Diseminasi FTI. https://eproceeding.itenas.ac.id/index.php/fti/article/view/962
Gao, Z., Feng, A., Song, X., & Wu, X. (2019). Target-dependent sentiment classification with bert. IEEE Access, 7, 154290–154299. https://doi.org/10.1109/ACCESS.2019.2946594
Geetha, M. P., & Karthika Renuka, D. (2021). Improving the performance of aspect based sentiment analysis using fine-tuned Bert Base Uncased model. International Journal of Intelligent Networks, 2, 64–69. https://doi.org/10.1016/J.IJIN.2021.06.005
Gimpel, K., Schneider, N., O’connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N. A. (2011). Part-of-speech tagging for twitter: annotation, features, and experiments. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 42–47. https://aclanthology.org/P11-2008.pdf
Hermanto, D. T., Setyanto, A., & Luthfi, E. T. (2021). Algoritma lstm-cnn untuk binary klasifikasi dengan word2vec pada media online. Creative Information Technology Journal, 8(1), 64–77. https://doi.org/10.24076/CITEC.2021V8I1.264
Huddar, M. G., Sannakki, S. S., & Rajpurohit, V. S. (2021). Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional lstm. Multimedia Tools and Applications, 80(9), 13059–13076. https://doi.org/10.1007/S11042-020-10285-X/METRICS
Joulin, A., Grave, É., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. The Association for Computational Linguistics, 2, 427–431. https://aclanthology.org/E17-2068
Naseem, U., Razzak, I., & Eklund, P. W. (2021). A survey of pre-processing techniques to improve short-text quality: a case study on hate speech detection on twitter. Multimedia Tools and Applications, 80(28–29), 35239–35266. https://doi.org/10.1007/S11042-020-10082-6/METRICS
Nurrohmat, M. A., & SN, A. (2019). Sentiment analysis of novel review using long short-term memory method. IJCCS (Indonesian Journal of Computing and Cybernetics Systems), 13(3), 209–218. https://doi.org/10.22146/IJCCS.41236
Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., Al-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., Hoste, V., Apidianaki, M., Tannier, X., Loukachevitch, N., Kotelnikov, E., Bel, N., Jiménez-Zafra, S. M., & Eryigit, G. (2016). Aspect based sentiment analysis. SemEval 2016 - 10th International Workshop on Semantic Evaluation, Proceedings, 19–30. https://doi.org/10.18653/V1/S16-1002
Preite, S. (2019). Deep question answering: a new teacher for distilbert [University of Bologna]. https://amslaurea.unibo.it/20384/1/MasterThesisBologna.pdf
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. https://arxiv.org/abs/1910.01108v4
Santosa, R. D. W., Bijaksana, M. A., & Romadhony, A. (2021). Implementasi algoritma long short-term memory (lstm) untuk mendeteksi penggunaan kalimat abusive pada teks bahasa indonesia. EProceedings of Engineering, 8(1). https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/view/14318