Dr. Jyoti Prakash Singh, Top 2% Scientist and Associated Professor: Sharing Insights and Innovations in Natural Language Processing.

Explore My Research Journal

Journal

Network Modeling Analysis in Health Informatics and Bioinformatics | Published: 29 March 2024

Identification of COVID-19 with CT scans using radiomics and DL-based features

Deep learning plays a crucial role in identifying COVID-19 patients from computed tomography (CT) scans by leveraging its ability to analyze vast amounts of image data and extract patterns indicative of the disease. While deep learning-based models have consistently achieved state-of-the-art performance, the incorporation of relevant handcrafted features alongside deep learning-based features has the potential to enhance overall performance even further. Therefore, this paper proposes a hybrid approach that combines handcrafted and deep learning features from CT scan images for accurate COVID-19 classification. Handcrafted features capturing image statistics are derived through radiomics, while deep learning features are extracted using the Xception model. Preprocessing techniques like binary thresholding and segmentation are used to remove noises and locate the proper diseased area to enhance COVID-19 diagnosis. The approach is evaluated on a dataset of 2482 CT scan images and outperforms state-of-the-art techniques with an accuracy of 0.98, a positive predictive value (PPV) of 0.99, sensitivity of 0.99, specificity of 0.98, and an -score of 0.99. The combined use of radiomics and deep learning features can make it a promising tool for COVID-19 diagnosis and monitoring, offering support for clinical decision-making and potentially benefiting other respiratory diseases.

Sunil Dalal
Jyoti Prakash Singh
Arvind Kumar Tiwari
Abhinav Kumar

Filtering offensive language from multilingual social media contents: A deep learning approach | Published: 27 February 2024

Engineering Applications of Artificial Intelligence

In the face of uncontrolled offensive content on social media, automated detection emerges as a critical need. This paper tackles this challenge by proposing a novel approach for identifying offensive language in multilingual, code-mixed, and script-mixed settings. The study presents a novel multilingual hybrid dataset constructed by merging diverse monolingual and bilingual resources. Further, we systematically evaluate the impact of input representations (Word2Vec, Global Vectors for Word Representation (or GloVe), Bidirectional Encoder Representations from Transformers (or BERT), and uniform initialization) and deep learning models (Convolutional Neural Network (or CNN), Bidirectional Long Short Term Memory (or Bi-LSTM), Bi-LSTM-Attention, and fine-tuned BERT) on detection accuracy. Our comprehensive experiments on a dataset of 42,560 social media comments from five languages (English, Hindi, German, Tamil, and Malayalam) reveal the superiority of fine-tuned BERT. Notably, it achieves a macro average -score of 0.79 for monolingual tasks and an impressive 0.86 for code-mixed and script-mixed tasks. These findings significantly advance offensive language detection methodologies and shed light on the complex dynamics of multilingual social media, paving the way for more inclusive and safer online communities.

Sunil Saumya
Abhinav Kumar
and Jyoti Prakash Singh

Journal of Computational Social Science | Published: Published: 07 March 2024

A deep learning framework for clickbait spoiler generation and type identification

Clickbait pertains to attention-grabbing or misleading content that sacrifices accuracy for clicks. This marketing tactic is widely used to drive online traffic, but it can lead to misinformation, frustration, and a diminished user experience. Consequently, the timely identification and countering of clickbait posts is crucial. One way to counter clickbait posts is to spoil them by creating short messages that reveal their true content. This research generates short texts called clickbait spoiler for clickbait headlines. We have fine-tuned the Generative Pretrained Transformer 2 (GPT-2) medium model with the clickbait dataset to generate spoilers for them. Since these spoilers vary from one word to multiple paragraphs, we also determine the type of spoilers. For spoiler type identification a sentence encoder Bidirectional Encoder Representations from Transformers (BERT) is used to generate embeddings of each sentence, followed by classification by Support Vector Machine (SVM). The spoiler generation by GPT-2 yielded a Bilingual Evaluation Understudy (BLEU) score of 0.58 outperforming the previous state-of-the-art models. The spoiler identification model achieved a precision of 0.83, recall of 0.82, F1-Score of 0.80, MCC Score of 0.63, and accuracy of 0.83 surpassing previous state-of-the-art models.

Itishree Panda
Jyoti Prakash Singh
Gayadhar Pradhan

Multimedia Tools and Applications | Published: 06 February 2024

An ensemble approach to detect depression from social media platform: E-CLS

Depression, a prevalent adult symptom, can arise from various sources, including mental health conditions and social interactions. With the rise of social media, adults often share their daily experiences, potentially revealing their emotional state on social platforms, like X (formerly Twitter) and Facebook. In this study, we present Ensemble (E) of Convolutional Neural Network (C), Attention-based Long Short-Term Memory (L) Network, and Support Vector Machine (S) (E-CLS), utilizing Term Frequency-Inverse Document Frequency (TF-IDF) vectors, Global Vectors for Word Representation (GloVe) and Bidirectional Encoder Representations from Transformers (BERT) word embeddings. This model effectively identifies depressive posts. Validated with a Twitter-derived depressive dataset, E-CLS achieves an impressive -score of 0.91, surpassing existing machine-learning and deep-learning models by 2%. This research advances the detection of depression in social media posts, holding promise for enhanced mental health monitoring. Furthermore, our work contributes to the burgeoning field of mental health informatics by leveraging state-of-the-art techniques in natural language processing. The ensemble approach synergizes the strengths of Convolutional Neural Network (CNN) for local pattern recognition, Long Short-Term Memory (LSTM) Network for sequential context understanding, and Support Vector Machine (SVM) for robust classification. The incorporation of TF-IDF vectors and GloVe embeddings enriches feature representation, enhancing the model’s ability to discern nuanced linguistic cues associated with depression. By demonstrating superior performance over established models, E-CLS showcases its potential as a valuable tool in digital mental health interventions.

Shashank Shekher Tiwari
Rajnish Pandey
Akshay Deepak
Jyoti Prakash Singh & Sudhakar Tripathi

Applied Soft Computing | Published: 3 May 2024

Supervised weight learning-based PSO framework for single document extractive summarization

The need for automatic text summarization is natural: there is a huge volume of information available online, which prompts for a widespread interest in extracting relevant information in a concise and understandable manner. Here, automated text summarization has been treated as an extractive single-document summarization problem in the proposed system. To solve this problem, a particle swarm optimisation (PSO) algorithm-based approach is suggested, with the goal of producing good summaries in terms of content coverage, informativeness, and readability. This paper introduces XSumm-PSO: a new approach based on PSO optimization technique in a supervised manner for extractive summarization. Further, this paper also contributes a new feature “incorrect word” that captures misspelled words in the candidate sentences. This feature is combined with nine existing features used by proposed model to generate error free summaries. As a result, the proposed XSumm-PSO framework produces superior performance achieving improvements of +2.7%, +0.8%, and +0.8% for ROUGE-1, ROUGE-2, and ROUGE-L scores, respectively, on DUC 2002 dataset, over state-of-the-art techniques. The corresponding improvements on the CNN/DailyMail dataset are +0.97%, +0.25%, and +0.49%. We also performed sample t-test, showing the proposed approach is statistically consistent across various runs.

Sangita Singh
Jyoti Prakash Singh
Akshay Deepak

International Journal of System Assurance Engineering and Management | Published: 09 April 2024

Early prediction of promising expert users on community question answering sites

Community question answering (CQA) sites have become a popular medium for exchanging knowledge with other members of the community. Users can publish questions, answers, and comments on these sites. Furthermore, users of the CQA sites are able to express their thoughts on a post by voting positively or negatively. People anticipate rapid and high-quality answers from these CQA sites, which are often provided by a small group of users known as experts. A large number of queries remain unanswered on these forums, emphasising the scarcity of experts. To address this problem, we presented a methodology for predicting promising expert users for CQA sites. Promising experts are individuals that have just joined the community and have shown glimpses of producing high-quality content to the site. The suggested method looks at the first month of a user’s postings to determine whether or not the individual is a promising expert. The experimental findings revealed that the suggested approach accurately predicts future experts.

Pradeep Kumar Roy
Jyoti Prakash Singh

Journal of Electronic Imaging | Published: March,2024

Lightweight Image Encryption based on Composite Pseudo-Random Number Generator

With the ever-increasing use of the Internet, secure data transmission has become critical. The benefits that communication networks have provided are enormous but at the expense of data security. Privacy and security are becoming major concerns due to the rise in the use of public communication networks. Recently various encryption schemes have been proposed but most of them have high computational overhead and are also not resistant to recent attacks. For data transmission networks, a secure and lightweight encryption (LWE) scheme has been developed. Using pseudo-random number sequences (PRNS) and DNA arithmetic, the plain-text image (PI) is confused and diffused one at a time in the proposed scheme. A composite PRNG (CPRNG) uses different sets of keys to generate PRNSs. The proposed LWE uses the CPRNG, which is composed of the chaotic map and 256-bit linear feedback shift register. Both the components can run on low computational overhead. The large key space makes the scheme highly robust and scalable. This encryption scheme has been tested on a variety of gray-label images to demonstrate its robustness and broad applicability. The test results of the evaluation are extremely promising, indicating that the scheme can provide adequate security against attacks.

Deepak Kumar
Bhaskar Mondal
Jyoti Prakash Singh

Journal of Imaging Informatics in Medicine | Published: 11 March 2024

Classification of Lung Diseases Using an Attention-Based Modified DenseNet Model

Lung diseases represent a significant global health threat, impacting both well-being and mortality rates. Diagnostic procedures such as Computed Tomography (CT) scans and X-ray imaging play a pivotal role in identifying these conditions. X-rays, due to their easy accessibility and affordability, serve as a convenient and cost-effective option for diagnosing lung diseases. Our proposed method utilized the Contrast-Limited Adaptive Histogram Equalization (CLAHE) enhancement technique on X-ray images to highlight the key feature maps related to lung diseases using DenseNet201. We have augmented the existing Densenet201 model with a hybrid pooling and channel attention mechanism. The experimental results demonstrate the superiority of our model over well-known pre-trained models, such as VGG16, VGG19, InceptionV3, Xception, ResNet50, ResNet152, ResNet50V2, ResNet152V2, MobileNetV2, DenseNet121, DenseNet169, and DenseNet201. Our model achieves impressive accuracy, precision, recall, and F1-scores of 95.34%, 97%, 96%, and 96%, respectively. We also provide visual insights into our model’s decision-making process using Gradient-weighted Class Activation Mapping (Grad-CAM) to identify normal, pneumothorax, and atelectasis cases. The experimental results of our model in terms of heatmap may help radiologists improve their diagnostic abilities and labelling processes.

Upasana Chutia
Anand Shanker Tewari
jyoti prakash sigh
vikash kumar raj

Social Network Analysis and Mining | Published: 25 November 2023

Identification of clickbait news articles using SBERT and correlation matrix

Clickbait refers to the practice of using attention-grabbing or misleading headlines to attract readers to click on particular headlines or pieces of content. This technique often involves exaggerating claims or adding false information to attract traffic. In general, clickbait headlines are unlike standard news posts or detailed articles where headlines are highly correlated with body content. In the proposed system, the dissimilarity between the headline and detailed articles is exploited to create features from headlines and its paragraph by using sentence bidirectional encoder representation from transformer (SBERT). Since the size of the paragraph is quite large compared to the headlines, only k where (total sentences in a paragraph) sentences from the paragraph are selected using a correlation matrix. The extracted features from the headlines, target title and selected sentences of the paragraph are concatenated and classified using machine learning (ML) classifiers. The proposed model was tested extensively on two real-world datasets, and the results showed that it performed better than the current state-of-the-art models. In experimental results, a support vector machine (SVM) classifier with the concatenated embedding of dissimilar sentences from paragraph exhibited the best performance with an accuracy of 0.84, weighted precision of 0.83, weighted recall of 0.84, and weighted -score of 0.82 surpassing the state-of-the-art model by 8.8% in terms of -score.

Supriya
Jyoti Prakash Singh
Gunjan Kumar

IEEE Transactions on Computational Social Systems | Published: 23 November 2023

Explainable BERT-LSTM Stacking for Sentiment Analysis of COVID-19 Vaccination

Many people have been severely affected by the COVID-19 pandemic, which has caused intense anxiety, fear, and complex feelings or emotions. People’s emotions have changed and become more complicated since coronavirus vaccinations were introduced. Sentiment analysis of COVID-19 vaccination is critical for understanding public perception, vaccine hesitancy, monitoring vaccine impact, identifying adverse reactions, making policies, and allocating resources. Some artificial intelligence (AI)-based systems have been reported in the literature to analyze the sentiment of COVID-19 vaccination. However, most of them are end-to-end models that require explanation for their results in identifying COVID-19 vaccination sentiment. An explainable AI-based model can improve decision-making, transparency, and interpretability. It can enable users to comprehend how the model makes predictions and the factors that influence the outcome. Therefore, this study suggests a COVID-Twitter-BERT and LSTM (CT-BERT-LSTM) staking that is explicable for determining people’s views on the COVID-19 vaccination. The prediction of the proposed CT-BERT-LSTM model is then examined to determine where the suggested system successfully learned the context of the tweet and where it failed to do so. The proposed CT-BERT-LSTM model performed exceptionally well and outperformed the existing models with a F 1 -score of 0.88 in the sentiment identification of the COVID-19 vaccination.

Abhinav Kumar
Jyoti Prakash Singh and Amit Kumar Singh

Neural Computing and Applications | Published: 15 November 2023

Deep learning-based segmentation for medical data hiding with Galois field

Data hiding is of the utmost importance for protecting the copyright of image content, given the widespread use of images in the healthcare domain. Presently, medical image security is important not only for protecting individual privacy but also for accurate diagnosis and treatment. In this paper, a deep learning-based segmentation for a medical data hiding technique with the Galois field is proposed. This technique uses a customised UNet3+ deep learning network to segment a medical image into a Region of Interest and a Non-Region of Interest. Through the proper spatial and transform-based embedding method, multiple marks are embedded into both parts of the medical image. In addition, encryption is utilised to provide additional security for protecting sensitive information when transmitted over an open channel so that the information cannot be retrieved. The extensive experimental results show that the proposed technique for medical images achieves a good balance between imperceptibility and robustness with high security. Further, the obtained results showed the superiority of our technique over state-of-the-art techniques, demonstrating that it can provide a reliable security solution for healthcare data.

P. Amrit
K. N. Singh
N. Baranwal
A. K. Singh
J. P. Singh & H. Zhou

Sustainable Computing: Informatics and Systems, | Published: 10 June 2021

Cache-aware mobile data collection schedule for IoT enabled multi-rate data generator wireless netw

The energy hole problem is a major concern in the wireless sensor network (WSN) given the uneven energy depletion of the power-constrained sensor nodes. Recently, several studies have incorporated a mobile data collector (MDC) to collect data from sensor nodes where the MDC reaches a set of data collection points (DCPs) to collect data from nearby nodes. This strategy has resolved the underlying cause of unbalanced energy consumption by avoiding the long chain of multi-hop routing. However, in most of the existing protocols, sensor nodes information is relayed to the Base Station (BS) to nominate MDC sojourn points and travel sequence. Such a centralized approach results in a substantial exchange of messages among sensor nodes causing high energy consumption. Moreover, these approaches perceived a uniform data generation rate at sensor nodes while ignoring the limited buffer cache capacity during the MDC path design. In this context, this paper proposes a distributed MDC sojourn point nomination method that significantly improves the message complexity and energy consumption of the WSN. The selected DCPs information is relayed to the BS where a modified Ant Colony Optimization algorithm is applied to construct the MDC traversal path. The performance of the proposed protocol is extensively analysed with some related existing schemes in terms of network parameters like energy efficiency, network lifetime, packet delivery ratio, and end to end delay. Moreover, the scalability of the proposed scheme is evaluated by varying the number of MDC and sensor nodes in the network.

Nabajyoti Mazumdar
Amitava Nag
Jyoti Prakash Singh

IEEE Access 11, 124996 - 125010 (2023). | Published: 11 October 2023

Android Malware Detection by Correlated Real Permission Couples Using FP Growth Algorithm and Neural

In the current internet era, where mobile devices are ubiquitous and often hold sensitive personal and corporate data, Android malware analysis is crucial to protect against the increasing sophistication and prevalence of mobile-based cyberattacks. This article proposes an innovative approach to Android malware detection using real permissions features extracted from Android code. This approach is unique among existing literature because, there for most cases, the declared permissions are being used, which is different from the real permission used in Android apps. Here, step-by-step guidelines have been elaborated to perform reverse engineering in any Android application package (APK) to extract the real permission feature from the Android disassembled code. After that, the most frequent & correlated pairs of real permissions were identified using the Frequent Pattern (FP) Growth algorithm. Thereafter, the existence of those identified real permission couples was checked and fed into a multi-layered, K-fold cross-validated neural network model to predict whether an APK is malware or benignware. Simultaneously, five other traditional machine learning models have also been applied to benchmark the results. The outcome of these models is measured with various well-known metrics like Accuracy, Precision, Recall, Loss, Specificity, F1 Score, Receiver Operating Characteristic (ROC) curve, Negative Predictive Value, and Mathew’s Correlation Coefficient. The experimental evaluation of the proposed methodology shows better performance on the well-known Drebin dataset as well as on the last 5 years customized dataset with an accuracy of over 96%.

, Journal of Computer and Electrical Engineering | Published: 24 May 2021

Trust-based sustainable load offloading protocol to reduce service delays in fog computing empowered

The proliferation of Internet of Things (IoT)-enabled applications such as smart cities, e-healthcare, industrial automation, etc., has resulted in a significant increase in the amount of data produced by these applications. The traditional two-layered cloud-based IoT framework is reported to incur a significant delay in data processing due to the large distance between IoT devices and the cloud server, making them incompatible for time-sensitive applications. A three-layered IoT–fog–cloud framework is proposed to overcome the said limitation and cater to the needs of time-sensitive applications. This paper presents fog nodes collaborative scheme to share the load of the overloaded fog node to ensure timely service. The proposed load-sharing scheme also takes care of the security aspect by developing a trust-based load-offloading method. The efficiency of the proposed model is evaluated via a simulation study to show its effectiveness over the existing methods with latency and service response rate as performance metrics.

Nabajyoti Mazumdar
Amitava Nag
Jyoti Prakash Singh

IEEE Transactions on Consumer Electronics | Published: 16 October 2023

Genre Effect Toward Developing a Multi-Modal Movie Recommendation System in Indian Setting

The Recommendation System (RS) plays a crucial role in various platforms as an information filtering agent however, our literature survey found that there is an unexplored area where the user feedback is in ordinal value and the movies are from the Indian regional language-based. Here, we have introduced a Multi-head cross-attention-based recommendation system for the Indian language-based multi-modal Hindi movie dataset where user feedback is considered from the three different classes, i) Dislike, ii) Like, and iii) Neutral/Not watched. Here, we have used the Audio-Video information of Hindi movie trailers of the Flickscore dataset. Besides that, we have also investigated the performance of a classification-based model in two factors, (i) GenreLike-score GL-score: (which we have formulated to match the user’s preferred genres with the genre of the movie), and (ii) Different Audio/Video embeddings. The performance of different combinations of these factors is tested on different modalities of the dataset and proved that GL-score is supportive in preference prediction. The performance of different keyframes extraction techniques has been investigated, and modality-wise different embedding processes have also been introduced here.

Prabir Mondal
Pulkit Kapoor
Siddharth Singh
Sriparna Saha
Jyoti PrakashSingh and Amit Kumar Singh

Information Technology and People | Published: 16 November 2021

Social media analytics for end-user’s expectation management in ISD projects

Purpose This study, an exploratory research, aims to investigate social media users' expectations of information systems (IS) products that are conceived but not yet launched. It specifically analyses social media data from Twitter about forthcoming smartphones and smartwatches from Apple and Samsung, two firms known for their innovative gadgets. Design/methodology/approach Tweets related to the following four forthcoming IS products were retrieved from 1st January 2020 to 30th September 2020: (1) Apple iPhone 12 (6,125 tweets), (2) Apple Watch 6 (553 tweets), (3) Samsung Galaxy Z Flip 2 (923 tweets) and (4) Samsung Galaxy Watch Active 3 (207 tweets). These 7,808 tweets were analysed using a combination of the Natural Language Processing Toolkit (NLTK) and sentiment analysis (SentiWordNet). Findings The online community was quite vocal about topics such as design, camera and hardware specifications. For all the forthcoming gadgets, the proportion of positive tweets exceeded that of negative tweets. The most prevalent sentiment expressed in Apple-related tweets was neutral, but in Samsung-related tweets was positive. Additionally, it was found that the proportion of tweets echoing negative sentiment was lower for Apple compared with Samsung. Originality/value This paper is the earliest empirical work to examine the degree to which social media chatter can be used by project managers for IS development projects, specifically for the purpose of end-users' expectation management.

Snehasish Banerjee
Jyoti Prakash Singh
Yogesh Kumar Dwivedi and Nripendra Pratap Rana

ACM Transactions on Asian and Low-Resource Language Information Processing | Published: 18 September 2023

A Hybrid Deep Ranking Weighted Multi-Hashing Recommender System

In countries where there is a low availability of resources for language, businesses face the challenge of overcoming language barriers to reach their customers. One possible solution is to use collaborative filtering-based recommendation systems in their native languages. These systems employ algorithms that understand the customers’ preferences and suggest products or services in their native language. Collaborative filtering (CF) is a popular recommendation technique that simulates word-of-mouth phenomena. However, the accuracy of a CF recommendation can be affected by sparse data. In this research paper, we present a novel hybrid weighted multi-deep ranking supervised hashing (HWMDRH) approach. Our method leverages both user-based and item-based CF by merging the item-based deep ranking weighted multi-hash recommender system prediction with the user-based deep ranking weighted multi-hash recommender system prediction to generate Top-N prediction. We conducted extensive experiments on the MovieLens 1M dataset, and our results show that the proposed HWMDRH model outperforms existing models and achieves state-of-the-art performance across recall, precision, RMSE, and F1-score metrics.

Suresh Kumar
Jyoti Prakash Singh
Surya Kant
and Neha Jain

Journal of Electronic Imaging, Vol. 32, Issue 5, 053019 (September 2023). | Published: 20 September 2023

Blockchain-based authenticable ( k , n ) multi-secret image sharing scheme

We aim to devise a technique to share k number of secret images to n number of participants with ( n ≥ k ) in such a way that only k or more participants can collaborate to get the original k secret images. The proposed approach relies on the development of a spanning tree from a complete graph G for the share image generation in which each of the k pixels retrieved from the k secret images is considered to be one of the k vertices of G. Each of the spanning trees of G is represented by a unique sequence of numbers known as Prufer sequences, and each participant is assigned a random Prufer sequence. The Prufer sequence of a participant is used to generate the share images for that participant. The share images are kept in a blockchain using interplanetary file system. This scheme is suitable for protecting sensitive images through distributed storage while eliminating the single points of failure. Because the blockchain is an immutable decentralized public record, anyone can verify their shares as well as the shares of other participants, thereby preventing cheating by any participant. The scheme presents a lossless reconstruction of k secret images, making it perfect for use in military applications, medical applications, and other applications in which lossless recovery is a crucial element. Extensive experimental analysis confirms that the proposed scheme is secure, cheat proof, and lossless.

Puja Sarkar
Amitava Nag
Jyoti Prakash Singh

Soft Computing | Published: 24 April 2021

Bilingual Cyber-aggression Detection on social media using LSTM Autoencoder

Cyber-aggression is an offensive behaviour attacking people based on race, ethnicity, religion, gender, sexual orientation and other traits. It has become a major issue plaguing the online social media. In this research, we have developed a deep learning-based model to identify different levels of aggression (direct, indirect and no aggression) in a social media post in a bilingual scenario. The model is an autoencoder built using the LSTM network and trained with non-aggressive comments only. Any aggressive comment (direct or indirect) will be regarded as an anomaly to the system and will be marked as Overtly (direct) or Covertly (indirect) aggressive comment depending on the reconstruction loss by the autoencoder. The validation results on the dataset from two popular social media sites: Facebook and Twitter with bilingual (English and Hindi) data outperformed the current state-of-the-art models with improvements of more than 11% on the test sets of the English dataset and more than 6% on the test sets of the Hindi dataset.

Kirti Kumar
Jyoti Prakash Singh
Yogesh Kumar Dwivedi
and NripendraPratap Rana

Multimedia Systems | Published: 13 April 2021

Multi-modal Cyber-aggression detection with feature optimization by Firefly algorithm

Aggressive comments containing offensive images and inappropriate gesture signs together with textual comments have grown exponentially in the recent past on social media. These aggressive contents on social media are affecting the victims negatively causing fear, stress, sleeping problems and even suicide in some cases. Since social media contents are unmoderated, a technical solution with the characteristic of having automatic flagging of these contents considering the text and images together is highly needed. This article presents a deep learning and binary firefly-based optimization-based model to classify the social media posts into high-aggressive, medium-aggressive, and non-aggressive classes. The proposed model considers both text and images together to evaluate the aggression level of a post. In this model, the image features of the posts are extracted using pre-trained VGG-16 model, whereas the textual features are extracted using a three-layered convolutional neural network in parallel. The image and text features are then combined to get a hybrid feature set which is further optimized using a binary firefly optimization algorithm. Our proposed model improves the results by 11% in terms of the weighted F1-score with optimized features by binary firefly algorithm over non-optimized features.

Kirti Kumari
Jyoti Prakash Singh

Engineering Applications of Artificial Intelligence | Published: 12 September 2023.

Review helpfulness prediction on e-commerce websites: A comprehensive survey

This comprehensive survey investigates methodologies and factors utilized for predicting review helpfulness on e-commerce websites. Analyzing 132 research publications from the past 17 years, four primary determinants come to light: textual contents, non-textual contents, reviewer-related factors, and product-related factors. Review length, readability, entropy, sentiments, review rating, product description features, and customer question-answer features emerge as influential indicators. The study revealed a shift from statistical processes to machine learning and neural learning approaches in recent years due to their superior performance in predicting review helpfulness. The survey findings open up promising avenues for future research. Key directions include addressing the challenges posed by duplicate reviews, ensuring review-rating consistency, and leveraging helpful reviews in the development of chatbot systems for e-commerce websites. Additionally, exploring the impact of social media sentiment on product recommendations presents intriguing possibilities. This survey provides valuable insights for researchers and practitioners in the realm of review helpfulness prediction on e-commerce websites.

Sunil Saumya a
Pradeep Kumar Roy b
Jyoti Prakash Singh

Applied Soft Computing | Published: 30 March 2021

Hybrid attention-based Long Short-Term Memory network for sarcasm identification

Sentiment analysis of people’s opinion is used in a lot of business and decision-making scenarios. Although social media is an informal medium in which to express one’s opinions, it is being used in many business and decision-making scenarios now. Social media posts contain a lot of sarcastic statements that affect the automatic extraction of the correct sentiment of the post, as sarcasm can flip the overall polarity of the sentence. Sarcasm is a bitterly cutting form of irony to be unpleasant to somebody or to make fun of them. Therefore, identifying sarcastic statements from the users’ posts has become an important task to extract the actual sentiments from informal statements regarding an event or a person. In this work, we propose a hybrid attention-based Long Short Term Memory (HA-LSTM) network to identify sarcastic statements. This HA-LSTM network is different than the existing LSTM model, as the proposed HA-LSTM network combines 16 different linguistic features in their hidden layers. The proposed HA-LSTM network is validated with three benchmark datasets. The combination of 16 different linguistic features shows an improvement in the performance of the model in comparison with other state-of-the-art models with an improvement of up to 2% in terms of -score with three different gold standard datasets.

Rajnish Pandey
Abhinav Kumar
Jyoti Prakash Singh
Sudhakar Tripathi

IEEE/ACM Transactions on Computational Biology and Bioinformatics | Published: 01 May-June 2023

A Comprehensive Survey of Deep Learning Techniques in Protein Function Prediction

Protein function prediction is a major challenge in the field of bioinformatics which aims at predicting the functions performed by a known protein. Many protein data forms like protein sequences, protein structures, protein-protein interaction networks, and micro-array data representations are being used to predict functions. During the past few decades, abundant protein sequence data has been generated using high throughput techniques making them a suitable candidate for predicting protein functions using deep learning techniques. Many such advanced techniques have been proposed so far. It becomes necessary to comprehend all these works in a survey to provide a systematic view of all the techniques along with the chronology in which the techniques have advanced. This survey provides comprehensive details of the latest methodologies, their pros and cons as well as predictive accuracy, and a new direction in terms of interpretability of the predictive models needed to be ventured by protein function prediction systems.

Computers and Electronics in Agriculture | Published: 28 February 2023

A meta-learning framework for recommending CNN models for plant disease identification tasks

Plant diseases are a major threat to food security and economic prosperity around the globe. Deep learning models based on Convolution Neural Network (CNN) have shown promising results in dealing with plant disease detection tasks. However, according to the No Free Lunch Theorem, no single model is suitable for all cases. Moreover, the vast diversity of plant diseases makes the model selection process time and resource extensive, using exhaustive search. This work proposes a meta-learning-based framework that recommends top-n suitable models for an unseen plant disease detection dataset using the prior evaluations of benchmark models on plant disease detection tasks. Rank-Biased Overlap (RBO) is used to evaluate the efficacy of the proposed framework by evaluating actual rankings with respect to the predicted rankings. Extensive comparative experiments are carried out with different configurations of meta-extractors and meta-learners. The results obtained demonstrate that the probe network trained for 10 epochs (termed as “intermediate stage”) along with standard deviation as meta-extractor and Support Vector Regressor as the meta-learner outperforms the rest with average RBO scores of 0.76, 0.73 and 0.75 for Top-5, Top-3 and Top-1 recommendations, respectively. Overall, this paper presents a viable substitute for the exhaustive search process carried out for choosing the best deep learning model for plant disease detection scenario, leading to better resource utilization and faster implementation procedure.

Sahil Verma
Prabhat Kumar
Jyoti Prakash Singh

Future Generation Computer Systems | Published: 13 January 2021

Multi-modal Aggression Identification Using Convolutional Neural Network and Binary Particle Swarm O

Aggressive posts containing symbolic and offencive images, inappropriate gestures along with provocative textual comments are growing exponentially in social media with the availability of inexpensive data services. These posts have numerous negative impacts on the reader and need an immediate technical solution to filter out aggressive comments. This paper presents a model based on a Convolutional Neural Network (CNN) and Binary Particle Swarm Optimization (BPSO) to classify the social media posts containing images with associated textual comments into non-aggressive, medium-aggressive and high-aggressive classes. A dataset containing symbolic images and the corresponding textual comments was created to validate the proposed model. The framework employs a pre-trained VGG-16 to extract the image features and a three-layered CNN to extract the textual features in parallel. The hybrid feature set obtained by concatenating the image and the text features were optimized using the BPSO algorithm to extract the more relevant features. The proposed model with optimized features and Random Forest classifier achieves a weighted F1-Score of 0.74, an improvement of around 3% over unoptimized features.

Kirti Kumari
Jyoti Prakash Singh
Yogesh Kumar Dwivedi
and NripendraPratap Rana

Multimedia Tools and Applications | Published: 16 February 2021

An Efficient Verifiable (t; n) Secret Image Sharing Scheme with Ultra-Light Shares

A secret sharing scheme partitions a secret into a set of shares and distributes them among the eligible participants, with each participant receiving one share of the secret. The sharing technique allows any qualified subset of participants to recover the secret. In (t,n)-threshold secret sharing schemes, the secret is distributed among n participants in the form of shares, such that every participant holds exactly one share. Individual share reveals nothing about the secret. Any subset of participants of size t or more (t ≤ n) can combine their shares and compute the secret, while any subset of size < t is not able to do so. This paper proposes a verifiable (t,n)-threshold secret image sharing (VSIS) scheme. In the proposed scheme, a secret image is shared among n participants with an intention that if t or more (t ≤ n) participants collaborate, then the secret image can be computed successfully. Still, any less than t participants get nothing. The scheme makes use of polynomial-based secret sharing and XOR operations to construct the shares and recover the secret image. Our scheme’s main advantage is that it presents the public shares as integer numbers (not image matrices produced in previous SIS schemes), much smaller than the secret image. It also generates a public share-image of the size the same as that of the secret image. Thus, the public shares can be efficiently transferred over the public network and efficiently stored in memory. The scheme applies to both grayscale and color images. The use of Elliptic Curve Cryptography (ECC) enables the participants to choose their own secret shadows and compute the pseudo shares (integer numbers) independently. Hence the entire communications can take place safely on public channels. The pseudo shares are verifiable to the participants as well as the combiner. The combination of small public shares and the elliptic curve cryptosystem makes this scheme ideal for resource-constrained devices. In contrast, public share-image can be safely stored with a Cloud Service Provider (CSP).

Arup Kumar Chattopadhyay
Amitava Nag and Jyoti Prakash Singh

Multimedia Tools and Applications | Published: 16 February 2021

An Efficient Verifiable (t; n) Secret Image Sharing Scheme with Ultra-Light Shares

Arup Kumar Chattopadhyay
Amitava Nag and Jyoti Prakash Singh

IETE journal of Research, | Published: 27 Feb 2023

A Unified Lightweight CNN-based Model for Disease Detection and Identification in Corn, Rice, and Wh

Plant diseases are a significant threat to global food security since they directly affect the quality of crops, leading to a decline in agricultural productivity. Several researchers have employed crop-specific deep learning models based on convolutional neural networks (CNN) to identify plant diseases with better accuracy and faster implementation. However, the use of crop-specific models is unreasonable considering the resource-constrained devices and digital literacy rate of farmers. This work proposes a single light-weight CNN model for disease identification in three major crops, namely, Corn, Rice, and Wheat. The proposed model uses convolution layers of variable sizes at the same level to accurately detect the diseases with various sizes of the infected area. The experimentation results reveal that the proposed model outperforms several benchmark CNN models, namely, VGG16, VGG19, ResNet50, ResNet152, ResNet50V2, ResNet152V2, MobileNetV2, DenseNet121, DenseNet201, InceptionV3, and Xception, to achieve an accuracy of 84.4% while using just 387,340 parameters. Moreover, the proposed model validates its efficacy as a multi-functional tool by classifying healthy and infected categories of each crop individually, obtaining accuracies of 99.74%, 82.67%, and 97.5% for Corn, Rice, and Wheat, respectively. The better performance values and light-weight nature of the proposed model make it a viable choice for real-time crop disease detection, even in resource-constrained environments.

Sahil Verma
Prabhat Kumar and Jyoti Prakash Singh

Circuit Systems and Signal Processing | Published: 27 October 2020

A pitch and noise robust keyword spotting system using SMAC features with prosody modification

Spotting of keywords in continuous speech signal with the aid of the computer is called a keyword spotting (KWS) system. A variety of strategies have been suggested in the literature to detect keywords from the adult’s speech effectively. However, only a limited number of studies have been reported for KWS in children’s speech. Due to the difference in physiological properties, the pitch and speaking rate of children’s differ from the adult’s. Consequently, KWS system model parameters trained on the speech data from adult’s signal yield poor performance for children speech. In this paper, we have developed a KWS system for spotting keywords from children’s speech using models trained on adults’ speech. The proposed approach uses spectral moment time–frequency distribution augmented by low-order cepstral (SMAC) as the front-end feature. The mismatches due to differences in pitch and speaking rate of children and adult speakers are further mitigated by data-augmented training using explicit pitch and speaking rate modifications. The experimental findings presented in this paper show that the SMAC feature offers significantly better output for both clean and noisy test conditions than the conventional Mel frequency cepstral coefficients.

Karabi Maity
Gayadhar Pradhan
and Jyoti Prakash Singh

IEEE Transactions on Computational Social Systems, (2023). | Published: 06 February 2023

Autoencoder-Based Feature Extraction for Identifying Hate Speech Spreaders in Social Media

Hate speech on social media has become a big problem, making regular users very upset and giving victims depression and suicidal thoughts. Early identification of the user spreading this type of hate speech may be a better solution, allowing hate speech to be stopped at source. In this article, we attempt to identify these hate speech spreaders by finding a representation for each user. Each user’s comments are aggregated and fed to an auto-encoder to train it. The encoder part of the auto-encoder is used to get an encoded vector for each user. The encoded vector is used with different machine learning (ML) classifiers to determine if a user is spreading hate speech. The proposed model was tested using the dataset released by PAN 2021 (https://pan.webis.de/data.html) hate speech spreader profiling competition in English and Spanish. The experimental results show that support vector machine (SVM) with encoded vectors as features outperforms existing models with an accuracy of 92% for both English and Spanish dataset. The proposed features extraction technique is found to be equally effective at identifying fake news spreaders on fake news datasets provided by PAN 2020 yielding accuracy values of 95% and 83% for English and Spanish, respectively.

Information Systems Frontiers | Published: 12 August 2020

Attention-based LSTM network for rumor veracity estimation of tweets

Twitter has become a fertile place for rumors, as information can spread to a large number of people immediately. Rumors can mislead public opinion, weaken social order, decrease the legitimacy of government, and lead to a significant threat to social stability. Therefore, timely detection and debunking rumor are urgently needed. In this work, we proposed an Attention-based Long-Short Term Memory (LSTM) network that uses tweet text with thirteen different linguistic and user features to distinguish rumor and non-rumor tweets. The performance of the proposed Attention-based LSTM model is compared with several conventional machine and deep learning models. The proposed Attention-based LSTM model achieved an F1-score of 0.88 in classifying rumor and non-rumor tweets, which is better than the state-of-the-art results. The proposed system can reduce the impact of rumors on society and weaken the loss of life, money, and build the firm trust of users with social media platforms.

Jyoti Prakash Singh
Kumar
Abhinav Kumar
Nripendra Pratap Rana
YogeshKumar Dwivedi

Computers & Industrial Engineering, 176(2023):(108941) | Published: 26 December 2022

) Fuzzy Logic Based IoMT Framework for COVID-19 Patient Monitoring

Smart healthcare is an integral part of a smart city, which provides real time and intelligent remote monitoring and tracking services to patients and elderly persons. In the era of an extraordinary public health crisis due to the spread of the novel coronavirus (2019-nCoV), which caused the deaths of millions and affected a multitude of people worldwide in different ways, the role of smart healthcare has become indispensable. Any modern method that allows for speedy and efficient monitoring of COVID19-affected patients could be highly beneficial to medical staff. Several smart-healthcare systems based on the Internet of Medical Things (IoMT) have attracted worldwide interest in their growing technical assistance in health services, notably in predicting, identifying and preventing, and their remote surveillance of most infectious diseases. In this paper, a real time health monitoring system for COVID19 patients based on edge computing and fuzzy logic technique is proposed. The proposed model makes use of the IoMT architecture to collect real time biological data (or health information) from the patients to monitor and analyze the health conditions of the infected patients and generates alert messages that are transmitted to the concerned parties such as relatives, medical staff and doctors to provide appropriate treatment in a timely fashion. The health data are collected through sensors attached to the patients and transmitted to the edge devices and cloud storage for further processing. The collected data are analyzed through fuzzy logic in edge devices to efficiently identify the risk status (such as low risk, moderate risk and high risk) of the COVID19 patients in real time. The proposed system is also associated with a mobile app that enables the continuous monitoring of the health status of the patients. Moreover, once alerted by the system about the high risk status of a patient, a doctor can fetch all the health records of the patient for a specified period, which can be utilized for a detailed clinical diagnosis.

Subir Panja a b
Arup Kumar Chattopadhyay a
Amitava Nag a
Jyoti Prakash Singh c

Multimedia Tools and Applications | Published: 07 July 2020

A Verifiable Multi-Secret Image Sharing Scheme using XOR and Hash Functions

In a secret image sharing (SIS) scheme, a dealer (the data owner or a trusted third-party) encodes a secret image into some share images and distributes them among some participants such that each participant receives exactly one share. Most of the secret image sharing schemes assume a trusted dealer and participants. They do not use any verification of the share images while they are presented for secret reconstruction. In this article, we propose a verifiable multi-secret image sharing scheme using Boolean operations and a secure hash function. We consider n secret images for sharing and convert each secret image to a complete noisy image by using a secure hash function, XOR operations, and a specially designed pseudo-random image-matrix generator function. Then, we use XOR operations to generate the share images. The hash function calls are chained in a unique way to enable reconstruction and verification at a low cost for the secret images. The use of hash function also ensures that secrecy of share images and secret images remains consistent. The experimental results and security analysis prove that the scheme is secure and verifiable.

rup Kumar Chattopadhyay
Amitava Nag
Jyoti Prakash Singh and Amit Kumar Singh

IEEE Transactions on Computational Social Systems, (2022) | Published: 1, February 2024

Deep Neural Networks for Location Reference Identification From Bilingual Disaster-Related Tweets

Twitter is increasingly being used during disasters to communicate with authorities, ascertain the ground reality, and coordinate real-time rescue and recovery activities. Geographical location information about users and events is critical in these scenarios. Geotagged tweets are extremely infrequent, and other location fields, such as user location and place name, are unreliable. The extraction of geographical information from tweet text is limited by the fact that individuals frequently publish multilingual tweets that contain numerous grammatical and spelling errors, as well as nonstandard acronyms. As a result, determining the geographical location of the tweet is a challenging problem. This article presents a technique based on deep neural networks for extracting geographical references mentioned in bilingual tweets. Several deep learning-based models, including convolutional neural networks (CNNs), long short-term memory (LSTM), bidirectional LSTM (Bi-LSTM), and attention-based Bi-LSTM, are implemented on real-world English and Hindi language tweets to determine their suitability for extracting location references. The proposed CNN, along with a conditional random field at the last layer, is found to perform better than other models, with an F1 -score of 0.858. The findings of this study can aid in early event detection, pinpointing the area of devastation and victims, real-time traffic management, and a number of other location-based applications. The suggested system’s code and trained model may be obtained at https://github.com/Abhinavkmr/Bi-lingual-location-reference-identification.git .

Electronic Commerce Research | Published: 13 May 2020

Spam Review Detection Using LSTM Autoencoder: An Unsupervised Approach

The review of online products or services is becoming a major factor in the user’s purchasing decisions. The popularity and influence of online reviews attract spammers who intend to elevate their products or services by writing positive reviews for them and lowering the business of others by writing negative reviews. Traditionally, the spam review identification task is seen as a two-class classification problem. The classification approach requires a labelled dataset to train a model for the environment it is working on. The unavailability of the labelled dataset is a major limitation in the classification approach. To overcome the problem of the labelled dataset, we propose an unsupervised learning model combining long short-term memory (LSTM) networks and autoencoder (LSTM-autoencoder) to distinguish spam reviews from other real reviews. The said model is trained to learn the patterns of real review from the review’s textual details without any label. The experimental results show that our model is able to separate the real and spam review with good accuracy.

Transactions on Emerging Telecommunications Technologies | Published: 25 February 2020

. Identification of Cyberbullying on Multi-modal Social Media Posts using Genetic Algorithm

Cyberbullying is one of the detrimental effects, social media is facing nowadays. With the increasing use of photo sharing and text comments, the severity of cyberbullying has increased many folds. Automated tools to detect these events have become necessary to make this platform healthy and secure. Sometimes innocent-looking images and text also convey bullying messages when posted together. So, the separate systems for processing text and images may not work properly to identify all cases of cyberbullying. In this research, we have tried to extract combined features of text and images to identify different cases of cyberbullying. We used a pre-trained VGG-16 network and convolutional neural network to extract the features from images and text, respectively. These features are further optimized using genetic algorithm to increase the efficiency of the whole system. Our proposed model is validated with a dataset containing text and image to achieve an F1-score 78% which shows an improvement of 9% over earlier reported results on the same dataset.

Journal of Intelligent Information Systems | Published: 10 October 2022

BERT-LSTM model for sarcasm detection in code-mixed social media post

Sarcasm is the acerbic use of words to mock someone or something, mostly in a satirical way. Scandal or mockery is used harshly, often crudely and contemptuously, for destructive purposes in sarcasm. To extract the actual sentiment of a sentence for code-mixed language is complex because of the unavailability of sufficient clues for sarcasm. In this work, we proposed a model consisting of Bidirectional Encoder Representations from Transformers (BERT) stacked with Long Short Term Memory (LSTM) (BERT-LSTM). A pre-trained BERT model is used to create embedding for the code-mixed dataset. These embedding vectors were used by an LSTM network consisting of a single layer to identify the nature of a sentence, i.e., sarcastic or non-sarcastic. The experiments show that the proposed BERT-LSTM model detects sarcastic sentences more effectively compared to other models on the code-mixed dataset, with an improvement of up to 6 % in terms of F1-score.

Annals of Operations Research | Published: 16 January 2020

A deep multi-modal neural network for informative Twitter content classification during emergencies

People start posting tweets containing texts, images, and videos as soon as a disaster hits an area. The analysis of these disaster-related tweet texts, images, and videos can help humanitarian response organizations in better decision-making and prioritizing their tasks. Finding the informative contents which can help in decision making out of the massive volume of Twitter content is a difficult task and require a system to filter out the informative contents. In this paper, we present a multi-modal approach to identify disaster-related informative content from the Twitter streams using text and images together. Our approach is based on long-short-term-memory and VGG-16 networks that show significant improvement in the performance, as evident from the validation result on seven different disaster-related datasets. The range of F1-score varied from 0.74 to 0.93 when tweet texts and images used together, whereas, in the case of only tweet text, it varies from 0.61 to 0.92. From this result, it is evident that the proposed multi-modal system is performing significantly well in identifying disaster-related informative social media contents.

Abhinav Kumar
Jyoti Prakash Singh
Yogesh Kumar Dwivedi
Nripendra PratapRana

Current Pharmaceutical Biotechnology, | Published: 05 September, 2022

Artificial Intelligence Accelerating Drug Discovery and Development,

Drug discovery and development are critical processes that enable the treatment of wide variety of health-related problems. These are time-consuming, tedious, complicated, and costly processes. Numerous difficulties arise throughout the entire process of drug discovery, from design to testing. Corona Virus Disease 2019 (COVID-19) has recently posed a significant threat to global public health. SARS-Cov-2 and its variants are rapidly spreading in humans due to their high transmission rate. To effectively treat COVID-19, potential drugs and vaccines must be developed quickly. The advancement of artificial intelligence has shifted the focus of drug development away from traditional methods and toward bioinformatics tools. Computer-aided drug design techniques have demonstrated tremendous utility in dealing with massive amounts of biological data and developing efficient algorithms. Artificial intelligence enables more effective approaches to complex problems associated with drug discovery and development through the use of machine learning. Artificial intelligence-based technologies improve the pharmaceutical industry's ability to discover effective drugs. This review summarizes significant challenges encountered during the drug discovery and development processes, as well as the applications of artificial intelligence-based methods to overcome those obstacles in order to provide effective solutions to health problems. This may provide additional insight into the mechanism of action, resulting in the development of vaccines and potent substitutes for repurposed drugs that can be used to treat not only COVID-19 but also other ailments.

Anushree Tripathi*
Krishna Misra*
Richa Dhanuka and Jyoti Prakash Singh

Microsystem Technologies | Published: 01 January 2020

An efficient load-balanced stable multi-path routing for Mobile Ad-Hoc Network

This research aims to identify stable neighbors in a mobile ad-hoc network to create a stable multi-path route for different mobility patterns. The other issue, this article deals with to schedule the data packets over those multiple paths to balance the loads across the paths and transmit the whole packets in minimum transmission time. The stable neighbors are chosen through a recurrent neural network which uses the previous neighborhood information as an input and predicts whether a node will be a neighbor in the next instance or not. We also framed a methodology to distribute the data packets across multiple paths based on their path length from source to destination. A simulation of the network model with two mobility models, Random way point and Gauss Markov mobility, shows that the accuracy of the recurrent neural-based stable node prediction is around 95%. The analytical, as well as a simulation, model shows that our proposed algorithm takes comparatively lesser time to transmit the same number of packets from a source to a destination due to better scheduling across multiple paths. Simulation results also demonstrate that compared to other similar multi-path routing protocols, our proposed algorithm yields a higher packet-delivery ratio and lower route recovery time also.

Arindrajit Pal
Paramartha Dutta
Amlan Chakrabarti
Jyoti Prakash Singh

Computational Intelligence and Neuroscience | Published: 10 October 2022

A Deep Ranking Weighted Multihashing Recommender System for Item Recommendation

Collaborative filtering (CF) techniques are used in recommender systems to provide users with specialised recommendations on social websites and in e-commerce. But they suffer from sparsity and cold start problems (CSP) and fail to interpret why they recommend a new item. A novel deep ranking weighted multihash recommender (DRWMR) system is designed to suppress sparsity and CSP. The proposed DRWMR system contains two stages: the neighbours’ formation and recommendation phases. Initially, the data is fed to the deep convolutional neural network (CNN). The significant features are extracted from CNN. The CNN contains an additional layer; the hash code is generated by minimising pairwise ranking loss and classification loss. Therefore, a weight is assigned to different hash tables and hash bits for a recommendation. Then, the similarity between users is obtained based on the weighted hammering distance; the similarity between users helps to form the neighbourhood for the active user. Finally, the rating for unknown items can be obtained by taking the weighted average rating of the neighbourhood, and a list of the top n items can be produced. The effectiveness and accuracy of the proposed DRWMR system are tested on the MovieLens 100 K dataset and compared with the existing methods. Based on the evaluation results, the proposed DRWMR system gives precision (0.16), the root mean squared error (RMSE) of 0.73 and the recall (0.08), the mean absolute error (MAE) of 0.57, and the F − 1 measure (0.101).

Suresh Kumar
Jyoti Prakash Singh
Vinay Kumar Jain
Avinab Marahatta

Soft Computing | Published: 27 November 2019

Towards Cyberbullying-free social media in smart cities: a unified multi-modal approach

Smart cities are shifting the presence of people from physical world to cyber world (cyberspace). Along with the facilities for societies, the troubles of physical world, such as bullying, aggression and hate speech, are also taking their presence emphatically in cyberspace. This paper aims to dig the posts of social media to identify the bullying comments containing text as well as image. In this paper, we have proposed a unified representation of text and image together to eliminate the need for separate learning modules for image and text. A single-layer Convolutional Neural Network model is used with a unified representation. The major findings of this research are that the text represented as image is a better model to encode the information. We also found that single-layer Convolutional Neural Network is giving better results with two-dimensional representation. In the current scenario, we have used three layers of text and three layers of a colour image to represent the input that gives a recall of 74% of the bullying class with one layer of Convolutional Neural Network.

Kirti Kumari
Jyoti Prakash Singh
Yogesh Kumar Dwivedi
and NripendraPratap Rana

Neural Computing and Applications | Published: 07 November 2019

Predicting closed questions on community question answering sites using convolution neural network

Community questions answering sites receive a huge number of questions and answers everyday. It has been observed that a number of questions among them are marked as closed by the site moderators. Such questions increase overhead of the moderators and also create user dissatisfaction. This paper aims to predict whether a newly posted question would be marked as closed in the future or not and also give a tentative reason of being closed. Two models: (1) a baseline model based on traditional machine learning techniques and (2) deep learning models such as convolutional neural network (CNN) and long short-term memory (LSTM) network are used to classify a question into one of the five classes: (1) open, (2) off-topic, (3) not a real question, (4) too constructive and (5) too localized. The baseline model requires the handcrafted features and hence does not preserve semantics. However, CNN and LSTM networks are capable of preserving the semantics of question’s word and extracting the hidden features from the textual content using multiple hidden layers. The LSTM network performs better compared to CNN and traditional machine learning models. The proposed model can be used as an initial filter to screen the closed question at the time of posting, which reduced the overheads of site moderators. To the best of our knowledge, this is the first work that predicts the closed question along with the reason the question will be closed. This helps the questioner to modify the question before posting. The experimental results with the dataset of Stack Overflow prove the effectiveness of the proposed model.

Pradeep Kumar Roy
and Jyoti Prakash Singh

Future Generation Computer Systems | Published: 4 September 2019

Deep Learning to Filter SMS Spam

The popularity of short message service (SMS) has been growing over the last decade. For businesses, these text messages are more effective than even emails. This is because while 98% of mobile users read their SMS by the end of the day, about 80% of the emails remain unopened. The popularity of SMS has also given rise to SMS Spam, which refers to any irrelevant text messages delivered using mobile networks. They are severely annoying to users. Most existing research that has attempted to filter SMS Spam has relied on manually identified features. Extending the current literature, this paper uses deep learning to classify Spam and Not-Spam text messages. Specifically, Convolutional Neural Network and Long Short-Term Memory models were employed. The proposed models were based on text data only, and self-extracted the feature set. On a benchmark dataset consisting of 747 Spam and 4,827 Not-Spam text messages, a remarkable accuracy of 99.44% was achieved.

Pradeep Kumar Roy
Jyoti Prakash Singh and Snehasish Banerjee

Journal of Information Science | Published: October 13, 2022

Is this question going to be closed? Answering question closibility on Stack Exchange

Community question answering sites (CQAs) are often flooded with questions that are never answered. To cope with the problem, experienced users of Stack Exchange are now allowed to mark newly posted questions as closed if they are of poor quality. Once closed, a question is no longer eligible to receive answers. However, identifying and closing subpar questions takes time. Therefore, the purpose of this article is to develop a supervised machine learning system that predicts question closibility, the possibility of a newly posted question to be eventually closed. Building on extant research on CQA question quality, the supervised machine learning system uses 17 features that were grouped into four categories, namely, asker features, community features, question content features and textual features. The performance of the developed system was tested on questions posted on Stack Exchange from 11 randomly chosen topics. The classification performance was generally promising and outperformed the baseline. Most of the measures of precision, recall, F1-score and area under the receiver operating characteristic curve (AUC) were above 0.90 irrespective of the topic of questions. By conceptualising question closibility, the article extends previous CQA research on question quality. Unlike previous studies, which were mostly limited to programming-related questions from Stack Overflow, this one empirically tests question closibility on questions from 11 randomly selected topics. The set of features used for classification offers a framework of question closibility that is not only more comprehensive but also more parsimonious compared with prior works.

Pradeep Kumar Roy
Jyoti Prakash Singh and Snehasish Banerjee

Multimedia Tools and Applications | Published: 07 June 2019

An efficient Boolean-based Multi-secret Image Sharing Scheme

The purpose of this paper is to develop an algorithm for sharing k secret images to n participants in such a way that each participant gets a single share image by encoding all k images. Any qualified subgroup of t : t ≤ n of those n participants can reconstruct the kith secret image only by combining their share images if they are qualified to reconstruct the kith secret image. Most of the existing literature solves this problem for the cases where t = 2 or t = n making it a very restrictive scheme. In this article, we aim to design a multi-secret image sharing scheme based on XOR operation where t is not restricted to be 2 or n. We have used n random matrices of the same size as the secret image size as private share to generate r (where r is the number of qualified subgroups) share images as public share using XOR operations. The proposed scheme is computationally lightweight and lossless due to XOR operation only. It does not involve any pixel expansion. The experimental results with a very low correlation coefficient between share and secret images confirm that share image does not reveal anything about secret image. The scheme is secure against differential attack as a higher value of Number of Changing Pixel rate (NPCR) confirms that. The current proposal is based on a general access structure, and hence any secret image can be reconstructed by a qualified group of t or more shares where t need not be 2 or n only.

Amitava Nag
Jyoti Prakash Singh
Amit Kumar Singh

Journal of Ambient Intelligence and Humanized Computing | Published: 13 January 2023

Secure data authentication and access control protocol for industrial healthcare system

Because of recent COVID-19 epidemic, the Internet-of-Medical-Things (IoMT) has acquired a significant impetus to diagnose patients remotely, regulate medical equipment, and track quarantined patients via smart electronic devices installed at the patient’s end. Nevertheless, the IoMT confronts various security and privacy issues, such as entity authentication, confidentiality, and integrity of health-related data, among others, rendering this technology vulnerable to different attacks. To address these concerns, a number of security procedures based on traditional cryptographic approaches, such as discrete logarithm and integer factorization problems, have been developed. All of these protocols, however, are vulnerable to quantum attacks. This paper, in this context, presents a data authentication and access control protocol for IoMT systems that can withstand quantum attacks. A comprehensive formal security assessment demonstrates that the proposed algorithm can endure both current and future threats. In terms of data computing, transmission, and key storage overheads, it also surpasses other related techniques.

Dayasagar Gupta
Nabajyoti Majumdar
Amitava Nag
and Jyoti Prakash Singh

Research on Biomedical Engineering, 38.3(2022): | Published: 02 August 2022

) Classification of diabetic macular Edema severity using deep learning technique,

Purpose Diabetic macular edema (DME) is a kind of hard exudates lesion seen near the diabetic macular region of the retina. DME causes visual loss and may result in complete blindness; early identification and treatment may be able to cure this. Identification of DME at an early stage is a challenging and error-prone task. To address this issue, the article presents a methodology that uses the notion of transfer learning to identify cases of DME from retinal fundus images. Methods A pre-trained DenseNet121 is used in this technique to extract the useful set of feature vectors from the fundus images, which are then fed into a few additional fully connected layers and then into the classification layer to classify DME instances. A total of 577 fundus training images from 3 DME classes were used to train the proposed model, and 103 fundus testing images were used to verify the proposed model for classifying them into one of the three DME cases. Results The suggested model is trained and tested on the Indian Diabetic Retinopathy Image Dataset (IDRiD). With the test images, the results demonstrate that the proposed model outperformed the state-of-the-art models presented in “Diabetic Retinopathy – Segmentation and Grading Challenge” held at ISBI-2018 with an accuracy of 86.4%. Conclusion The proposed model diagnoses DME at an early stage for timely treatment and helps to reduce the workload of ophthalmologists.

Amit Kumar
Anand Shanker Tewari and Jyoti Prakash Singh

Information Systems Frontiers | Published: 15 July 2022

Multi-Channel Convolutional Neural Network for the Identification of Eyewitness Tweets of Disaster

During a disaster, a large number of disaster-related social media posts are widely disseminated. Only a small percentage of disaster-related information is posted by eyewitnesses. The post of a disaster eyewitness offers an accurate depiction of the disaster. Therefore, the information posted by the eyewitness is preferred over the other source of information as it is more effective at helping organize rescue and relief operations and potentially saving lives. In this work, we propose a multi-channel convolutional neural network (MCNN) that uses three different word-embedding vectors together to classify disaster-related tweets into eyewitness, non-eyewitness, and don’t know classes. We compared the performance of the proposed multi-channel convolutional neural network with several attention-based deep-learning models and conventional machine learning-models such as recurrent neural network, gated recurrent unit, Long-Short-Term-Memory, convolutional neural network, logistic regression, support vector machine, and gradient boosting. The proposed multi-channel convolutional neural network achieved an F1-score of 0.84, 0.88, 0.84, and 0.86 with four disaster-related datasets of floods, earthquakes, hurricanes, and wildfires, respectively. The experimental results show that the training MCNN model with different word embedding together performs better than the conventional machine-learning models and several other deep-learning models.

Abhinav Kumar
Jyoti Prakash Singh
Nripendra P. Rana & Yogesh K. Dwivedi

Soft Computing | Published: 23 February 2019

Predicting the helpfulness score of online reviews using convolutional neural network

The smart cities aim to provide an infrastructure to their citizens that reduces both their time and effort. An example of such an available infrastructure is electronic shopping. Electronic shopping has become the hotbeds of many customers as it is easier to judge the quality of the product based on the review information. The purpose of this study is to predict the best helpful online product review, out of the several thousand reviews available for the product using review representation learning. The prediction is done using a two-layered convolutional neural network model. The review texts are embedded into low-dimensional vectors using a pre-trained model. To learn the best features of the review text, three filters are used to learn tri-gram, four-gram, and five-gram features of the text. The proposed approach is found to be better than existing machine learning based models which used hand-crafted features. The very low value of mean squared error confirms the prediction accuracy of the proposed method. The proposed method can be easily applied to any kind of review as the features are calculated only from the review text and not from other domain knowledge. The proposed model helps in predicting the helpfulness score of new reviews as soon as it gets posted on the product review page.

Sunil Saumya
Jyoti Prakash Singh
and Yogesh Kumar Dwivedi

Journal of Grid Computing | Published: 16 June 2022

Randomized Convolutional Neural Network Architecture for Eyewitness Tweet Identification During Disa

During a disaster, Twitter is flooded with disaster-related information. Among huge disaster-related Twitter posts, a fraction of them is posted by the eyewitness of disaster. The post of an eyewitness of the disaster contains an authentic description of the disaster. Therefore, eyewitness disaster-related posts are preferred over all other sources of information to know the floor reality of the disaster. In this work, we have used a convolutional neural network (CNN) with randomly initialized weights to extract features from the textual contents of the tweets and proposed three different random neural network-based models. The feature extracted from the untrained random convolutional neural network (RCNN) is passed through a trainable dense neural network (DNN), echo state network (ESN), and extreme learning machine (ELM) to identify eyewitness tweets. The proposed system is validated with hurricane, earthquake, flood, and wildfire datasets. In the extensive experiments with three different random neural network-based models such as RCNN-DNN, RCNN-ESN, RCNN-ELM, and other machine learning and deep learning models such as KNN, Naive Bayes, Decision Tree, Convolutional neural network, and Dense Neural Network, the RCNN-DNN model outperformed all the other models. The RCNN-DNN model achieved impressive performance with a weighted F1-scores of 0.79, 0.86, 0.79, and 0.85 for hurricane, earthquake, flood, and wildfire, respectively.

Abhinav Kumar
Jyoti Prakash Singh & Amit Kumar Singh

Annals of Operations Research | Published: 19 May 2017

Event classification and location prediction from tweets during disaster

Social media is a platform to express one’s view in real time. This real time nature of social media makes it an attractive tool for disaster management, as both victims and officials can put their problems and solutions at the same place in real time. We investigate the Twitter post in a flood related disaster and propose an algorithm to identify victims asking for help. The developed system takes tweets as inputs and categorizes them into high or low priority tweets. User location of high priority tweets with no location information is predicted based on historical locations of the users using the Markov model. The system is working well, with its classification accuracy of 81%, and location prediction accuracy of 87%. The present system can be extended for use in other natural disaster situations, such as earthquake, tsunami, etc., as well as man-made disasters such as riots, terrorist attacks etc. The present system is first of its kind, aimed at helping victims during disasters based on their tweets.

Jyoti Prakash Singh
Yogesh Kumar Dwivedi
Nripendra Pratap Rana
AbinavKumar
and Kawalijeet Kumar Kapoor

International Journal of Interactive Multimedia and Artificial Intelligence | Published: 3 October 2022

Modeling sub-band information through discrete wavelet transform to improve intelligibility assessme

The speech signal within a sub-band varies at a fine level depending on the type, and level of dysarthria. The Mel-frequency filterbank used in the computation process of cepstral coefficients smoothed out this fine level information in the higher frequency regions due to the larger bandwidth of filters. To capture the sub-band information, in this paper, four-level discrete wavelet transform (DWT) decomposition is firstly performed to decompose the input speech signal into approximation and detail coefficients, respectively, at each level. For a particular input speech signal, five speech signals representing different sub-bands are then reconstructed using inverse DWT (IDWT). The log filterbank energies are computed by analyzing the short-term discrete Fourier transform magnitude spectra of each reconstructed speech using a 30-channel Mel-filterbank. For each analysis frame, the log filterbank energies obtained across all reconstructed speech signals are pooled together, and discrete cosine transform is performed to represent the cepstral feature, here termed as discrete wavelet transform reconstructed (DWTR)- Mel frequency cepstral coefficient (MFCC). The i-vector based dysarthric level assessment system developed on the universal access speech corpus shows that the proposed DTWRMFCC feature outperforms the conventional MFCC and several other cepstral features reported for a similar task. The usages of DWTR-MFCC improve the detection accuracy rate (DAR) of the dysarthric level assessment system in the text and the speaker-independent test case to 60.094 % from 56.646 % MFCC baseline. Further analysis of the confusion matrices shows that confusion among different dysarthric classes is quite different for MFCC and DWTR-MFCC features. Motivated by this observation, a two-stage classification approach employing discriminating power of both kinds of features is proposed to improve the overall performance of the developed dysarthric level assessment system. The two-stage classification scheme further improves the DAR to 65.813 % in the text and speaker-independent test case.

Laxmi Priya Sahu1
Gayadhar Pradhan1
Jyoti Prakash Singh2 *

Wireless Personal Communications | Published: 03 October 2018

Biogeographic-based temporal prediction of link stability in mobile ad hoc networks

A set of moving nodes communicating with each other without any infrastructure is considered a mobile ad hoc network (MANET). Stability is a big problem with this type of network due to its variable location and variable speed with respect to time. As a result, link failure is a big problem in MANET. When the link fails, the network faces high packet drop and higher delay in delivery of the packets due to a new routing setup in most cases. In this paper, we have proposed a method to frame up a stable link network using a temporal data analysis model. In this model, we first analyzed the mobility and position of neighbor nodes with respect to each node from the temporal snapshot of the network. The statistical model ARMA (Auto Regressive Moving Average) is used for predicting the stable neighbors of each node in a future time frame. These stable neighbors can be used for creating a link between different nodes. The combination between different nodes builds a path between the source and destination. We applied a BBO (Biogeographic-based optimization) technique to estimate parameters relevant to the optimal path from source to destination nodes. This optimal link offers a stable and reliable connection for the remaining lifetime of the data transfer for the said network.

Arindrajit Pal
Paramartha Dutta
and Amlan Chakrabarti
Jyoti Prakash Singh
and Shayak Sadhu

International Journal of Disaster Risk Reductio | Published: 30 October 2018

Location reference identification from tweets during emergencies: A deep learning approach

Twitter is recently being used during crises to communicate with officials and provide rescue and relief operation in real time. The geographical location information of the event, as well as users, are vitally important in such scenarios. The identification of geographic location is one of the challenging tasks as the location information fields, such as user location and place name of tweets are not reliable. The extraction of location information from tweet text is difficult as it contains a lot of non-standard English, grammatical errors, spelling mistakes, non-standard abbreviations, and so on. This research aims to extract location words used in the tweet using a Convolutional Neural Network (CNN) based model. We achieved the exact matching score of 0.929, Hamming loss of 0.002, and -score of 0.96 for the tweets related to the earthquake. Our model was able to extract even three- to four-word long location references which is also evident from the exact matching score of over 92%. The findings of this paper can help in early event localization, emergency situations, real-time road traffic management, localized advertisement, and in various location-based services.

Abhinav Kumar
Jyoti Prakash Singh

Journal of Biomedical and Health Informatics | Published: 29 March 2022

A Semi-Supervised Autoencoder-Based Approach for Protein Function Prediction

After the development of next-generation sequencing techniques, protein sequences are abundantly available. Determining the functional characteristics of these proteins is costly and time-consuming. The gap between the number of protein sequences and their corresponding functions is continuously increasing. Advanced machine-learning methods have stepped up to fill this gap. In this work, an advanced deep-learning-based approach is proposed for protein function prediction using protein sequences. A set of autoencoders is trained in a semi-supervised manner with protein sequences. Each autoencoder corresponds to a single protein function only. In particular, 932 autoencoders corresponding to 932 biological processes and 585 autoencoders corresponding to 585 molecular functions are trained separately. Reconstruction losses of each protein sample for every autoencoder are used as a feature to classify these sequences into their corresponding functions. The proposed model is tested on test protein samples and achieves promising results. This method can be easily extended to predict any number of functions having an ample amount of supporting protein sequences. All relevant codes, data and trained models are available at https://github.com/richadhanuka/PFP-Autoencoders

Richa Dhanuka
Anushree Tripathi and Jyoti Prakash Singh

International Journal of Information Management | Published: 24 May 2018

. Identifying Reputation Collectors in Community Question Answering (CQA) Sites: Exploring the Dark

This research aims to identify users who are posting as well as encouraging others to post low quality and duplicate contents on community question answering sites. The good guys called Caretakers and the bad guys called Reputation Collectors are characterised by their behaviour, answering pattern and reputation points. The proposed system is developed and analysed over publicly available Stack Exchange data dump. A graph-based methodology is employed to derive the characteristic of Reputation Collectors and Caretakers. Results reveal that Reputation Collectors are primary sources of low-quality answers as well as answers to duplicate questions posted on the site. The Caretakers answer limited questions of challenging nature and fetches maximum reputation against those questions whereas Reputation Collectors answers have so many low quality and duplicate questions to gain the reputation point. We have developed algorithms to identify the Caretakers and Reputation Collectors of the site. Our analysis finds that 1.05% of Reputation Collectors post 18.88% of low-quality answers. This study extends previous research by identifying the Reputation Collectors and how they collect their reputation points.

Pradeep Kumar Roy
Jyoti Prakash Singh
Abdullah Mohammad Baabdullah
Hatice Kizgin
Nripendra Pratap Rana

Wireless Personal Communications | Published: 18 April 2018

An Energy Efficient Protocol to Mitigate Hot Spot Problem Using Unequal Clustering in WSN

In multihop scenarios, the sensor nodes nearer to the base station (BS) are overloaded because they handle their own data as well as the information obtained from far away nodes. This induces a higher energy depletion rate in nodes near to the BS causing early death of these nodes resulting in hot spot/energy hole problem in wireless sensor network (WSN). This paper proposes a novel strategy using unequal fixed grid-based cluster along with a mobile data mule for data collection from the cluster head (CH). A CH is selected in such a manner that the cumulative transmission distance for member nodes within the cluster is minimum. The paper has attempted to optimize the values for CH change time or round number (f) and also established a relationship between different size clusters by using a factor (r), as they are playing an important role in the overall performance improvement of the WSN. Integrating a mobile data mule in the protocol enhances its efficiency of handling hot spot problem and makes it more energy effective. Two different WSN-scenarios have been considered based on the movement pattern of the data mule. The results obtained through simulation in both scenarios prove the success of our scheme in terms of energy efficiency, load balancing and network lifetime as compared to the existing protocols. The paper also providing a balance trade-off between delay and high overheads by using a single mule with simple predefined path. It also minimizes the hot spot problem as it sustains more than 3000 rounds, which is far better than the existing methods.

Sunil Kumar Singh
Prabhat Kumar
and Jyoti Prakash Singh

IETE Journal of Research | Published: 21 Mar 2022

Feature Extraction to Filter Out Low-Quality Answers from Social Question Answering Sites

Social Question Answering sites (SQAs) are online platforms that allow Internet users to ask questions, and obtain answers from others in the community. SQAs have been marred by the problem of low-quality answers. Worryingly, answer quality on SQAs have been reported to be following a downward trajectory in recent years. To this end, existing research has predominantly focused on finding the best answer, or identifying high-quality answers among the available responses. However, such scholarly efforts have not reduced the volume of low-quality answers on SQAs. Therefore, the goal of this research is to extract features in order to weed out low-quality answers as soon as they are posted on SQAs. Data from Stack Exchange was used to carry out the investigation. Informed by the literature, 26 features were extracted. Thereafter, machine learning algorithms were implemented that could correctly identify 85% to 96% of low-quality answers. The key contribution of this research is the development of a system to detect subpar answers on the fly at the time of posting. It is intended to be used as an early warning system that warns users about answer quality at the point of posting.

Pradeep Kumar Roy
Zishan Ahmed
Jyoti Prakash Singh
Snehasish Banerjee

. CAAI Transactions on Intelligence Technology, (2022) | Published: 04 May 2022

Analysis of community question-answering issues via machine learning and deep learning: State-of-the

Over the last couple of decades, community question-answering sites (CQAs) have been a topic of much academic interest. Scholars have often leveraged traditional machine learning (ML) and deep learning (DL) to explore the ever-growing volume of content that CQAs engender. To clarify the current state of the CQA literature that has used ML and DL, this paper reports a systematic literature review. The goal is to summarise and synthesise the major themes of CQA research related to (i) questions, (ii) answers and (iii) users. The final review included 133 articles. Dominant research themes include question quality, answer quality, and expert identification. In terms of dataset, some of the most widely studied platforms include Yahoo! Answers, Stack Exchange and Stack Overflow. The scope of most articles was confined to just one platform with few cross-platform investigations. Articles with ML outnumber those with DL. Nonetheless, the use of DL in CQA research is on an upward trajectory. A number of research directions are proposed.

Pradeep Kumar Roy
Sunil Saumya
Jyoti Prakash Singh
Snehasish Banerjee
Adnan Gutub

Electronic Commerce Research and Applications | Published: 29 March 2018

Ranking online consumer reviews

Product reviews are posted online by the hundreds and thousands for popular products. Handling such a large volume of continuously generated online content is a challenging task for buyers, sellers and researchers. The purpose of this study is to rank the overwhelming number of reviews using their predicted helpfulness scores. The helpfulness score is predicted using features extracted from review text, product description, and customer question-answer data of a product using the random-forest classifier and gradient boosting regressor. The system classifies reviews into low or high quality with the random-forest classifier. The helpfulness scores of the high-quality reviews are only predicted using the gradient boosting regressor. The helpfulness scores of the low-quality reviews are not calculated because they are never going to be in the top k reviews. They are just added at the end of the review list to the review-listing website. The proposed system provides fair review placement on review listing pages and makes all high-quality reviews visible to customers on the top. The experimental results on data from two popular Indian e-commerce websites validate our claim, as 3–4 newer high-quality reviews are placed in the top ten reviews along with 5–6 older reviews based on review helpfulness. Our findings indicate that inclusion of features from product description data and customer question-answer data improves the prediction accuracy of the helpfulness score.

Sunil Saumya
Jyoti Prakash Singh
Abdullah Mohammed Baabdullah
Nripendra Pratap Rana

Computers and Electrical Engineering | Published: 29 March 2022

A buffer-aware dynamic UAV trajectory design for data collection in resource-constrained IoT framewo

The emergence of unmanned aerial vehicle (UAV)-enabled technology in the Internet of Things (IoT) era leads to a significant reduction in data collection delays when accumulating sensory data from ground IoT nodes (INs). As a flying data collector, the UAV hovers at a limited number of Access Points (APs) to collect data, outperforming ground data collectors in terms of transmission energy consumption, data delivery reliability, and timeliness. However, the INs have a finite amount of buffer capacity to store the data that must be collected before they overflow. As a result, the data gathering route for UAVs should be adaptable to INs’ buffer deadline in order to minimize data loss. In this paper, a buffer-aware dynamic UAV trajectory design protocol is proposed for data collection from resource-constrained INs. A distributed AP nomination strategy is proposed in order to reduce UAV hovering latency. Furthermore, using machine learning approaches, a modified ant colony optimization algorithm is constructed to minimize the data loss penalty due to buffer overflow. Finally, the performance of the proposed scheme is evaluated against several state-of-the-art protocols with regards to parameters such as data loss penalty, packet delivery ratio, and network lifetime.

Nabajyoti Mazumdar
Saugata Roy
Amitava Nag
Jyoti Prakash Singh

Journal of Business Research | Published: 10 August 2016

Predicting the “helpfulness” of online consumer reviews

Online shopping is increasingly becoming people's first choice when shopping, as it is very convenient to choose products based on their reviews. Even for moderately popular products, there are thousands of reviews constantly being posted on e-commerce sites. Such a large volume of data constantly being generated can be considered as a big data challenge for both online businesses and consumers. That makes it difficult for buyers to go through all the reviews to make purchase decisions. In this research, we have developed models based on machine learning that can predict the helpfulness of the consumer reviews using several textual features such as polarity, subjectivity, entropy, and reading ease. The model will automatically assign helpfulness values to an initial review as soon as it is posted on the website so that the review gets a fair chance of being viewed by other buyers. The results of this study will help buyers to write better reviews and thereby assist other buyers in making their purchase decisions, as well as help businesses to improve their websites.

Jyoti Prakash Singh
Seda Irani
Nripendra Pratap Rana
Yogesh Kumar Dwivedi
Sunil Saumya
and Pradeep Kumar Roy

IEEE Xplore | Published: 14 February 2017

A Survey on Successors of LEACH Protocol

Even after 16 years of existence, low energy adaptive clustering hierarchy (LEACH) protocol is still gaining the attention of the research community working in the area of wireless sensor network (WSN). This itself shows the importance of this protocol. Researchers have come up with various and diverse modifications of the LEACH protocol. Successors of LEACH protocol are now available from single hop to multi-hop scenarios. Extensive work has already been done related to LEACH and it is a good idea for a new research in the field of WSN to go through LEACH and its variants over the years. This paper surveys the variants of LEACH routing protocols proposed so far and discusses the enhancement and working of them. This survey classifies all the protocols in two sections, namely, single hop communication and multi-hop communication based on data transmission from the cluster head to the base station. A comparitive analysis using nine different parameters, such as energy efficiency, overhead, scalability complexity, and so on, has been provided in a chronological fashion. The article also discusses the strong and the weak points of each and every variants of LEACH. Finally the paper concludes with suggestions on future research domains in the area of WSN

Sunil Kumar Singh
Prabhat Kumar
and Jyoti Prakash Singh

Computers and Electrical Engineering | Published: 29 March 2022.

A buffer-aware dynamic UAV trajectory design for data collection in resource-constrained IoT framewo

Nabajyoti Mazumdar a
Saugata Roy b
Amitava Nag c
Jyoti Prakash Singh

Egyptian Informatics Journal | Published: 7 February 2015

Path length prediction in MANET under AODV routing: Comparative analysis of ARIMA and MLP model

Mobile Ad-hoc network (MANET) is infra-structure less collection of mobile nodes which can communicate with each other through single hop or multi-hop technique. The hop count also known as path length plays a crucial role in packet delivery, routing load, delay, etc. The path length between source destination pair nodes depends upon factors such as the mobility patterns of nodes, routing algorithm, transmission range, etc. In this article, we have tried to predict the path length between a source destination pair in MANET using Autoregressive Integrated Moving Average (ARIMA) and multilayer perceptron (MLP) models. The path length data are collected from MANETs using three different mobility models namely (i) Manhattan Grid Mobility Model (MHG), (ii) Random Way Point mobility model (RWP) and (iii) Reference Point Group Mobility Model (RPGM). This paper evaluates the predictive accuracy in forecasting the path length between source and destination nodes for Ad hoc On-Demand Distance Vector AODV routing in MANET using ARIMA model and MLP. It is found that neural networks can be effectively used in forecasting path length between mobile nodes better than statistical model and the MLP based neural network models are found to be better forecaster than ARIMA model

Arindrajit Pal
Jyoti Prakash Singh
and Paramartha Dutta

IT Professional | Published: 06 May 2022

COVID-19 Fake News Detection Using Ensemble-Based Deep Learning Model

Download PDF Download References Request Permissions Save to Alerts Abstract: Fake news on various medicines, foods, and vaccinations relating to the COVID-19 pandemic has increased dramatically. These fake news reports lead individuals to believe in false and sometimes harmful claims and stories, and they also influence people’s vaccination opinions. Immediately detecting COVID-19 false news can help to reduce the spread of fear, confusion, and potential health risks among citizens. An ensemble-based deep learning model for detecting COVID-19-related fake news on Twitter is proposed in this article. CT-BERT, BERTweet, and roberta are three different models that are fine-tuned on COVID-19-linked text data to separate fake and authentic news. In addition, the proposed ensemble-based model is compared to a variety of standard machine learning and deep learning models. In the detection of COVID-19 fake news from Twitter, the proposed ensemble-based deep learning model achieved state-of-the-art performance with a weighted $F_1$F1-score of 0.99.

International Journal of Communication Networks and Distributed | Published: October 26, 2020

Finding Location of Fake and Phantom Source for Source Location Privacy in Wireless Sensor Network

Wireless sensor network (WSN) has various advantages over wired networks and hence it has been an important area of research since years. Several issues of WSN such as energy, delay, routing, security have been addressed rigorously in the past researches. However, the issues of privacy is still grey zoned. In privacy, the issue of source location privacy (SLP) is critical for targeting and monitoring applications. Past works addressing the issue of SLP have used variants of fake and phantom sources, establishing the use of fake and phantom sources as a promising resource for SLP. Amid these, the selection of proper location of fake and phantom sources is still an issue which significantly impacts the privacy as well as other constraints of WSN. To fill this gap, we propose an algorithm to select the location of fake and phantom sources in the network in order to provide better privacy and energy trade-off in the network. Our approach uses the concept of geometry for selecting the location of fake and phantom sources. The experimental results confirm that the proposed algorithm gives a promising trade-off between privacy and other constraints such as energy and traffic. Keywords

Rimjhim
Pradeep Kumar Roy and Jyoti Prakash Singh

Multimedia Tools and Applications | Published: 08 January 2022

A lightweight image encryption scheme based on chaos and diffusion circuit

The Internet of Things (IoT) devices is being deployed in almost all aspects of human life starting from smart home, health monitoring, smart metering, to smart garbage collection and industrial applications. These devices sense and collects data from the environment and send it to other high power computing devices called fog nodes or to the cloud. One of the major challenges in this process is secure communication of data as the IoT devices are having low processing power, memory and energy constraints. This paper proposes a lightweight encryption technique for images using chaotic maps and diffusion circuits. The chaotic maps are used to control the generation of random number sequences which are used for permutation and substitution of the pixel values in images. Both permutation and substitution of the pixel values are done in one scan of the image only reducing the time complexity. The substitution operations are simple bit-wise operations reducing the computational overhead. The scheme is tested by several statistical and security tests to ensure its strength against attacks.

CSI Transaction on ICT | Published: 15 May 2018

Detection of spam reviews: a sentiment analysis approach

Electronic shopping is highly influenced by online reviews posted by customers against the product quality. Some fraudulent pretenders consider this as an opportunity to write the spam reviews to upgrade or degrade product’s reputation. Hence, detection of those reviews are very essential for preserving the interests of users. To date, number of researches have been proposed in order to detect the spam reviews and to provide the genuine resources for customers and business person. However, we found few limitations in existing supervised approaches. First, most of the supervised approaches have used manual labelling of reviews into spam and non-spam. However, due to identical appearance of reviews manual labelling can not be considered as authentic. Second, the scarcity of spam reviews leads to data imbalance problem. Third, computing similarities among reviews naturally needs expensive computation. In this work, we propose a novel and robust, spam review detection system which efficiently employ following three features: (i) sentiments of review and its comments, (ii) content based factor, and (iii) rating deviation. To address the aforementioned limitations, we investigated all these features for only suspicious review list in which only those reviews have kept which received comments by peer users. The proposed system achieved the F -score of 91%. The proposed system can be a great asset in spam detection system as it can be used as an stand-alone system to purify the product review datasets.

Sunil Saumya
Jyoti Prakash Singh

Multimedia Tools and Applications | Published: 08 January 2022

COVID-19 and cyberbullying: deep ensemble model to identify cyberbullying from code-switched languag

It has been declared by the World Health Organization (WHO) the novel coronavirus a global pandemic due to an exponential spread in COVID-19 in the past months reaching over 100 million cases and resulting in approximately 3 million deaths worldwide. Amid this pandemic, identification of cyberbullying has become a more evolving area of research over posts or comments in social media platforms. In multilingual societies like India, code-switched texts comprise the majority of the Internet. Identifying the online bullying of the code-switched user is bit challenging than monolingual cases. As a first step towards enabling the development of approaches for cyberbullying detection, we developed a new code-switched dataset, collected from Twitter utterances annotated with binary labels. To demonstrate the utility of the proposed dataset, we build different machine learning (Support Vector Machine & Logistic Regression) and deep learning (Multilayer Perceptron, Convolution Neural Network, BiLSTM, BERT) algorithms to detect cyberbullying of English-Hindi (En-Hi) code-switched text. Our proposed model integrates different hand-crafted features and is enriched by sequential and semantic patterns generated by different state-of-the-art deep neural network models. Initial experimental results of the proposed deep ensemble model on our code-switched data reveal that our approach yields state-of-the-art results, i.e., 0.93 in terms of macro-averaged F1 score. The dataset and codes of the present study will be made publicly available on the paper’s companion repository [https://github.com/95sayanta/COVID-19-and-Cyberbullying].

Sayanta Paul
Sriparna Saha and Jyoti Prakash Singh

Computational Biology and Chemistry | Published: 16 October 2021

Protein Function Prediction using Functional Inter-Relationship

With the growth of high throughput sequencing techniques, the generation of protein sequences has become fast and cheap, leading to a huge increase in the number of known proteins. However, it is challenging to identify the functions being performed by these newly discovered proteins. Machine learning techniques have improved traditional methods’ efficiency by suggesting relevant functions but fails to perform well when the number of functions to be predicted becomes large. In this work, we propose a machine learning-based approach to predict huge set of protein functions that use the inter-relationships between functions to improve the model’s predictability. These inter-relationships of functions is used to reduce the redundancy caused by highly correlated functions. The proposed model is trained on the reduced set of non-redundant functions hindering the ambiguity caused due to inter-related functions. Here, we use two statistical approaches 1) Pearson’s correlation coefficient 2) Jaccard similarity coefficient, as a measure of correlation to remove redundant functions. To have a fair evaluation of the proposed model, we recreate our original function set by inverse transforming the reduced set using the two proposed approaches: Direct mapping and Ensemble approach. The model is tested using different feature sets and function sets of biological processes and molecular functions to get promising results on DeepGO and CAFA3 dataset. The proposed model is able to predict specific functions for the test data which were unpredictable by other compared methods. The experimental models, code and other relevant data are available at https://github.com/richadhanuka/PFP-using-Functional-interrelationship.

Richa Dhanuka
and Jyoti Prakash Singh

Global Journal of Flexible Systems Management | Published: 07 November 2017

Finding and Ranking High-Quality Answers in Community Question Answering Sites

Community Question Answering (CQA) sites have become a very popular place to ask questions and give answers to a large community of users on the Internet. Stack Exchange is one of the popular CQA sites where a large amount of contents are posted every day in the form of questions, answers and comments. The answers on Stack Exchange are listed by their recent occurrences, time of posting or votes obtained by peer users under three tabs called active, oldest and votes, respectively. Votes tab is the default setting on the site and is also preferred tab of users because answers under this tab are voted as good answers by other users. The problem of voting-based sorting is that new answers which are yet to receive any vote are placed at the bottom in vote tab. The new answer may be of sufficiently high-quality to be placed at the top but no or fewer votes (later posting) have made them stay at the bottom. We introduce a new tab called promising answers tab where answers are listed based on their usefulness, which is calculated by our proposed system using the classification and regression models. Several textual features of answers and users reputation are used as features to predict the usefulness of the answers. The results are validated with good values of precision, recall, F1-score, area under the receiver operating characteristic curve (AUC) and root mean squared error. We also compare the top ten answers predicted by our system to the actual top ten answers based on votes and found that they are in high agreement.

radeep Kumar Roy
Zishan Ahmad
Jyoti Prakash Singh
Mohammad Abdallah Ali Alryalat
Nripendra Pratap Rana
and Yogesh Kumar Dwived

Sustainable Cities and Society | Published: 22 September 2021

Disaster related social media content processing for sustainable cities

The current study offers a hybrid convolutional neural networks (CNN) model that filters relevant posts and categorises them into several humanitarian classifications using both character and word embedding of textual content. The distinct embeddings for words and characters are used as input to the CNN model’s various channels. A hurricane, flood, and wildfire dataset are used to validate the proposed model. The model performed similarly across all datasets, with the F1-score ranging from 0.66 to 0.71. Because it uses existing social media posts and may be used as a layer with any social media, the model provides a sustainable solution for disaster analysis. With domain-specific training, the suggested approach can be used to locate useful information in other domains such as traffic accidents and civil unrest also.

Pradeep Kumar Roy
Abhinav Kumar
Jyoti Prakash Singh
Yogesh KumarDwivedi
Nripendra Pratap Rana and Ramakrishnan Raman

International Journal of Wireless Information Networks | Published: 10 August 2011

The Temporal Effect of Mobility on Path Length in MANET

Ad hoc network consists of a set of identical nodes that move freely and independently and communicate among themselves via wireless links. The most interesting feature of this network is that they do not require any existing infrastructure of central administration and hence is very suitable for temporary communication links in an emergency situation. This flexibility, however, is achieved at a price of communication uncertainty induced due to frequent topology changes. In this article, we have tried to identify the system dynamics using the proven concepts of time series modeling. Here, we have analyzed variation of path length between a particular source destination pair nodes over a fixed area for different mobility patterns under different routing algorithm. We have considered four different mobility models—(i) Gauss-Markov mobility model, (ii) Manhattan Grid mobility model and (iii) Random Way Point mobility model and (iv) Reference Point Group mobility model. The routing protocols under which, we carried out our experiments are (i) Ad hoc On demand Distance Vector routing (AODV), (ii) Destination Sequenced Distance Vector routing (DSDV) and (iii) Dynamic Source Routing (DSR). The path length between two particular nodes behaves as a random variable for all mobility models for all routing algorithms. The pattern of path length for every combination of mobility model and for every routing protocol can be well modeled as an autoregressive model of order p i.e. AR(p). The order p is estimated and it is found that most of them are of order unity only. We also calculate the average path length for all mobility models and for all routing algorithms.

International Journal of Information Security and Privacy | Published: 31 August 2023

Localization in Wireless Sensor Networks Using Soft Computing Approach

Wireless sensor network (WSN) is formed by a large number of low-cost sensors. In order to exchange information, sensor nodes communicate in an ad hoc manner. The acquired information is useful only when the location of sensors is known. To use GPS-aided devices in each sensor makes sensors more costly and energy hungry. Hence, finding the location of nodes in WSNs becomes a major issue. In this paper, the authors propose a combination of range based and range-free localization scheme. In their scheme, for finding the distance, they use received signal strength indication (RSSI), which is a range based center of gravity technique. For finding the location of non-anchor nodes, the authors assign weights to anchor and non-anchor nodes based on received signal strength. The weight, which is assigned to anchor and non-anchor nodes, are designed by fuzzy logic system (FLS).

Sunil Kumar Singh
Prabhat Kumar
and Jyoti Prakash Singh

Journal of Computing and Information Technology | Published: 23 November 2023

Temporal modeling of link characteristic in mobile ad hoc network. CIT

Ad hoc network consists of a set of identical nodes that move freely and independently and communicate among themselves via wireless links. The most interesting feature of this network is that they do not require any existing infrastructure of central administration and hence is very suitable for temporary communication links in an emergency situation. This flexibility, however, is achieved at a price of communication uncertainty induced due to frequent topology changes. In this article, we have tried to identify the system dynamics using the proven concepts of time series modeling. Here, we have analyzed variation of link utilization between any two particular nodes over a fixed area for differentmobility patterns under different routing algorithm. We have considered four different mobility models – (i) Gauss-Markov mobility model, (ii) Manhattan Grid Mobility model and (iii) Random Way Point mobility model and (iv) Reference Point Group mobility model. The routing protocols under which, we carried out our experiments are (i) Ad hoc On demand Distance Vector routing (AODV), (ii) Destination Sequenced Distance Vector routing (DSDV) and (iii) Dynamic Source Routing (DSR). The value of link load between two particular nodes behaves as a random variable for any mobility pattern under a routing algorithm. The pattern of link load for every combination of mobility model and for every routing protocol can be well modeled as an autoregressive model of order p i.e. AR(p). The order of p is estimated and it is found that most of them are of order 1 only.

Jyoti Prakash Singh
and Paramartha Dutta

T. Journal of Computing and Information Technology | Published: 08 July 2023

Temporal modeling of node mobility in mobile ad hoc network

Ad-hoc network consists of a set of identical nodes that move freely and independently and communicate via wireless links. The most interesting feature of this network is that it does not require any predefined infrastructure or central administration and hence it is very suitable for establishing temporary communication links in emergency situations. This flexibility however is achieved at the price of communication link uncertainties due to frequent topology changes. In this article we describe the system dynamics using the proven concept of time series modeling. Specifically, we analyze variations of the number of neighbor nodes of a particular node over a geographical area and for given total number of nodes assuming different values of (i) the speeds of nodes, (ii) the transmission powers, (iii) sampling periods and (iv) different mobility patterns. We consider three different mobility models: (i) Gaussian mobility model, (ii) random walk mobility model and (iii) random way point mobility model. The number of neighbor nodes of a particular node behaves as a random variable for any mobility pattern. Through our analysis we find that the variation of the number of neibhbor nodes can be well modeled by an autoregressive AR$(p)$ model. The values of $p$ evaluated for different scenarios are found to be in the range between $1$ and $5$. Moreover, we also investigate the relationship between the speed and the time of measurements, and the transmission range of a specific node under various mobility patterns.

Jyoti Prakash Singh
and Paramartha Dutta

International Journal of Electronic Government Research (IJEGR) | Published: 2024

An Empirical Exploration of E-agriculture System Acceptance, Satisfaction, and Usage

The Indian economy relies heavily on agriculture, and an e-agriculture portal is a digital tool for sustainable development. It enables government and farmers to exchange information and share resources with anyone. However, many farmers are unaware of it, negatively impacting production and supply chains. The paper uses the Unified Theory of Acceptance and Use of Technology to analyze the intention to use an e-agriculture portal. The study used empirical methods to assess how citizen satisfaction concerns affect the purpose of using e-agriculture portal. Data were collected from 294 rural area farmers and further analyzed using partial least squares structural equation modelling. The study's main findings are that (1) performance expectancy has the most significant impact, while facilitating conditions have the least impact and experience and habit don't significantly impact on e-agriculture portal usage. (2) using e-agriculture portal positively affects citizen satisfaction, citizen engagement, trust in government, and trust in technology

Santosh Kumar Roy
Jyoti Prakash Singh
Kumod Kumar
Khalid H.M.Alhamzi

Design & Developed By Indiagators