You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Table 1: Structural predicates and generative rules for the linguistic capabilities of sentiment analysis.
Table 2: Structural predicates and generative rules for the linguistic capabilities of hate speech detection.
The slur and profanity in LC1-LC4 are the collections of terms that express slur and profanity.
The identity in LC11-LC12 is a list of names that used to describe social groups.
In this work, we reuse these terms from Hatecheck.
Baselines
Capability Testing Baselines
ALiCT is evaluated by comparing with the state-of-the-art linguistic capability testing for sentiment analysis and hate speech detection as following:
Given the generated test cases from the ALiCT and capability testing baselines, models in the table 3 are evaluated:
Table 3: The NLP model used in our evaluation.
Evaluation of of expansion phase of ALiCT
the test case diversity provided by ALiCT expansion phase of ALiCT is also compared against that of one syntax-based (MT-NLP) and three adversarial (Alzantot-attack, BERT-Attack and SememePSO-attack) as follows:
Figure 1: Results of Self-BLEU (left) and Syntactic diversity (right) of ALiCT and capability-based testing baselines for sentiment analysis and hate speech detection.
Use of only ALiCT seed sentences and all ALiCT sentences are denoted as ALiCT and ALiCT+EXP respectively.
Figure 2: Results of Self-BLEU (left) and Syntactic diversity (right) between original sentences of capability-based
testing baselines and ALiCT generated sentences from the original sentences.
Table 4: Comparison results against MT-NLP.
Table 5: Comparison results against adversarial attacks.
Figure 3: Neuron coverage results of ALiCT and CHECKLIST.
Table 6: Examples for text generation compared with the syntax-based and adversarial generation baselines.
RQ2: Effectiveness
Table 7: Results of BERT-base, RoBERTa-base and DistilBERT-base sentiment analysis models on ALiCT test cases using all seeds. BERT-base, RoBERTa-base and DistilBERT-base models are denoted as BERT, RoBERTa and dstBERT,respectively.
Table 8: Results of dehate-BERT and twitter-RoBERTa hate speech detection models on ALiCT test cases using all seeds. dehate-BERT and twitter-RoBERTa models are denoted as BERT and RoBERTa respectively.