Download (v1.0, annotated N-triples)
We provide a sample dataset containing malicious triples that could be used to evaluate the resilience of Linked Data applications or to train spam filters.
The dataset is the polluted version of a fraction of the Billion Triple Challenge 2012 Dataset. More specifically, we chose the 1-hop expansion “Timbl crawl”, a crawl seeded with Tim Berners Lee’s foaf profile, and we applied the spam vectors described in the paper.
The resulting dataset contains approximately 16k triples (spam triples account for 4% of dataset size). The dataset includes samples of Content contamination vectors, Link poisoning vectors and Non-triple-based attacks (Malicious subclassing only).rdfs:label
properties associated to akt:Organisation
entitiesbibo:Quote
has been associated to all foaf:Person
of the datasetdcterms:subject
triples (about computer science publications)owl:Class
and rdf:Property
have been associated to malicious rdfs:labels
foaf:depiction
triples pointing to replica watches (one triple for each foaf:Person
)owl:sameAs
, one for each akt:Organisation
foaf:Persons
cloned and clones are associated same foaf:homepage
, considered a IFP.rdfs:seeAlso
data:text/html URI for each akt:Organisation
foaf:Person
and akt:Organisation
have been defined as subclasses of "GraviaBuyer" and "GraviaSupplier", respectively.