We provide a sample dataset containing malicious triples that could be used to evaluate the resilience of Linked Data applications or to train spam filters.
The dataset is the polluted version of a fraction of the Billion Triple Challenge 2012 Dataset. More specifically, we chose the 1-hop expansion “Timbl crawl”, a crawl seeded with Tim Berners Lee’s foaf profile, and we applied the spam vectors described in the paper.The resulting dataset contains approximately 16k triples (spam triples account for 4% of dataset size). The dataset includes samples of Content contamination vectors, Link poisoning vectors and Non-triple-based attacks (Malicious subclassing only).
rdfs:labelproperties associated to
bibo:Quotehas been associated to all
foaf:Personof the dataset
dcterms:subjecttriples (about computer science publications)
rdf:Propertyhave been associated to malicious
foaf:depictiontriples pointing to replica watches (one triple for each
owl:sameAs, one for each
foaf:Personscloned and clones are associated same
foaf:homepage, considered a IFP.
rdfs:seeAlsodata:text/html URI for each
akt:Organisationhave been defined as subclasses of "GraviaBuyer" and "GraviaSupplier", respectively.