@prefix dct: . @prefix orcid: . @prefix this: . @prefix sub: . @prefix xsd: . @prefix prov: . @prefix pav: . @prefix np: . @prefix doco: . @prefix c4o: . sub:Head { this: np:hasAssertion sub:assertion; np:hasProvenance sub:provenance; np:hasPublicationInfo sub:pubinfo; a np:Nanopublication . } sub:assertion { sub:paragraph c4o:hasContent "Wikipedia dumps 10 are packaged as XML documents and contain text formatted according to the Mediawiki markup syntax, 11 with templates to be transcluded. 12 Hence, a pre-processing step is required to obtain a raw text representation of the dump. To achieve this, we leverage the WIKI EXTRACTOR , 13 a third-party tool that retains the text and expands templates of a Wikipedia XML dump, while discarding other data such as tables, references, images, etc. We note that the tool is not completely robust with respect to templates expansion. Such drawback is expected for two reasons: first, new templates are constantly defined, thus requiring regular maintenance of the tool; second, Wikipedia editors do not always comply to the specifications of the templates they include. Therefore, we could not obtain a fully cleaned Wikipedia plain text corpus, and noticed gaps in its content, probably due to template expansion failures. Nevertheless, we argue that the loss of information is not significant and can be neglected despite the recall cost. From the entire Italian Wikipedia corpus, we slice the use case subset by querying the I TALIAN DBPEDIA CHAPTER 14 for the Wikipedia article IDs of relevant entities."; a doco:Paragraph . } sub:provenance { sub:assertion prov:hadPrimarySource ; prov:wasAttributedTo orcid:0000-0002-5456-7964 . } sub:pubinfo { this: dct:created "2019-11-10T18:05:11+01:00"^^xsd:dateTime; pav:createdBy orcid:0000-0002-7114-6459 . }