http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#Head
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU
http://www.nanopub.org/nschema#hasAssertion
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#assertion
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU
http://www.nanopub.org/nschema#hasProvenance
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#provenance
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU
http://www.nanopub.org/nschema#hasPublicationInfo
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#pubinfo
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://www.nanopub.org/nschema#Nanopublication
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#assertion
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#paragraph
http://purl.org/spar/c4o/hasContent
Wikipedia dumps 10 are packaged as XML documents and contain text formatted according to the Mediawiki markup syntax, 11 with templates to be transcluded. 12 Hence, a pre-processing step is required to obtain a raw text representation of the dump. To achieve this, we leverage the WIKI EXTRACTOR , 13 a third-party tool that retains the text and expands templates of a Wikipedia XML dump, while discarding other data such as tables, references, images, etc. We note that the tool is not completely robust with respect to templates expansion. Such drawback is expected for two reasons: first, new templates are constantly defined, thus requiring regular maintenance of the tool; second, Wikipedia editors do not always comply to the specifications of the templates they include. Therefore, we could not obtain a fully cleaned Wikipedia plain text corpus, and noticed gaps in its content, probably due to template expansion failures. Nevertheless, we argue that the loss of information is not significant and can be neglected despite the recall cost. From the entire Italian Wikipedia corpus, we slice the use case subset by querying the I TALIAN DBPEDIA CHAPTER 14 for the Wikipedia article IDs of relevant entities.
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#paragraph
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
http://purl.org/spar/doco/Paragraph
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#provenance
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#assertion
http://www.w3.org/ns/prov#hadPrimarySource
http://dx.doi.org/10.3233/SW-170269
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#assertion
http://www.w3.org/ns/prov#wasAttributedTo
https://orcid.org/0000-0002-5456-7964
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU#pubinfo
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU
http://purl.org/dc/terms/created
2019-11-10T18:05:11+01:00
http://purl.org/np/RAjyLA7-iEjl-Gbtz8AnROdFEAkzLvMXmH7OHk4N5lZrU
http://purl.org/pav/createdBy
https://orcid.org/0000-0002-7114-6459