Abstract. ROSSIO Infrastructure is building an open-access and free platform that aims to aggregate, organise and connect digital resources related to Social Sciences, Arts and Humanities located in Portuguese educational and cultural institutions. This paper aims to present ROSSIO infrastructure, the institutions involved, its main goals and the services it will provide, such as a discovery portal, exhibitions, collections and a virtual research environment. Underlying these services is a metadata aggregation approach that brings into ROSSIO the metadata on digital objects from the providing institutions. The aggregated dataset is transformed into linked data and enriched with entities from controlled vocabularies, which are defined by ROSSIO. We will detail this process, including the applications employed and how they interoperate. Finally, we will conclusively reflect on the potentialities of these services for public dissemination of science, taking into account the FAIR principles.
Corresponding author: email@example.com
In 2016, PARTHENOS Project1 defined Research Infrastructures as “complex agglomerations of knowledge, data, people, and services that bring together diverse resources for a wide user base and make these resources (re)usable and available for an appropriately long term in order to support research (either individual or collaborative) and share the results of that research2”. Although this quote synthesizes some of the general features normally included in research infrastructures, the general scope of its concept and aims are still evolving, with some organizations valuing some aspects more than others. For instance, the European Strategy Forum on Research Infrastructures - ESFRI3 seems to be giving more emphasis to data sharing premises and data preservation during a research infrastructure lifecycle, as demonstrated by the European Roadmap for Research Infrastructures (2006, 2018) [1-2]. On the other hand, the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanities and Social Sciences highlights the Infrastructures’ contribution to promote collaborative research and to build networks and communities, as revealed in the report “Our Cultural Commonwealth” (2006) . Nevertheless, the reports from these institutions point out that infrastructures as well as the contents and services they provide, can vary greatly, depending, among other factors, on the scientific area they serve. According to the aforementioned institutions, we can distinguish three types of infrastructures: I) “single-sited”, infrastructures with their facilities concentrated in the same geographical location; II) “distributed”, located geographically in different poles; III) “digital”,with a strong and large technological component.
Since 2006, the European Strategy Forum on Research Infrastructures and its ESFRI Roadmap (e.g. 2016, 2018) have been promoting a strategy to develop and consolidate national and transnational research infrastructures in European Union countries and promote the collaboration between these institutions, which has, in turn, contributed to influence national scientific policies. Aiming to integrate Portuguese institutions within this context, the Foundation for Science and Technology (FCT) created the National Roadmap for Research Infrastructures of Strategic Interest (RNIE 2014-2020) in 2013, with the objective of mapping and evaluating the Portuguese research infrastructures. Initially, it consisted of 40 infrastructures, including ROSSIO Infrastructure: Social Sciences, Arts and Humanities4. In 2020, the number expanded to 56 infrastructures, of which only seven belong to Social Sciences and Humanities (SSAH) . They collaborate with European counterparts, helping to create and develop international networks. For example, Social Sciences DataLab5 is the Portuguese node of SHARE ERIC project – Survey of Health, Ageing and Retirement in Europe6 and ROSSIO has the same role in the Digital Research Infrastructure for the Arts and Humanities (DARIAH7).
ROSSIO Infrastructure is a research infrastructure coordinated by the Nova School of Social Sciences and Humanities (NOVA University Lisbon). It integrates six Portuguese cultural institutions: Arquivo Municipal de Lisboa (Lisbon Municipal Archive), Cinemateca Portuguesa (Portuguese Film Archive), Biblioteca de Arte da Fundação Calouste Gulbenkian (Calouste Gulbenkian Art Library), Teatro Nacional D. Maria II (National Theater D. Maria II), Direção Geral do Património Cultural (Directorate-General for Cultural Heritage) Direção Geral do Livro, dos Arquivos e das Bibliotecas (Directorate-General for Books, Archives and Libraries). ROSSIO also includes content providers as are the cases of ARQUIVO.pt (Portuguese web-archive) and Diplomatic Institute at the Portuguese Ministry of Foreign Affairs. It is inspired by the best principles applied by other international research infrastructures related to SSAH, such as Digital Public Library of America (DPLA), Historiana, and Torvi.
ROSSIO Infrastructure has five major objectives: I) to aggregate, organise, connect, contextualize and provide free and open access to digital resources related to SSAH located in the aforementioned Portuguese educational and cultural institutions (providing, in some cases, the necessary funds for archival treatment and digitization of sources8); II) to promote the development of high-quality research on SSAH, stimulating new agendas and debates; III) to generate synergies and articulate individuals and institutions in order to promote scientific innovation and cultural heritage dissemination; IV) to contribute to the internationalization of SSAH studies, allowing researchers from all over the world to have a more transparent access to contents in Portuguese language, following the best international practices by other research infrastructures and FAIR data principles (findability, accessibility, interoperability and reuse); V) to build a sustainable network between academic and non-academic communities to better respond to the societal challenges.
This paper aims to present ROSSIO and reflect on how its services will contribute to change, promote and develop quality research, collaborative work and dissemination of knowledge. It is divided into two parts. The first will present the metadata aggregation approach and the applications that will be employed by ROSSIO, its potential users, and how the applications work together to support the work done by them. We will then focus on services provided by the platform, such as a discovery portal, exhibitions, collections and a virtual research environment (VRE).
Digital resources of interest for research in SSAH are dispersed over a large number of academic and cultural heritage institutions, which brings challenges to the discoverability and usage of such resources. An often-used approach, and the one applied by ROSSIO, is metadata aggregation, where a central organisation takes the role of facilitating the discovery and use of the resources by collecting their associated metadata. Based on these aggregated datasets of metadata, ROSSIO is in a position to further promote the usage of the digital resources by means that cannot be efficiently undertaken by each providing institution in isolation.
The technological approach to metadata aggregation applied by ROSSIO is based on the OAI-PMH protocol. This protocol was designed in 1999  and was meant to address shortcomings in scholarly communication by providing a technical interoperability solution for discovery of e-prints, via metadata aggregation. The cultural heritage domain also embraced OAI-PMH, since discovery of cultural heritage digital resources was only feasible if based on metadata instead of full-text . OAI-PMH is nowadays widely deployed in academic and cultural heritage institutions to support cooperative networks such as Europeana and the Digital Public Library of America.
The metadata aggregated by ROSSIO is processed centrally by several systems in order to provide access and search functionalities on the metadata, which is then used by the VRE, the digital exhibitions and collections applications. ROSSIO’s systems also publish, according to the FAIR principles, these aggregated datasets, and other datasets created by the researchers while using the infrastructure. Figure 1 presents the applications that form the ROSSIO Infrastructure, how they are related, and with which users they interact, and which applications interoperate with external systems.
This applications architecture considers three general types of users (actors):
Data Manager and Curator – This type of user is responsible for the operation of the metadata ingestion process, and for the publication of datasets. He uses the Metadata Harvesting and Ingestion Application, and the Public Datasets Repository, for these two purposes.
Vocabulary Manager – This type of user represents terminologists and information specialists that create and maintain the SKOS vocabularies used in ROSSIO Infrastructure.
End-user - This type of user corresponds to researchers, teachers, students and the general public, which use the VRE, the digital exhibitions and collections applications (these services will be expanded in Section 3).
The architecture comprehends the following applications:
Metadata Harvesting and Ingestion Application – This application allows the harvesting of the datasets from data providing institutions via OAI-PMH. This application also implements the ingestion process for the datasets, storing the datasets in ROSSIO’s internal repository, creating their search indexes and publishing them in ROSSIO’s public repository for datasets.
Internal Repository – This repository stores the collections from the data providers in a way that allows for the fast access to individual records. This repository is also responsible for the assignment of identifiers to the metadata records, in the form of linked data URIs.
Data Normalization and Enrichment Application – This application is used during the ingestion process to enrich the metadata aggregated from the data providers. It matches particular field values from the metadata with the entities and concepts that are part of a ROSSIO vocabulary. These links support the semantic searching in ROSSIO via its vocabularies. This application also applies data normalization for values of date and language properties.
Public Repository (Dataverse) – The Dataverse software9 is used for publishing datasets to the public. ROSSIO will publish in this repository the datasets aggregated from the data providers, and also the datasets created by researchers while using the ROSSIO applications such as the VRE. Dataverse assigns identifiers to the datasets, which are the base for the linked data URI of the datasets.
Linked Data Resolution Application – This application provides access to metadata according to the specifications and best practices for linked data. It is responsible for processing the access requests to URIs in the namespaces defined by ROSSIO. For obtaining the data for responding to the URI requests, this application uses the Internal Repository (for metadata about the individual cultural and scientific items), the Public Dataset Repository (for metadata about datasets) and from the Thesaurus RDF Triple Store (for entities and concepts of the ROSSIO Thesaurus).
Search Engine (Apache Solr) – This application indexes the aggregated metadata and provides searching functionality across the metadata of all cultural and scientific items. The search index is maintained by the Data Manager and Curator via the Metadata Harvesting and Ingestion Application. The Search Engine has a search schema designed for supporting the search requirements of the VRE, the digital exhibitions and collections applications. The Search Engine is an installation of the Apache Solr10, therefore the other applications send their search requests via the Solr API.
Vocabulary Editor (Vocbench) – This application allows for the creating and maintenance of controlled vocabularies used within the ROSSIO infrastructure. It is an installation of the Vocbench11 application, and is used by the Vocabularies Manager.
Vocabularies Publication Application (Skosmos) – This application allows for the publication of vocabularies created in the ROSSIO Infrastructure. The application publishes the vocabularies for consultation by human users, and it also provides a SPARQL endpoint for applications. The SPARQL endpoint is available publicly so the vocabularies can also be used by internal and external applications. This application is an installation of the Skosmos software12 and an Apache Fuseki triplestore13. The publication process is done by the Vocabularies Manager, who exports the vocabularies from the Vocabulary Editor and imports them into the Vocabularies Publication Application.
Fig. 1. The application architecture of ROSSIO Infrastructure.
During the initial operation of ROSSIO infrastructure, the metadata harvested from data providers will consist of a simple data model based on the 15 elements of the Dublin Core Metadata Element Set. Nevertheless, ROSSIO’s applications are being implemented for supporting a richer data model, which consists in a profile of the Europeana Data Model (EDM). This EDM application profile was defined in 2017 by a working group formed by representatives from Portuguese academic and cultural heritage institutions, and was named EDM-DRD application profile. This data model allows ROSSIO to represent the administrative metadata required for its operation, and also the enriched metadata created during the ingestion process. In the future, we expect that EDM-DRD will be implemented by data providers allowing ROSSIO to operate with high-quality metadata that will benefit its services for researchers.
As mentioned above, metadata normalization and enrichment are supported by controlled vocabularies that are published as linked open data by the ROSSIO Infrastructure. At this time, the following vocabularies are being developed:
ROSSIO Thesaurus. This vocabulary consists of terms, i.e. designations of topics or general concepts. The development process of this vocabulary was described in .
ROSSIO Agents. This vocabulary includes personal and organization names for information organization within the platform. For example, it identifies every data provider of ROSSIO, whose URIs are then used in metadata descriptions at the dataset level.
ROSSIO Places. This vocabulary consists of toponyms, including names of geopolitical entities, areas and geographical features that may be of relevance for information organization in the platform.
ROSSIO Periods. This vocabulary includes names for geological, historical, cultural or artistic periods for information organization in the platform.
The ROSSIO vocabularies are being modelled in SKOS , a W3C recommendation for thesauri and other knowledge organization systems in the Semantic Web. In addition to SKOS, the vocabularies reuse elements from other widely used ontologies:
BIBFRAME.14 The vocabularies reuse BIBFRAME classes for representing the entity types that are present in each vocabulary, namely Topic, Person, Organization, Place and Temporal. This facilitates both the internal organization of the ROSSIO vocabularies and their reusability as linked data.
Getty Vocabulary Program ontology.15 The vocabularies employ properties of this ontology for further specifying types of agents and places through concepts in the ROSSIO Thesaurus. This allows, for example, to link Portugal in ROSSIO Places to the Countries concept in ROSSIO Thesaurus.
ISO 25964 SKOS extension.16 This model consists of mappings of elements from the ISO 25964 data model for thesauri  to SKOS. The Thesaurus Array class is employed in the vocabularies for representing collections of concepts at the same hierarchical level. These allow representing so-called ‘guide terms’ (e.g. <People by occupation>), which are a common modelling choice in thesauri and terminologies.
Schema.org.17 Properties of this ontology are reused for modelling the birth and death dates of people in ROSSIO Agents.
WGS84 Geo Positioning ontology.18 ROSSIO Places employs properties of this ontology for modelling latitude and longitude coordinates, which may be used by geolocation applications.
The development of the ROSSIO vocabularies leverages existing structured and unstructured vocabulary resources, including lists of index terms provided by members of the ROSSIO consortium, as well as by reusing sections of established thesauri in SSAH such as the Getty’s Art and Architecture Thesaurus19. As a minimum requirement, the concepts included in the ROSSIO vocabularies are identified by Portuguese and English labels, whose form generally follows the conventions of thesauri for information retrieval .
As linked data resources, it is fundamental for the ROSSIO vocabularies to include links to external resources identified through URIs. This is achieved by declaring mapping properties between concepts in the ROSSIO vocabularies and external knowledge organization systems. Concepts in ROSSIO Thesaurus and ROSSIO Periods are being mapped to Getty’s Art and Architecture Thesaurus, either manually or semi-automatically through alignment tools for linked data resources. The ROSSIO Thesaurus is also aligned with the Backbone Thesaurus,20 a meta-thesaurus for the humanities published by DARIAH-EU. Finally, ROSSIO Agents is aligned with VIAF (Virtual International Authority File),21 while ROSSIO Places is aligned both with GeoNames22 and Getty Thesaurus of Geographic Names23.
The metadata aggregation process and the controlled vocabularies developed are the pillars that will allow ROSSIO Infrastructure to create a platform. The platform will employ different information and communication technologies (ICT) tools, commonly defined as devices, applications and systems that allow different agents – such as individuals and organizations – to interact digitally: a discovery portal, a VRE, and digital exhibitions and collections.
The discovery portal will allow the search of the digital resources (e.g. documents, videos, audio, photos, among others) located in the different heritage institutions, providing simple and advanced search options. In the latter case, the results will be more concrete and oriented towards controlled vocabularies, with filters that allow a more immediate approximation to the desired result. As with other similar initiatives, such as DPLA, this is going to be particularly important for the research community . On the one hand, researchers are used to build more advanced research surveys and need tools to help them refine the results obtained. On the other hand, the research model will allow them to optimise the time spent on search and increase their research capacity. Furthermore, it will help them to open new lines of reflection and interpretation on the patterns, trends and links between the aggregated resources. In the case of ROSSIO, the discovery portal will allow access, within the same platform, to digital sources dispersed in different heritage institutions, and to scientific outputs produced at NOVA University. The discovery portal is the core of ROSSIO platform since all other products and services are highly dependable on their rightful implementation. Therefore, the development of simple and advanced research, based on controlled ontologies and vocabularies is vital for the interoperability between systems and platforms and dissemination of archival collections to SSAH experts and the general public.
Another service the platform will provide are exhibitions and digital collections. According to authors such as Martin R. Kalfatovic , Chee Khoon Leong [12-13], Maria Teresa Natale  and Angeliki Antoniou , digital exhibitions are activities that use hypermedia - information presented in the form of text, graphics, audio and video - with the objective of developing a given subject, resorting to a diverse set of digital objects arranged according to a predetermined narrative, potentially accessible to a wider and geographically dispersed audience. The inclusion of a contextualizing narrative distinguishes digital exhibitions from other similar initiatives, such as image galleries. The digital collections - not to be confused with sources digitized - are similar to the aforementioned exhibitions, aside from small differences. Considering the work developed by other Infrastructures (e.g. DPLA, Europeana, Trovi, Culturaitalia) and cultural institutions (e.g. British Library, Gallica), the digital collections that will be made available by ROSSIO are small sized exhibitions targeted to specific audiences, such as students, teachers or personnel related to tourism or cultural industries. These digital exhibitions and collections will resort to documentation aggregated and connected within the platform and contribute to promote their intrinsic and extrinsic value.
The VRE is designed to create a web-based working environment to enhance the research and facilitate the sharing of ROSSIO digital resources. Although this feature is open to anyone who accesses ROSSIO, it is being developed with specific communities in mind such as researchers, teachers and school and university students. Following principles of technical interoperability - with the use of open-source software and the adoption of standardized data organization standards (e.g. OAI-PMH protocol), sustainability, security and easy-to-use practices, the VRE is an indispensable tool for intuitive Research Infrastructures. Its collaborative character is particularly relevant, enabling dialogue and cooperation between different interlocutors in the scientific community [16, 17, 18].
In 2020, the pandemic context reinforced the importance of e-infrastructures and platforms, as well as the urgency in making content available for free and open access, following internationally defined methodologies (FAIR principles). It revealed its fundamental and indispensable role in overcoming the constraints caused by lockdowns and the growing infodemic, ensuring the community's access to scientific knowledge [19-20]. Considering the strengthening of these realities in the coming years, the ROSSIO platform will take on an important role in the aggregation, dissemination, curation and study of resources related with SSAH and located in different cultural, educational and diplomatic Portuguese institutions, as well as their integration in international networks.
The discovery portal will make it possible to research in several cultural, educational and diplomatic institutions at the same time, as well as to fine-tune the research carried out and, thus, increase the number of views of the resources available, highlighting both the valuable documents of Portuguese History (e.g. Medieval Royal Chancelleries; First Portuguese videos), as well as collections and sources related to Global History (e.g. UNESCO Memory of the World Programme). The exhibitions and digital collections will provide a way of presenting these digital resources, aimed at a wider audience, contributing to promote social inclusion in scientific knowledge, but also to enhance the digital literacy of society. The exhibitions and digital collections will encourage users to search for digital resources in the discovery portal to learn more about the objects displayed, but also to get to be acquainted with the related academic research projects in development. These exhibitions and collections can promote the development of collaborative practices that allow strengthening community participation in the production of scientific knowledge.
The VRE is an additional tool characterised as a personal web-based workspace that will provide all necessary information on digital objects made available by the platform and a means of enhancing a community of practices, enabling the development of collaborative initiatives. Although VRE has traditionally been associated with scientific work, it is intended to benefit other target communities such as the educational (students and teachers) and cultural and tourism industries (tourist guides and museums), following the example of other international infrastructures (e.g. Historiana). In the case of the teachers and students, this may help to bring them closer to scientific and cultural institutions, promoting dynamic and interactive learning spaces, where hands-on initiatives are encouraged [21-22].
Within ROSSIO, the development of controlled vocabularies is expected to enable the normalization and semantic enrichment of the metadata aggregated and produced in the platform. The publication of the ROSSIO vocabularies as linked open data complies with FAIR principles and is a relevant contribution for the Portuguese section of the linguistic linked open data cloud, which remains underrepresented in terms of number of resources. For example, of the more than 100,000 resources listed in LingHub, a directory for linked data language resources, only 96 are in Portuguese24.
Furthermore, the deployment of applications for managing and publishing SKOS vocabularies is expected to facilitate the collaborative development of domain or institutional specific controlled vocabularies by members of the ROSSIO consortium, functioning as a hub for information organization activities in the SSAH and in the Portuguese language.
The platform and its services will, hopefully, contribute to the development and promotion of the best international practices to Portuguese cultural and academic institutions, focusing on state-of-the-art procedures for safeguarding documentation and its subsequent connection, enrichment and dissemination in digital platforms and infrastructures to the general public. The experience acquired in the building process of the platform will also encourage and facilitate the entry of new content providers in the future, from central to local Portuguese heritage institutions. A good example of this is the recent incorporation of the Diplomatic Institute at the Portuguese Ministry of Foreign Affairs as a content provider (2021). ROSSIO could also be an asset for local and generally small-sized cultural institutions (e.g. municipal historical archives), since many of them do not have the necessary technical skills and financial funds to ensure the aggregation and semantic enrichment of their resources on digital platforms.
Thus, keeping in mind the words of Tim Sherratt regarding platforms, ROSSIO platform intends to be a relevant digital tool for unlocking, sharing and exploring the Portuguese Cultural Heritage.
 European Strategy Forum on Research Infrastructures (2016). European Roodmap for Research Infrastructures Report 2016. Luxembourg: Office for Official Publications of the European Communities. . https://www.esfri.eu/sites/default/files/esfri_roadmap_2006_en.pdf;
 European Strategy Forum on Research Infrastructures (2018). European Roodmap for Research Infrastructures Report 2018. Luxembourg: Office for Official Publications of the European Communities. http://roadmap2018.esfri.eu/
 ACLS Commission on Cyberinfrastructure (2006). Our Cultural Commonwealth: the report of the American Council of learned societies commission on cyberinfrastructure for the humanities and social science.
 Foundation for Science and Technology (2020), Portuguese Roadmap of Research Infrastructures – 2020 Update. Lisbon: FCT. https://www.fct.pt/apoios/equipamento/roteiro/index.phtml.en
 Lagoze, C., Van de Sompel, H., Nelson, M. and Warner, S. (2002). The Open Archives Initiative Protocol for Metadata Harvesting, Version 2.0 <http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm>.
 Van de Sompel, H., Nelson, M. (2015). “Reminiscing About 15 Years of Interoperability Efforts”. D-Lib Magazine, 21(11/12). doi:10.1045/november2015-vandesompel < http://www.dlib.org/dlib/november15/vandesompel/11vandesompel.html>
 Almeida, B., Freire, N., & Monteiro, D. (2021). “The Development of the ROSSIO Thesaurus: Supporting Content Discovery and Management in a Research Infrastructure”. In D. Dosso, S. Ferilli, P. Manghi, A. Poggi, G. Serra, & G. Silvello (Eds.), Proceedings of the 17th Italian Research Conference on Digital Libraries (pp. 138–146). Aachen: CEUR-WS. http://ceur-ws.org/Vol-2816/
 Miles, A., & Bechhofer, S. (2009). SKOS Simple Knowledge Organization System Reference. http://www.w3.org/TR/skos-reference
 ISO 25964-1. (2011). Information and documentation—Thesauri and interoperability with other vocabularies—Part 1: Thesauri for information retrieval. Geneva: ISO.
 Sherratt, T. (2013). From portals to platforms – building new frameworks for user engagement. 1-9. Paper presented at LIANZA 2013, Hamilton, New Zealand. https://doi.org/10.5281/zenodo.3563238
 Kalfatovic, M. R. (2002), Creating a Winning Online Exhibition. A Guide for Libraries, Archives, and Museums. Chicago/London: American Library Association.
 Khoon, L. C., Chennupati, R. K., Foo, S. (2003). “The design and development of an online exhibition for heritage information awareness in Singapore”, Program, 37(2), pp. 85-93.
 Khoon, L. C., Chennupati, R. K. (2014). “Design and development of Web-based Online Exhibitions”, DESIDOC Journal of Library & Information Technology, 32(2), pp. 97-102.
 Natale, M. T., Fernández, S., López, M. (Eds.) (2012). Handbook on virtual exhibitions and virtual performances. Tivoli (Roma): Offiine Grafihe Tiburtine, 2012.
 Antoniou, A., Lepouras, G. L., Vassi lakis, C. (2013). “Methodology for Design of Online Exhibitions”, DESIDOC Journal of Library & Information Technology, 33(3), pp. 158-167.
 Candela, L., Castelli, D., Pagano, P. (2013). “Virtual Research Environments: an overview and a research agenda”, Data Science Journal, 12, pp. 75-81.
 Zhou, J. et al (2020), “Building Science Gateways for Humanities”. In: Practice and Experience in Advanced Research Computing (PEARC’20). New York: Association for Computing Machinery, pp. 327–332. doi: https://doi.org/10.1145/3311790.3396628
 Carusi, A., REIMER, T. (2010), Virtual Research Environment Collaborative Landscape Study: A JISC funded project. [Bristol]: JISC.
 OSÓRIO, A. J. (2020). “Reflexões sobre tecnologia e educação em tempo de pandemia”. In: A Universidade do Minho em tempos de pandemia II. Minho: UMinho Editora, pp. 212-224. https://doi.org/10.21814/uminho.ed.24.9
 RODRIGUES, E. (2020), “A pandemia e a emergência da ciência aberta”. In: A Universidade do Minho em tempos de pandemia. II. Minho: UMinho Editora, pp. 263-294. https://doi.org/10.21814/uminho.ed.24.12
 Elrayies, G. M., (2017), “Flipped Learning as a Paradigm Shift in Architectural Education”, International Education Studies, 10 (1), pp. 93-108.
 Ahmed, Hanaa Ouda Khadri, (2016) “Flipped Learning As A New Educational Paradigm: An Analytical Critical Study”, European Scientific Journal, 12 (10), pp. 417-444.