ORTOLANG Deposit and sharing

Speech and Language Data Repository (SLDR/ORTOLANG)

Investissements d'avenir  Huma-Num  CLARIN

Open archives (OAI-PMH)

Frequently Asked Questions

1. What is SLDR ?

SLDR (Speech and Language Data Repository) is a repository of oral language and multimodal data. Since 2015, it has the status of center CLARIN-C.
Alongside with the CNRTL et the Nanterre-Orléans Center, it is one of the parts of ORTOLANG, national french platform for sharing language data.

2. Which services are provided ?

SLDR offers with its partners a service to manage all stages of conservation and dissemination:
- Assistance in data formatting;
- Web interface for the deposit;
- Description of the data with DublinCore and OLAC metadata standards;
- Creation and management of persistent identifiers;
- Secure storage;
- Long-term preservation in connection with the CINES.

3. What can you do on SLDR ?

Mainly the following: deposit data, generate metadata, view the descriptive schedules and, depending on access rights, download the files.

4. Which kind of data can you deposit ?

Audio and video recordings, raw data, ...
Lexicons, knowledge bases, annotations, transcriptions, ...
Tools for analysis and data enrichment.
SLDR also allows to group these different objects in collections.

5. Where are the data stored ?

The data is now stored on the servers of INIST in Nancy where the platform ORTOLANG is hosted. The objects that have been sent for long-term preservation are stored at CINES.
SLDR makes also a double back-up of the data it manages.

6. Who can deposit data on SLDR ?

Any registered user, either personal or institutional. The SLDR recognizes two types of users: those within public research; those under another domain, including industry and trade.
The nationality is not taken into account.

7. How do you deposit your data ?

You must first create an account on the SLDR site.
After validation of your account by administrators, you'll be able to create a metadata record from the link "Deposit / Edit" in the menu.
After you have completed the form with the required fields, you can fill in the others progressively.
The data deposit happens in a second time. Depending on their size, we can integrate the data via FTP transfer or using an external disk.

8. Do you have to organize your data in a particular way before the deposit ?

The SLDR allows the deposit of objects without volume or tree limitation. So it is possible for you to deposit the data in their original filing. It will constitute the download page.
The SLDR team will help you in the selection of formats for long-term preservation and for the choice of a presentation facilitating reuse of data by other users.

9. What's a PID and what is it good for ?

The SLDR has its own mechanism for the allocation of persistent identifiers (PIDs). These identifiers are assigned to an object, and any or all of its files.
Elements constituents of objects or files URLs, these identifiers provide the depositor and users a way to persistently find data regardless of their location or version.
In particular, PIDs offer a persistent reference for all citations.
In addition, the mechanism set up by the SLDR relies on semi-deterministic IDs: ID does not consist of an arbitrary alphanumeric string but predictable elements. This makes it easier to use the PID in citations, even before all data has been received.

10. Is it possible to update objects or create versions ?

The update of data and metadata is always possible as long as the object has not been sent to long-term preservation.
The long-term preservation process implies that an object is finalized: any subsequent changes will result in the creation of a new version. However, it is possible to put in a folder named "§doc" files that can be modified even after the long-term archiving process.

11. What's the long-term preservation ?

It's a special storage process conducted by the CINES (Montpellier) to preserve data for a long time (approx. 30 years).
In particular, it requires to use formats which will probably be permanently accessible (that is, mostly open and free formats).
In the ORTOLANG project, long-term preservation will be reserved for data of particular cultural interest and which can not be reproduced.

12. How are access rights being managed ?

Access rights management on SLDR relies upon the french “Code du Patrimoine” since it can be applied to public scientific archives.
The “Code du Patrimoine” stipulates the free dissemination of public archives but also recognizes some derogations. In particular, the oral data are considered personal data and are therefore protected.
Free dissemination of data on SLDR thus depends on the review of content, permissions signed by the speakers and also file formats.
The definition of the data status so determines specificity and granularity of the access rights.

13. Can you add licenses to your data ?

For free access objects, corresponding to the AR038 derogation (v. Previous question), it is possible to associate a license specifying conditions of use.
One can use free licenses such as Creative Commons, GNU, etc.
On his part, the SLDR automatically presents its own license for objects in filtered access. These objects are only visible to registered users after they have accepted the SLDR license. This filtered access also helps to inform depositors of downloads made from their data.

14. What are the different kinds of access to the files ?

From more open to more closed :
- "Public" files, under AR038 derogation, are accessible to all and can be accompanied by a free license. Access to these same objects can in some cases be filtered : they then require the user identification and acceptance of the SLDR license.
- The other objects, especially under AR048 derogation protecting personal data have more limited access:
- They require in all cases the identification and acceptance of SLDR license;
- They can also be reserved to the category of researchers;
- SLDR also offers depositors the opportunity to create a "privileged access" for certain institutions or individuals;
- It is finally possible to grant specific rights to specific individuals.
- A last category of files, the secret files, are accessible only to the depositor and to the administrator (eg to hold not yet anonymized permissions).