Data integrity checks

OpenAtlas puts great emphasis on the data quality. Even if the responsibility for the quality of the information entered ultimately lies with the individual projects, avoidance of inconsistent on a technical level are important during the development of the application. It is therefore not possible e.g. to enter the start date of an event after the end date of the same event

Nevertheless mistakes happen, not only on the application level but also when importing data from other projects or deleting files outside of the application, etc. Therefore, functions to check possible inconsistencies were implemented and are described in detail below.

Orphans

Orphans

Here unlinked data are shown. These so called orphans could be artifacts from imports or results of bugs and can be deleted. If orphans seem to appear regularly (without data imports or known bugs) please report this issue to the OpenAtlas team e.g. via the OpenAtlas redmine.

Type

The types listed here were created but have no sub types or associated data. They might have been pre-installed before teams have started entering information or were created and then never used.

Missing files

File entities without a corresponding file are listed here. This is (most likely) caused by the deletion of files from the dataset.

Orphaned files

Files without a corresponding file entity are listed here.

Orphaned IIIF files

IIIF files without a corresponding entity are listed here.

Orphaned annotations

Annotations that are linked to an entity, but file and entity themselves are not linked are listed here. There are three options to proceed:

  • Relink entity: Adds a link between file and entity

  • Remove entity: Removes the entity from the annotation

  • Delete annotation: Deletes the whole annotation

Orphaned subunits

Subunits without a link to the level above, e.g. a feature with no connection to a place.

Circular dependencies

It can be checked if an entity is linked to itself. This could happen, for example, if a person has been entered as is married to themself or a type has itself as super. It shouldn’t be possible to create circular dependencies within the application but nevertheless it’s useful to check the database as this can happen through data imports or bugs. If you find circular dependencies in your dataset regularly with a previous report or known bugs, please report the issue e.g. via the OpenAtlas redmine.

Dates

In this view various results of invalid or inconsistent dates are shown.

Invalid dates

In this tab invalid date combinations are shown, for example dates of begins That are later than end dates. These issues should be fixed, otherwise the user interface won’t allow to update these entities.

Invalid involvement dates

Incompatible dates for involvements are shown. Example: A person participated in an event for longer than the event lasted.

Invalid preceding dates

Here, incompatible dates for chained events are listed. Example: A preceding event starts after the succeeding event.

Invalid sub dates

This tab shows incompatible dates for hierarchical events. Example: A sub event begins before the super event began.

Check similar names

This test will search for similar entity names. Depending on selection and data volume this might take some time. The following options are given:

  • Classes - select the class you want to search for similar names

  • Ratio - select how similar the names should be, 100 is the default and means 100% identical

The function uses the fuzzywuzzy package which uses the Levenshtein Distance.