Data mining functionalities
| Id | Type | Name | Description | Average score of interest (range 0 to 1) |
|---|---|---|---|---|
| 1 | preana | import_data | Functionality to import raw data from a file system in the tool | 0.95 |
| 2 | index | Document extraction and indexing functionalities to retrieve metadata from file system raw data | 1.00 | |
| 3 | import_plan | Functionality to import a file plan in the tool | 0.95 | |
| 4 | import_records | Functionality to import the records (ISO 15 498) in the tool | 0.95 | |
| 5 | ana | content_anonym | Functionality to anonymize the sensitive content (name, email) | 0.30 |
| 6 | meta_sign | Functionality to recognize digital signature from metadata of a file | 0.75 | |
| 7 | file_list_id_title | Functionality of file path analysis. Breakdown of the path into “title-identifier” when possible. This would make it possible to detect if a classification framework is in place, and if so, to link records (ISO 15 498) folders to the file plan section | 0.95 | |
| 8 | content_ner | Functionality of NER to identify dates, names, locations, emails from text content | 0.80 | |
| 9 | content_class_image | Functionality of image classification to detect hand signature, official stamp, etc. | 0.75 | |
| 10 | content_class_text | Functionality of text classification to identify some type of content such as minutes, copyright, etc. | 0.85 | |
| 11 | content_detect_lang | Functionality of language detection | 0.65 | |
| 12 | content_summary | Functionality of automatic summarization | 0.60 | |
| 13 | content_readability | Functionality to attribute a score of readability (easy to hard to read, such as Flesch Kindcaid or Fog) | 0.30 | |
| 14 | content_link_record_plan | Functionality to create a link between file plan and records | 0.80 | |
| 15 | combo | Functionality to compose combination of data metrics | 0.95 | |
| 16 | metric_archiv | Functionality to compute archival metrics | 0.90 | |
| 17 | ocr | Optical character recognition functionality for text scanned document to treat them with all the text mining approaches | 1.00 | |
| 18 | trans_image | Functionality to describe automatically an image | 0.65 | |
| 19 | trans_sound | Functionality of speech to text for audio and video content | 0.75 | |
| 20 | name_rules | Functionality to detect naming rules of file and directory | 0.80 | |
| 21 | stat | count | Functionality to count any metric of a set of documents | 0.90 |
| 22 | words | Functionality to extract frequent and relevant words | 0.80 | |
| 23 | time | Functionality to see the number of documents created and/or modified over time | 0.95 | |
| 24 | size | Functionality to count the total size of a document group | 0.85 | |
| 25 | search | engine | Functionalities to search in the documents any words in the text and the metadata | 1.00 |
| 26 | filter | Functionalities for filtering search with any metrics of the tool | 1.00 | |
| 27 | sim | Functionality to search for similar documents in a collection from one document or a group | 0.90 | |
| 28 | cluster | Functionality to create several clusters of documents from a query (unsupervised classification). Could be used to identify group of documents without any previous information | 0.80 | |
| 29 | learnauto | text_gen | Text generation functionality: can generate titles, records file descriptions when they are missing | 0.75 |
| 30 | class | Classification functionality for some metrics that may be missing, for example, proposal of a proposed final state, type of sampling, etc. when missing | 0.75 | |
| 31 | admin | sample | Functionality to make/prepare a sample for archival purposes | 0.90 |
| 32 | profile | Profile management functionality. This feature offers a more personalized tool linked to certain user preferences | 0.90 | |
| 33 | combo | Functionality of managing metric combinations and mapping with the archival model | 0.95 |
| Id | Type | Name | Description | Average score of interest |
|---|---|---|---|---|
| 1 | preana | import_data | Functionality to import raw data from a file system in the tool | 0.95 |
| 2 | index | Document extraction and indexing functionalities to retrieve metadata from file system raw data | 1.00 | |
| 3 | import_plan | Functionality to import a file plan in the tool | 0.95 | |
| 4 | import_records | Functionality to import the records (ISO 15 498) in the tool | 0.95 | |
| 5 | ana | content_anonym | Functionality to anonymize the sensitive content (name, email) | 0.30 |
| 6 | meta_sign | Functionality to recognize digital signature from metadata of a file | 0.75 | |
| 7 | file_list_id_title | Functionality of file path analysis. Breakdown of the path into “title-identifier” when possible. This would make it possible to detect if a classification framework is in place, and if so, to link records (ISO 15 498) folders to the file plan section | 0.95 | |
| 8 | content_ner | Functionality of NER to identify dates, names, locations, emails from text content | 0.80 | |
| 9 | content_class_image | Functionality of image classification to detect hand signature, official stamp, etc. | 0.75 | |
| 10 | content_class_text | Functionality of text classification to identify some type of content such as minutes, copyright, etc. | 0.85 | |
| 11 | content_detect_lang | Functionality of language detection | 0.65 | |
| 12 | content_summary | Functionality of automatic summarization | 0.60 | |
| 13 | content_readability | Functionality to attribute a score of readability (easy to hard to read, such as Flesch Kindcaid or Fog) | 0.30 | |
| 14 | content_link_record_plan | Functionality to create a link between file plan and records | 0.80 | |
| 15 | combo | Functionality to compose combination of data metrics | 0.95 | |
| 16 | metric_archiv | Functionality to compute archival metrics | 0.90 | |
| 17 | ocr | Optical character recognition functionality for text scanned document to treat them with all the text mining approaches | 1.00 | |
| 18 | trans_image | Functionality to describe automatically an image | 0.65 | |
| 19 | trans_sound | Functionality of speech to text for audio and video content | 0.75 | |
| 20 | name_rules | Functionality to detect naming rules of file and directory | 0.80 | |
| 21 | stat | count | Functionality to count any metric of a set of documents | 0.90 |
| 22 | words | Functionality to extract frequent and relevant words | 0.80 | |
| 23 | time | Functionality to see the number of documents created and/or modified over time | 0.95 | |
| 24 | size | Functionality to count the total size of a document group | 0.85 | |
| 25 | search | engine | Functionalities to search in the documents any words in the text and the metadata | 1.00 |
| 26 | filter | Functionalities for filtering search with any metrics of the tool | 1.00 | |
| 27 | sim | Functionality to search for similar documents in a collection from one document or a group | 0.90 | |
| 28 | cluster | Functionality to create several clusters of documents from a query (unsupervised classification). Could be used to identify group of documents without any previous information | 0.80 | |
| 29 | learnauto | text_gen | Text generation functionality: can generate titles, records file descriptions when they are missing | 0.75 |
| 30 | class | Classification functionality for some metrics that may be missing, for example, proposal of a proposed final state, type of sampling, etc. when missing | 0.75 | |
| 31 | admin | sample | Functionality to make/prepare a sample for archival purposes | 0.90 |
| 32 | profile | Profile management functionality. This feature offers a more personalized tool linked to certain user preferences | 0.90 | |
| 33 | combo | Functionality of managing metric combinations and mapping with the archival model | 0.95 |