Login with password
Forgotten password

Semantic analyzer - smart text search 887 views

Semantic Analyser is an open source tool that uses artificial intelligence techniques to process natural language. It can be used to analyse Slovenian texts (e.g. legal documents), to determine which concepts are key to understanding the content and which of these are missing from a given vocabulary or dictionary (e.g. the vocabulary of basic concepts of public administration). The prototype was developed in collaboration with the Faculty of Computer Science and Informatics at the University of Ljubljana and is a set of building blocks of the Orange software system. The semantic analyser scans all the texts and extracts characteristic concepts from them. Depending on which concepts appear in several texts at the same time, it reveals the relatedness between them and, according to this criterion, determines groups and classifies the texts among them. The characteristic concepts of each group can be used to give a quick overview of the content covered in a given collection. A graphical representation shows which group a text belongs to and thus allows you to find texts that deal with related topics. Alternatively, we can describe the content we are looking for with a set of terms and find texts with these terms, as well as with terms that we have not mentioned but are close in content (e.g. synonyms, hyponymys, hypernyms).The characteristic terms become a link between documents, whereby we can discover interdependencies and/or contextual links between documents in one or more different text collections (e.g. finding the most relevant laws for a selected proposal for a measure from a collection of proposals for a measure). This tool also allows the selection of terms from the Central Vocabulary (https://nio.gov.si/nio/asset/centralni+besednjak?lang=en), which become a link between documents in one or more different text collections. For example, it is possible to find those legal documents that refer to a certain source of information in the vocabulary (register, record, list).

Already prepared text collections (laws, proposals to government, etc.) that can be used, analysed and cross-referenced for related content are available at the following link: http://file.biolab.si/text-semantics/data/
A prototype with open source code and materials can be found at the following link: https://github.com/biolab/text-semantics

In the future we plan to develop an application based on the already developed prototype that will be freely available online.
For any further information on the Semantic Analyser tool, please contact: miha.jesenko@gov.si.


Interoperability level
Semantic interoperability
Interoperability sublevel
Semantic assets
ADMS type of interoperability
Service Description
Owner institution






FRI UL, MJU, Revelo
Restrictions of use
Compliance with EU assets
Creative Commons Attribution 4.0 International (CC BY 4.0)

View comments

Ni komentarjev


Please login to add comments