DatagramDB

An implementation of a main-memory database supporting the General Semistructured Model with a Graph Grammar-driven Query Language

DatagramDB implements the Generalised Semistructured Model representing in an uniform representation relational, graph, semistructured (Bergami, 2018) and time series (Bergami & Zegadło, 2023). This was made possible by generalising KnoBAB’s data model so to better support a generic object-oriented data representation (Bergami et al., 2024).

Representing both data and indexing structures for time series in DatagramDB (Bergami & Zegadło, 2023).

By exploiting a declarative and expressing query language leveraging the key concepts from Graph Grammars, we can rewrite sentences parsed from dependency graphs and rewrite them into a syntax-invariant representation. This key technology enables the full logical representation of human-language sentences as advocated by the LaSSI project. This solution outperforms graph databases implementing common graph query languages, thus motivating the need for our system for processing multiple sentences at a time (Bergami et al., 2024).

The project’s wiki provides a full description on the query language and on the possible ways to set-up the project.

References

2024

Matching and Rewriting Rules in Object-Oriented Databases

Giacomo Bergami, Oliver Robert Fox, and Graham Morgan

Mathematics, 2024

Abs Preprint Bib HTML

Graph query languages such as Cypher are widely adopted to match and retrieve data in a graph representation, due to their ability to retrieve and transform information. Even though the most natural way to match and transform information is through rewriting rules, those are scarcely or partially adopted in graph query languages. Their inability to do so has a major impact on the subsequent way the information is structured, as it might then appear more natural to provide major constraints over the data representation to fix the way the information should be represented. On the other hand, recent works are starting to move towards the opposite direction, as the provision of a truly general semistructured model (GSM) allows to both represent all the available data formats (Network-Based, Relational, and Semistructured) as well as support a holistic query language expressing all major queries in such languages. In this paper, we show that the usage of GSM enables the definition of a general rewriting mechanism which can be expressed in current graph query languages only at the cost of adhering the query to the specificity of the underlying data representation. We formalise the proposed query language in terms declarative graph rewriting mechanisms described as a set of production rules L→R while both providing restriction to the characterisation of L, and extending it to support structural graph nesting operations, useful to aggregate similar information around an entry-point of interest. We further achieve our declarative requirements by determining the order in which the data should be rewritten and multiple rules should be applied while ensuring the application of such updates on the GSM database is persisted in subsequent rewriting calls. We discuss how GSM, by fully supporting index-based data representation, allows for a better physical model implementation leveraging the benefits of columnar database storage. Preliminary benchmarks show the scalability of this proposed implementation in comparison with state-of-the-art implementations.
@article{math12172677, author = {Bergami, Giacomo and Fox, Oliver Robert and Morgan, Graham}, title = {Matching and Rewriting Rules in Object-Oriented Databases}, journal = {Mathematics}, volume = {12}, year = {2024}, issue = {17}, number = {2677}, url = {https://www.mdpi.com/2227-7390/12/17/2677}, issn = {2227-7390}, doi = {10.3390/math12172677}, dimensions = {true}, }

2023

Towards a Generalised Semistructured Data Model and Query Language

Giacomo Bergami, and Wiktor Zegadło

SIGWEB Newsl., Aug 2023

Abs Bib HTML

Although current efforts are all aimed at re-defining new ways to harness old data representations, possibly with new schema features, the challenges still open provide evidence of the need for a "diametrically opposite" approach: in fact, all information generated in real contexts is to be understood lacking of any form of schema, where the schema associated with such data is only determined a posteriori based on either a specific application context, or from some data’s facets of interest. This solution should still enable recommendation systems to manipulate the aforementioned data semantically. After providing evidence of these limitations from current literature, we propose a new Generalized Semistructured data Model that makes possible queries expressible in any data representation through a Generalised Semistructured Query Language, both relying upon script v2.0 as a MetaModel language manipulating types as terms as well as allowing structural aggregation functions.
@article{zegadlo, author = {Bergami, Giacomo and Zegad\l{}o, Wiktor}, title = {Towards a Generalised Semistructured Data Model and Query Language}, year = {2023}, issue_date = {Summer 2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {2023}, number = {Summer}, issn = {1931-1745}, url = {https://doi.org/10.1145/3609429.3609433}, doi = {10.1145/3609429.3609433}, journal = {SIGWEB Newsl.}, month = aug, articleno = {4}, numpages = {22}, dimensions = {true}, }

2018

A new Nested Graph Model for Data Integration

Giacomo Bergami

Alma Mater Studiorum – Università di Bologna, Apr 2018

Abs TeX Bib HTML

Despite graph data gained increasing interest in several fields, no data model suitable for both querying and integrating differently structured graph and (semi)structured data has been currently conceived. The lack of operators allowing combinations of (multiple) graphs in current graph query languages (graph joins), and on graph data structure allowing neither data integration nor nested multidimensional representations (graph nesting) are a possible motivation. In order to make such data integration possible, this thesis proposes a novel model (General Semistructured data Model) allowing the representation of both graphs and arbitrarily nested contents (e.g., one node can be contained by more than just one parent node), thus allowing the definition of a nested graph model, where both vertices and edges may include (overlapping) graphs. We provide two graph joins algorithms (Graph Conjunctive Equijoin Algorithm and Graph Conjunctive Less-equal Algorithm) and one graph nesting algorithm (Two HOp Separated Patterns). Their evaluation on top of our secondary memory representation showed the inefficiency of existing query languages’ query plan on top of their respective data models (relational, graph and document-oriented). In all three algorithms, the enhancement was possible by using an adjacency list graph representation, thus reducing the cost of joining the vertices with their respective outgoing (or ingoing) edges, and by associating hash values to both vertices and edges. As a secondary outcome of this thesis, a general data integration scenario is provided where both graph data and other semistructured and structured data could be represented and integrated into the General Semistructured data Model. A new query language outlines the feasibility of this approach (General Semistructured Query Language) over the former data model, also allowing to express both graph joins and graph nestings. This language is also capable of representing both traversal and data manipulation operators.
@phdthesis{amsdottorato8348, author = {Bergami, Giacomo}, year = {2018}, month = apr, title = {A new Nested Graph Model for Data Integration}, school = {Alma Mater Studiorum -- Università di Bologna}, url = {http://amsdottorato.unibo.it/8348/}, keywords = {Graph Join, Graph Nesting, Nested Graph, Property Graph, General Semistructured Data Model, GSQL}, dimensions = {true}, tex = {https://github.com/gyankos/PhDThesis-Latex} }