Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v4.0.0 #252

Merged
merged 32 commits into from
Aug 7, 2024
Merged

v4.0.0 #252

merged 32 commits into from
Aug 7, 2024

Conversation

bigerl
Copy link
Member

@bigerl bigerl commented Jun 10, 2024

No description provided.

frensing and others added 24 commits September 23, 2022 21:55
fix accepting all 200-299 status codes on http responses
* fix encoding in post request body

* add neutral query with special characters

* add usage of StandardCharsets

* get workerType from class shorthand

* remove workerType constructor parameter
* fix FolderQuerySourceTest

* FolderQuerySource returns now queries sorted by file name. Tests were adjusted Co-authored-by: Nick <[email protected]>

* Disable occasionally failing test 

---------

Co-authored-by: Alexander Bigerl <[email protected]>
* SPARQLProtocolWorker is a draft for a better, more reliable worker that is tailored towards SPARQL Protocol. Each worker uses a single HttpClient and handles work completion conditions itself.

* Add workerId and ExecutionStats to SPARQLProtocolWorker

Refactored SPARQLProtocolWorker to record workerId and execution stats for each worker. WorkerId was added to uniquely identify each worker. An ExecutionStats inner class was created to track start time, duration, HTTP status code, content length, number of bindings, and number of solutions for each worker's task.

* "Refactor SPARQLProtocolWorker to handle query streams.

This commit changes the query building mechanism within SPARQLProtocolWorker.java, shifting from StringBuilder to InputStream, aiming to support processing of large queries, and reduce overhead from using String for queryID. Now it reads queries directly from QueryHandler's data stream, with modifications to a number of HTTP Request methods to accommodate this change. The refactor also includes addition of new method in Query Handler which returns 'QueryHandle' record—a container for index and InputStream for a query."

* Add streaming support for handling large queries

Introduced InputStream support in the QueryList and QuerySource to handle large queries more efficiently. Changes have been made to IndexedQueryReader, QuerySource, QueryHandler, and several other classes to accommodate the new streaming feature. Previously, all queries were loaded into memory which might cause OutOfMemoryError for large queries. It still depends on the SPARQL worker used if queries are streamed to the client.

* Refactored BigByteArrayOutputStream

* Hashing and large response body support for SPARQLProtocolWorker

* remove dangling javadoc comment

* Scaffold ResponseBodyProcessor. This class keeps track of already handled responses to avoid repeated processing. It uses a concurrent hash map to store the responses identified by unique keys. This approach aims to improve the efficiency of handling response bodies in multi-threaded scenarios.

* Use unsynchronized ByteArrayOutputStream for BigByteArrayInput/BigArrayOutputStream and complete rewrite of BigByteArrayInputStream. This should increase the performance of both streams significantly.

* Add Language Processor and SparqlJsonResultCountingParser

Implemented the AbstractLanguageProcessor interface to process InputStreams. A new SAX Parser (SaxSparqlJsonResultCountingParser) was introduced for SPARQL JSON results, returning solutions, bound values, and variables.

* Completed ResponseBodyProcessor and integrated it into SPARQLProtocolWorker

* Worker integration and removal of a lot of code

* small fixes

* changes to the SPARQLProtocolWorker

* delegated executeQuery method
* reuse bbaos if not consumed
* removed assert for non-differing content-length header value and actual content length
* better logging for malformed url

* Add basic logging for Suite class

* remove JUnit 4 and add surefire plugin

The surefire plugin is used for better control over the available system resources for the test, because the BigByteArrayStream tests can take a lot of them.

* update iguana-schema.json

* Update config file validation and change suiteID generation

This also removes some unused redundant code. The suiteID has also been changed to a string type, that consists of an epoch timestamp in seconds and the hashcode of the configuration file.

* Remove CLIProcessManager.java

* Update schema file and re-enable tests

The validation function has also been made public, for better testing.

* Remove test files for IndexQueryReader

See issue #214.

* Add start and end-time for each worker.

Adjusted the test as well and integrate it in the StresstestResultProcessor and Storages.

* Remove unused dependencies

* Document possible problem with the SPARQLProtocolWorker and the connected client

---------

Co-authored-by: Alexander Bigerl <[email protected]>
Co-authored-by: Alexander Bigerl <[email protected]>
The wrong file was referenced and there was one old instance of the schema file, that hasn't been updated.
* Add the DurationLiteral class which implements RDFDataType

* Fix wrong Supplier import

* Add missing Test annotation

* Cache QuerySource hashCode

* Remove outdated TODO comment

* Fix duration uri

* Cleanup

* Fix the conversion to a duration, that only contains seconds

* Change the assertions of the failing RDFFileStorageTest

* Fix comment

* Update src/main/java/org/aksw/iguana/commons/time/DurationLiteral.java

Co-authored-by: Alexander Bigerl <[email protected]>

* Check parameters for QuerySource and QueryList constructor

* Remove unused comment

* Additional parameter checking and adjust tests

* Revert some parameter checks

* Fix test assertions

* Remove unused method

* Change duration to dayTimeDuration

---------

Co-authored-by: Alexander Bigerl <[email protected]>
* Add more logging messages

* Fix log4j2 configuration

* Implement apache HTTP client

* Implement apache HTTP async client 5

* Fix timeout

* Fixes

* Fix hashing bug

* Fix conversion of byte stream to string

* Implement POST request streaming

* Disable the storing and hashing of responses when the parseResults parameter in the config is false

* Move utility classes

* StreamEntityProducer can send fixed-sized data and is reproducible now

* Make QueryHandler return stream supplier and info about query being cached

* Change RequestFactory behavior

* cached queries will be sent with fixed-sizes request
* requests of cached queries will be cached as well (addresses #223)

* Cleanup

* Preload requests

* Fix IDE warnings

* Fix tests

* Remove unneeded test class

* Add Javadocs

* Change requests

* Move the RequestFactory to a separate class and add comments

* Add comments from overridden methods

* Lower maximum capacity while reading response
@bigerl bigerl changed the title Develop v4.0.0 Jun 10, 2024
nck-mlcnv and others added 5 commits June 13, 2024 15:05
* Add ResponseBodyProcessor timeout

* Update documentation

Co-authored-by: Alexander Bigerl <[email protected]>
* Add more logging messages

* Fix log4j2 configuration

* Implement apache HTTP client

* Implement apache HTTP async client 5

* Fix timeout

* Fixes

* Fix hashing bug

* Fix conversion of byte stream to string

* Implement POST request streaming

* Disable the storing and hashing of responses when the parseResults parameter in the config is false

* Move utility classes

* StreamEntityProducer can send fixed-sized data and is reproducible now

* Make QueryHandler return stream supplier and info about query being cached

* Change RequestFactory behavior

* cached queries will be sent with fixed-sizes request
* requests of cached queries will be cached as well (addresses #223)

* Cleanup

* Preload requests

* Fix IDE warnings

* Fix tests

* Remove unneeded test class

* Add Javadocs

* Add the GraalVM native-maven-plugin for ahead-of-time compilation

* Switch to Logback implementation of SLF4J, as Log4j2 is not supported with GraalVM

* Update native-maven-plugin version

* Native-image builder optimizations

* Remove pre-made graalvm config

* Update native profile

* Catch exceptions inside TriplestoreStorage

* Reset workerId after warmup

* Update native image plugin configuration

* Add scripts for working with native images

* Remove spring

* Rename directory

* Add test workflow

* Fix permissions

* Remove periods

* Fix script

* Fix workflow

* Update workflow

* Test directory upload

* Update workflows

* Update Test Workflow

* Fix workflow

* Another fix

* Rename job

* Remove test workflow

* Make workerID go out of scope

* Add comment for registering LanguageProcessors

* Clean up logging config

* Fix deploy workflow

* Disable non supported tests

* Update pom.xml to automatically generate configuration files for native image

* Update workflows

* Update documentation

* Fix symlink

* Add cpu micro architectures

* Add cpu micro architectures 2

* Update generate-config.sh

* Fix unstable tests

* Fix regex cleanup

* Enable long running tests on environment variable

* Increase the thread count for the apache http client

* Disable re-usage of bbaos and create bbaos of optimal size when possible

* Try to fix something

* Debug logging

* Debug logging 2

* Attempt to fix something

* Attempt to fix something 2

* Attempt to fix something 3

* Attempt to fix something 4

* Attempt to fix something 5

* Attempt to fix something 6

* Make thread dump

* Make thread dump 2

* Attempt to fix something 7

* Attempt to fix something 8

* Attempt to fix something 9

* Attempt to fix something 10

* Attempt to fix something 11

* Finetuning test

* Finetuning test 2

* Cleanup httpclient configuration

* Cleanup tests

* Disable compressed references by default

This option needs be set before compilation and it allows the heap to use more than 32gb.

* Remove test configurations

* Re-enable configurations and decrease timeout in tests

* Add workaround for failing tests

* Adjust test configurations

* Adjust test configurations 2

* Adjust test configurations 3

* Adjust test configurations 4

* Revert "Adjust test configurations 4"

This reverts commit 9bf8cc8.

* Shorten http client configuration

* Add ByteArrayList output and inputstream

* Update SPARQLProtocolWorker to use ByteArrayListOutputStream when response body has unknown length

* Fix bad merge conflict resolve

* Fix size calculation in ByteArrayListOutputStream

* Add test + fix for ByteArrayListInputStream

* Add test for ByteArrayListOutputStream

* Change single log message

* Update exception handling in TriplestoreStorage

* Add execution parameter to configuration generation

* Fix dry-run parameter

* Add comment in TriplestoreStorage

* Change behavior of ByteArrayListInputStream

* Add comments and access modifiers

* Update src/main/java/org/aksw/iguana/cc/storage/impl/TriplestoreStorage.java

Co-authored-by: Alexander Bigerl <[email protected]>

* Update github workflow

---------

Co-authored-by: Alexander Bigerl <[email protected]>
* Update GitHub workflow

* Fix another workflow too

* Cancel last jobs and activate native test
* Fix result datatypes

* Rename method

* Update iguana.owl

* Remove unused properties

* Add missing relations

* Properly add query ids to the result data

* Remove unused properties

* Fix tests

* Fix ontology

* Exchange intersection with unions

* Add disjoints

* Fix more things

* Generate schema file with protege

* Save as OWL/XML file

* Change ontology name

* Switch from xsd:dayTimeDuration to xsd:duration

* Change file extension

* Remove unused prefixes
* Remove old file

* Minor doc change

* Update html documentation page

* Remove unused files

* Move file

* Change image link

* Update mkdocs.yml

* Update docs deployment

* Add depyloment test

* Change trigger

* Change python action

* Change python action 2

* Add ontology deployment

* Fix ontology deployment

* Fix test

* Fix test 2

* Fix test 3

* Fix test 4

* Fix test 5

* Fix test 6

* Fix test 7

* Fix test 8

* Fix test 9

* Fix test 10

* Remove test workflow

* Fix python setup

* Test release files

* Fix workflow

* Fix workflow 2

* Fix workflow 3

* Remove test workflow
@nck-mlcnv nck-mlcnv self-requested a review August 7, 2024 13:35
Copy link
Contributor

@nck-mlcnv nck-mlcnv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍

@bigerl bigerl merged commit 9dc1b50 into main Aug 7, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants