-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP worker refactoring #221
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…at is tailored towards SPARQL Protocol. Each worker uses a single HttpClient and handles work completion conditions itself.
Refactored SPARQLProtocolWorker to record workerId and execution stats for each worker. WorkerId was added to uniquely identify each worker. An ExecutionStats inner class was created to track start time, duration, HTTP status code, content length, number of bindings, and number of solutions for each worker's task.
This commit changes the query building mechanism within SPARQLProtocolWorker.java, shifting from StringBuilder to InputStream, aiming to support processing of large queries, and reduce overhead from using String for queryID. Now it reads queries directly from QueryHandler's data stream, with modifications to a number of HTTP Request methods to accommodate this change. The refactor also includes addition of new method in Query Handler which returns 'QueryHandle' record—a container for index and InputStream for a query."
Introduced InputStream support in the QueryList and QuerySource to handle large queries more efficiently. Changes have been made to IndexedQueryReader, QuerySource, QueryHandler, and several other classes to accommodate the new streaming feature. Previously, all queries were loaded into memory which might cause OutOfMemoryError for large queries. It still depends on the SPARQL worker used if queries are streamed to the client.
…dled responses to avoid repeated processing. It uses a concurrent hash map to store the responses identified by unique keys. This approach aims to improve the efficiency of handling response bodies in multi-threaded scenarios.
…ayOutputStream and complete rewrite of BigByteArrayInputStream. This should increase the performance of both streams significantly.
Implemented the AbstractLanguageProcessor interface to process InputStreams. A new SAX Parser (SaxSparqlJsonResultCountingParser) was introduced for SPARQL JSON results, returning solutions, bound values, and variables.
* delegated executeQuery method * reuse bbaos if not consumed * removed assert for non-differing content-length header value and actual content length * better logging for malformed url
# Conflicts: # pom.xml # src/main/java/org/aksw/iguana/cc/config/IguanaConfig.java # src/main/java/org/aksw/iguana/cc/config/elements/ConnectionConfig.java # src/main/java/org/aksw/iguana/cc/config/elements/DatasetConfig.java # src/main/java/org/aksw/iguana/cc/config/elements/MetricConfig.java # src/main/java/org/aksw/iguana/cc/config/elements/StorageConfig.java # src/main/java/org/aksw/iguana/cc/config/elements/TaskConfig.java # src/main/java/org/aksw/iguana/cc/controller/TaskController.java # src/main/java/org/aksw/iguana/cc/lang/AbstractLanguageProcessor.java # src/main/java/org/aksw/iguana/cc/lang/QueryWrapper.java # src/main/java/org/aksw/iguana/cc/lang/impl/RDFLanguageProcessor.java # src/main/java/org/aksw/iguana/cc/lang/impl/SPARQLLanguageProcessor.java # src/main/java/org/aksw/iguana/cc/model/QueryExecutionStats.java # src/main/java/org/aksw/iguana/cc/query/handler/QueryHandler.java # src/main/java/org/aksw/iguana/cc/tasks/AbstractTask.java # src/main/java/org/aksw/iguana/cc/tasks/Task.java # src/main/java/org/aksw/iguana/cc/tasks/TaskManager.java # src/main/java/org/aksw/iguana/cc/tasks/stresstest/Stresstest.java # src/main/java/org/aksw/iguana/cc/tasks/stresstest/storage/impl/NTFileStorage.java # src/main/java/org/aksw/iguana/cc/tasks/stresstest/storage/impl/RDFFileStorage.java # src/main/java/org/aksw/iguana/cc/tasks/stresstest/storage/impl/TriplestoreStorage.java # src/main/java/org/aksw/iguana/cc/worker/AbstractWorker.java # src/main/java/org/aksw/iguana/cc/worker/Worker.java # src/main/java/org/aksw/iguana/cc/worker/impl/CLIInputFileWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/CLIInputPrefixWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/CLIInputWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/CLIWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/HttpGetWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/HttpPostWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/HttpWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/MultipleCLIInputWorker.java # src/main/java/org/aksw/iguana/cc/worker/impl/UPDATEWorker.java # src/main/java/org/aksw/iguana/rp/storage/TripleBasedStorage.java # src/test/java/org/aksw/iguana/cc/config/WorkflowTest.java # src/test/java/org/aksw/iguana/cc/lang/SPARQLLanguageProcessorTest.java # src/test/java/org/aksw/iguana/cc/tasks/storage/impl/NTFileStorageTest.java # src/test/java/org/aksw/iguana/cc/tasks/storage/impl/TriplestoreStorageTest.java # src/test/java/org/aksw/iguana/cc/tasks/stresstest/StresstestTest.java # src/test/java/org/aksw/iguana/cc/worker/HTTPWorkerTest.java # src/test/java/org/aksw/iguana/cc/worker/MockupWorker.java # src/test/java/org/aksw/iguana/cc/worker/UPDATEWorkerTest.java # src/test/java/org/aksw/iguana/cc/worker/impl/CLIWorkersTests.java # src/test/java/org/aksw/iguana/cc/worker/impl/HttpPostWorkerTest.java # src/test/resources/controller_test.properties
* this commit also moved some packages
* also updates CSVStorage * adds Storable interface
* also delegate the deserializer class for the QueryHandler to the QueryHandler itself
This was
linked to
issues
Nov 4, 2023
Closed
Adjusted the test as well and integrate it in the StresstestResultProcessor and Storages.
The implemented method searches every class that has been annotated with ContentType and maps its value with the clas. This is done with the spring-framework.
I accidently enabled the BigByteArrayStream tests, and it looks like they cause some issues on the GitHub system. 🤔 |
nck-mlcnv
added
the
breaking change
Changes that cause changes in the config file, output format, ontology or commandline interface.
label
Nov 6, 2023
bigerl
requested changes
Nov 9, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things.
src/main/java/org/aksw/iguana/cc/metrics/ModelWritingMetric.java
Outdated
Show resolved
Hide resolved
src/main/java/org/aksw/iguana/cc/tasks/impl/StresstestResultProcessor.java
Outdated
Show resolved
Hide resolved
src/main/java/org/aksw/iguana/cc/tasks/impl/StresstestResultProcessor.java
Show resolved
Hide resolved
bigerl
approved these changes
Nov 13, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
breaking change
Changes that cause changes in the config file, output format, ontology or commandline interface.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SPARQLProtocolWorker is a draft for a better, more reliable worker that is tailored towards SPARQL Protocol. Each worker uses a single HttpClient and handles work completion conditions itself.
It also covers sending and receiving HTTP request and request bodies that exceed 2GB.This PR gives an idea what the internals of such a worker could look like. It doesn't provide a full implementation and the code is not yet used within IGUANA.
TODOs:
Implement LanguageProcessors for xml, csv and tsv SPARQL results -> move to separate issuePort LanguageProcessor for RDF results -> move to separate issueAdjust Streestest to start a ResponseBodyProcessor per unique (QuerySource,LanguageProcessor)document adjusted behaviorUse LSQ to decide if a query is an update query; if yes, use the update endpoint -> move to separate issueFuture improvements: