Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Computer Vision (CV) functions #27

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ bin/
.gradle/
.idea/
out/

.vscode/
1 change: 1 addition & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ allprojects {
mavenBom "io.awspring.cloud:spring-cloud-aws-dependencies:$springCloudAwsVersion"
mavenBom "org.springframework.boot:spring-boot-dependencies:$springBootVersion"
mavenBom "org.springframework.cloud:spring-cloud-dependencies:$springCloudVersion"
mavenBom "ai.djl:bom:$djlVersion"
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions dependencies.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ ext {
springCloudAwsVersion = '3.0.4'

debeziumVersion = '2.5.0.Final'
djlVersion = '0.26.0'
djlSpringVersion = '0.26'

springIntegrationAws = 'org.springframework.integration:spring-integration-aws:3.0.5'
ftpserverCore = 'org.apache.ftpserver:ftpserver-core:1.2.0'
Expand Down
216 changes: 216 additions & 0 deletions function/spring-computer-vision-function/README.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
= Computer Vision Functions

This module provides functional interface to perform common Computer Vision tasks such as Image Classification, Object Detection, Instance and Semantic Segmentation, Pose Estimation an more.

It leverages the https://docs.djl.ai/index.html[Deep Java Library] (DJL) to enable Java developers to harness the power of deep learning.
DJL serves as a bridge between the rich ecosystem of Java programming and the cutting-edge capabilities of deep learning.
DJL provides integration with popular deep learning frameworks like `TensorFlow`, `PyTorch`, and `MXNet`, as well as support for a variety of pre-trained models using `ONNX Runtime`.

== Beans for injection

This module exposes auto-configurations for the following beans:

* `Function<Message<byte[]>, Message<byte[]>> objectDetection` - Offering `Object Detection` for finding all instances of objects from a known set of categories in an image and `Instance Segmentation` for finding all instances of objects from a known set of categories in an image and drawing a mask on each instance.
* `Function<Message<byte[]>, Message<byte[]>> imageClassifications` - The `Image Classification` task assigns a label to an image from a set of categories.
* `Function<Message<byte[]>, Message<byte[]>> semanticSegmentation` - `Semantic Segmentation` refers to the task of detecting objects of various classes at pixel level.
It colors the pixels based on the objects detected in that space.
* `Function<Message<byte[]>, Message<byte[]>> poseEstimation` - `Pose Estimation` refers to the task of detecting human figures in images and videos, and estimating the pose of the bodies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where my comment is lost, but I'll repeat it.

It is better for end-user just to auto-wire a single function and leave the rest of hard logic up to internal routing.
You still can have those Function instances internally, but better to just expose only a single function for auto-configuration and so.

I would go with Spring Integration router, but probably a simple switch by some configuration property would be enough.

The point is that we pursue here a microservices architecture therefore a single auto-configured service from this module is what everyone would expect in their projects.

Any arguments that we have to leave it as a number of functions ?

Or... We might go with all those beans for functions since I see they are conditional, but let's see if we can make them with the same bean name in the end! So, target project would benefit from the same @Autowire without any paradox of choice for name.


Each of them are conditional by specific configuration properties.

[%autowidth]
|===
|Bean |Activation Properties

|objectDetection
|djl.output-class=ai.djl.modality.cv.output.DetectedObjects

|imageClassifications
|djl.output-class=ai.djl.modality.Classifications

|semanticSegmentation
|djl.output-class=ai.djl.modality.cv.output.CategoryMask

|poseEstimation
|djl.output-class=ai.djl.modality.cv.output.Joints

|===

Once injected, you can use the `apply` method of the `Function` to invoke it and get the result.

All functions take and return a `Message<byte[]>`.
The input message payload contains the image bytes to be processed.
The output message payload contains the original or the augmented image after the processing.
The `computer.vision.function.augment-enabled` property controls whether the augmented image is returned or not.
Defaults to `true`.

== Configuration Options

[%autowidth]
|===
|Property |Description

|djl.application-type
|Defines the CV application task to be performed. Currently supported values are `OBJECT_DETECTION`, `IMAGE_CLASSIFICATION`, `INSTANCE_SEGMENTATION`, `SEMANTIC_SEGMENTATION` and `POSE_ESTIMATION`.

|djl.input-class
|Define input data type, a model may accept multiple input data type. Currently only the `ai.djl.modality.cv.Image` is supported.

|djl.output-class
|Define output data type, a model may generate different outputs. Supported output classes are `ai.djl.modality.cv.output.DetectedObjects`, `ai.djl.modality.cv.output.CategoryMask`, `ai.djl.modality.Classifications`, `ai.djl.modality.cv.output.Joints` .

|djl.urls
|Model repository URLs. Multiple may be supplied to search for models. Specifying a single URL can be used to load a specific model. Can be specified as comma delimited field or as an array in the configuration file.
Current supported archive formats: `zip`, `tar`, `tar.gz`, `tgz`, `tar.z`.

Supported URL schemes: `file://` - load a model from local directory or archive file., `http(s)://` - load a model from an archive file from web server, `jar://` - load a model from an archive file in the class path, `djl://` - load a model from the model zoo, `s3://` - load a model from S3 bucket (requires djl aws extension), `hdfs://` - load a model from HDFS file system (requires djl hadoop extension)

|djl.model-filter
| https://github.com/deepjavalibrary/djl/tree/master/model-zoo#how-to-find-a-pre-trained-model-in-the-model-zoo[Model Filters] used to lookup a model from model zoo .

|djl.group-id
|Defines the `groupId` of the model to be loaded from the zoo.

|djl.model-artifact-id
|Defines the `artifactId` of the model to be loaded from the zoo.

|djl.model-name
|(Optional) Defines the modelName of the model to be loaded.
Leave it empty if you want to load the latest version of the model.
Use "saved_model" for TensorFlow saved models.

|djl.engine
| Name of teh https://docs.djl.ai/docs/engine.html[Engine] to use https://docs.djl.ai/docs/engine.html#supported-engines[Supported engine names].

|djl.translator-factory
| https://javadoc.io/doc/ai.djl/api/latest/ai/djl/translate/Translator.html[Translator] provides model pre-processing and postprocessing functionality. Multiple https://javadoc.io/doc/ai.djl/api/latest/ai/djl/modality/cv/translator/package-summary.html[translators] are provided for different models, but you can implement your own translator if needed (see []). The translator-factory property allow to specify the translator to be used with the model.

|computer.vision.function.output-header-name
|Name of the header that contains the JSON payload computed by the functions.

|computer.vision.function.augment-enabled
|Enable image augmentation (false by default).

|===

=== Example Configurations

All computer vision examples use the following Java code snippet to invoke the function:

[source,Java]
----
@SpringBootApplication
public class TfObjectDetectionBootApp implements CommandLineRunner {

@Autowired
private Function<Message<byte[]>, Message<byte[]>> cvFunction;

@Override
public void run(String... args) throws Exception {
byte[] inputImage = new ClassPathResource("Image URI").getInputStream().readAllBytes();

Message<byte[]> outputMessage = cvFunction.apply(
MessageBuilder.withPayload(inputImage).build());

// Augmented output image.
byte[] outputImage = outputMessage.getPayload();

// JSON payload with the detected objects and their bounding boxes.
String jsonBoundingBoxes = outputMessage.getHeader("cvjson", String.class);
}

public static void main(String[] args) {
SpringApplication.run(TfObjectDetectionBootApp.class);
}
}
----

==== Object Detection (TensorFlow)

You can leverage any of the existing [TensorFlow models]. Just compy the url of the model archive as djl.urls property and set the djl.translator-factory to `org.springframework.cloud.fn.computer.vision.translator.TensorflowSavedModelObjectDetectionTranslatorFactory`.

----
computer.vision.function.augment-enabled=true
djl.application-type=OBJECT_DETECTION
djl.input-class=ai.djl.modality.cv.Image
djl.output-class=ai.djl.modality.cv.output.DetectedObjects
djl.engine=TensorFlow
djl.urls=http://download.tensorflow.org/models/object_detection/tf2/20200711/faster_rcnn_inception_resnet_v2_1024x1024_coco17_tpu-8.tar.gz
djl.model-name=saved_model
djl.translator-factory=org.springframework.cloud.fn.computer.vision.translator.TensorflowSavedModelObjectDetectionTranslatorFactory
djl.arguments.threshold=0.3
----

==== Object Detection (Yolo v8)

You can use the same Java snipped above, just change the configuration to use the Yolo v8 model:

----
computer.vision.function.augment-enabled=true
djl.application-type=OBJECT_DETECTION
djl.input-class=ai.djl.modality.cv.Image
djl.output-class=ai.djl.modality.cv.output.DetectedObjects
djl.engine=OnnxRuntime
djl.urls=djl://ai.djl.onnxruntime/yolov8n
djl.translator-factory=ai.djl.modality.cv.translator.YoloV8TranslatorFactory
djl.arguments.threshold=0.3
djl.arguments.width=640
djl.arguments.height=640
djl.arguments.resize=true
djl.arguments.toTensor=true
djl.arguments.applyRatio=true
djl.arguments.maxBox=1000
----

==== Instance Segmentation

Same Java code snipped but with the following configuration:

----
computer.vision.function.augment-enabled=true
djl.application-type=INSTANCE_SEGMENTATION
djl.input-class=ai.djl.modality.cv.Image
djl.output-class=ai.djl.modality.cv.output.DetectedObjects
djl.arguments.threshold=0.3

djl.model-filter.backbone=resnet18
djl.model-filter.flavor=v1b
djl.model-filter.dataset=coco
----

Note that here we didn't specify the model to be used, but used the model-filter to find a compatible model from the model zoo.

==== Semantic Segmentation

Same Java code snipped but with the following configuration:

----
computer.vision.function.augment-enabled=true
djl.application-type=SEMANTIC_SEGMENTATION
djl.input-class=ai.djl.modality.cv.Image
djl.output-class=ai.djl.modality.cv.output.CategoryMask
djl.arguments.threshold=0.3

djl.urls=https://mlrepo.djl.ai/model/cv/semantic_segmentation/ai/djl/pytorch/deeplabv3/0.0.1/deeplabv3.zip
djl.translator-factory=ai.djl.modality.cv.translator.SemanticSegmentationTranslatorFactory
djl.engine=PyTorch
----

==== Image Classification

----
djl.application-type=IMAGE_CLASSIFICATION
djl.input-class=ai.djl.modality.cv.Image
djl.output-class=ai.djl.modality.Classifications
djl.arguments.threshold=0.3
djl.engine=MXNet
----

== Tests

See this link:src/test/java/org/springframework/cloud/fn/computer/vision/ComputerVisionFunctionConfigurationTests.java[test suite] for examples of how this function is used.

The link:src/test/java/org/springframework/cloud/fn/computer/vision/JsonHelperTests.java[JsonHelperTests] validates the JSON serialization and deserialization of the `ComputerVisionFunctionConfiguration` class values object classes.

== Other usage

7 changes: 7 additions & 0 deletions function/spring-computer-vision-function/build.gradle
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
dependencies {
api "ai.djl.spring:djl-spring-boot-starter-autoconfigure:$djlSpringVersion"
api "ai.djl.spring:djl-spring-boot-starter-tensorflow-auto:$djlSpringVersion"
api "ai.djl.spring:djl-spring-boot-starter-pytorch-auto:$djlSpringVersion"
api "ai.djl.spring:djl-spring-boot-starter-mxnet-auto:$djlSpringVersion"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't they have a BOM for all of these dependencies?

The Dependa Bot is not going to understand this module config: we have to go the way as it is done in the dependencies.gradle, for example, for the springIntegrationAws property.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently only mavenBom "ai.djl:bom:$djlVersion" exist not for the spring boot auto-configuraitons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chances that we can raise a GH issue against that project, so they'll address it this or other way?

runtimeOnly "ai.djl.onnxruntime:onnxruntime-engine"
}
Loading