Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(glue-alpha): fix typos, inconsistencies, docs #33047

Merged
merged 4 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 38 additions & 51 deletions packages/@aws-cdk/aws-glue-alpha/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,6 @@ service that makes it easier to discover, prepare, move, and integrate data
from multiple sources for analytics, machine learning (ML), and application
development.

Without an L2 construct, developers define Glue data sources, connections,
jobs, and workflows for their data and ETL solutions via the AWS console,
the AWS CLI, and Infrastructure as Code tools like CloudFormation and the
CDK. However, there are several challenges to defining Glue resources at
scale that an L2 construct can resolve. First, developers must reference
documentation to determine the valid combinations of job type, Glue version,
worker type, language versions, and other parameters that are required for specific
job types. Additionally, developers must already know or look up the
networking constraints for data source connections, and there is ambiguity
around how to securely store secrets for JDBC connections. Finally,
developers want prescriptive guidance via best practice defaults for
throughput parameters like number of workers and batching.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this simply is not consistent with the voice of READMEs in the rest of the modules. we do not reference anything beyond how the current L2 constructs can be used. the rest of the README needs work, but this particular paragraph isn't necessary


The Glue L2 construct has convenience methods working backwards from common
use cases and sets required parameters to defaults that align with recommended
best practices for each job type. It also provides customers with a balance
Expand Down Expand Up @@ -122,25 +109,25 @@ declare const stack: cdk.Stack;
declare const role: iam.IRole;
declare const script: glue.Code;
new glue.PySparkEtlJob(stack, 'PySparkETLJob', {
jobName: 'PySparkETLJobCustomName',
description: 'This is a description',
role,
script,
glueVersion: glue.GlueVersion.V3_0,
continuousLogging: { enabled: false },
workerType: glue.WorkerType.G_2X,
maxConcurrentRuns: 100,
timeout: cdk.Duration.hours(2),
connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')],
securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'),
tags: {
FirstTagName: 'FirstTagValue',
SecondTagName: 'SecondTagValue',
XTagName: 'XTagValue',
},
numberOfWorkers: 2,
maxRetries: 2,
});
jobName: 'PySparkETLJobCustomName',
description: 'This is a description',
role,
script,
glueVersion: glue.GlueVersion.V3_0,
continuousLogging: { enabled: false },
workerType: glue.WorkerType.G_2X,
maxConcurrentRuns: 100,
timeout: cdk.Duration.hours(2),
connections: [glue.Connection.fromConnectionName(stack, 'Connection', 'connectionName')],
securityConfiguration: glue.SecurityConfiguration.fromSecurityConfigurationName(stack, 'SecurityConfig', 'securityConfigName'),
tags: {
FirstTagName: 'FirstTagValue',
SecondTagName: 'SecondTagValue',
XTagName: 'XTagValue',
},
numberOfWorkers: 2,
maxRetries: 2,
});
```

**Streaming Jobs**
Expand Down Expand Up @@ -369,11 +356,11 @@ declare const stack: cdk.Stack;
declare const role: iam.IRole;
declare const script: glue.Code;
new glue.PySparkEtlJob(stack, 'PySparkETLJob', {
role,
script,
jobName: 'PySparkETLJob',
jobRunQueuingEnabled: true
});
role,
script,
jobName: 'PySparkETLJob',
jobRunQueuingEnabled: true
});
```

### Uploading scripts from the CDK app repository to S3
Expand Down Expand Up @@ -679,20 +666,20 @@ If you have a table with a large number of partitions that grows over time, cons
```ts
declare const myDatabase: glue.Database;
new glue.S3Table(this, 'MyTable', {
database: myDatabase,
columns: [{
name: 'col1',
type: glue.Schema.STRING,
}],
partitionKeys: [{
name: 'year',
type: glue.Schema.SMALL_INT,
}, {
name: 'month',
type: glue.Schema.SMALL_INT,
}],
dataFormat: glue.DataFormat.JSON,
enablePartitionFiltering: true,
database: myDatabase,
columns: [{
name: 'col1',
type: glue.Schema.STRING,
}],
partitionKeys: [{
name: 'year',
type: glue.Schema.SMALL_INT,
}, {
name: 'month',
type: glue.Schema.SMALL_INT,
}],
dataFormat: glue.DataFormat.JSON,
enablePartitionFiltering: true,
});
```

Expand Down
1 change: 0 additions & 1 deletion packages/@aws-cdk/aws-glue-alpha/lib/code.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@ import * as constructs from 'constructs';
* Represents a Glue Job's Code assets (an asset can be a scripts, a jar, a python file or any other file).
*/
export abstract class Code {

/**
* Job code as an S3 object.
* @param bucket The S3 bucket
Expand Down
1 change: 0 additions & 1 deletion packages/@aws-cdk/aws-glue-alpha/lib/connection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ import { CfnConnection } from 'aws-cdk-lib/aws-glue';
* @see https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-connection-connectioninput.html#cfn-glue-connection-connectioninput-connectiontype
*/
export class ConnectionType {

/**
* Designates a connection to a database through Java Database Connectivity (JDBC).
*/
Expand Down
1 change: 0 additions & 1 deletion packages/@aws-cdk/aws-glue-alpha/lib/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,6 @@ export enum JobType {
* The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory.
*/
export enum MaxCapacity {

/**
* DPU value of 1/16th
*/
Expand Down
1 change: 0 additions & 1 deletion packages/@aws-cdk/aws-glue-alpha/lib/database.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,6 @@ export interface DatabaseProps {
* A Glue database.
*/
export class Database extends Resource implements IDatabase {

public static fromDatabaseArn(scope: Construct, id: string, databaseArn: string): IDatabase {
const stack = Stack.of(scope);

Expand Down
60 changes: 30 additions & 30 deletions packages/@aws-cdk/aws-glue-alpha/lib/jobs/job.ts
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,6 @@ export interface ContinuousLoggingProps {
* event-driven flow using the job.
*/
export abstract class JobBase extends cdk.Resource implements IJob {

public abstract readonly jobArn: string;
public abstract readonly jobName: string;
public abstract readonly grantPrincipal: iam.IPrincipal;
Expand Down Expand Up @@ -264,16 +263,13 @@ export abstract class JobBase extends cdk.Resource implements IJob {
*
* @param id construct id.
* @param jobState the job state.
* @private
*/
private metricJobStateRule(id: string, jobState: JobState): events.Rule {
return this.node.tryFindChild(id) as events.Rule ?? this.onStateChange(id, jobState);
}

/**
* Returns the job arn
* @param scope
* @param jobName
*/
protected buildJobArn(scope: constructs.Construct, jobName: string) : string {
return cdk.Stack.of(scope).formatArn({
Expand Down Expand Up @@ -308,13 +304,12 @@ export interface JobImportAttributes {
* JobProperties will be used to create new Glue Jobs using this L2 Construct.
*/
export interface JobProperties {

/**
* Script Code Location (required)
* Script to run when the Glue job executes. Can be uploaded
* from the local directory structure using fromAsset
* or referenced via S3 location using fromBucket
**/
* Script Code Location (required)
* Script to run when the Glue job executes. Can be uploaded
* from the local directory structure using fromAsset
* or referenced via S3 location using fromBucket
*/
readonly script: Code;

/**
Expand All @@ -326,26 +321,29 @@ export interface JobProperties {
* and be granted sufficient permissions.
*
* @see https://docs.aws.amazon.com/glue/latest/dg/getting-started-access.html
**/
*/
readonly role: iam.IRole;

/**
* Name of the Glue job (optional)
* Developer-specified name of the Glue job
*
* @default - a name is automatically generated
**/
*/
readonly jobName?: string;

/**
* Description (optional)
* Developer-specified description of the Glue job
*
* @default - no value
**/
*/
readonly description?: string;

/**
* Number of Workers (optional)
* Number of workers for Glue to use during job execution
*
* @default 10
*/
readonly numberOfWorkers?: number;
Expand All @@ -354,8 +352,9 @@ export interface JobProperties {
* Worker Type (optional)
* Type of Worker for Glue to use during job execution
* Enum options: Standard, G_1X, G_2X, G_025X. G_4X, G_8X, Z_2X
* @default G_1X
**/
*
* @default WorkerType.G_1X
*/
readonly workerType?: WorkerType;

/**
Expand All @@ -366,7 +365,7 @@ export interface JobProperties {
* you can specify is controlled by a service limit.
*
* @default 1
**/
*/
readonly maxConcurrentRuns?: number;

/**
Expand All @@ -377,7 +376,7 @@ export interface JobProperties {
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
* for a list of reserved parameters
* @default - no arguments
**/
*/
readonly defaultArguments?: { [key: string]: string };

/**
Expand All @@ -386,53 +385,58 @@ export interface JobProperties {
* Connections are used to connect to other AWS Service or resources within a VPC.
*
* @default [] - no connections are added to the job
**/
*/
readonly connections?: IConnection[];

/**
* Max Retries (optional)
* Maximum number of retry attempts Glue performs if the job fails
*
* @default 0
**/
*/
readonly maxRetries?: number;

/**
* Timeout (optional)
* The maximum time that a job run can consume resources before it is
* terminated and enters TIMEOUT status. Specified in minutes.
*
* @default 2880 (2 days for non-streaming)
*
**/
*/
readonly timeout?: cdk.Duration;

/**
* Security Configuration (optional)
* Defines the encryption options for the Glue job
*
* @default - no security configuration.
**/
*/
readonly securityConfiguration?: ISecurityConfiguration;

/**
* Tags (optional)
* A list of key:value pairs of tags to apply to this Glue job resourcex
* A list of key:value pairs of tags to apply to this Glue job resources
*
* @default {} - no tags
**/
*/
readonly tags?: { [key: string]: string };

/**
* Glue Version
* The version of Glue to use to execute this job
*
* @default 3.0 for ETL
**/
*/
readonly glueVersion?: GlueVersion;

/**
* Enables the collection of metrics for job profiling.
*
* @default - no profiling metrics emitted.
*
* @see `--enable-metrics` at https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
**/
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
*/
readonly enableProfilingMetrics? :boolean;

/**
Expand All @@ -444,15 +448,13 @@ export interface JobProperties {
* @see https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
**/
readonly continuousLogging?: ContinuousLoggingProps;

}

/**
* A Glue Job.
* @resource AWS::Glue::Job
*/
export abstract class Job extends JobBase {

/**
* Identifies an existing Glue Job from a subset of attributes that can
* be referenced from within another Stack or Construct.
Expand Down Expand Up @@ -500,7 +502,6 @@ export abstract class Job extends JobBase {
* @returns String containing the args for the continuous logging command
*/
public setupContinuousLogging(role: iam.IRole, props: ContinuousLoggingProps | undefined) : any {

// If the developer has explicitly disabled continuous logging return no args
if (props && !props.enabled) {
return {};
Expand Down Expand Up @@ -536,7 +537,6 @@ export abstract class Job extends JobBase {
const s3Location = code.bind(this, this.role).s3Location;
return `s3://${s3Location.bucketName}/${s3Location.objectKey}`;
}

}

/**
Expand Down
Loading
Loading