Skip to content

Commit

Permalink
Merge pull request #90 from aws-samples/feature/customer_label
Browse files Browse the repository at this point in the history
Add: Another call category item, CUST
  • Loading branch information
rstrahan authored Mar 5, 2022
2 parents 4fd093d + c2eb9bf commit 144174e
Show file tree
Hide file tree
Showing 6 changed files with 71 additions and 12 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
- Added call filename regex for CUST metadata, meant to be used as a customer identifier.

## [0.1.3] - 2022-01-14
### Fixed
Expand Down
18 changes: 15 additions & 3 deletions pca-main.template
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ Parameters:
Regular Expression (regex) used to parse call GUID from audio filenames.
The GUID value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_GUID_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the GUID value '2a602c1a-4ca3-4d37-a933-444d575c0222'.

FilenameAgentRegex:
Expand All @@ -255,9 +255,19 @@ Parameters:
Regular Expression (regex) used to parse call AGENT from audio filenames.
The AGENT value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_AGENT_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the AGENT value 'BobS'.


FilenameCustRegex:
Type: String
Default: '_CUST_(.*?)_'
Description: >
Regular Expression (regex) used to parse Customer from audio filenames.
The customer id value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_CUST_(.*?)_' parses
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the Customer value '12345'.

EnableTranscriptKendraSearch:
Type: String
Default: 'No'
Expand Down Expand Up @@ -336,6 +346,7 @@ Metadata:
- FilenameDatetimeFieldMap
- FilenameGUIDRegex
- FilenameAgentRegex
- FilenameCustRegex
- Label:
default: Transcription
Parameters:
Expand Down Expand Up @@ -630,6 +641,7 @@ Resources:
FilenameDatetimeFieldMap: !Ref FilenameDatetimeFieldMap
FilenameGUIDRegex: !Ref FilenameGUIDRegex
FilenameAgentRegex: !Ref FilenameAgentRegex
FilenameCustRegex: !Ref FilenameCustRegex
KendraIndexId:
!If
- ShouldEnableKendraSearch
Expand Down
2 changes: 1 addition & 1 deletion pca-server/cfn/lib/glue-database.template
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ Resources:
StorageDescriptor:
Columns:
- Name: conversationanalytics
Type: struct<guid:string,agent:string,conversationtime:string,conversationlocation:string,processtime:string,languagecode:string,duration:string,speakerlabels:array<struct<speaker:string,displaytext:string>>,sentimenttrends:struct<spk_0:struct<sentimentscore:double,sentimentperquarter:array<struct<quarter:int,score:double,beginoffsetsecs:double,endoffsetsecs:double>>,sentimentchange:double>,spk_1:struct<sentimentscore:double,sentimentperquarter:array<struct<quarter:int,score:double,beginoffsetsecs:double,endoffsetsecs:double>>,sentimentchange:double>>,speakertime:struct<spk_0:struct<totaltimesecs:double>,spk_1:struct<totaltimesecs:double>,nontalktime:struct<instances:array<struct<beginoffsetsecs:double,endoffsetsecs:double,durationsecs:double>>,totaltimesecs:double>>,categoriesdetected:array<string>,issuesdetected:array<struct<text:string,beginoffset:int,endoffset:int>>,customentities:array<struct<name:string,instances:int,values:array<string>>>,sourceinformation:array<struct<transcribejobinfo:struct<transcribeapitype:string,completiontime:string,mediaformat:string,mediasampleratehertz:int,mediaoriginaluri:string,averagewordconfidence:double,mediafileuri:string,transcriptionjobname:string,channelidentification:int>>>,entityrecognizername:string>
Type: struct<guid:string,agent:string,cust:string,conversationtime:string,conversationlocation:string,processtime:string,languagecode:string,duration:string,speakerlabels:array<struct<speaker:string,displaytext:string>>,sentimenttrends:struct<spk_0:struct<sentimentscore:double,sentimentperquarter:array<struct<quarter:int,score:double,beginoffsetsecs:double,endoffsetsecs:double>>,sentimentchange:double>,spk_1:struct<sentimentscore:double,sentimentperquarter:array<struct<quarter:int,score:double,beginoffsetsecs:double,endoffsetsecs:double>>,sentimentchange:double>>,speakertime:struct<spk_0:struct<totaltimesecs:double>,spk_1:struct<totaltimesecs:double>,nontalktime:struct<instances:array<struct<beginoffsetsecs:double,endoffsetsecs:double,durationsecs:double>>,totaltimesecs:double>>,categoriesdetected:array<string>,issuesdetected:array<struct<text:string,beginoffset:int,endoffset:int>>,customentities:array<struct<name:string,instances:int,values:array<string>>>,sourceinformation:array<struct<transcribejobinfo:struct<transcribeapitype:string,completiontime:string,mediaformat:string,mediasampleratehertz:int,mediaoriginaluri:string,averagewordconfidence:double,mediafileuri:string,transcriptionjobname:string,channelidentification:int>>>,entityrecognizername:string>
- Name: speechsegments
Type: array<struct<segmentstarttime:double,segmentendtime:double,segmentspeaker:string,segmentinterruption:boolean,originaltext:string,displaytext:string,textedited:int,loudnessscores:array<double>,sentimentispositive:int,sentimentisnegative:int,sentimentscore:double,basesentimentscores:struct<positive:double,negative:double,neutral:double>,entitiesdetected:array<struct<score:double,type:string,text:string,beginoffset:int,endoffset:int>>,categoriesdetected:array<string>,followoncategories:array<string>,issuesdetected:array<struct<text:string,beginoffset:int,endoffset:int>>,wordconfidence:array<struct<text:string,confidence:double,starttime:double,endtime:double>>>>
Compressed: false
Expand Down
26 changes: 24 additions & 2 deletions pca-server/src/pca/pca-aws-sf-process-turn-by-turn.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ def __init__(self, min_sentiment_pos, min_sentiment_neg, custom_entity_endpoint)
self.comprehendLanguageCode = ""
self.guid = ""
self.agent = ""
self.cust = ""
self.conversationTime = ""
self.conversationLocation = ""
self.speechSegmentList = []
Expand Down Expand Up @@ -212,6 +213,7 @@ def create_output_conversation_analytics(self):
# but to default this to the current processing time.
resultsHeaderInfo["GUID"] = self.guid
resultsHeaderInfo["Agent"] = self.agent
resultsHeaderInfo["Cust"] = self.cust
resultsHeaderInfo["ConversationTime"] = self.conversationTime
resultsHeaderInfo["ConversationLocation"] = self.conversationLocation
resultsHeaderInfo["ProcessTime"] = str(datetime.now())
Expand Down Expand Up @@ -1271,7 +1273,7 @@ def set_guid(self, filename):
Tries to parse a GUID for the call from the filename using a configurable Regular Expression.
The GUID value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_GUID_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the GUID value '2a602c1a-4ca3-4d37-a933-444d575c0222'.
'''
regex = cf.appConfig[cf.CONF_FILENAME_GUID_REGEX]
Expand All @@ -1290,7 +1292,7 @@ def set_agent(self, filename):
Tries to parse an Agent name or ID from the filename using a configurable Regular Expression.
The AGENT value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_AGENT_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the Agent value 'BobS'.
'''
regex = cf.appConfig[cf.CONF_FILENAME_AGENT_REGEX]
Expand All @@ -1304,6 +1306,25 @@ def set_agent(self, filename):
agent='None'
self.agent = agent

def set_cust(self, filename):
'''
Regular Expression (regex) used to parse Customer from audio filenames.
The customer id value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_CUST_(.*?)_' parses
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the Customer value '12345'.
'''
regex = cf.appConfig[cf.CONF_FILENAME_CUST_REGEX]
print(f"INFO: Parsing CUST from filename '{filename}' using regex: '{regex}'.")
try:
match = re.search(regex, filename)
cust = " ".join(match.groups()) or 'None'
print(f"INFO: Parsed CUST: '{cust}'")
except:
print(f"WARNING: Unable to parse CUST name/ID from filename {filename}, using regex: '{regex}'. Defaulting to 'None'.")
cust='None'
self.cust = cust

def load_simple_entity_string_map(self):
"""
Loads in any defined simple entity map for later use - this must be a CSV file, but it will be defined
Expand Down Expand Up @@ -1463,6 +1484,7 @@ def parse_transcribe_file(self, sf_event):
# Parse Call GUID and Agent Name/ID from filename if possible
self.set_guid(job_name)
self.set_agent(job_name)
self.set_cust(job_name)

# Work out the conversation time and set the language code
self.calculate_transcribe_conversation_time(job_name)
Expand Down
5 changes: 3 additions & 2 deletions pca-server/src/pca/pcaconfiguration.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
CONF_FILENAME_DATETIME_FIELDMAP = "FilenameDatetimeFieldMap"
CONF_FILENAME_GUID_REGEX = "FilenameGUIDRegex"
CONF_FILENAME_AGENT_REGEX = "FilenameAgentRegex"
CONF_FILENAME_CUST_REGEX = "FilenameCustRegex"
CONF_FILTER_MODE = "VocabFilterMode"
CONF_FILTER_NAME = "VocabFilterName"
CONF_KENDRA_INDEX_ID = "KendraIndexId"
Expand Down Expand Up @@ -122,8 +123,8 @@ def loadConfiguration():
CONF_FILTER_MODE, CONF_FILTER_NAME,
CONF_FILENAME_DATETIME_REGEX, CONF_FILENAME_DATETIME_FIELDMAP,
CONF_FILENAME_GUID_REGEX, CONF_FILENAME_AGENT_REGEX,
CONF_KENDRA_INDEX_ID])
fullParamList4 = ssm.get_parameters(Names=[CONF_WEB_URI, CONF_TRANSCRIBE_API, CONF_REDACTION_TRANSCRIPT,
CONF_FILENAME_CUST_REGEX])
fullParamList4 = ssm.get_parameters(Names=[CONF_KENDRA_INDEX_ID, CONF_WEB_URI, CONF_TRANSCRIBE_API, CONF_REDACTION_TRANSCRIPT,
CONF_REDACTION_AUDIO])

# Extract our parameters into our config
Expand Down
31 changes: 27 additions & 4 deletions pca-ssm/cfn/ssm.template
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ Parameters:
Regular Expression (regex) used to parse call GUID from audio filenames.
The GUID value must be matched using a parenthesized group in the regex.
Example: the regex '_GUID_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the GUID value '2a602c1a-4ca3-4d37-a933-444d575c0222'.

FilenameAgentRegex:
Expand All @@ -210,9 +210,19 @@ Parameters:
Regular Expression (regex) used to parse call AGENT from audio filenames.
The AGENT value must be matched using a parenthesized group in the regex.
Example: the regex '_AGENT_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the AGENT value 'BobS'.

FilenameCustRegex:
Type: String
Default: '_CUST_(.*?)_'
Description: >
Regular Expression (regex) used to parse Customer from audio filenames.
The customer id value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_CUST_(.*?)_' parses
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the Customer value '12345'.

KendraIndexId:
Type: String
Description: Kendra Index ID, or empty string if call transcription indexing is not enabled
Expand Down Expand Up @@ -520,7 +530,7 @@ Resources:
Regular Expression (regex) used to parse call GUID from audio filenames.
The GUID value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_GUID_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the GUID value '2a602c1a-4ca3-4d37-a933-444d575c0222'.

FilenameAgentRegexParameter:
Expand All @@ -533,9 +543,22 @@ Resources:
Regular Expression (regex) used to parse call AGENT from audio filenames.
The AGENT value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_AGENT_(.*?)_' parses
the filename: AutoRepairs1_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the AGENT value 'BobS'.

FilenameCustRegexParameter:
Type: "AWS::SSM::Parameter"
Properties:
Name: FilenameCustRegex
Type: String
Value: !Ref FilenameCustRegex
Description: >
Regular Expression (regex) used to parse Customer from audio filenames.
The customer id value must be matched using one or more parenthesized groups in the regex.
Example: the regex '_CUST_(.*?)_' parses
the filename: AutoRepairs1_CUST_12345_GUID_2a602c1a-4ca3-4d37-a933-444d575c0222_AGENT_BobS_DATETIME_07.55.51.067-09-16-2021.wav
to extract the Customer value '12345'.

KendraIndexIdParameter:
Type: "AWS::SSM::Parameter"
Properties:
Expand Down

0 comments on commit 144174e

Please sign in to comment.