Skip to content
New issue

Have a question about this project?Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of serviceand privacy statement.We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kendra] sharepoint datasource (AZURE_AD Authentication) fails in data_source_sync_job #4267

Open
ssmailsopened this issue Sep 11, 2024 · 8 comments
Assignees
Labels
bug This issue is a confirmed bug. kendra p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation.

Comments

@ssmails
Copy link

ssmails commented Sep 11, 2024

Describe the bug

[Kendra]

  • sharepoint datasource (AZURE_AD Authentication) create data source - am able to create the datasource using boto3.
  • datasource created above fails the sync, in boto3 data_source_sync_job(). call succeeds, but the sync fails without any errors.

Want to understand if the AZURE_AD authentication option is supported by below versions of boto3.
boto3==1.35.16
botocore==1.35.16

The document states it supports only HTTP_BASIC | OAUTH2 perhttps://boto3.amazonaws /v1/documentation/api/latest/reference/services/kendra/client/create_data_source.html

'AuthenticationType': 'HTTP_BASIC'|'OAUTH2',

But, when I use the code, it states AZURE_AD also as a valid option.

Value at "configuration.sharePointConfiguration.authenticationType" failed to satisfy constraint: Member must satisfy enum value set: [AZURE_AD, HTTP_BASIC, OAUTH2]

If it is supported, why is the sync failing?
Below attached code snippets to reproduce the failure.

Expected Behavior

If AZURE_AD authentication is supported, the sync should succeed.
Manually creating datasource and syncing from Kendra UI for the AZURE AD type authentication is working as expected.

Current Behavior

[Kendra]

  • sharepoint datasource (AZURE_AD Authentication) create data source - am able to create the datasource using boto3.
  • datasource created above fails the sync, in boto3 data_source_sync_job(). call succeeds, but the sync fails without any errors.

Reproduction Steps

def create_new_data_source_sharepoint(self, index_id:str):
"""
Creates a new Kendra data source based on the configuration provided in the YAML file.

Returns:
str: The ID of the created data source, or the ID of the existing data source if one already exists.
"""

data_source_config = self.config['data_source']
logger.info( "config=", data_source_config)
#logger.info( "auth=", data_source_config['configuration']['SharePointConfiguration']['AuthenticationType'])

try:
# using hardcoded sharepoint config
response = self.kendra.create_data_source(
RoleArn='correct role are that works via Kendra UI',
Name=data_source_config['name'],
IndexId='prerecreated Kendra indexid',
Type='SHAREPOINT',
Configuration={
'SharePointConfiguration': {
'SharePointVersion': 'SHAREPOINT_ONLINE',
'Urls': [
'sharepoint site that works via Kendra UI'
],
'AuthenticationType': 'AZURE_AD',
'SecretArn': 'correct are for azure ad, which works via Kendra UI',
}
}
)

data_source_id = response['Id']
logger.info(f "Data source created with ID: {data_source_id}" )

return data_source_id
except Exception as e:
logger.error(f "Error creating data source: {str(e)}" )
raise KendraAdapterException(f "Error creating data source: {str(e)}" )

def start_ingestion_sharepoint(self, index_id, data_source_id):
try:
while True:
status = self.kendra.describe_index(Id=index_id)['Status']
if status == 'ACTIVE':
logger.info(f "Index {index_id} is active" )
break
logger.info(f "Waiting for index {index_id} to become active. Current status: {status}" )
time.sleep(30)

response = self.kendra.start_data_source_sync_job(
Id=data_source_id,
IndexId=index_id
)

sync_job_id = response['ExecutionId']
logger.info(f "Data source sync job started with ID: {sync_job_id}" )
except Exception as e:
logger.error(f "Error starting data source sync job: {str(e)}" )
raise KendraAdapterException(f "Error starting data source sync job: {str(e)}" )

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.35.16

Environment details (OS name and version, etc.)

Mac OS

@ssmails ssmails added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Sep 11, 2024
@tim-finnigan tim-finnigan self-assigned this Sep 12, 2024
@tim-finnigan
Copy link
Contributor

Thanks for reaching out. Thecreate_data_sourcecommand makes a call to the underlyingCreateDataSource API,so this is an issue with the Kendra service API rather than the SDK.

I can confirm that forAuthenticationTypeI get the error messageMember must satisfy enum value set: [AZURE_AD, HTTP_BASIC, OAUTH2]if specifying any other value. But this does not match the documentation of what is supported (AuthenticationType': 'HTTP_BASIC'|'OAUTH2'). I'll reach out to the Kendra service team for more information and will share any updates here.

@tim-finnigan tim-finnigan added service-api This issue is caused by the service API, not the SDK implementation. p2 This is a standard priority issue kendra and removed needs-triage This issue or PR still needs to be triaged. labels Sep 12, 2024
@ssmails
Copy link
Author

ssmails commented Sep 13, 2024

Thanks@tim-finnigan
Wondering if theres any update on this issue?
also, note that, creating datasource and syncing via Kendra UI with works for me with Sharepoint Online V2.0. could it be that boto3 is using V1.0 and not V2.0 for sharepoint?

@tim-finnigan
Copy link
Contributor

Thanks for following up — I'm still waiting for more information from the Kendra team but will share any updates in this issue.

@tim-finnigan
Copy link
Contributor

It looks like the issue is that you need to specify v2 in your template: can you try configuring the v2 connector configuration? See documentation below:

@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Oct 7, 2024
@ssmails
Copy link
Author

ssmails commented Oct 8, 2024

I did not use the Template configuration object.
I used the create_data_source with Type=SHAREPOINT, as documented athttps://boto3.amazonaws /v1/documentation/api/latest/reference/services/kendra/client/create_data_source.html

response = client.create_data_source(
Name='string',
IndexId='string',
Type='SHAREPOINT'

and SharePointConfiguration with type as below.

'SharePointConfiguration': {
'SharePointVersion': 'SHAREPOINT_ONLINE',
....

If these are no longer supported and it is expected to use the TemplateConfiguration, the boto3 docs should clearly state that, with a working example.

@github-actions github-actions bot removed the response-requested Waiting on additional information or feedback. label Oct 9, 2024
@tim-finnigan
Copy link
Contributor

Thanks for following up, I'm trying to get more information from the Kendra service team regarding the documentation and recommended guidance here.

@ssmails
Copy link
Author

ssmails commented Oct 18, 2024

@tim-finnigan,I was able to figure out a way to workaround the Kendra bugs and get a working version of sharepoint with TemplateConfiguration.
If there is a place where you have examples/docs, would be happy to contribute.

@tim-finnigan
Copy link
Contributor

Thanks for following up again. Boto3-specific examples could also be added here:https://boto3.amazonaws /v1/documentation/api/latest/guide/examples.html,or here in the Kendra Developer Guide:https://docs.aws.amazon /kendra/latest/dg/gs- Python.html.What is the documentation change that you were planning to submit?

@tim-finnigan tim-finnigan added the response-requested Waiting on additional information or feedback. label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a confirmed bug. kendra p2 This is a standard priority issue response-requested Waiting on additional information or feedback. service-api This issue is caused by the service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

2 participants