You can use AWS SDK for python (boto3) to list all objects and keys (prefix) in an Amazon S3 bucket. The same method can also be used to list all objects (files) in a specific key (folder).
Step 1: Install and configure boto3 in your system
https://cloudaffaire.com/how-to-install-python-boto3-sdk-for-aws/
https://cloudaffaire.com/how-to-configure-python-boto3-sdk-for-aws/
https://pypi.org/project/argparse/
Step 2: Create a python script to list all objects with prefix in S3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
## Create a python script to list all s3 buckets cat << EOF > list_objects.py import boto3 from botocore.exceptions import ClientError import argparse parser = argparse.ArgumentParser(description='List all objects with prefix') parser.add_argument('--bucket_name', type=str, help='bucket name') parser.add_argument('--prefix', type=str, nargs='?', help='prifix') parser.add_argument('--delimiter', type=str, nargs='?', help='delimiter') parser.add_argument('--start_after', type=str, nargs='?', help='start after key') args = parser.parse_args() s3_paginator = boto3.client('s3').get_paginator('list_objects_v2') def list_objects(bucket_name, prefix, delimiter, start_after): prefix = prefix[1:] if prefix.startswith(delimiter) else prefix start_after = (start_after or prefix) if prefix.endswith(delimiter) else start_after for page in s3_paginator.paginate(Bucket=bucket_name, Prefix=prefix, StartAfter=start_after): for content in page.get('Contents', ()): print(content['Key']) bucket_name = args.bucket_name if args.prefix is not None: prefix = args.prefix else: prefix = '/' if args.delimiter is not None: delimiter = args.delimiter else: delimiter = '/' if args.start_after is not None: start_after = args.start_after else: start_after = '' list_objects(bucket_name, prefix, delimiter, start_after) EOF |
Step 3: Exacute the script to list all files and folders in a S3 bucket
1 2 3 |
## List all objects of a s3 bucket python3 list_objects.py --bucket_name cloudaffaire python3 list_objects.py --bucket_name cloudaffaire --prefix targetDir |
Note: The script will return all the objects as pagination logic (max object count 1000) is included in the script. If you have millions or billions of objects in the bucket then modify the script accordingly so that there is no impact on the system you are executing the script.