Question:
Is it possible to loop through the file/key in Amazon S3 bucket, read the contents and count the number of lines using Python?
For Example:
1 2 3 |
1. My bucket: "my-bucket-name" 2. File/Key : "test.txt" |
I need to loop through the file “test.txt” and count the number of line in the raw file.
Sample Code:
1 2 3 4 5 |
for bucket in conn.get_all_buckets(): if bucket.name == "my-bucket-name": for file in bucket.list(): #need to count the number lines in each file and print to a log. |
Answer:
Using boto3
you can do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
import boto3 # create the s3 resource s3 = boto3.resource('s3') # get the file object obj = s3.Object('bucket_name', 'key') # read the file contents in memory file_contents = obj.get()["Body"].read() # print the occurrences of the new line character to get the number of lines print file_contents.count('\n') |
If you want to do this for all objects in a bucket, you can use the following code snippet:
1 2 3 4 5 |
bucket = s3.Bucket('bucket_name') for obj in bucket.objects.all(): file_contents = obj.get()["Body"].read() print file_contents.count('\n') |
Here is the reference to boto3 documentation for more functionality: http://boto3.readthedocs.io/en/latest/reference/services/s3.html#object
Update: (Using boto 2)
1 2 3 4 5 6 7 8 |
import boto s3 = boto.connect_s3() # establish connection bucket = s3.get_bucket('bucket_name') # get bucket for key in bucket.list(prefix='key'): # list objects at a given prefix file_contents = key.get_contents_as_string() # get file contents print file_contents.count('\n') # print the occurrences of the new line character to get the number of lines |