Question:
I am attempting to upload my cleaned (and split data using kfold) to s3 so that I can use sagemaker to create a model using it (since sagemaker wants an s3 file with training and test data). However, whenever I attempt to upload the csv to s3 it runs but I don’t see the file in s3.
I have tried changing which folder I access in sagemaker, or trying to upload different types of files none of which work. In addition, I have tried the approaches in similar Stack Overflow posts without success.
Also note that I am able to manually upload my csv to s3, just not through sagemaker automatically.
The code below is what I currently have to upload to s3, which I have copied directly from AWS documentation for file uploading using sagemaker.
1 2 3 4 5 6 7 8 9 10 11 |
import io import csv import boto3 #key = "{}/{}/examples".format(prefix,data_partition_name) #url = 's3n://{}/{}'.format(bucket, key) name = boto3.Session().resource('s3').Bucket('nc-demo-sagemaker').name print(name) boto3.Session().resource('s3').Bucket('nc-demo-sagemaker').upload_file('train', '/') print('Done writing to {}'.format('sagemaker bucket')) |
I expect that when I run that code snippet, I am able to upload the training and test data to the folder I want for use in creating sagemaker models.
Answer:
I always upload files from Sagemaker notebook instance to S3 using this code. This will upload all the specified folder’s contents to S3. Alternatively, you can specify a single file to upload.
1 2 3 4 5 6 7 |
import sagemaker s3_path_to_data = sagemaker.Session().upload_data(bucket='my_awesome_bucket', path='local/path/data/train', key_prefix='my_crazy_project_name/data/train') |
I hope this helps!