Question:
I’m trying understand how mount
works. I have a S3 bucket named myB
, and a folder in it called test
. I did a mount using
1 2 3 |
var AwsBucketName = "myB" val MountName = "myB" |
My question is that: does it create a link between S3 myB
and databricks, and would databricks access all the files include the files under test
folder? (or if I do a mount using var AwsBucketName = "myB/test"
does it only link databricks to that foldertest
but not anyother files that outside of that folder?)
If so, how do I say list files in test
folder, read that file or or count() a csv file in scala? I did a display(dbutils.fs.ls("/mnt/myB"))
and it only shows the test folder but not files in it. Quite new here. Many thanks for your help!
Answer:
From the Databricks documentation:
1 2 3 4 5 6 7 8 9 10 |
// Replace with your values val AccessKey = "YOUR_ACCESS_KEY" // Encode the Secret Key as that can contain "/" val SecretKey = "YOUR_SECRET_KEY".replace("/", "%2F") val AwsBucketName = "MY_BUCKET" val MountName = "MOUNT_NAME" dbutils.fs.mount(s"s3a://$AccessKey:$SecretKey@$AwsBucketName", s"/mnt/$MountName") display(dbutils.fs.ls(s"/mnt/$MountName")) |
If you are unable to see files in your mounted directory it is possible that you have created a directory under /mnt that is not a link to the s3 bucket. If that is the case try deleting the directory (dbfs.fs.rm) and remounting using the above code sample. Note that you will need your AWS credentials (AccessKey and SecretKey above). If you don’t know them you will need to ask your AWS account admin for them.