{"id":3749,"date":"2020-03-24T16:17:48","date_gmt":"2020-03-24T22:17:48","guid":{"rendered":"http:\/\/benincosa.com\/?p=3749"},"modified":"2020-03-24T16:21:42","modified_gmt":"2020-03-24T22:21:42","slug":"compressing-python-dictionary-objects-before-storing-in-json-s3-files","status":"publish","type":"post","link":"https:\/\/benincosa.com\/?p=3749","title":{"rendered":"Compressing Python dictionary objects before storing in json S3 files."},"content":{"rendered":"\n<p>Here&#8217;s a quick little script I wrote since I need to test uploading files into s3.  In this case the file generated will be 78 bytes.  When unzipped 170 Bytes.  The reason I wrote this is because I have to upload large amounts of data in json form into S3.  Saving space in S3 results in pretty great savings.  Here is the code: <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/usr\/bin\/env python3\nimport boto3, json\nfrom io import BytesIO\nfrom gzip import GzipFile\n\ndata = {\n    \"foo1\": \"bar\",\n    \"harry1\": \"salad\",\n    \"foo2\": \"bar\",\n    \"harry2\": \"salad\",\n    \"foo3\": \"bar\",\n    \"harry3\": \"salad\",\n    \"foo4\": \"bar\",\n    \"harry4\": \"salad\",\n    \"foo5\": \"bar\",\n    \"harry5\": \"salad\",\n}\n\ngz_body = BytesIO()\ngz = GzipFile(None, 'wb', 9, gz_body)\ngz.write(json.dumps(data).encode('utf-8'))\ngz.close()\n\ns3 = boto3.resource('s3')\nbucket = \"&lt;your datalake bucket>\"\ns3_key = \"test\/test.gz\"\ntry:\n    f = s3.Object(bucket, s3_key).put(Body=gz_body.getvalue())\nexcept Exception as e:\n    print(\"Error: \", e)<\/code><\/pre>\n\n\n\n<p>This code takes a python dictionary and loads it compressed into S3 into s3:\/\/&lt;your datalake bucket>\/test\/test.gz. <\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Here&#8217;s a quick little script I wrote since I need to test uploading files into s3. In this case the file generated will be 78 bytes. When unzipped 170 Bytes. The reason I wrote this is because I have to upload large amounts of data in json form into S3. Saving space in S3 results&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[462],"tags":[1010,932,935,466,895],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/3749"}],"collection":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3749"}],"version-history":[{"count":1,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/3749\/revisions"}],"predecessor-version":[{"id":3750,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/3749\/revisions\/3750"}],"wp:attachment":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3749"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3749"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3749"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}