Django AWS s3 enable file_overwrite only for some files

Here’s a conundrum: what if I want most files to not overwrite in S3, but I want some specific files to be unique and keep their naming? If we enable file_overwrite, all images will keep their name, but if we disable it, then all will be renamed. We might have a parent entity PE1 with a main image, and we want that image to be named PE1-main_image.jpg. How can we achieve that logic only for specific images? The answer is very simple.

During processing, rename your file accordingly.

In your custom storage, set file_overwrite = False and overwrite the get_available_name function:

from storages.backends.s3boto3 import S3Boto3Storage
from storages.utils import get_available_overwrite_name

from web.utils import name_meets_condition

class MediaStorage(S3Boto3Storage):
    location = 'media'
    default_acl = 'public-read'
    file_overwrite = False

    def get_available_name(self, name, max_length=None):
        name = self._clean_name(name)
        if name_meets_condition(name):  # e.g. name.startswith('PE')
            return get_available_overwrite_name(name, max_length)
        
        return super().get_available_name(name, max_length)

Preview HEIC thumbnails when the frontend tools won’t

Since iPhone 8, a new image format has been around. HEIC files are unfortunately not supported by many frontend tools. While some solutions have been found, here’s how we solved it using a backend trick.

Prerequisites: pyheif, PIL

This post has an image featured from https://www.macworld.co.uk/feature/iphone/what-is-heic-3660408/. Use this link to find out more about HEIC/HEIF files if you want to learn more.

In frontend

Setup your frontend tool to fall back to a base64 img src. If the image uploaded is supported natively, use the base64 encoding provided by the frontend tool. Otherwise, make a backend call to convert the .HEIC file to jpg:

function convert_heic_to_jpeg(image_data, imageId, fallback) {
  if (image_data.type != "image/heic") {
    $(imageId).attr('src', fallback);
    return;
  }

  var fd = new FormData();
  var files = image_data;
  fd.append('file', files);

  $.ajax({
        url: '/ajax/convert-heic-to-jpeg/',  # whatever your backend URL for conversion is
        type: 'post',
        data: fd,
        contentType: false,
        processData: false,
        success: function(data) {
          $(imageId).attr('src', data['result']);
          return data;
        }
    });
}

In backend

Configure your URL in routes.py. Write a function that converts the HEIC to JPG, encodes the JPG data to base64 and return it as a JsonResponse.

import pyheif
import io

from PIL import Image

@csrf_exempt
@login_required
def convert_heic_to_jpeg(request):
    heic_img = pyheif.read(request.FILES['file'])
    pil_image = Image.frombytes(
        mode=heic_img.mode, size=heic_img.size, data=heic_img.data
    )

    output = io.BytesIO()
    pil_image.save(output, format="jpeg")
    output.seek(0)

    return JsonResponse({
        'result': "data:image/jpeg;base64," + base64.b64encode(
            output.getvalue()
        ).decode(),
    }, status=200)

Django – Upload image from b64 data to AWS

The previous post covered image file optimisation to AWS S3, using Django/Python. But a lot of frontend editors will only give us the image as b64 data, not as a regular file upload. So how do we create that from scratch?

Requirements: upload an image to S3 after it was created using a frontend editor, upload a b64 image to S3 using Django

Prerequisites: django-storages, PIL

import base64
import io

from django.core.files.base import ContentFile
from PIL import Image

from web.models import MyImageClass


def create_from_b64_data(b64_data, parent_entity):
    prefix, img_data = b64_data.split(',')
    img_data = base64.b64decode(img_data)
    buf = io.BytesIO(img_data)

    image = Image.open(buf)

    ext = image.format.lower()
    new_file = ContentFile(img_data)
    
    new_file.content_type = 'image/{}'.format(ext)
    new_file.name = format_filename(parent_entity, ext)  # customise filename as you wish

    image_data = {
        'parent_entity': parent_entity,
        'image_file': new_file,
        # ... other attributes
    }

    image = MyImageClass(**image_data)
    image.save()  # see previous post for handling in storage.py

    return image

Django – Optimise images before upload to AWS

Image uploads are problematic to store if the files are too big. Our requirements in this case were:

  • convert uploaded images to JPG before upload to S3 server
  • resize images homothetically before upload to S3 server
  • reduce image quality and image file size before uploading them to the S3 server
  • Additionally, create some thumbnails of the uploaded image with a specific naming. I am not going into details about the thumbnail implementation, but I have kept some remnants of it in the article as an illustration that you can do more processing on these images before uploading.
Lena – the most famous face of image optimisation

Prerequisites:

django-storages, PIL

Settings:

in settings.py add DEFAULT_FILE_STORAGE = 'web.utils.storage.MediaStorage'

create the file storage.py under web/utils (this can be different, but it has to match the DEFAULT_FILE_STORAGE setting)

In storage.py

import io
import os
from PIL import Image
from storages.backends.s3boto3 import S3Boto3Storage

from web.utils.images import open_or_convert, homothetical_transformation

class MediaStorage(S3Boto3Storage):
    location = 'media'
    default_acl = 'public-read'
    file_overwrite = False

    def _save(self, name, content):
        if hasattr(content, 'content_type') and content.content_type.startswith('image/'):  # noqa
            return self.generate_thumbnails(name, content)
        else:
            return super()._save(name, content)

    def _save_image(self, picture, filename, quality=100):
        fh = self.open(filename, 'wb')
        sfile = io.BytesIO()

        picture.save(sfile, format='jpeg', quality=quality)

        fh.write(sfile.getvalue())
        fh.close()

    def generate_thumbnails(self, name, content):
        _, ext = os.path.splitext(filename).lower()
        pic = Image.open(content)
        
        # thumbnails or any other extra images to be saved
        # thumbnail = ... e.g. PIL processing to create a square thumbnail
        # filename = ... e.g. thumbnail naming rules
        # self._save_image(thumbnail, filename)

        # main image (homothetical thumbnail)
        name = name.replace(ext, '.jpg')
        main_image = homothetical_transformation(pic)
        self._save_image(main_image, name, 85)

        return name

In web/utils/images.py

from django.conf import settings
from PIL import Image, ImageOps

def open_or_convert(content, ext):
    return Image.open(content)

def homothetical_transformation(pic, new_width=750):
    new_height = int(new_width * pic.height / pic.width)
    result = pic.copy()
    result = ImageOps.exif_transpose(result)
    result = convert_to_rgb(result)
    result.thumbnail((new_width, new_height))

    return result

def convert_to_rgb(image):
    if image.mode == 'P':
        image = image.convert('RGB')
        return image

    if image.mode not in ['RGBA', 'LA']:
        return image

    background = Image.new("RGB", image.size, (255, 255, 255))
    background.paste(image, mask=image.split()[3])  # better handling for transparent images

    return background

Transparency handling is fully explained here.

Number readability with underscores in Python 3

Here is a neat trick for Python numeric values: you can improve readability of large numbers by using underscores as separators, like this:

caption taken from repl.it: https://repl.it/languages/python3

Underscores will be “ignored” by Python when interpreting the numerical value, but for the programmer, this is much more easy on the eye. And yes, it works for floats too. Here is the official link with more examples.

PS: You are always free to opt for the scientific notation of large numbers, especially those with trailing zeroes, which are pretty straightforward to read. However, I find this notation to be very useful for accounting and financial apps, where large numbers are common but rarely used in their scientific form.

Django REST Framework — Programatically get related entity and serializer

Image: infinite realities nexus point by TellOfVisions

Let’s say we have an entity nexus point, i.e. many other entities that have o2m relations to the same entity. Some use cases would be implementing a Note model that relates to Users, Events, Projects etc. Adding foreign keys is fine up until one point, but, since all but one of these foreign keys will be always None, let’s implement a model where the entity is retrieved programatically.

class Note(Date):
    ...
    entity_type = models.CharField(max_length=64)
    entity_id = models.IntegerField()

To get the related Note for a User, we simply filter by Note.objects.get(entity_type=User, entity_id=user.id), similarly for other entities.

But, when retrieving the Note from a REST API, I want the related entity properly serialized, not just the type and id fields. With Django REST Framework, this only takes a few lines. In serializers.py:

from django.apps import apps
from rest_framework import serializers
class NoteSerializer(serializers.ModelSerializer):
    ...
    entity = serializers.SerializerMethodField()
    def get_entity(self, obj):
        pk = obj.entity_id
        class_name = obj.entity_type
        model_class = apps.get_model('app', class_name)
        model = model_class.objects.get(pk=pk)
        serializer_class = class_name + 'Serializer'  # assumes the "ModelNameSerializer" is the name of the serializer class in the current file
        return globals()[serializer_class](model).data
class Meta:
        model = Note
        fields = (..., 'entity', ...)

Using the SerializerMethodField and defining a function prefixed with get_ will trigger the result of that function to be represented in the entity key of the parent NoteSerializer. Inside it, we simply retrieve our related model and serialize it using the ModelNameSerializer. This means a User will always be serialized as a User, an Event as an Event and a Project as a Project.

I’ve spent a couple of hours looking for this solution, so I hope it helps in similar use cases.

Sprintf is your friend

Usain Bolt, famous sprinter. If he were a programmer, he’d definitely be a sprintf-er.

I’ve seen many developers steer clear of the sprintf function. In this short article I will try to convince you thatsprintf helps in many cases, using mainly PHP as an example (but I talk about other languages as well).

Natural readability

The following return statements are equivalent.

https://gist.github.com/calina-c/9fae80512167d4afe7a3ff679c4f96ea

However, by the time you’ve finished reading the second one, you’re probably wondering about correct spacing, coding standards regarding line length and the meaning of life itself.

Conversions and number formatting

The following return statements are equivalent:

https://gist.github.com/calina-c/67180e4a3ba43a122c3d03d5565318b7

However, the latter requires developers to know the number_format function and its parameter list. At least PHP is able to perform conversions, but, for example, Python disallows the following:

return “The number is “ + 2 because it doesn’t automatically convert 2 to the string “2”, which means the following solutions work:

https://gist.github.com/calina-c/5e8c965962bc0240eab516cb7a180670

Considering my first point about readability and how the two solutions scale to more numbers, I will let you decide which is better.

Maintainability of conditionals

Consider the following use case. Upon creating or updating a model in our application, we want to show a message: “Item was created” or “Item was updated.” Here are two solutions, separated by a line of comment characters (“#”). I am using the ternary operator because it’s more concise:

https://gist.github.com/calina-c/5f21c6260b4c4c14aaf099b61d7e20fa

The first solution contains text duplication, which means it is easy to forget to edit both messages. The second option is better because it does not duplicate the text, so modifying the message in just this one place will suffice.

The manageable .gitignore file

How do we keep a readable and manageable .gitignore in a team of many members? The following solution takes advantage of the 3 different files where ignored paths may be stored:

Use the project’s .gitignore for:

Vendor modules, logs, upload folders and environment configuration files. They should be ignored because they are either not relevant, or otherwise automatically managed and updated on other repository clones.

Use your global .gitignore for:

OS or IDE-specific indexing and description files. Your co-workers with a Windows system shouldn’t be concerned with your Mac-related settings (.DS_Store), just as users of Atom shouldn’t care about the .idea folder automatically created by IntelliJ. Adding these to the .gitignore will only crowd the file unnecessarily.

Your global ignore settings usually reside in ~/.gitconfig.

Use the repository exclude file for:

Custom files you use in your workflow but should not be shared to other repositories. E.g. project-specific like documentation-todos.txt, language or framework-specific like*pyc.

Need more help?

Read more and find help generating a .gitignore:

https://git-scm.com/docs/gitignore

https://git-scm.com/docs/gitignore

https://git-scm.com/docs/gitignore

Git checkout to latest branch

I’m embarrassed to have only found out about this cool gimmick. Using the command line, when you cd to a given path, you can always revert to the previous path using cd - , saving precious time in retyping it.

In Git, there’s the exact same use case: switching between branch with longer names can be exhausting for any lazy-typing developer and sometimes you just need to go back, nothing fancier. My motto is “think more, type less”, so I now use (you’ve guessed it) git checkout - to switch to the previous branch. A few microseconds a day stack up!

So remember: git checkout — to switch to the previous branch.

Credits: http://stackoverflow.com/a/7207542/2887012

MySQL UPDATE based on data from different tables

Just like my previous article on the meaning of COALESCE, the following problem is one of those brain teasers that keeps developers busy for a while, frantically searching for answers, trying and re-trying various approaches and then… well, after solving, it seems stupidly simple. This recipe will definitely help those of you tackling delicate SQL data migrations. So, here goes:

Assume your database is going through the following changes: You have a Users table, containing first and last names. You also have Accounts, which up until now only had an AccountNumber (IBAN) and an OwnerId. So now there’s this new column, called Account Nickname, where people will be able to input a friendlier name for their accounts, which is irrelevant to the bank but meaningful to the user. For the moment, you want to populate all Accounts to contain “(FirstName) (LastName)’s Bank Account”.

In some SQL flavours, you can do an UPDATE directly from a virtual table. However, in MySQL it’s a bit different:

UPDATE `accounts`, `users` SET `accounts`.`nickname` = CONCAT_WS(‘ ‘, `users`.`first_name`, `users`.`last_name`, “ ‘s Bank Account”) WHERE `accounts`.`user_id` = `users`.`id`;

Note how both the tables involved are mentioned in the first part of the UPDATE statement and what you’d usually consider JOIN criteria goes inside the WHERE. Neat, huh? Credits for the idea: http://stackoverflow.com/a/12748310/2887012.