Usage

View Decorator

If you have an existing django view that you want to render as a pdf, you can use the decorator:

from alliance_platform.pdf.decorators import view_as_pdf

@view_as_pdf()
def my_view(request):
    return HttpResponse('Some content')

Here, an existing function view is wrapped with the decorator. By default, the decorator will only render the view as a pdf if it sees a query parameter called pdf=<value>, and value evaluates to true using strtobool(), e.g. http://foo/pdf-view?pdf=true.

You can easily override this behaviour by specifying your own query parameter name to look for:

@view_as_pdf(query_param_test='my-custom-parameter')
def my_view(request):
    return HttpResponse('Some content')

Which will return a pdf if the request looks like http://foo/pdf-view?my-custom-parameter=true.

You also pass a custom function that takes the request and returns whether to render based on some other condition:

@view_as_pdf(query_param_test=lambda r: r.META.get('X-Custom-Header'))
def my_view(request):
    return HttpResponse('Some content')

When a page is rendered, it will usually initiate network requests for additional resources (images, javascript files, ajax requests). These are usually handled directly by this app: static files and media files are served directly from the filesystem, and Django requests are manually processed through the basic Django stack (e.g. these requests don’t trigger actual http requests to your webserver in order to avoid potential deadlocks on a loaded web server).

For requests that are handled using the Django stack, unless explicitly controlled using the options available (see request_headers, extra_headers below), every request generated by chromium (and this includes the request for the initial page render view) will be supplied with headers extracted from the incoming Django HttpRequest META dictionary (note that this includes more than just the HTTP headers, e.g. SERVER_NAME, DJANGO_SETTINGS_MODULE, the process env variables, etc). Only those keys whose values have type str are extracted from META and set on the internal request object. Importantly, the HTTP_COOKIE header will be passed through, which means the session (and therefore logged-in user) will be preserved on these requests.

However, if you want your rendered pages to be allowed to fetch from arbitrary network locations, you can pass the parameter pass_through=True to the decorator. When this is True, we will first attempt to handle the request internally, and if that is not possible, we will allow chromium to handle the request as normal (when this setting is False - the default - we raise an exception for any un-handled requests). Note that this is generally discouraged: slow network requests to other web servers may cause your PDF generation to time out.

Here is an example that requires pass_through in order to render an image from an external location:

@view_as_pdf(pass_through=True)
def pdf_view(request):
    return HttpResponse(
        '<html><body><img src="https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png" /></body></html>')

Here is a more complex example, that renders a template view as a pdf, and overrides the url path that will be seen by chromium when rendering that view (this allows you to specify the route that will be rendered by single page apps for example):

urlpatterns = [
    url(
        'pdf-route',
        view_as_pdf(optional=False, url_path='/admin/users/')(
            django_site.views.FrontendView.as_view(
                site=admin_site,
                basename='admin',
                entry_point='admin',
            )
        )
    ),
]

In the example above, we set optional=False so that this view always renders as a pdf, and have used the url_path argument to make sure when the single-page-app renders, the url that it sees is /admin/users, which will trigger the rendering of a particular route when the javascript on the page is executed.

If you need to set extra headers on the requests that are triggered during rendering you can do so with extra_headers:

@view_as_pdf(extra_headers={'X-Custom-Header': 'something'})
def pdf_view(request):
    return HttpResponse('content')

If you need to make sure only an explicit set of headers are present on these requests, use request_headers:

@view_as_pdf(request_headers={'X-Custom-Header': 'something'})
def pdf_view(request):
    return HttpResponse('content')

Manual Rendering

The other main function provided by this app allows rendering html directly to pdf:

from alliance_platform.pdf.render import render_pdf

render_pdf(html='<html><body>Content here</body></html>', pass_through=True)

This function can render either the provided html (which can then trigger additional network requests which are handled as above), or a provided url:

render_pdf(url='https://www.google.com', pass_through=True, request_handlers=[])

Here we make sure that there are no request_handlers (see below), so that all requests are handled by chromium (since the source url is external).

You can use this to render a PDF and then send it as an attachment:

pdf = render_pdf(
    request.build_absolute_uri(reverse("demo_app:restaurant_menu", kwargs={"pk": self.object.pk})),
    request_headers=extract_request_headers(request),
)
message = EmailMessage(
    f"Menu for {self.object.brand.name} {self.object.name}",
    "Here is the menu you requested",
    "demo@examplesystem.com",
    [email],
)
message.attach("menu.pdf", pdf, "application/pdf")
message.send()

This example renders from a URL and passes the headers from the current request.

Request Handlers

The network requests that are triggered during rendering are typically handled internally by this app (except when pass_through is set, as mentioned above). The code that is responsible for handling these requests is in request_handlers. By default, the following request handlers are used by render_pdf():

[
    StaticHttpRequestHandler(),
    MediaHttpRequestHandler(),
    DjangoRequestHandler(),
    # The default includes domains set in the `alliance_platform.pdf_WHITELIST_DOMAINS`
    # When DEBUG is enabled also includes the Vite dev server URL
    WhitelistDomainRequestHandler(whitelist_domains)
]

WhitelistDomainRequestHandler: this handler allows any requests to the specified domains. This is useful for known external resources - eg. external CSS fonts, images loaded from S3 that sit outside the media directory etc.
StaticHttpRequestHandler: serves Django static file assets through direct filesystem lookup
MediaHttpRequestHandler: serves Django media file assets through direct filesystem lookup
DjangoRequestHandler: processes any requests that are for the same url as the initial page url (the url parameter to render_pdf()) as Django requests (runs them through the Django stack).

Note that these handlers are attempted in the order provided, so order can be important.

When pass_through=True, this is handled internally by adding a PassThroughRequestHandler to the list of handlers, in the last position.

There is also a CustomRequestHandler available that can be used to return arbitrary responses to specific urls:

handler = CustomRequestHandler({
  'https://foo.com/bar': {
    'body': b'Some response content',
    'headers': {
        'X-Custom-Header': 'blah',
    }
  }
})

Internally, when render_pdf() is passed an html argument, it handles this using CustomRequestHandler.

If you need to create your own request handler, be aware that since the pdf render script normally runs as a sub-process, the base RequestHandler class contains methods for serialize() and deserialize(), which are used when passing these handlers to the sub-process. See CustomRequestHandler for how this is used.

General Notes

Process model

The default rendering mode is to run pyppeteer in a sub-process. Pyppeteer will by default install signal handlers to clean up chromium processes, and therefore expects to run in the main thread. Since all the request handling is done in the sub-process, request handlers are serialized in the calling process and de-serialized in the sub-process. So be aware that if you create your own request handler sub-class, it will execute in a different environment (different process, different python interpreter) to the calling environment; in particular, global variables will generally not be available.

A second rendering mode exists for some advanced use-cases, where the pyppeteer render script is run in the calling process (the run_as_subproccess argument to render_pdf() controls this). This mode is currently only used to enable testing, and you probably shouldn’t be using it in normal use, unless you know what you are doing.

Knowing when the rendered page is ‘finished’

The pdf render script is configured by default to consider a page ‘finished’ when two conditions are met:

when there are no more than 0 network connections for at least 500 ms
and window.__PAGE_RENDERING_FINISHED is truthy (see pageRenderingObserver.ts)

For this second condition, it will wait (by default) up to 10 seconds for this to become true. After that time (if the condition is not met) it will continue to render to PDF regardless.

You have some options in view_as_pdf() and render_pdf() for controlling this second condition:

turn it off: page_done_flag=None
change it to some other variable: page_done_flag="window.MY_FLAG"
change it to an arbitrary JS expression: page_done_flag="window.MY_FLAG === 4.5"
change the maximum time spent waiting on this flag: page_done_timeout_msecs=2000 (wait up to 2 seconds)