ABCpdf Classic ASP Reference Guide - HTML / CSS Rendering

HTML / CSS Rendering

Intro

ABCpdf fully supports HTML and CSS.

You can render individual pages of HTML using the AddImageUrl method.

You can page HTML over multiple PDF pages using the AddImageUrl method in combination with the AddImageToChain method.

ABCpdf allows you to treat HTML like any other media so you can even page your HTML across multiple columns of multiple pages of your PDF.

Method

HTML was designed to specify the meaning of document content and leave the precise rendering and layout up to the browser. PDF was designed to specify the appearance of a document and ignore the meaning of the document content. HTML and PDF are fundamentally different.

HTML is being changed to allow greater control over the appearance of a document, and PDF is being changed to allow the meaning of a document to be better represented. However, the fact that the two specifications are based on diametrically opposing concepts does mean that it can be difficult to convert between the two.

ABCpdf uses the Microsoft Internet Explorer HTML engine to parse and preprocess the HTML for insertion into your PDF. This provides an extremely accurate rendition of the HTML.

However, this method is not suitable for every situation. Interaction with the rendering engine is only available via COM, and while ABCpdf takes great care to isolate itself from COM dependencies, rendering a page of HTML is not as fast as building an equivalent page using native ABCpdf rendering methods.

Cache

ABCpdf holds a cache of recently requested URLs, and it's only after five minutes or so that these pages expire from the cache.

This results in a considerable degree of optimization for many common operations. However, if you wish to bypass the cache, you can do so by setting the DisableCache parameter to true when you call AddImageUrl or AddImageHtml.

Occasionally, you may find that your page is being cached elsewhere. There are all kinds of places this can happen. For example, Windows sometimes caches individual page resources. Proxy servers may cache entire pages.

If you want to be totally sure that your URLs are rendered afresh each time, you need to vary the URL. For example:

http://www.microsoft.com/?dummy=1
http://www.microsoft.com/?dummy=2
http://www.microsoft.com/?dummy=3

These will all render the same page (www.microsoft.com), but because the URL is varying, you can be sure that they will be rendered afresh each time.

Caveats

You can render any page you can supply a URL for.

When you render a page, the page has to be reloaded by ABCpdf. This is because you - as a client - are looking at the page from your current machine. ABCpdf lives on the server and so it exists in a different session.

So you cannot generally rely on cookies, session state or form submission in your page. The page must be reliant only on the URL you supply.

If you have to rely on session state, you could use cookieless sessions (which will give you a URL for your session), or you could save the session information under a specific unique ID, then pass the ID via the URL, and pick up the information via your server-side code.

If you are using Authentication, you should provide a logon name and password in the ABCpdf HTML Options. Problems which appear to be related to SSL or HTTPS connections are often authentication issues simply solved by providing a user name and password.

Size

Screen resolution is typically 96 dpi. So when you view an HTML page on your monitor, Windows will display it at 96 dpi.

The disparity between the screen resolution and the PDF 72 dpi resolution means that HTML appears larger in print documents than it does on screen.

You will need to apply a scale of 72/96 (0.75) to compensate for this if you want both to appear the same size.

For example, if you are rendering a web page supplying a value of 800 for the Width parameter, you will need to set the width of your Rect to 600 if you want both to appear the same size.

DPI

PDF documents are predominantly vector-based. As such, they do not really have a dpi because they are resolution independent. The only portions of PDFs which are raster based are images.

Most elements of HTML - text, lines - are vector based. So they are resolution-independent.

The resolution at which images in your web pages are rendered is complicated. Suppose you have a 300 square image referenced by an image tag. If the width of your Doc.Rect is the same as the width you pass to AddImageUrl, then this will be rendered at 72 dpi. However, by changing the ratios between these two values, the image will be scaled and hence the resolution will be changed.

And... if your 300 square is in an img tag with a width and height of 150, then the default resolution will be doubled.

Breaks

ABCpdf uses a sophisticated set of heuristics to determine where to break pages. For greater control over page breaking, you can use the page-break-before, page-break-after and page-break-inside CSS styles.

You must ensure that the element to which you apply your page breaking style is visible. For example:

<div style="page-break-before:always"> </div>

... will break, but ...

<div style="page-break-before:always"></div>

... will not.

Useful Tip. Debugging page break styles.

Sometimes, your page breaks don't work in they way you think they should. Because these kinds of tags are invisible, it's very difficult for you to know whether you've applied them correctly or not. One simple solution is to debug your HTML using a visible style.

For example, when you apply your "page-break-inside: avoid" style, apply a right border style at the same time. That way, you can see exactly where your elements are. If the borders don't appear in the right places, then you know there's something wrong with your HTML.

Snapshot

You may wish to take a snapshot of the current URL.

In many circumstances, you should be able to derive a URL for the current page using the value of the SERVER_NAME, URL and QUERY_STRING Server Variables. You should be able to derive a URL for the previous page using the HTTP_REFERER (sic) Server Variable.

When you perform this kind of operation, be careful not to recursively call ABCpdf. If you do this, then you will get into a hall-of-mirrors type situation, and the software will not be able to return you a sensible image.