Konva – what I learned building rtx – a rich text editor for HTML5 canvas

So I spent some time building a rich text editor as a side project. Here’s some of what I learned which I’m sharing so you can avoid the bear traps and subsequent pain!

Why do it?

Good question. There are a ton of good WYSIWYG HTML editors out there. They’ve been around since at least the turn of the century and they’re so good that unless you were certifiable you probably wouldn’t want to try to build one today.

But the HTML5 canvas is different. Partly because it’s a niche, and because the use cases are a mostly specific to diagrams, charts, etc, means there is a lot less natural ‘fit’ or ‘need’. But are some use cases that could benefit, and anyway, it was a challenge.

What did you learn?

Lol – where to start. Actually my starting point is probably worth a mention. I’ve been working with 2D graphics software as part of my day job for a lot of years. It’s not been full-on and its not been continuous, but I’ve built up what I consider a solid understanding of most aspects. (Composition operations make my head hurt and my eyeballs spin like fruit machine wheels though!)

Specifically, I’ve had to get fairly deep into how text works, at first on Windows API (thanks Charles Petzold and Feng Yuan) , then in PostScript and PDF, and now the Web.

So in the words of Bryan Mills, “I have a very particular set of skills”. Added to that, I have found Konva to be a very useful library for working with the canvas.

To me, Konva is in the Goldilocks Zone of open source libraries, meaning it is alive, generally stable, not prone to going off in weird and unnecessary directions, and not heading for calamitous breaking changes any time soon.

Because of this I was able to produce the rtx as a custom Konva control with no other libs one other lib – Graphemer which is needed for complex multi-character Unicode support (See blog post JavaScript – working with Emoji’s or the café problem). If you adopt rtx, that means no one dependencies other than Konva, and you wouldn’t be looking at it if you didn’t already use Konva, so having no only one additional dependency is already a small win.

Since the rtx augments a custom shape, it means we get all the benefit of all of Konva’s capabilities like the rotation, scaling, the Konva transformer, grouping, etc.

Enough of the Oscar’s speech!

Ok, ok, I get it. So here in no specific order are some aspects of what I learned.

It’s all about Rectangles

Underlying what you are about to read below is the principle that text is all about rectangles. Think about this paragraph. Each letter in each word has its own small rectangular space in which it sits. Each word has a rectangle surrounding the letters, each line is rectangle composed of those word rects, and the paragraph is a collection of line rects.

All of that position and size information is held and managed within rtx to produce what you see. It can also be output if your needs require it.

Text measuring

This is quite tricky. The canvas API gives us TextMetrics which we can use to get information about the rectangular size of a given string with given font characteristics applied. The tricky parts are that this alone does not give us everything we need – we have to use some CSS techniques to get accurate inter-line spacing, and we have to consider kerning or special positioning of specific character pairs like AW. We also have to apply some performance techniques because font-measuring is expensive.

Out of all that we get around 20 pieces of information including the outer rectangle occupied by each character, and its internal spacing for baseline, riser and descender distances, etc.

It’s worth noting that whilst the canvas can output text (obviously!), it does not provide any support for underline or strikeout, which are produced by code within rtx.

Text layout

You might not believe it but this is the easiest part of the task. We have to read a stream of text and calculate where to place each character, by now knowing its characteristics from the step above. Sure, you have to detect when the line is filled, cope with different font sizes, keeping the base-line solid, handle alignment and justification, and special line processing like bullets or padding. But on the whole its an achievable challenge.

Visualizing the rectangles

Keyboard handling – word jumps, text selection, etc

Wow, who knew. We sit there all day bashing the keys on VS Code, WebStorm, Notepad, or whatever other code editors we use, without realizing what’s happening under the hood.

A minute ago we were happy that we we laid out our text giving a reasonably good looking appearance complete with mixed fonts, alignment, etc. Now the mood takes a dive as we realize this isn’t just about getting the output right – we have to handle a caret (no not ‘carrot’, the text insert point thing!) and move it sensibly when the user either touches or clicks the text, or even worse when they use the keyboard arrow keys. And OMG, how to handle that thing where the caret knows its x-position when you key up and down the lines. And then there’s the selection combinations of shift-arrow, and the word jumping combo of control-arrow and even the shift-control-arrow for word selection. And while we’re on that subject, there’s double-click word selection and click-drag text selection too.

Example keyboard caret navigation

Urgh, too many ands!!!

The point is, there’s a hell of a lot going on in the interactions of keyboard, mouse, and the text we see in front of us, and the rtx MVP needs most if it.

To make all that even possible we have to know exactly where every character is located on the rtx control – they each have their own little rectangle in 2D space. And we need to develop and algorithm to cover how the caret should move based on all those events. Ok (I hear you say) – that’s easy, knowing all the character rects means that we can park the caret on the left side of the current character. And right-arrow means switch location to the same position on the rect of the next character. Simple! But, I say, what happens at the end of the line? Try it in your editor and notice that at the end of the line there’s an extra caret stop position. Hmmmm, not quite so simple. In fact it is quite complex. It took me a few go-arounds, and I still think there is room for improvement.

And as for mouse click-drag selection! That one involves all kinds of edge cases for when the click is not ‘on’ the text itself. Click above the text, click below the text, click outside on the left or right – they all have different connotations. And we take it all for granted because we are all so familiar with the modern editor experience.


Cool – we got something working for arrow key and mouse navigation. We can move the caret where it needs to go. Phew. Oh, hang on, what about when the user wants to select a character or word so they can cut/copy/paste/change font or size or color? Now we need to handle the visual indication for the current selection, and react to whatever the user wants to do.

Actually selection was pretty easy after setting up the mechanism for handling caret positioning and movement. There are a few spooky cases like when the user hits the left or right arrow keys the caret ‘comes out’ at the same side of the selection. But overall it is straightforward.

The caret flash

Search the Konva docs for ‘flash’ and even Algolia can’t help you. But the standard metaphor of a flashing caret is expected as it draws our eye to its location easily in a potential screen full of dense text.

The caret at work

We know that the HTML5 canvas does not have an object model. If canvas had a DOM like HTML we could just flip the visibility of an absolutely-positioned pipe or similar, and we would have a flashing caret. But it doesn’t, and even though Konva gives us a kind of object DOM, there’s till an issue because using Konva, we can’t easily isolate a part of the canvas and update only that piece. This can be done in pure JS & canvas by setting a clipping area around the caret, but I’m using Konva!

Here’s the nub – the essence of the flashing mechanism is simply a setTimeout function that shows or hides a line – nothing specifically challenging there. But each time that happens, because it changes the custom shapes visual contents, it forces ALL of the Konva shapes to redraw. If you think about it this is sensible for Konva to do – the shape that just changed could be overlapping other shapes

If the caret flashes once a second, that means you get an automatic redraw of the entire canvas contents once a second, on top of whatever your app actually needs. Rtx has an optimization in place for this that stops the entire refresh, but again its something I want to revisit.


Every modern UI needs a do-undo mechanism. In previous development projects I’ve concentrated on getting something visual out on the canvas and only later gone through an expensive refactor to add this feature. We can all add a shape on the stage and make if draggable or scalable. To make it undo-redo-able needs more introspection.

In a nutshell, every time you write code to add a shape to the stage you need that code to be in a function into which you pass the parameters of the shape. And when you call it you also save those characteristics to the undo-stack. Then you add buttons and code to ‘play back’ the parameters you stored in each step saved in the stack. You now have the essence of an undo process, and going back up the stack of changes gives you the redo process.

The details of how you execute each step of the stack might be more complex, but it works.

The challenge for a text editor that treats every character as a discrete entity that can have different characteristics from its neighbors is the sheer volume of information that can be generated in the undo stack. I initially assumed I would use array manipulation – keeping each character object in an array representing the characters from the beginning to the end of the document. But I quickly switched to a chain-based approach when I realized that insert an deletion operations, though entirely possible, were becoming increasingly complicated to handle purely with an array.

Illustration of character chain for a word before and after deletion of 2 characters.
Illustration of character chain for a word before and after deletion of 2 characters.

The chain-based approach also allows for erased characters to stay in-place within the character stream with the remapping of pointers to previous and next characters giving good performance and reducing the overhead of inserting and removing array entries in large volume. When deleting or inserting, all we need to store in the undo stack are the previous and next chars and their pointers before and after the change which is a massive reduction in volume and therefore memory consumption.


Talking of performance, internally there are many small calculations happening each time the canvas is refreshed. Because of Konva’s excellent caching feature, we can cache the rtx in which case it gets a big performance boost, but will need to be fully refreshed on any zoom operation.


The design goal for rtx was to produce a control that works by augmenting a Konva custom shape. Rich text editors are commonly seen with a toolbar full of buttons and dropdowns that follow the usual editor-control paradigm. In my case, though I provide a primitive version of this in the demo code, I do not provide a production quality version of that tool bar. Instead I assume that the developer using the rtx control will expect to have full power over how the rtx is driven without suffering my opinion of how a toolbar should look.

For changes to the text and related items, the rtx is controlled by an events API. The developer’s code sends it an event message and it applies the change as requested. We might be adding bold to a selection, aligning the text, or pasting some. The mechanism is always the same, though the contents of the message we send vary depending on the exact action we want to apply.

A sample API message for enabling bold on the current character is shown below. The configuration object contains the name of the item being targeted, the attribute, and the new value.

    name: "font",
    attr: "fontWeight",
    value": "bold"

In terms of feedback, the developer can register callbacks to receive information about the last event, and the characteristics of the current character or text selection. The latter is useful for modifying the information in the UI, such as the font size or name indicators.

Example of selection feedback shown in the browser.
Example of selection feedback shown in the browser.

Negatives / gaps

There are a few points to note regarding issues:

  • Accessibility – screen readers and similar devices have been developed to read HTML structure and sound out what they see. They can be very spphistcated and understand the subtley and flow of the HTML document. The rtx is entirely pixel-based, and has no structure outside of the canvas. Work must be done to enable accessibility.
  • Compactness of transport – currently a text definition for use by rtx requires a bulky JSON object describing the various ‘styles’ in use in the text, and the text itself as a string. The styles are referenced onto their appropriate characters by index position of the character in the string. This needs improvement to make the rtx a better web citizen that is able to share data in a more ‘standard’ and readable HTML markup with CSS classes for compactness. The current approach is usable but combersome.
  • Cut & paste – work needs to be done on cut & paste from office documents and other sources that will require intervention to sanitize the text information within. Cut & paste within the rtx itself also needs review.


I’ve explained the motivation for producing a rich text, WYSIWYG editor for the canvas using Konva. We’ve looked at some of the internals, the hurdles and their solutions. There’s a ton more lower level details and techniques that I shall forgo for fear of the audience falling asleep. I want to extend my thanks to all the folks on the Konva Discord who threw in suggestions and encouragement along the way. It’s not at v1.0 yet, but it’s getting nearer.

Thanks for reading.

VW. March 2023

Photo credit: Bruno Martins and unsplash.com

2 thoughts on “Konva – what I learned building rtx – a rich text editor for HTML5 canvas

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: