Is Microsoft Word LegalTech?


Reading Time: 7 minutes

One of the advantages of word processing is the ability to conveniently think as you type, to format, correct, and crucially, handle large amounts of information. Questions of whether that has detracted from quality of thought aside, could you imagine being a corporate lawyer using a typewriter? Maybe the generation of lawyers before mine can. I have never done corporate work, but imagining the memos I have written being done on a typewriter instead is enough to bring on a long, lonely shudder. With that, we have a basic definition of what LegalTech involves: any technology which improves, or is assumed to improve, the practice of law.

The upshot of this convenience is a massive uptick in the production of information, and conversely, in the rate of processing and understanding such information (the two processes being distinct things: we all know the feeling of having read an entire page without comprehending anything). In short: more work. And too much information. Information, especially of the noisy variety, as opposed to signals, are like sugar and everyone knows the effects of taking too much sugar – cavities are the least of your worries. We have not yet adopted the tools to deal with the second-order effects of information overload.

That was what that first wave of LegalTech wrought on legal practice, even as word processing brought unprecedented marvels to numerous text-based industries. Microsoft Word evidently is a form of LegalTech, but we need even more LegalTech. What form should it take? I don’t have a citation for this, but I would hazard a guess that the length of contracts, especially corporate ones, has gone up exponentially. If I find any study demonstrating this, I’ll be sure to post again. The same can be said for any legal product, whether it is intermediate – produced for internal consumption in the course of putting together the final product – or the final product that goes out to the client. And the same for e-mails. And the same for meetings. Presentations. Conferences. Journal articles. Cases. The list goes on. Copy-paste and tons of reading will not cut it.

Before going on, one thing needs to be clear: the relationship between complexity, complicatedness, quantity and quality of legal products can be unclear. Complexity and complicatedness are different things. My own take is that complexity is where one observes emergent behaviour: when something is greater than the sum of its often simple parts. Areas of law can be complex; simple changes to rules may give rise to totally unpredictable behaviour or changes. But the law can also be complicated; procedural rules can be labyrinthine and complicated without being complex – if you know the rules, you know the rules, to put things colloquially. Predictability is a function of how familiar you are with the rules. How these relate to quantity and quality is yet another issue. Lawyers, on the most part, treasure conciseness (or concision, depending on which camp you belong to) and precision. But you can be concise, precise in a complicated way, without getting to the heart of a complex issue. A complex contract may be an ingenious arrangement that avails parties of possibilities within a regulatory environment that would otherwise not have been exploited, or one that reveals its ingenuity as its simple elements adapt elegantly to myriad situations. But that is not the same as a complicated contract – a complicated contract just needs time and effort to work through. It’s more likely to evoke tedium.

Why is this important? Long and complicated tasks take up time; it’s a question of allocating scarce resources. Most legal work is long, or complicated, or both. If you treat lawyers as mechanical, expendable units of production, this is not a problem. If you treat lawyers as fundamentally human, it’s a problem. A huge one. And one the legal industry, and society, is continuing to pay dues for. What resources remain for the often simple but complex questions? Like how do we increase access to justice?

LegalTech promises to be able to solve this, including increasing access to justice[I]Both surprisingly and unsurprisingly, access to justice is hallowed but significantly under-resourced compared to its corporate cousins. This is lamentable, but solvable.. Coupling the narrow intelligence of AI, the wiles of fiercely intelligent programmers and computer scientists, with the knowledge and experience of well-versed legal practitioners is a recipe for innovation and improving legal practice. I have my doubts about whether this is completely the case. There are pockets of this happening, and certainly, economic figures to show both the tremendous potential of the industry, and the profits currently accruing. What I really want to see is for the public space to be revolutionised. I don’t mean an evolution in legal practice; I mean a revolution.

You could say that I want to be put out of a job to some extent, at least as far as lawyering as we know it now is concerned. The practice of law needs to advance in that manner, not only to realise its potential for the public good, but also because not advancing so would severely detract from the public good.

As it is in law, so it is more generally in life: we have not developed effective means to live well in a digital and information-overloaded world[II]Talking about means, let’s talk about financial ones. It’s interesting that the sophistication of financial instruments far out-strip the sophistication of our understanding as to how … Continue reading. I don’t mean well in the self-help, hokey, airy-fairy, go to Bali, conduct geographical arbitrage, practise some yoga and live free sort of way – not everyone can or wants to, amongst other important socioeconomic reasons. I mean it in the way that thinkers, both academically inclined and not, have thought about for millenia. Aristotle called it eudaimonia, and he had a lot to say about it. A lot of moral philosophers have a lot to say about it too. Explaining it would be beyond the purpose of this post, but let’s just say recent movements, exemplified by Elizabeth Gilbert’s Eat, Pray, Love, discussions on work-life balance, Jenny Odell’s How To Do Nothing, and on ‘unplugging’ are all attempts at the question. Actually, The Good Place, and its resident moral philosopher’s Youtube features are pretty good places to start.

Again, there are pockets of this: DocAssemble, A2J, Law for Good, LegalHackers and its constituent chapters – all very important and crucial grassroots initiatives, and people thinking about how best to allow other people to navigate the legal and information landscape. I think we need to go even further. If you believe in LegalTech, I am preaching to the choir. We need to go beyond the Choir; we need to go beyond the hymns it is singing.

It’s been almost 5 years since I read Richard Susskind’s Tomorrow’s Lawyers, and I felt this sheer sense of falling off a cliff, and behind a curve that I hadn’t even started on (I had just finished law school). I still think the intuition behind that is true. In the spirit of that, I thought that it’s time to do my part for increasing awareness: natural legal language processing.

More specifically, I thought it would be interesting to cover rudimentary contract comparison using Natural Language Processing (NLP) techniques (one way of putting it: AI applied to human language). Imagine that you had any number of new contractual templates to compare to a baseline that you or your organisation already has. They may differ in the following ways: i) the clauses might be arranged in different themes; ii) the same obligation, while identical in substance, might be split across different clauses; iii) the new clauses might cover more ground (cover a greater range of risks, for instance); or iv) the new clauses might cover even less ground. This is not exhaustive.

Several tools are available to you. Carry out a document comparison using Microsoft Word; see it in tracked changes. Line up the clauses, and compare. Achieve familiarity with all the clauses.

Using NLP, what might be useful is this:

  • Step 1: Line up your ‘baseline’.
  • Step 2: Line up all the clauses, from each new template, that are most similar in meaning to each clause in your baseline.
  • Step 3: Note if they fall below a certain similarity threshold.
  • Step 4: Keep a list of the next most similar clauses, so you have a starting point for which language to start looking at.
  • Step 5: Generate a Microsoft Word document, with a table allowing you to compare all of these.
  • Step 6: Profit. I mean in an access to justice way, as naive as that may sound.

Of course, there is more work to be done. The tables might need to be adjusted. Or the headers are not quite right. This would still have taken out a lot of the initial grunt-work.

The heavy-lifting here is done by language modelling. Language modelling is the concept that computers can use various indicators to understand how language is used, or is meant to be used. Some programs might look at the occurrence of a particular character from the end of a sentence. Others may look at the occurrence of words surrounding a particular word. Others may be given the same problem as you used to get in primary school: the cloze passage. And since it’s a computer, instead of getting it to read just left-to-right, have it read right-to-left as well. It’s very interesting stuff, and has made huge strides in recent years. For a plain language article about it, check this out.

A computer can then ‘measure’ a given word, sentence, or document using several indicators. The computer gives whatever building block you’re using a numeric measure for each of these indicators (called ’embeddings’). And using yet another number, you can approximate what all these measures mean for a particular chunk of text, and compare it to another chunk of text (using what is called cosine similarity).

If that interests you, all you need is a .csv – comma separated file, containing granular information about each clause across your baselines and templates, e.g. clause number, what header it comes under, the clause title, the substance, etc.; Python; and time to just experiment.

I will post the Python notebook and sample file in the next post. In the meantime, please do get in touch if you would like to discuss this, or let me know of any errors!

[updated 9 April 2020]

Thank you to Alex Woon, Rene Jeyaraj, Luke Wu, Jerrold Soh, and Stella Cao for comments, insightful discussions, and/or reading drafts of this post. Errors here are mine, and mine only.

References

References
I Both surprisingly and unsurprisingly, access to justice is hallowed but significantly under-resourced compared to its corporate cousins. This is lamentable, but solvable.
II Talking about means, let’s talk about financial ones. It’s interesting that the sophistication of financial instruments far out-strip the sophistication of our understanding as to how best to live harmoniously with the world we actually live in. For instance, I have wondered about the question of what value the stock market reflects. If the value is not based on some real, tangible thing that we can take from the earth, but we are given more money based on our adding some ‘value’ to the economy, should be able to extract more from the Earth? Another way to frame the question is to replace ‘sophistication’ with either ‘complexity’ or ‘complicatedness’, and to think about how that changes the contours of the answers. As yet another interesting side-note, it is interesting to observe who benefits most from increasing the sophistication of financial instruments, and who benefits the least from it, at the moment.