Why I love the law (and lawyers)
It has always been of interest to me how the job of a lawyer mirrors that of a software engineer. On the face of it they seem entirely different professions: the sharp-dressing besuited, serious explainers of a complex system of rules on the one hand, and ill-groomed, care-free implementers of a complex system of rules on the other.
But besides the social stigma we currently apply to each group, if you poke under the surface it is actually very similar! Software, as the law, runs on a set of rules/commands, and both have their own respective languages that are largely impenetrable to outsiders. To excel in either you need a keen mind that can cut to the heart of a difficult problem and come up with a solution that satisfies a number of requirements. Logic, following long trains of interrelated ideas and a sheer-minded bloody determination are all key characteristics.
So when it comes to replicating the mind of a lawyer, sometimes it takes someone who is close but distant to do the best job.
We have been trying to understand the legal process for years at Lexical Labs, and in this whitepaper we hope to explain how we see the basic tasks of the legal trade when it comes to contract review and negotiation, and how they can be modelled and automated.
As a summary, there are several key parts to reviewing a contract:
Let’s get into the details of how computers can be made to do these things:
When a lawyer reads a contract, they read it at multiple levels.
It is interesting to understand how a computer which does not know anything about a contract could understand similarity between clauses that have no actual textual overlap. This is achieved by the mechanism of clustering. Clustering groups similar items together, e.g. if you had a bucket of balls you could group them together by size or colour or material. Tennis balls are of a certain size and tend to be green, and have a felt surface – but you can also get giant novelty tennis balls, and purple tennis balls. Footballs and volleyballs tend to be of a similar size, and are differentiated by their stitching.
Depending on the quantity of balls you have, the more detail you could go into – if you only have 10 balls you might just care about the kind of ball you have, if you have 100,000 balls you may well be more discerning about the quality of said balls, for example, grouping tennis balls into ones designed for certain clay vs grass courts.
Similarly, you can group sentences together using different methods. A simple attempt might be to group sentences with words that overlap. More complex approaches might still use words but also look at their word position (whether they are subjects or objects of a sentence, or subordinate clauses). However this won’t help when synonyms are used. Like an ignorant human, a computer can consult a thesaurus (eg WordNet) and understand whether two words are similar or not.
More recent technological breakthroughs such as Word Vectors collapse a word into a long series of numbers. Words that have similar numbers are considered to be closely related, e.g. King and Queen have similar numbers, as do “son” and “daughter”.
Using this approach you could group a number of similar documents into a number of similar clauses. You could even attempt to name these groups by using the most unique words in each group (the idea being that if it is used in one group but not another it should be a good signifier of the group’s lexical intent)
Then when you see new documents you could look at each sentence and see what group it most resembles and thus “classify” a document.
If you have some legal nous, you might be inclined to apply some of it to the problem directly. Instead of having a computer make groups, you could make them. You could find a sentence in a contract that is a key area of import, and repeat the process in a few other documents. You could then tell your computer that these sentences all achieve a similar purpose and tell it that if it sees another one, it should apply some lawyer-assigned label.
This is known as “training a classifier”. Many kinds of classifiers exist and do things with increasing levels of subtlety. A basic one might just see if the words are similar without regard for the order (similar to our basic clustering algorithm above). The state of the art now looks at the order of the word in relation to every other word, the part of speech of the words and will use Word Vectors to calculate similarity of meaning.
Making it real
So let’s say we now comprehend what clauses are in a contract. We can now ask, whether it is in our favour or not. Certain types of clauses just by their existence may be unfavourable and we may want them removed. Other clauses we might always want included. Most of the time though there will be some level of tolerance, and it must be determined whether a clause is favourable or not.
We may have generated our classifiers to only find clauses when it is unfavourable to us. This is a less than optimal method for transferability. It may be only on one kind of contract, or with a category of vendors that a clause is intolerable, so if we have trained it to trigger only when it is unfavourable, we won’t know whether it exists at all. So it’s better (and easier to train!) a classifier to find a kind of clause generally, then specify in another way whether it is actually acceptable or not.
A clause may be unacceptable for a number of reasons: it may be well drafted but a time period in it may be too long or a liability cap too high. It might be fine as an obligation for a vendor but not for a customer. It could be good (or bad) if there is another clause also in the contract. It could be a problem depending upon the scope of a referenced defined term. It may happen that it is agreeable as a right, but not an obligation.
The solution then needs to be matched to the particular issue you have. A time period can be extracted from the contract and you can set up acceptable bounds for it. Through Natural Language Processing techniques you can pick up who is giving the obligation and relate them to the parties of the agreement (whether it is by party name or the role of the party). A legal tech expert (who straddles the worlds of legal knowledge and AI technology) can then easily put these rules together behind a set of issues that describe the requirements of the user.
Finally, once you understand that there is a problem with a part of a contract, how do you then amend the contract in a mutually acceptable way. There are a few ways to do this, from simple process-based methods to approaches using modern AI.
The simplest – and least technical – way is to have an already setup set of favoured language and acceptable fallback positions. When an issue is triggered, you can then explain what the precise problem is and provide text that the user then would need to customise for the contract.
A slightly more intelligent route would be to make this text configurable so that key terms or amounts could be manually or automatically synchronised so they match the contract.
Given an already negotiated contract bank, you could mine this data so that commonly negotiated clauses were identified and if a clause resembles a negotiated contract’s original text, then provide the outcome as a suggestion.
More recent technological developments such as generative models like GPT from OpenAI which let you train a model with all of your previously worded clauses and ask it to generate a given position. It’s not perfect for generating all-complete legalese yet (it is much better at free-wheeling prose), but it is a glimpse into the future.
As you can see, the process of reviewing a contract is quite an involved process, but there are approaches to emulate each step and combining them together could give you a lawyer in a bottle. Of course, a lawyer isn’t just someone who reviews contracts, but also understands what the general position of a client is, what is special about a specific circumstance, and has a wealth of experience to guide and protect their clients from pitfalls.
We can use an understanding of the contract review process to do the heavy lifting on much of the standard legal work, and allow lawyers (and other people in an organisation who regularly deal with contracts) to achieve their ends in a fraction of the time.