Skip to Main Content

Text and data mining

TDM, copyright and AI

Source data - copyright

Source of the data
Unless you have produced the data, it will be protected by copyright law and needs either permission from the rightsholders, or it should fall within the copyright legislation exception.  Presuming Public Domain materials are not being used.(See sidebar.)
Under UK law there is a copyright exception that enables TDM for non-commercial research.

The key points are:

  • If you have lawful access to a work you may copy it to carry out TDM.
  • Contract terms aimed at preventing researchers copying in order to perform text and data mining are unenforceable.
  • The research may be funded by a commercial partner but the purpose must not be commercial.
  • The copied work(s) must not be made available to anyone else without authorisation from the rights holder. 
  • You must cite the source of the data, unless making this acknowledgement would be impractical.

N.B. The term non-commercial applies to the purpose – not the person or institution carrying out the project. 
There were plans to extend the TDM exception to allow TDM for any purpose but these plans were withdrawn in March 2023 and the UK government is struggling to pass further legisltion. Watch this space!