Skip to content


Aerial view of cargo ship and cargo containers in harbour (source: iStock by Getty Images)
Aerial view of cargo ship and cargo containers in harbour (source: iStock by Getty Images)

Our research and development work

While most of our research and development (R&D) work is confidential and covered by NDAs with our clients, some of the work has been released for society’s benefit into public domain.

A Compendium of Data Sources for Data Science, Machine Learning, and Artificial Intelligence

AUTHORS: Paul Bilokon (Imperial College London, Thalesians Marine Ltd), Oleksandr Bilokon (Thalesians Marine Ltd), Saeed Amen (Cuemacro, Thalesians Ltd)

DATE: 12 September, 2023

ABSTRACT:Recent advances in data science, machine learning, and artificial intelligence, such as the emergence of large language models, are leading to an increasing demand for data that can be processed by such models. While data sources are application-specific, and it is impossible to produce an exhaustive list of such data sources, it seems that a comprehensive, rather than complete, list would still benefit data scientists and machine learning experts of all levels of seniority. The goal of this publication is to provide just such an (inevitably incomplete) list – or compendium – of data sources across multiple areas of applications, including finance and economics, legal (laws and regulations), life sciences (medicine and drug discovery), news sentiment and social media, retail and ecommerce, satellite imagery, and shipping and logistics, and sports.

Keywords: Artificial intelligence, AI, machine learning, ML, data science, datasets, data, alternative data

Available on SSRN:

Oleksandr Bilokon (left)
Oleksandr Bilokon (left) paints the eyes on the lions as part of an opening ceremony in Shanghai.

Interested in collaborating?

Are you interested in collaborating? Please contact