Could you tell us a bit more about that data mapping project? What's the scope of it? Where does it cover and so on?
It rather morphed throughout the project. So at the beginning, we were we were aiming on just looking at a particular region – the North Sea region, its context within Europe for content and application creation. We started by looking at the film, television and games sectors but we were also trying to look at all of the creative industries as much as possible. It grew as we realised that the big challenge is that there's a lack of concurrency between the physical and the digital representations of companies. So it wasn't as easy as to say, well give me just the phone book of companies and businesses for North Sea Region, Europe. It doesn't exist. We had to move to a more global scale to be able to look at the European area we're interested in. So the scope ended up with generating a global map of content creators and more specifically developers of video games and immersive media.
And how did you approach the whole data mapping exercise? Where did you even start?
Well, we knew it was going to be a challenge. It was initially planned as a desk research project to search for and identify companies, their locations and their activities. But it became immediately apparent that we really needed a more ‘big data’ approach. So where we started was with building some data scraping tools to collect product information from existing storefronts, including for example the games industry.
We did initially start looking at film and television as well. However, there's there was a challenge there around the closed market for that data. And it wasn't easily accessible. The large data providers like IMDb, for example, stopped making their data readily available except for large fees, upwards of tens of thousands of Euros. Their definition of academic research appears to b extremely limited. We were also trying to create a more standardised approach that we could use on multiple different sites to get a more complete view rather than just looking at one single data provider.
This sounds like an absolutely huge task. Beyond new charges for data, there must have been some real challenges along the way?
Yes, there were several challenges. We went through about three iterations of the data collection system initially thinking that it was possible to build a nice and simple system. Then we realized, we need to add more components to it and had to rebuild it several times. We ended up building on a sort of containerized system on the back of Docker, which is a sort of a scalable architecture for database and web application design. We were able to turn that into a scalable system. The major problem we were coming across was how to collect the data and at the same time where do we get our data from.
We did contact a range of ‘storefront’ providers in addition to IMDb such as immersive system and headset companies Oculus and HTC, and providers of stores for games and VR (virtual reality). We asked them for permission to collect data from their sites. Basically, we had to go through each providers end user license agreements on their websites to say, well, do they explicitly tell you no, you can't collect things. So that was the first challenge to overcome, Was it okay to even collect their data.
Once we got past that, we built a custom scraper for each website that would allow us to collect the data from it and put it into a massive database that we could then organize a bit better. The next challenge was, in effect, turning products into the actual companies behind them - real world companies that actually existed.
There was a secondary challenge there looking at how we aligned company names and any registration information with real life companies and their company registrations in relevant countries. We made use of a an open source project, called Open Corporates who are based in London, UK. They very helpfully provided us with a portal to a worldwide UK Companies House equivalent. This meant we were connected up with data for countries such as Denmark and Germany through the various state, official company registration systems with a view to accessing data.
However, Germany was one of the ‘problem countries’ because they don't appear to provide their data openly unlike other countries in the North Sea Region. For example in the UK, you can type in a company number or name online and the system will point you towards their their registration and information like their accounts. In Germany, we believe due to GDPR, you can't do that. So we do actually have a big blank area on our map that says, we know these companies exist in Germany. But we don't know very much about them, because the state does not provide that level of transparency on companies.
It sounds absolutely amazing that Open Corporates, the company you mentioned in London helped to facilitate access to data. It sounds absolutely brilliant and that this should have been a hugely expensive service. But presumably, they were quite amenable to the project.
Yes, they operate on a share alike policy. So if you use your data for data collection, or for research, if you're using their data for research, for the sort of betterment of everyone else, as long as you share what data you've with them, then they're happy to provide access. Otherwise, it would have cost us many, many thousands of Euros to purchase the data sets that we were using.
Fast forward as it were, how can the data now be seen what different formats as a team?
Creating a large database has been useful in the sense that we know the data is there. But how do we actually make that into something useful? We were trying to look at new approaches to making these datasets publicly available. At the time, when we were looking at this project, the new kid on the block was mobile virtual reality. And we were looking at how could we turn this this massively complicated data into something that with very little understanding, you could actually manipulate and gain some kind of business insights as to how companies work together, how big companies are, when they existed, how many products they produce and so on without this information being overtly available such as on websites.
So we actually built a virtual reality visualization tool. And we built several iterations of virtual reality visualization tools, as we always do, where you could query that database live. And you could say, for example, show me all of the companies that worked on this game, and it will give you ‘planetary style blocks’ with wires running between them say, all these work together, but then you could then grab one of those companies and say, show me who else these this companies work together with.
You can take an adventure through their relations and see who else they've worked with. You can also start to build a sort of a web of collaboration and really start to see where companies that are almost like the linchpin of the industry, because everybody needs them.
Some of our preliminary work didn't make into the final mapping. But in preliminary work, we found, for example, lighting companies in the film industry that don't really get thought about very often as being a crux of the industry. But absolutely, every company relied on them. So if there was a problem with that company, you'd see a massive problem in the industry.
In games, challenges and bottlenecks were more around localization and platform service providers who move things from a Mac to a PC, or PC to Windows, PC, or Linux. You realize that actually they're highly specialist. They do that one important job and they're a limited pool.
We tried lots of different visualizations to try and make it accessible. And we took it to various fairs, to let people have a go with it and see what they thought.
Especially because it was visual and interactive, it received an excellent reception from all types of people all the way from HR to technicians as they were easily able to see and understand all kinds of relationships between companies and identify opportunities.
So you must have been creating a really positive impression and getting feedback on such a helpful tool that you can use in real time to explore creative, technical and business relationships.
Yes, many people were really positive about it, because, even though it was very data driven, it didn't feel like you were looking at a spreadsheet. Even though we technically could have, you know, reported exactly the same information in a spreadsheet and said, there you go, that's the data. Instead, by visualizing it, they were able to consume the information much quicker and understand the relationship. So even if I'd given you a list and said, well, this company works with these others, and this one works with these others, by visualizing it, we were able to show almost instantly the connections and see the value between the various companies.
So, do you think it would be possible to make this experience dynamic in terms of data gathering and populating the application and how people interact with that data?
Yes, that would be a next stage. If we were going to develop it further, we would look at how you could introduce dynamic querying. So a way in which a person is viewing it can say: “Well, actually, my focus isn't on products. My focus is on countries, for example. I'm interested to see where in the country these things are. There's certainly some scope to add that bolt-on to the existing design. The database is built in a way that is scalable. So the advantage we have is because we've got this sort of centralized data store and then these apps which view it, then theoretically, you can design whatever app you like on top of that data and view it in a different way. So you could have the everyday person's view of it, but then you could have a professional tool that is used In a much more formal setting that would still access the same data.
What kind of resources would that take? Are we talking a vast team?
That depends on how many things you wanted to build. But something that's achievable with a small to medium sized team, I'd say, based on this project.
And just coming full circle, again, to the transnational point that we were discussing at the start , what has been your impression. You mentioned it depends on the sector. But broadly, looking at top level factors for sectors such as immersive, screen or games, you were talking about certain differences between those on the kind of scope for people to collaborate, particularly transnationally. From looking at that data, have you drawn any conclusions or insights so far?
From what we've seen, the interesting thing about the games industry and immersive is that they're generally working using the same tools and tend not to ‘play nicely’ across borders or even collaborate within a given country very often. So whereas when we did some initial scoping, looking at the film industry, for example, you had these large productions where there might be 50 companies working on a single production but with a game, even the biggest game might only have five companies working on it. And they would tend to be very local and not across borders preferring to be geographically close. And I suspect some of that is because of the security requirements of working together in these industries. So they generally work behind the restrictions of Non disclosure Agreements (NDAs). They keep everything close to their chest and for understandable commercial reasons don't really want to share very much. And so, because of that culture, they limit the companies they work with to trusted partners based on relationships stablished over time or they just buy companies with relevant skills.
You've also hit on another interesting point. If you contrast screen, particularly transnational co-productions, with say, games, it's very much financially driven. If those companies didn't have access to say, tax incentives, and maybe the co-production, match finance that they can get because they're working together, they wouldn't necessarily have the same incentive to work together. Whereas I don't think there's similar regimes in place for let's say, games or for immersive. And this is the basis for the culture of getting companies to engage with applications like service delivery, training and marketing.
And just on sort of some final points, were you getting the impression from this particular data gathering project that you were seeing, let's say SMEs, who are not in the entertainment space, were you getting the impression that any of these companies are servicing non-entertainment sectors.
I think there's a kind of tokenism that happens with new technology. So with, for example, immersive and VR, we saw quite a lot of companies that were commissioning work, and then therefore being listed as producers of it. So you might, for example, see IKEA, the retailer and furniture manufacturer, shown as a producer of multiple apps, including an augmented reality app. That is them actually engaging with the new technology. Whereas you might find other companies who have a game or an app just as a kind of an added bonus, it's not really part of their core business. But it means that they end up being listed as part of the immersive industry.
I think one of the other challenges we came across is that a lot of immersive and VR and games development is still very freelance and not formalised as company production. So one of the challenges for mapping is that you don't actually get 100% of the companies. We might have a list of 100,000 companies. But by the time we validated that you might be somewhere closer to 40,000.
So a key challenge is that there is no link between virtual companies and real companies. For example, I can start five domains tomorrow, on the web, whatever they are, and I'm not obliged to link them to my company at all, I don't have to put any kind of official tag or number or anything on there to know that that is actually a trading representation of my business. Which means that it makes the exercise of mapping those businesses even harder, because even if you had the best search engine in the world, you have no way of saying that a given website definitely belongs to a given company. So the data on it doesn't necessarily help. You can best guess, which is all you can do with these kind of big data exercises. But because there's no digital registration, digital global registration of companies, then tomorrow, I could start a new game company and never register it. And it could be on Steam or whatever store we like. And it could make a lot of money. And yet it would be missed on these kinds of mapping exercises, unless you ignore the fact that it doesn't have a real life presence.
So actually, trying to gather data in this field is really challenging? Because there is no connection between the legal world and the virtual world, just as you say. Any final thoughts?
The main thing I would really be interested in seeing is whether, say, a European initiative, because it's unlikely we'd get a global initiative, to actually have a standardized company identification number that would be required to be displayed on websites that belong to that company, would be very interesting.
That would facilitate so many other things, and especially being able to understand different sectors and truly map them and determine what their needs are going forward and where there's some opportunities as well.