Eija-Leena's Groovy Site

OpenAI, the company who has created the user interface (UI) ie. ChatGPT for common people to use Large Language Models (LLMs) took another step last week disrupting the world by introducing a new version of ChatGPT, that you can build your own ChatGPT with - without any coding. They also added the possibility to use other foundation models too like DALL·E 3 to create visuals. OpenAI has done a marvellous job to democratise the somewhat mysterious Data Science and Machine Learning field so that more people can create and enjoy the productivity leap it can bring.

There’s a consensus forming that this is a bigger change for the world than the internet, and speed of development is crazy; last year this time there were no publicly available ChatGPT, the first version was released on November 30th 2022 and it is the fastest-growing consumer application in history. LLMs though have been there longer, Google has Bard, meta has Llama, and there also open source ones, but this development isn’t only about the performance of the models which are increasing rapidly, but it’s more about the ease of use.

As a Data Scientist myself I have acted as the gatekeeper selecting and using models in different applications. I also choose what data is used to train them and where the models are used. I have made sure that data represents at least on ball bark the phenomenon we wanna model, used necessary disclaimers and used appropriate models to do the job. With years of expertise, I have been able to create a process, to make sure that the outcome is both desired and representative. Democratisation of all this is great, but for me it seems we have given all the people powerful tools, but no manual on how to use them wisely and safely. People are used to hand saws and now they have a powerful electrical one, and it can be hazardous.

I have seen applications where one can create nudes from anyone's picture, I have seen the pope wearing a rap outfit, I have seen child pornography created with pictures taken from Instagram and I have seen real feeling scams with audio. I have seen adorable things too, like a ChatGPT telling stories to one’s daughter and illustrating them at the same time and useful ones like Swedish news created automatically in English. There are a lot of great things one can accomplish, but the ease of use has some serious illicit uses as well.

Should everyone embrace this development? My fear is that it’s once again a small group of people who will shape the future. They are most likely highly educated people, people that are more into technology, high-risk takers, early adopters. That, of course, is a thing that usually happens with new technology. Let's say we accept that, but should we accept and comply with the worldview this small bunch represents and reinforce that?

Remind you that this worldview is the basis of the foundation models ie. the base LLMs. Eg. the training data for the GPT 3.5 model, according to ChatGPT, “comes from a wide range of sources on the internet”. Mostly English content, mostly from the Western world. I fear and partly know that it is a world of tech bros, the world of the winner takes it-all. But it is the world we are living in, they say. We are, but do we want to? I don’t, and I’m actively making an effort to change it.

I’m also all pro-automating stuff and I hate mundane tasks. I was keen to let ChatGPT write this piece as well, but I wanted to create something new for my fellow humans, not let the model repeat what has been written before. If and when we start to generate most of the text and do it without explicitly defining the tone or words used, we repeat what is the basis of those foundation models ie. this world we are living in. Slowly most of the text is average, most of the text sounds the same and comes with the same biases and inclusivity problems that we hold now (or more precisely, held up to April 2023 with new GPT 4 model).

Another question is who has the resources to embrace this new technology? Well educated people, well equipped and well performing companies, techies, engineers minded people that like to build stuff. People who have time and are not doing chores. They have resources to do things for fun, test out limits and possibilities. I’d say that most of the companies and individuals who’d probably need the productivity leap the most, have not the resources. We could create services for non native speakers in Finland providing automatic translations or accessible services for the elderly and automate some of the transcribing of the doctors. These are just some examples. For these to happen, we need non-technical leaders to understand the possibilities and allocate the resources. We need people and services to educate the late bloomers in tech. We need the nationwide plan to include all citizens from various backgrounds.

So I call for all the leaders in the public, health, social etc sectors to seek for resources and funding, come up with real value-adding use cases, talk with people who are (at least almost) up to speed with the development. I’ll chip in and promise to spare the ideas with you pro bono.

For the current government, please hear, that the society needs the resources and funding for this massive change right now, if we wanna keep the values we foster in the Nordics, if we want to direct the course of this development to a direction that is for all the citizens, not only for the ones who are already well off.

Try and push your limits of imagination with GPT or other models and UIs, but you’ll soon notice how costly the queries are. Then it is time to let the experts step in and create a robust pipeline of data, (open source) models and output, like we are accustomed to. Without compromising any of your data or business logic.

Who is embracing new the technology like Large Language Models and why does it matter?