Three different things there. The open-sourceness and transparency of models has been a bit of sliding scale and fully (in all aspects) are rare - a lot of close tries and even more marketing “open”. We’ve had a couple of thread where these have been discussed. But that’s just the models. A system or a service that uses the model (offers it for use) is another thing and those are the ones possibly gathering data (a good model can be at a bad system). A model doesn’t actively and at that moment learn, a system gathers data and then that is used to train a new version of the model. That is why a “bad model in a good system” is possible, as you can set up your own system (own server, own comp) where that black box model is sandboxed. Due to the blackbox nature of the models (they are so complex), it’s possible (although I haven’t come across reports of it) that a model could try to use system weaknesses (or just normal features) to send data to a third party.
Not a straight answer, I know. I wouldn’t recommend any one AI service as such - it kinda depends on your risk profile and the data you are using with the service, do you trust it. Using your own system (like Ollama or maybe Alpaca) may be pretty safe but the catch is the limited resources and size of model - is it sufficient vs. your security and privacy needs. For a truly secure combination, you’d need a trusted and tested good model, audible safe system and a combination of security features to keep an eye on and limiting the model, data, user and network.