Should the server be unrepsonsive, please notify me by clicking the button below. If you wish to leave a note, you may do so but it is optional. Note it is public, so don't post any sensitive information!
If it's a regular issue, and just needs a reboot don't worry about explaining. Just click Report and then kick back and wait please. If issues are reported between 21.00 and 6.00 it will most likely be resolved as soon as I'm awake and checking my email, so please be patient during out of office hours.
If you look below and see an issue reported but is waiting on an action, please feel free to either just wait for me to reboot it, or hit report again to add another notification.
Anything else please feel free to email me directly.
All times are in GMT
IMPORTANT!PLEASE READ THE RENDER SETTINGS INFORMATION BELOW!
Jobs suddenly failing on 9090? Please check you're using the newest version of DS, since the 3090's won't render anything with DS 4.12 it seems. For now, please use 7070 instead.
Iray Section Planes
These will cause the server to drop to the CPU regardless of whats in the scene, so please don't use them - as the script will reboot the server when it drops, causing it basically to get stuck in a loop forever.
The server has seen an uptick in use over the last few weeks, and I think we need to start being a bit more mindful on setting up render settings and use in general.
The Iray server software is a fickle beast and it's easy to get into a situation where the server is loaded up with 20+ jobs, the cache file hits 100gb+ and it just grinds to a halt with no other option than to delete the cache (and subsequent jobs) and start over.
I'm hyper aware that this can cause annoyance - so before people start cancelling out of frustration, I'd like to see what we can do collectively to try and smooth these bumps out.
There are a number of users who submit jobs and either remove or set impossibly high render settings.
For those unaware, Iray uses these settings to determine when to stop rendering. Either it meets the number of iterations, time limit or quality.. or whatever those that are set.
So if you have an large image, which is set to 15000 iterations, no time limit, or no quality, that image will render until that iteration limit is met, regardless if it is 20 mins or 20 hours. Obviously, this is an easy way to bog the server down, particularly if busy.
This also causes a potential problem if the job drops from the GPU to CPU rendering - that means it's going to be glacial trying to render that and essentially it's just locked the server up until either I know, or am alerted. This then creates a backlog, and then the cache issue comes into play as well.
So please can I suggest we look to be more efficient with these. I appreciate some shots will need more samples, but huge renders with quality and time settings disabled are a little overkill. Ideally I'd like to see everyone set a time limit of an hour (so 3600 seconds) a quality of 1 and 15000 iterations. If these settings produce noisy renders, then adjust, but pay attention to the results - if the render stopped after 20 mins as it hit the iteration limit, then just increase that. Don't just max everything out please.
Also please consider your render resolution - a lot are now being rendered 3000+ pixels, which is ok if you keep the other settings manageable. I do it myself, and then resample down, but now it just seems everything is getting set to max, huge scenes, huge resolutions.. and it's impacting everyone else.
Hopefully keeping the server moving a bit more will mean I can reboot and clear the cache when its quiet as well.
Please be mindful of others - some people use it more than others, so if you see 1 job behind your 5+ please consider just setting your priority to 101+ to let others go first (but again, bear in mind if you load up a lot of renders, they could be culled in a cache clear out)
News and updates!
June 2nd 2021
It's quite busy now regularly and there a few things I'm seeing which I think might help if we can try be a little more proactive.
This is an ongoing issue. The cache file is a single file which stores the job data - and gets slower the larger it gets. So I always try to keep it under 100gb and clear the cache when the server is empty. If the server bogs down, then I must do it regardless and clear any jobs that are waiting on there (sorry about that..)
One thing I see a lot of is people uploading a job to a server that is busy, then uploading the same (or other jobs) to the second server. This causes a few problems - firstly it doubles the cache data on the other server (since you're uploading the same data twice) plus you're then using both servers at the same time.
So can I ask, if you upload a job to one server, please stick to that server for all your jobs (at least in that 'session'). Deleting the job and moving it over doesn't help the cache size, it merely inflates both needlessly. It also gives people a fair chance that if one is busy, they can use the other.. and not then sat behind the same users who have jobs on the other one as well.
Please check the server queue before upload too. If you see 20 jobs on the server, please consider just uploading one or two of your own until the queue settles. If you see a job with CLEARING CACHE that means the cache file is getting big and I will clear it soon, so upload at your own risk. The server gets slower the bigger the cache file, so it will only compound the issue if you then submit jobs which make the cache bigger, and then even slower.
As the server gets busier, I'm politely asking people to not upload in big batches. If everyone uploads 10 images, it gets a mess real fast.. so please consider less than that to keep everything moving.
If you renders are taking hours instead of minutes, please consider setting the priority to 101 so other people who have only one or two images to render can do so without waiting forever. Conversely, if you're uploading quicker renders, please don't upload loads of them in one go (as above). I appreciate people pay for higher tiers, but please don't block the server out with all your renders. If you're not in a rush, either wait to upload or set the priority a little higher please.
Check the Queue beforehand
Please watch the server, if you upload a job and it kills one server, please do not the upload it to the other one and take that offline too. I appreciate those who do flag when a server is offline, but bear in mind I am in the UK so potentially overnight it will be the morning when it is fixed.
I get that it can be frustrating, but I have no control over the server software. If it locks up, there is nothing I can do but reboot it or clear the cache.
I'm happy to continue to put any money the servers generate back into them, but it's really all for nothing if the settings are cranked up and then the server gets locked up – then we just have a server with 3 x 3090's just sitting there. Conversely, I don't have the time to be forever rebooting or sorting out issues with it.
Appreciate your time reading, any comments please just shout.
April 26th 2021
9090 now uses 3 x 3090's. Please don't crank up the settings.. but rather enjoy the faster speeds.
March 9th 2021
I figured I would let you know what I've been working on.
9090 is now using a pair of 3090's. I'm seeing some jobs flat out fail, but they work on 7070... I believe it's something to do with the 3090's only supporting the newer version of Iray - and as such 4.15+ only seems to work best on there.
Dropping to CPU! This is a massive pain, which I have spent a while looking into. It seems to be just a matter of fact that swapping jobs, it can cause the machine to drop to CPU. It's doing it on the 3090's, and I know it's not case of running out of RAM. Yesterday, I tested Iray Sever on Linux.. and it still did it, so I've pretty sure it's not limited to the OS version.
So, I have been working on a custom script which monitors the machine - and if it sees the CPU usage increase it will reboot and restart the server automatically (also checks to see if anyone is currently uploading a job ). This has been deployed on 7070 only at the moment while I test it, as there are some issues I'd like to try and address (like if a job literally will only render on the CPU, it will go into an infinite loop of rebooting. So I need to build in some error handling etc.. but it's a start.Once this is working, it should provide an automatic way of ensuring the server only drops to the CPU when it has to, and make the server both more efficient and less demanding. If you see the server act wonky, please just drop me a message on here to look into.
|Current Date Time||Note|