In my case I was shooting with 2 different cameras. The sound was out of sync with both.
Following Zooms logic all of us have defective cameras. Also in my understanding drop frame is a way of counting frames, and does not affect real time recording.
Also the second camera in my scenario was shooting in non drop frame mode.
Forgive my jumping to conclusions but it does seem to me like Zoom is saying Im doing it right while the rest of the world is wrong. Perhaps with prodding they can come up with a better answer?

