I'm not sure who came up with the notion of setting tone to your "average" audio level. But that ain't how we use tone in master control, the place where all TV signals start their trip to viewers. Tone is assumed to be at 0 db and will be as loud as the loudest sound on the tape. Master control operators are charged with insuring that the audio or video portion of the RF signal do not cause interference with each other. (If you don't understand how they can do that, I can explain it again.)
It is assumed professionals already know what dynamic range they want for their program, so tone is at the same level as the maximum sound that occurs during playback. If you set your tone to -12db or -6 db, master control operators will boost your audio to 0 db for playback, which works fine as long as you never exceed the level you set. The general rules of having your audio bounce between -6 db and -12 db are there to provide some headroom for louder sounds. Say you have an interview with several people and when any one is speaking normally, they are bouncing between -6 db & -12db. But when anyone speaks up or if everyone begins laughing, your audio levels can jump up to 0 db without causing RF signal problems. But ANYTHING over 0.0 db will cause distortion in TV playback.
Audio only recordings & movies may deal with peak audio in a different manner. In sound recording or amplified performances, your audio can bounce above 0 db with little to no distortion. So audio recordings are adjusted differently than audio/video recordings. While I'm sure that there have been changes since video recording switched from analogue to digital, it seems unlikely that professional practices have radically changed since I used to work full-time in master control. Back when HiFi audio was introduced, the audio signal was laid between the tracks of the video signal (low hi audio continued to reside on the edge of the tape.) If the audio was too loud, it would encroach upon the actual video signal and cause video & audio distortion on the source tape.  To prevent distortion, professional video productions would place an audio limiter/compressor between the mixer & the record deck. But film productions never put the audio & video signals in contact with each other. Audio is recorded on tape until a final print is made, so distortion occurs only when it is overdriven in recording or, more often, when overdriven to speakers.
So while I can't speak to film production practices, industry convention in TV is tone should be recorded at 0 db, to correspond to the loudest sound in the program. SMPTE color bars are used in conjunction with a vectorscope & a waveform monitor to adjust the time base corrector (TBC) output. Which should make the playback signal look exactly like the video signal you saw when editing. Provided you viewed your program on a properly adjusted monitor. Film playback does not require a color/luminance/contrast reference signal as there are no adjustments made to the projected image.
I hope this clears up the confusion on this issue.