For the NES, when you shoot, the NES displays a single black frame, with a white square where the target is.
Inside the gun is a photodiode which conducts electricity when a light shines on it, so if the gun was aimed at the target, the white square lights up the photodiode, and then the NES interprets that as a hit.
The reason this doesn't work on modern TVs is that modern displays try to process the signal to make it look better, and then the upscale the video to fit the screen. This can take anywhere from 20 ms to 80 ms.
This is a problem because the NES expects the black frame to be displayed instantaneously, which at 60 frames per second means under 16 milliseconds (although in practice it would need to be closer to 8ms).
There are other technologies though, like lightpens, which look for the electron beam to scan over the screen.
(By timing how long it takes from the start of the frame to the electron beam hitting the light pen's position, the computer/console can figure out where the pen is on the screen)
The only lightgun tech that works with modern displays is either an infrared system, like the Wii, or camera systems, like modern VR uses.