Deciding factors One word provides the answer - EXPERIENCE.
Special features (low power consumption, for example)
Cost of goods in production.
If short development time has more priority than cost of goods, then go for the more muscular controller. You can do cost optimization later as a separate stage. Premature costoptimization [sic] often spoils projects.
If you are already in the cost optimization stage, then you start from a working product. The early non-optimized version acts as a prototype for the optimized version. The early version flushes out the requirements.
before we start working on the project, how do we know what size and how much powerful microcontroller is needed for the project?
You can't know. And that's a big problem. If you choose a microcontroller that's too small, you might run out of resources (memory/pins/registers/other features) and these resources can be hard to accurately estimate until it's too late. But if you choose a microcontroller that's too large, you are paying for resources you aren't using, and that adds to the system cost.
You can sometimes make an educated guess how much of the various resources your application will need... if you were doing FFT calculations for example, you can calculate the exact amount of memory the samples will require. But it's harder to reliably determine how much object code space the software will require, until after it's been written.
A good hedge against running out of code memory, is to select a microcontroller family that is scalable across several different price/performance points. This is a big selling point of the ARM, Microchip PIC, Atmel AVR, and other scalable micro's. As you develop your software, you can move from a larger development system into a smaller target system, keeping the final end-product cost down.
Scalable means that all of the microcontrollers in that family have (mostly) the same instruction set and (mostly) the same registers. So software written for one, will work on another in the same family. (Microchip PIC code will not run on an ARM, but if you learn the PIC14 it's easy to move down to the PIC12 or up to the PIC16.)
If you're only building a one-off prototype, rather than a full production product line, your best option is to stick with a development system that's a bit larger than you think you need, but has room to grow.
A big factor is the size of your most-often-used variables. If you are using mostly 8-bit variables, and you only need to access 8-bit ports, then you can probably get by with a 8-bit MCU.
However if you have a lot of 16-bit variables, and even just a few 32-bit longs, then you are going to need to look at a 16-bit (or even 32-bit) MCU since accessing 16-bit variables with a 8-bit MCU takes a far amount of code.
And if you are going to use any floating point variables, I strongly suggest a 16-bit or 32-bit MCU.
Even if you don't have the time to write much of the code you will need to use, I suggest writing some of it beforehand and compiling for whatever microcontrollers you might be using. Most microcontroller manufacturers have free versions of their compilers, perhaps limited by the size of the output file, or by limiting the amount of optimization.
If you're already familiar with microcontrollers, than do some research whether a 8-bit uC is sufficient or not. It depends on the complexity, whether it will be extended or not (like from a monochrome LCD to color LCD), is it for a long term, does it have to be low power consumption, low memory usage and so on...
I had to do some research about which ARM uC I had to choose and also which compiler (based on price and popularity). In my situation it was important to check the availability of peripherals.
Usually, 32bit MPUs are often faster due to a higher frequency, but also due to their capabilities or some 'tricks' they can do.
A 32bit MCU can add two 32bit integers in one go, while an 8bit MCU will have to to this byte by byte. Floating point numbers need at least 16 bit and their math is more complex, which means lots of work for the 8bit MCU. A 32bit MCU may do the math in hardware, so once again, in one go. One of the tricks may be to do four additions of 8bit variables at once. Load four 8bit values into the register, and add 8bit values to each of them.
But that's not all. Also have a careful look at the hardware periphery, which is not necessarily more powerful on 32bit MCUs!