Thank you for your response. In regards to your question, I think you almost got it right.

The DWT on any signal results in a set of two coefficient; approximation which correspond with the lower frequencies and detail coefficients which correspond with the higher frequencies. Lets call these cA1 and cD1.

The second pass is done on the approximation coefficients; so the sub-band containing the lower frequencies is split into approximation and detail coefficients again. Lets call them cA2 and cD2.

This is done up to the maximum decomposition level n.

As you can see in the documentation of pywt.wavedec() the final output is [cAn, cDn, cDn-1, …, cD2, cD1].

If you want, you can also have a look at this simple example of the DWT.

]]>this is really helpful, Thanks!

I do not quite grasp the how the DWT algorithm is working (maybe that comes from a lack of understanding of what a filter bank is actually doing). Would you say that a first pass of the DWT is doing a convolution between the signal and the wavelet (and would that be the approximation coefficients?, in that case how do you get the detailed ones?). then the second pass is doing the same on the approximation coefficents, etc. Is that right?

]]>In section 3.2, why do you add peaks_x and peaks_y ? ]]>

Thanks for the comment!

In regards to mph, you didn’t get anything wrong… it is one of the arguments of the peak finding method. This method can be applied to any signal, whether the signal is a time-dependent signal or a frequency-dependent signal. I am simply disregarding any peaks below a certain percentage of the maximum amplitude in the signal, since they are more likely to be the result of noise than actual information.

In regards to the normalization part. I have never experienced that normalizing (scaling to [0 – 1]) the input values of a (gradient boosting) classifier improves the accuracy. I don’t think it is necessary. Usually it is more of a recommendation.

I would pay more attention to normalizing if there the input values had very different scales (one column with values in the scale of 1E-3 and one column with values in the scale of 1E6 for example), and the distance between points is important for the algorithm (algorithms like k-Nearest neighbours). ]]>

Another question I have regards the normalization of the data. As far as i’ve seen you did not perform any kind of data normalization. Is the dataset already normalized/standardized or is there any particular reason you left that out? If not, what kind of standardization would you recommend?

Best regards and keep up the good work! ]]>