The dataset of Two Sigma Competition is a h5 file. It can be read into a 1,710,756 rows x 111 columns pandas dataframe.
All datapoints are identified by two attributes combined: id + timestamp, both of which are not unique. Id is a financial security and timestamp indicates the time of quote (Y).
In the dataset there are 1424 different ids and 1813 different timestamps. In general, the data point is sampled in a fixed timestamp interval of 750.