Abstract:[Background] The relationship between gut microbiota and human health has attracted much attention and became a popular research area. [Objective] To explore the feature of gut microbiota of obese people based on the American Gut Project. To provide a theoretical basis for the intervention of obesity based on gut microbiota by constructing machine learning models to predict the status of people obesity. [Methods] Total of 1 665 normal samples (18.530) obese samples were downloaded from the website of the American Gut Project (AGP). The Wilcox rank-sum analysis was performed to explore the alteration of alpha-diversity between the obese and normal group. In addition, the logistic regression was performed to explore the correlation between alpha-diversity of gut microbiota and obese. For beta-analysis, we performed the principal component analysis (PCA) to explore the difference in the structure of gut microbiota between obese and normal groups. For the phylogenetic profiles, we performed the Wilcox rank-sum analysis to detect any significantly different taxa between the two groups. The PICRUSt analysis was used to predict the pathway based on the 16s rRNA gene sequences. Then, the Wilcox rank-sum analysis was used to detect the significantly different pathway between the two groups. To find the correlation between these significantly different pathways and genus, we performed the correlation analysis. Finally, we used the Scikit-Learn packages in python to construct the machine learning model and used the AUC value as the standard to justify the performance of each model. [Results] The decreasing trend of alpha-diversity in the obese population compared to the healthy population was observed after the Wilcox rank-sum analysis. In addition, the correlation between the alpha-diversity and the statues of obese was confirmed using the logistics regression. As for the beta-diversity, we did not observe the significant difference of the structure of gut microbiota after PCA based on three beta-diversity distance matrix including Weighted Unifrac, Unweighted Unifrac and Bray-Curtis. For the phylum, the high relative abundance of Bacteroidetes and the low relative abundance of Firmicutes was observed in the obese group. Besides, a total of 57 genera was significantly different between the two groups after the Wilcox rank-sum analysis. The genus of Ruminococcus increased in the obese groups, but the genus of Prevotella, Akkermansia and Methanobacteriales decreased in the obese group. All the pathway which predicted by the PICRUSt analysis were performed the Wilcox-rank-sum analysis between two groups and a total of 63 significantly different pathways was observed. The gradient boosted regression tree (GBDT) had the best performance with the AUC value (0.769) and test precise (0.725) among other models. [Conclusion] This study revealed the feature of gut microbiota of obese population based on a large-scale data sets. Besides, this study also constructed the machine learning models based on gut microbiota to predict the status of obese, which provide the new idea and theory basis of personalized medicine and diet.