Web Mining Using Machine Learning: A Survey

2023-01-18 04:09:33

Abstract: With the rapid increase of network information, networks are becoming a major source of information for all people. As the number of users increases, organizations provide their data online on a daily basis. The Web contains both structured and unstructured information that provides comprehensive insight into Web data. Manual extraction of knowledge is a difficult task because of the enormous amount and dynamics of network data, so accurate information extraction technology is necessary to fully utilize the network as an information resource. . .

Text mining Text mining technology allows you to analyze text data from the web, comment fields, books, and other text-based sources to discover insights you have never seen before. Text mining uses machine learning or natural language processing techniques to organize documents (email, blogs, Twitter feeds, questionnaires, conflict information, etc.). It helps you analyze large amounts of information and discover new topics and terms relationships.

Algorithms underlying machine learning are basically based on statistics. Machine learning is similar to data mining concepts. The algorithm tries to find patterns in the data to classify, predict, or clarify significant trends. Machine learning is useful only when there is enough available data and the data is ready. As an example of a toy, consider evaluating the strength of the password depending on the length of the password, if it contains digits, and whether special characters are included. Also assume you have a list of passwords and their advantages. Simply use the original textual representation of the password of the secret learning algorithm to understand why the password becomes strong or does not work.

Think about Web search - Web search should learn, artificial intelligence, and impossible to imagine someone launching a web search engine that does not use certain machine learning or data science. It does not start a web search engine that only uses Tf-idf as a ranking. In our case, Google and Facebook basically have unlimited computing power and expertise than anyone else. The problem is that it gives them a long-term structural advantage. Are we putting bread crumbs falling from the capitalist's table on this chair? What do we like? Do we really have what we need to compete in these areas?