Methods for Colecting Data

2023-05-09 07:21:38

How you collect data differs in many ways. The first thing that it depends on is what the researchers want to know. There are many techniques for collecting social dates such as surveys, sample, intensive interviews, focus groups, field research, available materials, experiments etc. The two most useful and widely used items I found were on-site investigations and experiments. Although they are quite different, they do a lot of work to help socialists collect data and understand groups of people.

In Part 1 we discussed some basic ways to process categorical data such as hot coding and feature hashes. Both of these methods produce a very sparse and high dimensional representation of the data. This article introduces a more sophisticated approach designed to capture different categories of smaller sizes. Part 1 confirmed that the RF model is not a very good embedding method, but it works well with order encoding. So, we try to provide a numerical representation of each category using a few columns, but similar categories are closer together by encoding. The easiest way is to replace each category with the number of times you saw in the dataset. Thus, if both New York and New Jersey are large cities, they may appear more than once in our dataset and the model recognizes that they are similar.

Use of Python categorical data in machine learning: embedding from dummy variables into deep classes and to Cat 2vec - part 2

How to process lost or corrupted data in a dataset You can decide to find missing / corrupted data in a dataset and delete those rows or columns, or replace them with other values. Pandas has two very useful methods, isnull () and dropna (). These are useful for finding missing or damaged data strings and deleting those values. If you want to fill in invalid values with placeholder values (such as 0) you can use the fillna () method. How do you conduct exploratory data analysis (EDA)? The goal of EDA is to gather some insight from the data, ie get information, before applying the predictive model. Basically, I would like to do EDA from a coarse method in a detailed way. We first get some high level global insight. Please check some unbalanced lessons. We display the average and variance of each class. Please check the first few lines to see it all

Once the cleanup data set is complete, the next step is exploratory data analysis (EDA). EDA is a process of judging what data can be obtained, using EDA to find patterns, relations, or exceptions and to provide information to use for subsequent analyzes. Although almost all methods are used in EDA, one of the most effective startup tools is a graph pair (also called a scatterplot matrix). You can use the pairing graph to see the distribution of individual variables and the relationship between the two variables. Pairing is a good way to identify trends in subsequent analyzes. Fortunately, it's easy to implement in Python!