A Controlled Output of the LSTM Network in Some Text Generating Problem

by IVAN PYSHNOGRAIEV, IGOR MOSEICH

Abstract. The approach for the generating the structured texts is

presented. The Long Short Term Memory (LSTM) network is used as

the basis element. The common LSTM network doesn’t take into

account the specific of such texts. The restrictive situations, which

define the additional rules for generating, have been outlined. Thus, a

control unit is proposed for correcting the states and the outputs of the

network according to the need of this problem. It consists of the

switching module for choosing the type of correction, the neural

networks for attention mechanisms, the modules for the specific

decoding, etc. In future research the control unit, which covers the most

number of the restricted situations, will be formalized and built.

Keywords. Text generation, neural network, policy, LSTM network,

machine learning, control input, restrictive situation.

INTRODUCTION

There are many applications of neural networks nowadays. The mathematical

modeling in different areas [1, 2], the speech and facial expression recognition [3], the classifications [4], the texts translation [5] and others are providing using this tool. Also a big amount of scientific paper is devoted to improving the networks effectivity [6, 7].

Different networks show good results in different cases. One of the type is the LSTM (Long Short Term Memory) network [8, 9]. It is a special kind of recurrent neural network, capable of learning long-term dependencies. It is very important when network’s output depends on not only the present input but also the previous ones. It influences at the structure of network, its studying and so on.

In our study we consider the text generating problem, in which it is needed to take into account the previous letters or even words for correct network’s output [10]. For example, Y. Choi with co-authors used such neural network for storytelling [11]. Therefore, we have chosen the LSTM networks to adopt it for our purposes.

We noticed that there are many texts with very similar strict structure. For

example, they are the rules of services, the privacy policies at the customer’s

websites, the data policies etc. They are needed for almost every company. But a common LSTM network does not take into account their specifics (the differences in the organization titles, the length and structure of the sentences, using different contexts and others).

So, we define the problem as the creating the set of different neural networks and programming modules for:

- the generating different policies;

- the working with income tax filling;

- the generating letters for different typical cases (the immigration, the

relationships at the work, the property issues);

- etc.

That is why it is necessary to create the approach for mentioned problem. In this paper we have tried to establish the problem and formulate a general approach to solving it.

PROBLEM STATEMENT

Let us consider the LSTM network from [9]. The network is shown schematically in figure 1.

According to the [2] the output and state are calculated using following

formulas:

It is the simplest example of the LSTM networks but in our case this does not

affect further reasoning.

As mentioned above, we are dealing with strictly structured texts. That’s why

sometime we need to get the network’s output in a specific form. At this moment we have identified the following cases:

- inserting specific words or phrases into certain parts of sentences;

- limitation of the sentence and paragraphs lengths;

- using the specific terms and contexts;

- need to use some words more often;

- etc.

Let us denote the given cases as some types of restrictive situations.

Proposed in fig. 1 standard network or its well-known modifications don’t have a mechanism for these issues. To solve this problem a certain module is needed, which corrects the network’s output and state. And in the paper we present some approach for constructing the such control unit.

CONTROLLED OUTPUT APPROACH FOR TEXT GENERATING

Let us consider in more detail two cases of different restrictive situations.

When we consider the input texts for network studying it is necessary to pay more attention to special words: the titles of companies and organizations, the names of persons, the posts of personal, etc. In this case we act throw the two stages. First is to encode such terms to separate values. And during the second stage when we get the results of the network output our control unit should replace those codes to the codes which relate to those for whom this text is generated. This decoding module should be pre-programmed for certain contexts.

When we need to use some terms more often the mechanism of attention can help. The many algorithms for it have already exists [12, 13]. Common to them is the use of a separate neural network which “knows” what words we prefer in certain case. But it should be different for each subject. Let’s name this part of CU as attention module.

The conceptual scheme of the proposed control unit is presented in figure 3. It

helps to adopt the LSTM network for our purposes.

CONCLUSIONS

In this paper we propose the approach which makes it possible to improve the

LSTM networks for generating some types of the structured texts. We have

outlined the different cases of the restricted situations and have already tested this approach on several simple types of them. The control unit is used for interrupting the network’s work and correcting its state and output. It consists of the separate neural networks and others modules for different types of the restricted situations. In future research we want to formalize and build the control unit which covers the most number of the restricted situations.

REFERENCES

TÜMER, A.E. and AKKUŞ, A., 2018. Forecasting Gross Domestic Product per Capita Using Artificial Neural Networks with Non-Economical Parameters.

Physica A: Statistical Mechanics and its Applications, 512, pp. 468–473.

2. ROSSI, M. and RENZI, M., 2018. A general methodology for performance

prediction of pumps-as-turbines using Artificial Neural Networks. Renewable

Energy, 128, pp. 265–274.

3. LIU, Y., YUAN, X., GONG, X., XIE, Z., FANG, F. and LUO, Z., 2018.

Conditional convolution neural network enhanced random forest for facial

expression recognition. Pattern Recognition, 84, pp. 251–261.

4. CETINIC, E., LIPIC, T. and GRGIC, S., 2018. Fine-tuning Convolutional

Neural Networks for fine art classification. Expert Systems with Applications,

114, pp. 107–118.

5. Cho, K., Courville, A., Bengio, Y., 2015. Describing Multimedia Content using Attention-based Encoder-Decoder Networks, CoRR, abs/1507.01053,

available at: http://arxiv.org/abs/1507.01053

6. SEO, J., YU, J., LEE, J. and CHOI, K., 2016. A new approach to binarizing

neural networks, ISOCC 2016 — International SoC Design Conference: Smart

SoC for Intelligent Things 2016, pp. 77–78.

7. YIN, Z., KONG, D., SHAO, G., NING, X., JIN, W. and WANG, J.-., 2016. A-

optimal convolutional neural network. Neural Computing and Applications, 1, pp. 1–10.

8. Hochraiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural

Computation, 9(8), pp. 1735–1780.

9. Understanding LSTM Networks. Colah’s blog. — Available at:

http://colah.github.io/posts/2015-08-Understanding-LSTMs

10.The Unreasonable Effectiveness of Recurrent Neural Networks. Andrej Karpathy blog. — Available at:

http://karpathy.github.io/2015/05/21/rnn-effectiveness

11. CHOI, Y., KIM, S. and LEE, J.-., 2016. Recurrent Neural Network for

Storytelling, Proceedings — 2016 Joint 8th International Conference on Soft

Computing and Intelligent Systems and 2016 17th International Symposium on Advanced Intelligent Systems, SCIS-ISIS 2016 2016, pp. 841–845.

12. HORI, C., HORI, T., LEE, T.-., ZHANG, Z., HARSHAM, B., HERSHEY,

J.R., MARKS, T.K. and SUMI, K., 2017. Attention-Based Multimodal Fusion

for Video Description, Proceedings of the IEEE International Conference on

Computer Vision 2017, pp. 4203–4212.

13. Vaswani, A., Shazeer, N., Parmar, N., et al. 2017. Attention Is All You Need, CoRR, abs/1706.03762, available at: (http://arxiv.org/abs/1706.03762)

Check our project in development.

#ANN #ArtificialNeuralNetworks #LSTM